The Statistical Interpretation of Degrees of Freedom
Author(s): William J. Moonan
Source: The Journal of Experimental Education, Vol. 21, No. 3 (Mar., 1953), pp. 259-264
Published by: Taylor & Francis, Ltd.
Stable URL: http://www.jstor.org/stable/20153902 .
Accessed: 11/09/2011 01:58
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of
Experimental Education.
http://www.jstor.org
THE
OF FREEDOM
DEGREES
WILLIAM
University
Minneapolis,
1. Introduction
THE CONCEPT of "degrees
of freedom"
has
a very simple nature, but this simplicity
is not
in statistical
It
textbooks.
exemplified
generally
is the purpose of this paper to discuss
and define
the statistical
of degrees
of freedom and
aspects
of the term.
This
thereby clarify the meaning
a very elem
shall be accomplished
by considering
and pro
entary statistical
problem of estimation
onward through more difficult but com
gressing
mon problems
until finally a multivariate
prob
is used.
The available
literature which is devot
ed to degrees
of freedom
is very limited.
Some
are given in the bibliography
of these references
and they contain algebraic,
physical
geometrical,
and rational
The main emphasis
interpretations.
in this article will be found to be on discovering
the degrees
of freedom associated
with certain
standard errors of common and useful significance
and
tests,
that
for
some
models,
are
parameters
or indirectly,
estimated
directly
by certain d e
The procedures
grees of freedom.
given here
in the system of es
may be put forth completely
of least
timation which utilizes
the principle
The application
squares.
given here are special
cases of this system.
J. MOONAN
of Minnesota
Minnesota
cation really comes from the theory of estimation
mentioned
before.
We also could construct
an
=
other linear function of the random variables,
Y2
is a measure
of how
This contrast
statistic
well our observations
agree since it yields a meas
ure of the average difference
of the variables.
These statistics,
Yx and Y2, have the valuable
inform
that they contain all the available
property
ation relevant
to discerning
of the
characteristics
from which the y's were drawn.
This
population
is true because
it is possible
to reconstruct
the
from them.
random variables
original
Clearly,
Yi
n random
statistical
problems
are
variables
available
it is assumed
some
for
that
anal
to con
it is possible
With these variables,
ysis.
struct certain functions called statistics
with which
estimations
and
tests
of
hypotheses
are
made.
As
are numbers of de
sociated with these statistics
To elaborate
and explain what
grees of freedom.
this means,
let us start out with a very simple
situation.
Suppose
we
have
two
random
variables,
of statistics,
If we pursue an objective
y i and y2.
which is called the reduction
of data, we might
construct
the linear function, Yx = ?- yx + ? y2.
This function estimates
the mean of the popu
lation from which the random variables
were
so does any other linear
drawn.
For that matter
function of the form, Yi = axl yx + a12 y2 where
the a's are real equal numbers.
When the coef
ficients of the random variables
are equal to the
of the number of them, the statistic de
reciprocal
fined is the sample mean.
This statistic may be
chosen here for logical reasons,
but its specif i
=
=
Y2
and
yx
-
Yx
=
Y2
y2.
We
discern
that
we have constructed
a pair of statistics
which are
to the original variables,
reduceable
but they state
in the variables
the information
in a
contained
more
useful
form.
are
There
certain
other
char
worth noticing.
acteristics
The sum of the coef
ficients of the random variables
of Y2 equals zero
and the sum of the products
of the corresponding
of the random variables
of Yx and Y2
coefficients
That is, (?)(?) + t?)X-?) = 0. This
equals zero.
latter property
is known as the quasi-orthogonal
is analogous
This property
to
ity of Yi and Y2.
the property
of independence
which is associated
with the random variables.
2.
In most
OF
INTERPRETATION
STATISTICAL
our random
In changing
tics
we
have
formation.
performed
Quasi-orthogonal
variables
to the statis
a quasi-orthogonal
transformations
trans
are
to which
of special
the statistics
interest because
In particular,
they lead have valuable properties.
if our data are composed
from
of random variables
a normal
dent
these
population,
in the probability
independent)
or
in other
are
statistics
sense,
words,
(i.e.,
they
indepen
stochastically
are
uncorrel
That remark has a rational
interpretation
used are not over
says that the statistics
lapping in the information
they reveal about the
the property of orth
data. As long as we preserve
the original
ogonality we will be able to reproduce
random variables
at will.
This reproductive
prop
when the coefficients
of the
erty is guaranteed
are mutually
random variables
of the statistics
statistic
is
to
(i. e., every
orthogonal
orthogonal
of such
every other one), since the determinant
does not vanish when this is true, our
coefficients
have a solution which is the
equations
(statistics)
random vari
of the original
explicit designation
ated.
which
260
ables.
JOURNALOF EXPERIMENTAL EDUCATION
The determinant
i
(1)
i
i
is
for this problem
of the number
inquire about the relationship
might
to the yi's to the
called the sum of the squares
sum of squares
this
of the
If we require
Yj's.
then
number to be invariant,
= (?-X-?) - (*)(*) =
is another valuable property
of quasi-orth
which we shall come to a
ogonal transformations
little later.
There
3.
If we have three observations,
we can construct
three mutually
statistics.
quasi-orthogonal
Again
we might
let Yx be the mean of the random vari
ables with Y2 and Y3 as contrast
statistics.
Spec
let Yx = i yx = i y2 = t y3. There exist
ifically,
two
other
mutually
sta
linear
quasi-orthogonal
tistics which might be chosen,
and it can be said
in the
that we enjoy the freedom of two choices
we actually
use to summarize
statistics
the data.
We could let
(2) Y2
=
?yx
-
iy2
+ fy3;
Y3
-iy3;
Y3
=
iyx
+ ?-y2
-
iyx
-iy2
-|y3.
t'y*.
or,
(3)Y2
=?y1
+ iy2
(It can be shown
choices!
possible
that there
=
an infinity
exists
of
we
variables,
other
mutually
tics
to summarize
dom
then
a
construct
might
statis
linear
quasi-orthogonal
the data.
Each
to a mutually
corresponds
statis
of free
degree
quasi-orthog
In
linear function of the random variables.
the term degree of freedom does not nec
general,
refer to a linear function which is orth
essarily
ogonal to all the others which are or may be con
in common usage it usually
structed;
however,
linear functions.
does refer to quasi-orthogonal
model we are working
When the observational
which is estimated
with contains only parameter
there is little purpose in spec
by a linear function,
in the
of freedom
degrees
ifying the remaining
is
if our model
For instance,
form of contrasts.
with zero mean
distributed
yi =0 + ei is normally
onal
and
variance
a2,
o
n
Yj2=
S
i=l
we
two statistics,
For
i. e.,
N(o,
a2),
and
i =
1,...,
n,
we would also like to estimate
a2. Unfortunately,
is not estimated
this parameter
directly
by linear
functions other than Yx.
of quasi
the other property
Before
proceeding,
One
will
discussed.
be
transformations
orthogonal
yi2.
can write
in matrix
notation,
(5)
n
*z j |^a21
.^yj2
Now
S
if
in
Yf
the main
=
'(Ay)
y' A' Ay.
n
=
L
is to equal
J i=l y?2 y!y,
= Y'Y
is a two row-two
then A'A
ones
a22 J ^y2
J
= Y' Y =
(a y)
Then,
column matrix
i. e.,
diagonal,
=
A!A
with
/1
Ox
vo r
A matrix,
A', which when multiplied
by its trans
then A' is called
pose, A, equals a unit matrix,
an orthogonal matrix
and the y^s which are trans
are said to be
formed to the
this matrix
by
Yj's
transformed.
notice that
You
will
orthogonally
the matrix
of the coefficients
of Y! and Y2 of sec
tion 2 is not an orthogonal matrix
since
-J2
A'A
tic representing
the sample mean
(which estimates
or degrees
of freedom
1 choices
9) and have n
for
n
(4)S
J=l
)
which we have
Either pair of the statistics
chosen together with Yx can be shown to reproduce
the random variables
y1} and y2 and y3. As a
that
all the information
consequence,
they possess
if we have
In general,
do.
the original variables
n random
XXI
(Vol.
?
i
of the Y's had been 1/V2's
If the coefficients
instead of ?-'s then A' would be an orthogonal ma
of our transformations
the matrix
trix. Because
defin
does not fulfill the accepted mathematical
but one very
ition of orthogonal
transformations,
for the purposes
much like them, they are termed,
of
this
paper,
to define
transformations.
quasi-orthogonal
it seems
However,
Yx
as
yx
to beginning
unnatural
+_1_
=_l_y1
Ya-
students
for
Actually,
y/T
/2"
and equal co
Yx any linear function with positive
would serve as well as Yx itself for they
efficients
and mathematically
would be logically
equivalent
of the sample
to the usual definition
reducible
mean.
If we
are
to use
the
statis
common-sense
something must be done in order
tics, obviously
to preserve
the property
(4). One thing that can
of what the sum
be done is to change our definition
of squares of the jth linear function, Y*, would
be. Let us define the sum of squares associated
to be
with the linear function
Yj
(6)
SS(Yj)
= <aH y* +a2-i Ya +
+
+-+
a22j
afj
+
*nj Yn)2
an*j
March
MOONAN
261
1953)
instead of just the numerator
Using this definition
of it, property
As an illus
(4) will be preserved.
tration of this formula let j = 1 and
of the hypothesis,
9 = Go. The next logical elab
oration would be to consider
Fisher's
t test of the
=
is
model
hypothesis.
0X
0E. The observation
=
yik
Yx =?yx
+ 3Ly2 + ty3,
3
(iyi + JYa + iys)2
(*) 2 + (*) 2 + (i) 2
(7)ss(y1)=
where
eik,
9fc+
are
02,
(?yQ2
3
variables,
SS(Yi)
=
2 and
k=l,
= l
Yi
respectively,
yn
+
...+l
o
ynix+
nx
or for n random
n^
i=l,...,
linear
eik are N(o,a2 =of = a22). The orthogonal
functions which estimate
the parameters
6X and
then
nx
o
y12+...+
n2
yn22
n2
n
(S yi)2/n.
i=l
then
if yx = 24, y2 = 18 and y3 =36,
Further,
= 18
=
SS(YJ
2028, and if we use (2), then SS(Y2)
and SS(Y3) = 150. Note that SS(YJ + SS(YJ + SS
= 2196 and that
(Y2) +SS(Y3)
3
Z y? = 242 + 182 + 362 = 2196
+
ylx+...+o
andY2=o
1
ynii
nx
nx
y12
+..
.+1
n2
yn22.
n2
Then,
=
+
(11) SS(Y3) +...
SS(Yn1+Il2)
nk
2
.^
.^yik
-SS(YJ
i=l
the sum of squares of the linear function
equals the sum of squares of the random variables.
be generalized
to
These results
can, of course,
sum
case.
of
the n-variable
the
squares
Clearly,
of the two linear functions Y2 and Y3 equals the
total sum of squares of the random variables min
us the sum of squares associated
Yx, so:
nk 2
Thus
3
3^2
~
(8) SS(Y2 )+ SS(Y, )= S yf SS(YX)= S y* <? 7?
1=1
i=l
->=%?
or,
in general,
(9) SS(Y2) + ....+
SS(Yn)
2 Z
S
i=l 3=1Yy J
SS(Y2)=
ni
?
i=l
-
(yii
_
y if
.
(sVii)2
_M_-_lz?_=
n2
na
-
+ S
(yi2
i=l
_
(.Syi2)2
n2
.
y2)2
and if we average
these sum of squares,
the ap
denominator
will be nx + n2 "2. The
of Fisher's
t is Yi -Y2 under the null
=
and
the denominator
is a func
hypothesis
Bx
92
tion of (11) and is associated
with nx + n2-2 de
grees of freedom.
propriate
numerator
= S
Yi2 (& yj)2
1=1
n-"
5.
Now define the sample variance
of a set of lin
ear functions as the average
of the sums of squares
with the contrast
linear functions.
We
associated
see that for the special case where n = 3, our di
vision for this average will be 2 because
three are
two sums of squares
to be averaged
in (8). This
for the degrees
of freedom di
argument accounts
visor which has been traditionally
to ex
difficult
in the formula
students
plain to beginning
we might consider
the re
example,
= G+ ?
+
where
ex,
x)
(xi
gression model,
yx
n and ei are N(0>
The lineur
i = 1,...,
o$.x).
functions of interest are Yx = 1 yn and Y2 = (Xi-5Q
As another
yx +...
is used
an
(iQ)s^Azin -1
= M^ii!
(liyi)8
A1=1 n - 1
n(n
1)
The statistic Ylf accounts
for one degree of free
dom in the numerator
of the formula for Student's
t and the denominator
is a function of (10) and is
associated
with n
1 degrees
of freedom.
Note
that it is not necessary
to construct
the contrast
of freedom
to obtain the sums of squares
degrees
associated
with them.
4.
The problem
sis of variance
is a simple analy
just presented
and
leads to the test
(anova) type
average
+ (Xn - X) yn.
n
to estimate
product
the mean,
of
n
n
For
the
these
functions,
G and Y2,
deviation
x's
Yx
being
and
an
con
comitant y's,
leads to an estimate
of the unknown
constant of proportionality,
This is rationally
?.
and algebraically
true, since if yi and (xi x)tend
to proportionately
increase
and decrease
simul
taneously or inversely,
Y2 will tend to increase
if yx and (xi
absolutely.
However,
x) do not pro
rise
or in
and
fail
portionately
simultaneously
will
tend
zero.
to
be
can be
This
Y2
versely,
shown by the following
table.
In this table, sev
eral sets of x's designated
by xjk, k = 1,...,
3,
each of which have the same mean,
4, are substi
tuted in Y2*together
with their corresponding
yj's.
The values of the Y2k are given in the bottom line
of Table I.
262
OF EXPERIMENTAL
JOURNAL
TABLE
I
EDUCATION
these
EVALUATION
OF Y2 FOR CHANG
ING VALUES OF Xi IN THE SIMPLE
values
construct
the following
o f
system
:
equations
REGRESSIONMODEL
XXI
(Vol.
Pi(ax.
+ p2(ax.
ax)
a2)
...
+
=
+
am)
Pm(ax.
(ax.y)
xil
Xi3
xi5
(14)
.
Pi(a2
m
(6) we find
(12)SS(YX) = (j|yi)2
-x)yiV
andSS(Y2)= gXxi
"IT
to find the sample
(13) SS(Y3)+...+ SS(Yn) =
| 1-1
pm(am.
am)
.
(a2
y)
y)
(am.
Yx,
freedom,
Y2,...,
Ym(m<n)
by
(15) px(ax.y)
.. +
pm(am.y)
+ p2(a2.y)+.
?\2
S.(xi-x)'
1=1
Consequently,
of
degrees
is given
am)=
are solved,
by whatever
the sum of squares
for the
When these equations
method
is convenient,
Using
pm(a2
=
a2)+...+
Pi(am.ax)+p2(am.
Y2k
.
.+
a2)+..
ax)+p2(a2.
of o& x,
estimate
sum of squares
The method
reveals
the correct
or not the degrees
of freedom are mutu
whether
it for the
but we shall illustrate
ally orthogonal,
case.
Consider
orthogonal
again (2) and then let
Yi2 SS(YX)
a2 = (i,
SS(Y2)=i=l
-?-, y),
and y = (yx, y2, y3) = (24,
ing to (14) we have
yi)2
ft yi2 (i=?i W*'
18, 36).
(16) p2(i) + p3 (0) = 3
E(xi-x)2
?,
a3(f,
-?)
Correspond
p2(0) + p3(f) = 10.
1=1
p2 = 6 and p3 = 15, then SS(Y2)+SS(Y3)=
= 168.
In some previous work in
6(3)+(15)(10)
section 3, we found SS(Y2) =18 and SS(Y3) = 150,
so this result checks.
In this problem,
Yx was
in order to show that (4) is quite general
neglected
Therefore
S
1=1
(Yi
-
y)
-b
S
1=1
(xi
x)yi
= s
1=1
(yt
7? \2
yx)
where
b is the usual regression
for
coefficient
of x and y is the
y from a knowledge
predicting
value of yi. Again to find the variance
predicted
with these sums of squares we divide
associated
their sums of squares by the number of degrees
of freedom
from which these sums of square were
2. Under the null
derived.
This number is n
=
of the t test,
the
denominator
?
0, hypothesis,
of freedom and
t = b/S. ED, has (n 2) degrees
is associated
the numerator
with one degree of
freedom.
for
m<n.
7.
All of these principles
may be easily general
case.
ized to the multivariate
What is needed is
to use matrix
variables
instead of the single ones
we have been using.
Using the Least Squares
the ideas presented
here (and many
Principle,
others) have been applied to multivariate
analysis
of variance
in reference
number 4. The follow
is taken from this source.
ing and last example
Suppose,
6.
to calculate
the
It is fairly laborious
SS(Yj)
to have a meth
of this it is desirable
and because
with
od whereby
the sum of squares associated
several
linear functions may be conveniently
is fairly long and
found. The proof of the method
will not be reproduced
here.
will
Its exposition
have to suffice.
Let ai be the coefficient
vector of the random
of the jth degree of freedom and let y
variables
be the observation
(yx y2,...,
vector,
yn). With
any
Y?
=
H,
y*
=
5, yi
=
8; y2
=
2, y22
=
6, y3*
= 13.
the superscripts
indicate which variate
is
are not to be
(these numbers
being considered
confused with powers),
and the subscripts
desig
nate the variables.
Also let
Here
Y"=f^ # + Y*a= - and
Y*?
?.
^ ?
March
3
MOONAN
263
1953)
3
of freedom. This
vector set of degrees
orthogonal
serves
to illustrate
this invari
simple problem
'
3
ance
where
a = 1, 2. We
have,
Pi1 (*) + P21 (0) + p3x (0) = 8
8.
(17) P.1 (0) + p? (i) + p,1 (f) = 0
Pi1 (0) + pi (0) + p3x (*) = 0
p/
Therefore,
=
=
=
24, p^
6, p3* 0 and using
(15)
we find (24)(8) + (4)(2) + (0)(0) = 210 which is equal
to ll2
+ 52 + 82.
P? (0) + pi (0) = I
(18) px2 (0) + p2
For
the second
px2 (?) +
variate
we
get
= -2
(i) + p32 (0)
px2
=
21,
p22
=
-4,
p32
=
-9
and
to (15), (21)(7) + (-4)(-2) + (-9)(-6)=
corresponding
209 which is equal to 22 + 62 = 132. The sum of
of these three vector degrees
of
cross-products
freedom for the two vari?tes may be found in one
of two ways; either (24)(7) + 6(-2) + 0(-6) = 156
or (21)(8) + (-4)(3) + (-9)(10) = 156. Both results
are equal to (H)(2) + (5)(6) + (8)(13).
The matrix
210
156
a multivariate
case.
Summary
We have seen that certain statistical
problems
are formulated
in terms of linear functions of the
random variables.
called
These
linear functions,
of freedom,
served the purpose of pre
degrees
senting the data in a more usable form because
or indirectly
the functions
led directly
to estimates
of the parameters
of the observation
model and the
estimate
= -6
Pi2 (0) + p22 (0) + p32 (f)
Solving,
for
property
to (14)
corresponding
156
209
to the total sum of squares and cross
corresponds
for the bivariate
products
sample observations
which have been transformed
by the vector de
2. We note that
grees of freedom
Yja, j=l, 2,
the sums of squares and cross-products
of the
variables
for each variate
is preserved
by the
of
variance
of
the
observations.
More
these estimates
may be used to test hypoth
over,
eses about the population parameters
by the stand
ard statistical
tests.
Modern
statistical
usage of the concept of de
in Student's
grees of freedom had its inception
classic work,
reference
7, which is often consid
to the devel
ered the paper which was necessary
statistics.
opment of modern
beginning
Fisher,
with his frequency distribution
study, reference
to work in their many con
2, has generalizations
an
tributions
to the general
theory of regression
alysis.
This paper has resulted
from an attempt to
to the statistical
bring clarification
interpreta
tion of degrees
of freedom.
The author feels that
his attempt will not be altogether
for
successful
there remain many questions which students may
or should ask that have not been answered
here.
A satisfactory
could be given by a com
exposition
of the thepry of least squares
plete presentation
of modern
which is slanted towards the problems
of
of
variance
the
regression
theory
type.
analysis
would appropriately
This discussion
take book
form,
however.
REFERENCES
1. Cramer,
of
Methods
Harald, Mathematical
Statistics
Princeton
N.J.:
Un
(Princeton,
iversity Press,
1946).
2. Fisher,
Ronald A.,
Distribution
"Frequency
of the Values of the Correlation
In Samples
"
from an Independently
Large Population,
X (1915), pp. 507-521.
Biometrika,
3. Johnson,
in
Palmer O., Statistical
Methods
Research
(New York: Prentice
Inc.,
Hall,
1948).
4. Moonan, William
of
J., The Generalization
the Principles
of Some Modern Experiment
al Designs
for Educational
and Psycholog
Research.
Unpublished
University
thesis,
of Minnesota,
19
Minnesota,
Minneapolis,
52.
5. Rulon, Phillip J.,
''Matrix Representation
of Models
for the Analysis
of Variance
and
"
XIV (1949),
Psychometrika,
Covariance,
pp. 259-278.
6. Snedecor,
Statistical
Methods
George W.,
Iowa:
Collegiate
Press,
1946).
(Ames,
7. Student,
"The Probable
Error of the Mean,"
VI (1908), pp. 1-25.
Biometrika,
264
JOURNAL
OF EXPERIMENTAL
8. Tukey,
John W.,
of An
"Standard Methods
"
Proceedings:
Computation
alyzing Data,
Seminar
Business
(New York: International
Machines
Corporation,
1949), pp. 95-112.
9. Walker,
Journal
(1940),
John
W.,
"Degrees
of Educational
253-269.
pp.
of
"
Freedom,
Psychology,
XXI
EDUCATION
(Vol.
XXI
Essential
Helen M., Mathematics
10. Walker,
for Elementary
Statistics
(New York: Henry
Holt and Co.,
1951).
and Analysis
of
The Design
11. Yates,
Frank,
Factorial
Experiments,
Imperial Bureau
of Soil Science,
Technical
Communication
1937.
No. 35, Harpenden,
England:
Download

The Statistical Interpretation of Degrees of Freedom