The Statistical Interpretation of Degrees of Freedom Author(s): William J. Moonan Source: The Journal of Experimental Education, Vol. 21, No. 3 (Mar., 1953), pp. 259-264 Published by: Taylor & Francis, Ltd. Stable URL: http://www.jstor.org/stable/20153902 . Accessed: 11/09/2011 01:58 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Experimental Education. http://www.jstor.org THE OF FREEDOM DEGREES WILLIAM University Minneapolis, 1. Introduction THE CONCEPT of "degrees of freedom" has a very simple nature, but this simplicity is not in statistical It textbooks. exemplified generally is the purpose of this paper to discuss and define the statistical of degrees of freedom and aspects of the term. This thereby clarify the meaning a very elem shall be accomplished by considering and pro entary statistical problem of estimation onward through more difficult but com gressing mon problems until finally a multivariate prob is used. The available literature which is devot ed to degrees of freedom is very limited. Some are given in the bibliography of these references and they contain algebraic, physical geometrical, and rational The main emphasis interpretations. in this article will be found to be on discovering the degrees of freedom associated with certain standard errors of common and useful significance and tests, that for some models, are parameters or indirectly, estimated directly by certain d e The procedures grees of freedom. given here in the system of es may be put forth completely of least timation which utilizes the principle The application squares. given here are special cases of this system. J. MOONAN of Minnesota Minnesota cation really comes from the theory of estimation mentioned before. We also could construct an = other linear function of the random variables, Y2 is a measure of how This contrast statistic well our observations agree since it yields a meas ure of the average difference of the variables. These statistics, Yx and Y2, have the valuable inform that they contain all the available property ation relevant to discerning of the characteristics from which the y's were drawn. This population is true because it is possible to reconstruct the from them. random variables original Clearly, Yi n random statistical problems are variables available it is assumed some for that anal to con it is possible With these variables, ysis. struct certain functions called statistics with which estimations and tests of hypotheses are made. As are numbers of de sociated with these statistics To elaborate and explain what grees of freedom. this means, let us start out with a very simple situation. Suppose we have two random variables, of statistics, If we pursue an objective y i and y2. which is called the reduction of data, we might construct the linear function, Yx = ?- yx + ? y2. This function estimates the mean of the popu lation from which the random variables were so does any other linear drawn. For that matter function of the form, Yi = axl yx + a12 y2 where the a's are real equal numbers. When the coef ficients of the random variables are equal to the of the number of them, the statistic de reciprocal fined is the sample mean. This statistic may be chosen here for logical reasons, but its specif i = = Y2 and yx - Yx = Y2 y2. We discern that we have constructed a pair of statistics which are to the original variables, reduceable but they state in the variables the information in a contained more useful form. are There certain other char worth noticing. acteristics The sum of the coef ficients of the random variables of Y2 equals zero and the sum of the products of the corresponding of the random variables of Yx and Y2 coefficients That is, (?)(?) + t?)X-?) = 0. This equals zero. latter property is known as the quasi-orthogonal is analogous This property to ity of Yi and Y2. the property of independence which is associated with the random variables. 2. In most OF INTERPRETATION STATISTICAL our random In changing tics we have formation. performed Quasi-orthogonal variables to the statis a quasi-orthogonal transformations trans are to which of special the statistics interest because In particular, they lead have valuable properties. if our data are composed from of random variables a normal dent these population, in the probability independent) or in other are statistics sense, words, (i.e., they indepen stochastically are uncorrel That remark has a rational interpretation used are not over says that the statistics lapping in the information they reveal about the the property of orth data. As long as we preserve the original ogonality we will be able to reproduce random variables at will. This reproductive prop when the coefficients of the erty is guaranteed are mutually random variables of the statistics statistic is to (i. e., every orthogonal orthogonal of such every other one), since the determinant does not vanish when this is true, our coefficients have a solution which is the equations (statistics) random vari of the original explicit designation ated. which 260 ables. JOURNALOF EXPERIMENTAL EDUCATION The determinant i (1) i i is for this problem of the number inquire about the relationship might to the yi's to the called the sum of the squares sum of squares this of the If we require Yj's. then number to be invariant, = (?-X-?) - (*)(*) = is another valuable property of quasi-orth which we shall come to a ogonal transformations little later. There 3. If we have three observations, we can construct three mutually statistics. quasi-orthogonal Again we might let Yx be the mean of the random vari ables with Y2 and Y3 as contrast statistics. Spec let Yx = i yx = i y2 = t y3. There exist ifically, two other mutually sta linear quasi-orthogonal tistics which might be chosen, and it can be said in the that we enjoy the freedom of two choices we actually use to summarize statistics the data. We could let (2) Y2 = ?yx - iy2 + fy3; Y3 -iy3; Y3 = iyx + ?-y2 - iyx -iy2 -|y3. t'y*. or, (3)Y2 =?y1 + iy2 (It can be shown choices! possible that there = an infinity exists of we variables, other mutually tics to summarize dom then a construct might statis linear quasi-orthogonal the data. Each to a mutually corresponds statis of free degree quasi-orthog In linear function of the random variables. the term degree of freedom does not nec general, refer to a linear function which is orth essarily ogonal to all the others which are or may be con in common usage it usually structed; however, linear functions. does refer to quasi-orthogonal model we are working When the observational which is estimated with contains only parameter there is little purpose in spec by a linear function, in the of freedom degrees ifying the remaining is if our model For instance, form of contrasts. with zero mean distributed yi =0 + ei is normally onal and variance a2, o n Yj2= S i=l we two statistics, For i. e., N(o, a2), and i = 1,..., n, we would also like to estimate a2. Unfortunately, is not estimated this parameter directly by linear functions other than Yx. of quasi the other property Before proceeding, One will discussed. be transformations orthogonal yi2. can write in matrix notation, (5) n *z j |^a21 .^yj2 Now S if in Yf the main = '(Ay) y' A' Ay. n = L is to equal J i=l y?2 y!y, = Y'Y is a two row-two then A'A ones a22 J ^y2 J = Y' Y = (a y) Then, column matrix i. e., diagonal, = A!A with /1 Ox vo r A matrix, A', which when multiplied by its trans then A' is called pose, A, equals a unit matrix, an orthogonal matrix and the y^s which are trans are said to be formed to the this matrix by Yj's transformed. notice that You will orthogonally the matrix of the coefficients of Y! and Y2 of sec tion 2 is not an orthogonal matrix since -J2 A'A tic representing the sample mean (which estimates or degrees of freedom 1 choices 9) and have n for n (4)S J=l ) which we have Either pair of the statistics chosen together with Yx can be shown to reproduce the random variables y1} and y2 and y3. As a that all the information consequence, they possess if we have In general, do. the original variables n random XXI (Vol. ? i of the Y's had been 1/V2's If the coefficients instead of ?-'s then A' would be an orthogonal ma of our transformations the matrix trix. Because defin does not fulfill the accepted mathematical but one very ition of orthogonal transformations, for the purposes much like them, they are termed, of this paper, to define transformations. quasi-orthogonal it seems However, Yx as yx to beginning unnatural +_1_ =_l_y1 Ya- students for Actually, y/T /2" and equal co Yx any linear function with positive would serve as well as Yx itself for they efficients and mathematically would be logically equivalent of the sample to the usual definition reducible mean. If we are to use the statis common-sense something must be done in order tics, obviously to preserve the property (4). One thing that can of what the sum be done is to change our definition of squares of the jth linear function, Y*, would be. Let us define the sum of squares associated to be with the linear function Yj (6) SS(Yj) = <aH y* +a2-i Ya + + +-+ a22j afj + *nj Yn)2 an*j March MOONAN 261 1953) instead of just the numerator Using this definition of it, property As an illus (4) will be preserved. tration of this formula let j = 1 and of the hypothesis, 9 = Go. The next logical elab oration would be to consider Fisher's t test of the = is model hypothesis. 0X 0E. The observation = yik Yx =?yx + 3Ly2 + ty3, 3 (iyi + JYa + iys)2 (*) 2 + (*) 2 + (i) 2 (7)ss(y1)= where eik, 9fc+ are 02, (?yQ2 3 variables, SS(Yi) = 2 and k=l, = l Yi respectively, yn + ...+l o ynix+ nx or for n random n^ i=l,..., linear eik are N(o,a2 =of = a22). The orthogonal functions which estimate the parameters 6X and then nx o y12+...+ n2 yn22 n2 n (S yi)2/n. i=l then if yx = 24, y2 = 18 and y3 =36, Further, = 18 = SS(YJ 2028, and if we use (2), then SS(Y2) and SS(Y3) = 150. Note that SS(YJ + SS(YJ + SS = 2196 and that (Y2) +SS(Y3) 3 Z y? = 242 + 182 + 362 = 2196 + ylx+...+o andY2=o 1 ynii nx nx y12 +.. .+1 n2 yn22. n2 Then, = + (11) SS(Y3) +... SS(Yn1+Il2) nk 2 .^ .^yik -SS(YJ i=l the sum of squares of the linear function equals the sum of squares of the random variables. be generalized to These results can, of course, sum case. of the n-variable the squares Clearly, of the two linear functions Y2 and Y3 equals the total sum of squares of the random variables min us the sum of squares associated Yx, so: nk 2 Thus 3 3^2 ~ (8) SS(Y2 )+ SS(Y, )= S yf SS(YX)= S y* <? 7? 1=1 i=l ->=%? or, in general, (9) SS(Y2) + ....+ SS(Yn) 2 Z S i=l 3=1Yy J SS(Y2)= ni ? i=l - (yii _ y if . (sVii)2 _M_-_lz?_= n2 na - + S (yi2 i=l _ (.Syi2)2 n2 . y2)2 and if we average these sum of squares, the ap denominator will be nx + n2 "2. The of Fisher's t is Yi -Y2 under the null = and the denominator is a func hypothesis Bx 92 tion of (11) and is associated with nx + n2-2 de grees of freedom. propriate numerator = S Yi2 (& yj)2 1=1 n-" 5. Now define the sample variance of a set of lin ear functions as the average of the sums of squares with the contrast linear functions. We associated see that for the special case where n = 3, our di vision for this average will be 2 because three are two sums of squares to be averaged in (8). This for the degrees of freedom di argument accounts visor which has been traditionally to ex difficult in the formula students plain to beginning we might consider the re example, = G+ ? + where ex, x) (xi gression model, yx n and ei are N(0> The lineur i = 1,..., o$.x). functions of interest are Yx = 1 yn and Y2 = (Xi-5Q As another yx +... is used an (iQ)s^Azin -1 = M^ii! (liyi)8 A1=1 n - 1 n(n 1) The statistic Ylf accounts for one degree of free dom in the numerator of the formula for Student's t and the denominator is a function of (10) and is associated with n 1 degrees of freedom. Note that it is not necessary to construct the contrast of freedom to obtain the sums of squares degrees associated with them. 4. The problem sis of variance is a simple analy just presented and leads to the test (anova) type average + (Xn - X) yn. n to estimate product the mean, of n n For the these functions, G and Y2, deviation x's Yx being and an con comitant y's, leads to an estimate of the unknown constant of proportionality, This is rationally ?. and algebraically true, since if yi and (xi x)tend to proportionately increase and decrease simul taneously or inversely, Y2 will tend to increase if yx and (xi absolutely. However, x) do not pro rise or in and fail portionately simultaneously will tend zero. to be can be This Y2 versely, shown by the following table. In this table, sev eral sets of x's designated by xjk, k = 1,..., 3, each of which have the same mean, 4, are substi tuted in Y2*together with their corresponding yj's. The values of the Y2k are given in the bottom line of Table I. 262 OF EXPERIMENTAL JOURNAL TABLE I EDUCATION these EVALUATION OF Y2 FOR CHANG ING VALUES OF Xi IN THE SIMPLE values construct the following o f system : equations REGRESSIONMODEL XXI (Vol. Pi(ax. + p2(ax. ax) a2) ... + = + am) Pm(ax. (ax.y) xil Xi3 xi5 (14) . Pi(a2 m (6) we find (12)SS(YX) = (j|yi)2 -x)yiV andSS(Y2)= gXxi "IT to find the sample (13) SS(Y3)+...+ SS(Yn) = | 1-1 pm(am. am) . (a2 y) y) (am. Yx, freedom, Y2,..., Ym(m<n) by (15) px(ax.y) .. + pm(am.y) + p2(a2.y)+. ?\2 S.(xi-x)' 1=1 Consequently, of degrees is given am)= are solved, by whatever the sum of squares for the When these equations method is convenient, Using pm(a2 = a2)+...+ Pi(am.ax)+p2(am. Y2k . .+ a2)+.. ax)+p2(a2. of o& x, estimate sum of squares The method reveals the correct or not the degrees of freedom are mutu whether it for the but we shall illustrate ally orthogonal, case. Consider orthogonal again (2) and then let Yi2 SS(YX) a2 = (i, SS(Y2)=i=l -?-, y), and y = (yx, y2, y3) = (24, ing to (14) we have yi)2 ft yi2 (i=?i W*' 18, 36). (16) p2(i) + p3 (0) = 3 E(xi-x)2 ?, a3(f, -?) Correspond p2(0) + p3(f) = 10. 1=1 p2 = 6 and p3 = 15, then SS(Y2)+SS(Y3)= = 168. In some previous work in 6(3)+(15)(10) section 3, we found SS(Y2) =18 and SS(Y3) = 150, so this result checks. In this problem, Yx was in order to show that (4) is quite general neglected Therefore S 1=1 (Yi - y) -b S 1=1 (xi x)yi = s 1=1 (yt 7? \2 yx) where b is the usual regression for coefficient of x and y is the y from a knowledge predicting value of yi. Again to find the variance predicted with these sums of squares we divide associated their sums of squares by the number of degrees of freedom from which these sums of square were 2. Under the null derived. This number is n = of the t test, the denominator ? 0, hypothesis, of freedom and t = b/S. ED, has (n 2) degrees is associated the numerator with one degree of freedom. for m<n. 7. All of these principles may be easily general case. ized to the multivariate What is needed is to use matrix variables instead of the single ones we have been using. Using the Least Squares the ideas presented here (and many Principle, others) have been applied to multivariate analysis of variance in reference number 4. The follow is taken from this source. ing and last example Suppose, 6. to calculate the It is fairly laborious SS(Yj) to have a meth of this it is desirable and because with od whereby the sum of squares associated several linear functions may be conveniently is fairly long and found. The proof of the method will not be reproduced here. will Its exposition have to suffice. Let ai be the coefficient vector of the random of the jth degree of freedom and let y variables be the observation (yx y2,..., vector, yn). With any Y? = H, y* = 5, yi = 8; y2 = 2, y22 = 6, y3* = 13. the superscripts indicate which variate is are not to be (these numbers being considered confused with powers), and the subscripts desig nate the variables. Also let Here Y"=f^ # + Y*a= - and Y*? ?. ^ ? March 3 MOONAN 263 1953) 3 of freedom. This vector set of degrees orthogonal serves to illustrate this invari simple problem ' 3 ance where a = 1, 2. We have, Pi1 (*) + P21 (0) + p3x (0) = 8 8. (17) P.1 (0) + p? (i) + p,1 (f) = 0 Pi1 (0) + pi (0) + p3x (*) = 0 p/ Therefore, = = = 24, p^ 6, p3* 0 and using (15) we find (24)(8) + (4)(2) + (0)(0) = 210 which is equal to ll2 + 52 + 82. P? (0) + pi (0) = I (18) px2 (0) + p2 For the second px2 (?) + variate we get = -2 (i) + p32 (0) px2 = 21, p22 = -4, p32 = -9 and to (15), (21)(7) + (-4)(-2) + (-9)(-6)= corresponding 209 which is equal to 22 + 62 = 132. The sum of of these three vector degrees of cross-products freedom for the two vari?tes may be found in one of two ways; either (24)(7) + 6(-2) + 0(-6) = 156 or (21)(8) + (-4)(3) + (-9)(10) = 156. Both results are equal to (H)(2) + (5)(6) + (8)(13). The matrix 210 156 a multivariate case. Summary We have seen that certain statistical problems are formulated in terms of linear functions of the random variables. called These linear functions, of freedom, served the purpose of pre degrees senting the data in a more usable form because or indirectly the functions led directly to estimates of the parameters of the observation model and the estimate = -6 Pi2 (0) + p22 (0) + p32 (f) Solving, for property to (14) corresponding 156 209 to the total sum of squares and cross corresponds for the bivariate products sample observations which have been transformed by the vector de 2. We note that grees of freedom Yja, j=l, 2, the sums of squares and cross-products of the variables for each variate is preserved by the of variance of the observations. More these estimates may be used to test hypoth over, eses about the population parameters by the stand ard statistical tests. Modern statistical usage of the concept of de in Student's grees of freedom had its inception classic work, reference 7, which is often consid to the devel ered the paper which was necessary statistics. opment of modern beginning Fisher, with his frequency distribution study, reference to work in their many con 2, has generalizations an tributions to the general theory of regression alysis. This paper has resulted from an attempt to to the statistical bring clarification interpreta tion of degrees of freedom. The author feels that his attempt will not be altogether for successful there remain many questions which students may or should ask that have not been answered here. A satisfactory could be given by a com exposition of the thepry of least squares plete presentation of modern which is slanted towards the problems of of variance the regression theory type. analysis would appropriately This discussion take book form, however. REFERENCES 1. Cramer, of Methods Harald, Mathematical Statistics Princeton N.J.: Un (Princeton, iversity Press, 1946). 2. Fisher, Ronald A., Distribution "Frequency of the Values of the Correlation In Samples " from an Independently Large Population, X (1915), pp. 507-521. Biometrika, 3. Johnson, in Palmer O., Statistical Methods Research (New York: Prentice Inc., Hall, 1948). 4. Moonan, William of J., The Generalization the Principles of Some Modern Experiment al Designs for Educational and Psycholog Research. Unpublished University thesis, of Minnesota, 19 Minnesota, Minneapolis, 52. 5. Rulon, Phillip J., ''Matrix Representation of Models for the Analysis of Variance and " XIV (1949), Psychometrika, Covariance, pp. 259-278. 6. Snedecor, Statistical Methods George W., Iowa: Collegiate Press, 1946). (Ames, 7. Student, "The Probable Error of the Mean," VI (1908), pp. 1-25. Biometrika, 264 JOURNAL OF EXPERIMENTAL 8. Tukey, John W., of An "Standard Methods " Proceedings: Computation alyzing Data, Seminar Business (New York: International Machines Corporation, 1949), pp. 95-112. 9. Walker, Journal (1940), John W., "Degrees of Educational 253-269. pp. of " Freedom, Psychology, XXI EDUCATION (Vol. XXI Essential Helen M., Mathematics 10. Walker, for Elementary Statistics (New York: Henry Holt and Co., 1951). and Analysis of The Design 11. Yates, Frank, Factorial Experiments, Imperial Bureau of Soil Science, Technical Communication 1937. No. 35, Harpenden, England: