Econometrics I Professor William Greene Stern School of Business Department of Economics 5-1/34 Part 5: Regression Algebra and Fit Gauss-Markov Theorem A theorem of Gauss and Markov: Least Squares is the minimum variance linear unbiased estimator (MVLUE) n = i1 v ii 1. Linear estimator 2. Unbiased: E[b|X] = β Theorem: Var[b*|X] – Var[b|X] is nonnegative definite for any other linear and unbiased estimator b* that is not equal to b. Definition: b is efficient in this class of estimators. 5-2/34 Part 5: Regression Algebra and Fit Implications of Gauss-Markov Theorem: Var[b*|X] – Var[b|X] is nonnegative definite for any other linear and unbiased estimator b* that is not equal to b. Implies: bk = the kth particular element of b. Var[bk|X] = the kth diagonal element of Var[b|X] Var[bk|X] < Var[bk*|X] for each coefficient. cb = any linear combination of the elements of b. Var[cb|X] < Var[cb*|X] for any nonzero c and b* that is not equal to b. 5-3/34 Part 5: Regression Algebra and Fit Summary: Finite Sample Properties of b Unbiased: E[b]= Variance: Var[b|X] = 2(XX)-1 Efficiency: Gauss-Markov Theorem with all implications Distribution: Under normality, b|X ~ N[, 2(XX)-1 (Without normality, the distribution is generally unknown.) 5-4/34 Part 5: Regression Algebra and Fit Comparação de modelos Podemos comparar modelos sem usar testes estatsticos, mas sim medidas conhecidas como criterios de informação que traduzem a qualidade de ajustamento de um modelo. Para estes indicadores a variável principal acaba por ser uma medida do valor absoluto dos erros. 5-5/34 Part 5: Regression Algebra and Fit Medida de Ajuste R2 = bXM0Xb/yM0y e'e Regression Variation 1 N 2 Total Variation (y y) i1 i (Very Important Result.) R2 is bounded by zero and one only if: (a) There is a constant term in X and (b) The line is computed by linear least squares. 5-6/34 Part 5: Regression Algebra and Fit Comparing fits of regressions Make sure the denominator in R2 is the same - i.e., same left hand side variable. Example, linear vs. loglinear. Loglinear will almost always appear to fit better because taking logs reduces variation. 5-7/34 Part 5: Regression Algebra and Fit 5-8/34 Part 5: Regression Algebra and Fit Adjusted R Squared Adjusted R2 (for degrees of freedom) 2 R = 1 - [(n-1)/(n-K)](1 - R2) includes a penalty for variables that don’t add much fit. Can fall when a variable is added to the equation. R 5-9/34 2 Part 5: Regression Algebra and Fit Adjusted R2 What is being adjusted? The penalty for using up degrees of freedom. R 2 = 1 - [ee/(n – K)]/[yM0y/(n-1)] uses the ratio of two ‘unbiased’ estimators. Is the ratio unbiased? R 2 = 1 – [(n-1)/(n-K)(1 – R2)] Will R 2 rise when a variable is added to the regression? R 2 is higher with z than without z if and only if the t ratio on z is in the regression when it is added is larger than one in absolute value. 5-10/34 Part 5: Regression Algebra and Fit Full Regression (Without PD) ---------------------------------------------------------------------Ordinary least squares regression ............ LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36 Model size Parameters = 9 Degrees of freedom = 27 Residuals Sum of squares = 596.68995 Standard error of e = 4.70102 Fit R-squared = .99334 <********** Adjusted R-squared = .99137 <********** Info criter. LogAmemiya Prd. Crt. = 3.31870 <********** Akaike Info. Criter. = 3.30788 <********** Model test F[ 8, 27] (prob) = 503.3(.0000) --------+------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X --------+------------------------------------------------------------Constant| -8220.38** 3629.309 -2.265 .0317 PG| -26.8313*** 5.76403 -4.655 .0001 2.31661 Y| .02214*** .00711 3.116 .0043 9232.86 PNC| 36.2027 21.54563 1.680 .1044 1.67078 PUC| -6.23235 5.01098 -1.244 .2243 2.34364 PPT| 9.35681 8.94549 1.046 .3048 2.74486 PN| 53.5879* 30.61384 1.750 .0914 2.08511 PS| -65.4897*** 23.58819 -2.776 .0099 2.36898 YEAR| 4.18510** 1.87283 2.235 .0339 1977.50 --------+------------------------------------------------------------- 5-11/34 Part 5: Regression Algebra and Fit PD added to the model. R2 rises, Adj. R2 falls ---------------------------------------------------------------------Ordinary least squares regression ............ LHS=G Mean = 226.09444 Standard deviation = 50.59182 Number of observs. = 36 Model size Parameters = 10 Degrees of freedom = 26 Residuals Sum of squares = 594.54206 Standard error of e = 4.78195 Fit R-squared = .99336 Was 0.99334 Adjusted R-squared = .99107 Was 0.99137 --------+------------------------------------------------------------Variable| Coefficient Standard Error t-ratio P[|T|>t] Mean of X --------+------------------------------------------------------------Constant| -7916.51** 3822.602 -2.071 .0484 PG| -26.8077*** 5.86376 -4.572 .0001 2.31661 Y| .02231*** .00725 3.077 .0049 9232.86 PNC| 30.0618 29.69543 1.012 .3207 1.67078 PUC| -7.44699 6.45668 -1.153 .2592 2.34364 PPT| 9.05542 9.15246 .989 .3316 2.74486 PD| 11.8023 38.50913 .306 .7617 1.65056 (NOTE LOW t ratio) PN| 47.3306 37.23680 1.271 .2150 2.08511 PS| -60.6202** 28.77798 -2.106 .0450 2.36898 YEAR| 4.02861* 1.97231 2.043 .0514 1977.50 --------+------------------------------------------------------------- 5-12/34 Part 5: Regression Algebra and Fit Outras medidas de Ajuste Para alternativas não aninhadas Inclui penalidade de grau de liberdade. Critério de Informação Schwarz (BIC): n log(ee) + k(log(n)) Akaike (AIC): n log(ee) + 2k Quando se quer decidir entre dois modelos não aninhados, o melhor é o que produz o menor valor do critério. A penalidade no BIC de incluir algo não relevante é maior que no AIC. 5-13/34 Part 5: Regression Algebra and Fit Outras medidas de Ajuste Tanto o AIC quanto o BIC aumentam conforme SQR aumenta. Além disso, ambos critérios penalizam modelos com muitas variáveis sendo que valores menores de AIC e BIC são preferíveis. Como modelos com mais variáveis tendem a produzir menor SQR mas usam mais parâmetros, a melhor escolha é balancear o ajuste com a quantidade de variáveis. A penalidade no BIC de incluir algo não relevante é maior que no AIC. 5-14/34 Part 5: Regression Algebra and Fit Multicolinearidade 5-15/34 Part 5: Regression Algebra and Fit Formas funcionais 5-16/34 Part 5: Regression Algebra and Fit Specification and Functional Form: Nonlinearity Population Estimators y 1 2 x 3 x 2 4 z yˆ b1 b2 x b3 x 2 b4 z E[ y | x, z ] x 2 23 x x ˆ x b2 2b3 x Estimator of the variance of ˆ x Est.Var[ˆ x ] Var[b2 ] 4 x 2Var[b3 ] 4 xCov[b2 , b3 ] 5-17/34 Part 5: Regression Algebra and Fit Log Income Equation ---------------------------------------------------------------------Ordinary least squares regression ............ LHS=LOGY Mean = -1.15746 Estimated Cov[b1,b2] Standard deviation = .49149 Number of observs. = 27322 Model size Parameters = 7 Degrees of freedom = 27315 Residuals Sum of squares = 5462.03686 Standard error of e = .44717 Fit R-squared = .17237 --------+------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------AGE| .06225*** .00213 29.189 .0000 43.5272 AGESQ| -.00074*** .242482D-04 -30.576 .0000 2022.99 Constant| -3.19130*** .04567 -69.884 .0000 MARRIED| .32153*** .00703 45.767 .0000 .75869 HHKIDS| -.11134*** .00655 -17.002 .0000 .40272 FEMALE| -.00491 .00552 -.889 .3739 .47881 EDUC| .05542*** .00120 46.050 .0000 11.3202 --------+------------------------------------------------------------Average Age = 43.5272. Estimated Partial effect = .066225 – 2(.00074)43.5272 = .00018. Estimated Variance 4.54799e-6 + 4(43.5272)2(5.87973e-10) + 4(43.5272)(-5.1285e-8) = 7.4755086e-08. Estimated standard error = .00027341. 5-18/34 Part 5: Regression Algebra and Fit Specification and Functional Form: Interaction Effect Population y 1 2 x 3 z 4 xz Estimators yˆ b1 b2 x b3 z b4 xz E[ y | x, z ] x 2 4 z ˆ x b2 b4 z x Estimator of the variance of ˆ x Est.Var[ˆ x ] Var[b2 ] z 2Var[b4 ] 2 zCov[b2 , b4 ] 5-19/34 Part 5: Regression Algebra and Fit Interaction Effect ---------------------------------------------------------------------Ordinary least squares regression ............ LHS=LOGY Mean = -1.15746 Standard deviation = .49149 Number of observs. = 27322 Model size Parameters = 4 Degrees of freedom = 27318 Residuals Sum of squares = 6540.45988 Standard error of e = .48931 Fit R-squared = .00896 Adjusted R-squared = .00885 Model test F[ 3, 27318] (prob) = 82.4(.0000) --------+------------------------------------------------------------Variable| Coefficient Standard Error b/St.Er. P[|Z|>z] Mean of X --------+------------------------------------------------------------Constant| -1.22592*** .01605 -76.376 .0000 AGE| .00227*** .00036 6.240 .0000 43.5272 FEMALE| .21239*** .02363 8.987 .0000 .47881 AGE_FEM| -.00620*** .00052 -11.819 .0000 21.2960 --------+------------------------------------------------------------Do women earn more than men (in this sample?) The +.21239 coefficient on FEMALE would suggest so. But, the female “difference” is +.21239 - .00620*Age. At average Age, the effect is .21239 - .00620(43.5272) = -.05748. 5-20/34 Part 5: Regression Algebra and Fit 5-21/34 Part 5: Regression Algebra and Fit 5-22/34 Part 5: Regression Algebra and Fit Quebra estrutural 5-23/34 Part 5: Regression Algebra and Fit Linear Restrictions Context: How do linear restrictions affect the properties of the least squares estimator? Model: y = X + Theory (information) R - q = 0 Restricted least squares estimator: b* = b - (XX)-1R[R(XX)-1R]-1(Rb - q) Expected value: E[b*] = - (XX)-1R[R(XX)-1R]-1(Rb - q) Variance: 2(XX)-1 - 2 (XX)-1R[R(XX)-1R]-1 R(XX)-1 = Var[b] – a nonnegative definite matrix < Var[b] Implication: (As before) nonsample information reduces the variance of the estimator. 5-24/34 Part 5: Regression Algebra and Fit Interpretation Case 1: Theory is correct: R - q = 0 (the restrictions do hold). b* is unbiased Var[b*] is smaller than Var[b] How do we know this? Case 2: Theory is incorrect: R - q 0 (the restrictions do not hold). b* is biased – what does this mean? Var[b*] is still smaller than Var[b] 5-25/34 Part 5: Regression Algebra and Fit Linear Least Squares Subject to Restrictions Restrictions: Theory imposes certain restrictions on parameters. Some common applications Dropping variables from the equation = certain coefficients in b forced to equal 0. (Probably the most common testing situation. “Is a certain variable significant?”) Adding up conditions: Sums of certain coefficients must equal fixed values. Adding up conditions in demand systems. Constant returns to scale in production functions. Equality restrictions: Certain coefficients must equal other coefficients. Using real vs. nominal variables in equations. General formulation for linear restrictions: Minimize the sum of squares, ee, subject to the linear constraint Rb = q. 5-26/34 Part 5: Regression Algebra and Fit Restricted Least Squares In practice, restrictions can usually be imposed by solving them out. 1. Force a coefficient to equal zero. Drop the variable from the equation Problem: Minimize for 1 , 2 , 3 n 2 (y x x x ) subject to 3 0 i 1 i1 2 i2 3 i 3 i 1 Solution: Minimize for 1 , 2 i 1 (yi 1x i1 2 x i2 ) 2 n 2. Adding up restriction. Impose 1 + 2 + 3 = 1. Strategy: 3 =1 1 2 . = Solution: Minimize for 1 , 2 n 2 ( y x x (1 )x ) i 1 i1 2 i2 1 2 i3 i 1 n i 1 [(yi x i3 ) 1 (x i1 x i3 ) 2 (x i2 x i3 )]2 3. Equality restriction. Impose 3 2 Minimize for 1 , 2 , 3 n i 1 (yi 1x i1 2 x i2 3 x i3 ) 2 subject to 3 2 Solution: Minimize for 1 , 2 i 1[yi 1x i1 2 (x i2 x i3 )]2 n In each case, least squares using transformations of the data. 5-27/34 Part 5: Regression Algebra and Fit Restricted Least Squares Solution General Approach: Programming Problem Minimize for L = (y - X)(y - X) subject to R = q Each row of R is the K coefficients in a restriction. There are J restrictions: J rows 3 = 0: R = [0,0,1,0,…] q = (0). 2 = 3: R = [0,1,-1,0,…] q = (0) 2 = 0, 3 = 0: R = 0,1,0,0,… q= 0 0,0,1,0,… 0 5-28/34 Part 5: Regression Algebra and Fit Solution Strategy Quadratic program: Minimize quadratic criterion subject to linear restrictions All restrictions are binding Solve using Lagrangean formulation Minimize over (,) L* = (y - X)(y - X) + 2(R-q) (The 2 is for convenience – see below.) 5-29/34 Part 5: Regression Algebra and Fit Restricted LS Solution Necessary Conditions L * 2X(y X) 2R 0 L * 2(R q) 0 Divide everything by 2. Collect in a matrix form XX R Xy ˆ = A 1w or A = w. Solution R 0 q Does not rely on full rank of X. Relies on column rank of A = K J. 5-30/34 Part 5: Regression Algebra and Fit Restricted Least Squares If X has full rank, there is a partitioned solution for * and * β * = b - (XX) 1R [R ( XX) 1 R ](Rb q) * [ R ( XX) 1 R ]( Rb q) where b the simple least squares coefficients, b = (XX) 1Xy. There are cases in which X does not have full rank. E.g., X = [1,x1 ,x 2 ,d1 ,d 2 ,d3 ,d 4 ] where d1 ,d 2 ,d3 ,d 4 are a complete set of dummy variables with coefficients a1 ,a 2 ,a 3 ,a 4 . Unrestricted b cannot be computed. Restricted LS with a1 +a 2 +a 3 +a 4 = 0 can be computed. 5-31/34 Part 5: Regression Algebra and Fit Aspects of Restricted LS 1. b* = b - Cm where m = the “discrepancy vector” Rb - q. Note what happens if m = 0. What does m = 0 mean? 2. =[R(XX)-1R]-1(Rb - q) = [R(XX)-1R]-1m. When does = 0. What does this mean? 3. Combining results: b* = b - (XX)-1R. How could b* = b? 5-32/34 Part 5: Regression Algebra and Fit Restrictions and the Criterion Function Assume full rank X case. (The usual case.) b = (XX) 1Xy uniquely minimizes (y -X)(y-X) = . (y -Xb)(y -Xb) < (y -Xb*)(y -Xb*) for any b* b. Imposing restrictions cannot improve the criterion value. It follows that R 2 * < R 2 . Restrictions must degrade the fit. 5-33/34 Part 5: Regression Algebra and Fit