Economics Letters 85 (2004) 247 – 255 www.elsevier.com/locate/econbase
Seemingly unrelated regressions with identical regressors: a note Debopam Bhattacharya * Department of Economics, Dartmouth College, 327 Rockefeller Hall, 03755, Hanover, NH, USA Received 29 February 2004; accepted 20 April 2004
Abstract I show that in a seemingly unrelated, multivariate normal model with identical regressors, where some equations are linear and others are in the limited dependent variable (LDV) form, joint MLE can lead to efficiency gains for parameters of the LDV equations but not for parameters of the linear equations. D 2004 Elsevier B.V. All rights reserved. Keywords: SUR; Bivariate normal; Likelihood JEL classification: C0; C3
1. Introduction It is well-known that in a system of linear seemingly unrelated regression equations with identical regressors, equation by equation OLS yield efficient estimates of the coefficient vectors (see Greene, 1989, p. 488–489, for a text-book treatment). This note extends that result to identical regressors in a seemingly unrelated, (latent) multivariate normal model, where some equations are linear and others are in the limited dependent variable form (here, we investigate only the binary dependent variables). An example would be where the dependent variable in the first equation denotes whether an infant survived for up to 1 year after birth and the second equation denotes weight at birth and the common regressors are mother’s health and educational status and family income. Note that this is different from usual selection-type models where one variable is observed only for individuals whose second variable has crossed a threshold. One may well have more than two equations in the model; as an example, consider multichildren households. The dependent variables of interest are the school enrollment status of different siblings—one for the oldest, one for the second oldest, etc.—and another equation where the * Tel.: +1-603-359-5994; fax: +1-603-643-2122. E-mail address:
[email protected] (D. Bhattacharya). 0165-1765/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2004.04.012
248
D. Bhattacharya / Economics Letters 85 (2004) 247–255
dependent variable is the family income. Although family income is likely a major determinant of school enrollment, it is clearly an endogenous variable, and in absent plausible instrument, one may still estimate the reduced-form equations (where the dependent variables are the enrollment status of the different siblings plus income), with all exogenous variables included as regressors. Given that unobserved components for the different equations are likely to be correlated, a reasonable specification for this set of reduced-form equations is the SUR and a question of interest is whether using all the equations in the system improves the efficiency of the estimates. The main result of this note is that, in such a system with structural errors being multivariate normal, equation-by-equation OLS still yields efficient estimates of the coefficient vector for the linear equations. However, a (multivariate) probit for the limited dependent variable (LDV) equations alone, ignoring the linear equations in the system, will not necessarily yield asymptotically equivalent estimates of the LDV equation parameters which are obtained by maximizing the full-information joint likelihood (FIML). The result continues to hold if the regressors in the linear equation are a linear combination (in particular, a subset) of the regressors in the nonlinear equations. In Section 2, we show the results with one pair of equations to get the idea across in a simplified manner and state (without proof) the general case. In Section 3, we provide an empirical illustration.
2. Theory 2.1. One pair of equations For simplicity, we consider a two-equation normal SUR model where the first equation is a probit and the second is normal linear regression. We also simplify the notation by normalizing both variances to (1) (our main results do not depend on this normalization). Specifically, then, consider the model y*1i ¼ Xi b þ ei y2i ¼ Xi c þ ti with (ei, vi) are i.i.d. across i, independent of Xi and 0
1
00 1 0 11 0 1 q @ Af N2 @@ A; @ AA: ti 0 q 1 ei
b, caRk, qa( 1, 1), the sequence {xn} belongs to Rk and are i.i.d. with finite second moments. The observed version of the Eq. (1) is given by 8 9 < 0 if y*1i V 0 = : y1 ¼ : ; 1 if y*1i >0
ð1Þ
D. Bhattacharya / Economics Letters 85 (2004) 247–255
249
Denoting by / and U the standard normal density and cdf, the log-likelihood for a simple random sample is given by: N X
N X
Xi b q l¼ ln/ðy2i Xi cÞ þ y1i lnU pffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi ðy2i Xi cÞ 2 1q 1 q2 1 1 !! N X Xi b q ð1 y1i Þln 1 U pffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi ðy2i Xi cÞ þ 1 q2 1 q2 1
!
where we have first conditioned on y2. Reparametrizing q h ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 q2 and rewriting e2i ¼ y2i Xi c Xi b q zi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi pffiffiffiffiffiffiffiffiffiffiffiffiffi ðy2i Xi cÞ 2 1q 1 q2 ¼ Xi b
pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ h2 hðy2i Xi cÞ
likelihood is given by l¼
N X
ln/ðe2i Þ þ
1
N X
y1i lnUðzi Þ þ
1
N X ð1 y1i Þlnð1 Uðzi ÞÞ 1
First-order conditions are given by N pffiffiffiffiffiffiffiffiffiffiffiffiffi X Bl y1i Uðzi Þ 2 ¼ 1þh 0¼ Xi /ðzi Þ ; Bb Uðzi Þð1 Uðzi ÞÞ 1
ð2Þ
N N X X Bl y1i Uðzi Þ xi e2i þ hXi /ðzi Þ ¼ 0¼ Bc Uðzi Þð1 Uðzi ÞÞ 1 1
¼
N X 1
h Bl xi e2i þ pffiffiffiffiffiffiffiffiffiffiffiffiffi ; 1 þ h2 Bb
ð3Þ
250
D. Bhattacharya / Economics Letters 85 (2004) 247–255
and N Bl h Bl V X y1i Uðzi Þ ¼ e2i /ðzi Þ 0¼ b Bh Uðzi Þð1 Uðzi ÞÞ 1 þ h2 Bb 1
ð4Þ
Note from (2) that the second term in (3) is 0 so that (3) reduces to N X
xi e2i ¼ 0
ð5Þ
1
which implies the OLS solution for c (because the likelihood function is globally concave in the parameters1, there is a unique solution to the first order conditions and that solution must satisfy (5)). Note that this result continues to hold if X2ol(X1) where X1 are the regressors in the Eq. (1) and X2 are those in the second and l(X1) denotes the linear space spanned by the columns of X1. We now derive the asymptotic covariance matrix for the MLEs and show that the asymptotic variance of bˆ in FIML does not, in general, equal the asymptotic variance of the MLE of the simple probit estimator of b obtained by ignoring the linear equation. This shows, therefore, that not only are the estimates for b in the FIML and the simple probit numerically different in finite samples, they are also asymptotically different. 2.2. Asymptotic variances n o1 Bl Bl V . In what The asymptotic covariance matrix for aˆ =(cˆ, bˆ, hˆ ) is given by R ¼ E Ba Ba a¼a0 follows, all expectations are computed at the true parameter values and we drop the subscript a= a0. Now, E
Bl Ba
Bl V Ba
h2 Bl Bl E ðxi xiVÞ þ 1þh 2 E Bb Bb V B B 2 Bl Bl Bl Bl B h E Bb ¼ B pffiffiffiffiffiffiffi2ffi E Bb BbV BbV B 1þh @ e22i xi /ðzi Þ h2 Bl Bl h Bl Bl h2 Bl þ E b E E bV E 2 2 3=2 Bb BbV Bb BbV Bb Uðzi Þð1Uðzi ÞÞ 1þh ð1þh2 Þ ð1þh2 Þ
1
0
1
Bl BbV
bþE
e22i /2 ðzi Þ Uðzi Þð1Uðzi ÞÞ
C C C C C A
This follows from the fact that the likelihood for a bivariate normal distribution is concave in the parameters and our likelihood is simply the integral of the bivariate normal likelihood, integrated over the range of one of the variables.
D. Bhattacharya / Economics Letters 85 (2004) 247–255
251
Standard results on inverses of partitioned matrices (e.g., Rao, 1989, p. 33) implies that the second diagonal block in R 1 (which equals the asymptotic variance of bˆ when the system is jointly estimated) equals the inverse of ! Bl Bl h2 Bl Bl h Bl Bl E E pffiffiffiffiffiffiffiffiffiffiffiffiffi E bV Bb BbV Bb BbV 1 þ h2 Bb BbV 1 þ h2 0 2 11 e2i xi /ðzi Þ h2 Bl Bl h2 Bl Bl E E b þ E Eðxi xiVÞ þ 1þh 2 3=2 Uðzi Þð1Uðzi ÞÞ Bb BbV Bb BbV C B ð1þh2 Þ C B 2 2 2 A @ h2 2 e / ð z Þx V e / ð z Þ i i Bl Bl h Bl Bl bVE Bb bVE Bb þ E Uðzi2iÞð1U b þ E Uðzi Þ2ið1Ui ðzi ÞÞ 2 2 ð z Þ Þ BbV BbV 2 3=2 i ð1þh Þ ð1þh Þ ! V h2 Bl Bl h Bl Bl pffiffiffiffiffiffiffiffiffiffiffiffiffi E E b 2 Bb BbV 1 þ h Bb BbV 1 þ h2 Letting 0 @
P
11
P21
0
h2 Bl Bl V E x ð Þ þ x E i i P Bb BbV 1þh2 B A¼B 2 @ h2 e /ðzi ÞxiV Bl Bl þ E Uðzi2iÞð1U bVE Bb P22 ð z Þ Þ BbV 2 3=2 i ð1þh Þ 12
1
h2 3=2 ð1þh2 Þ h2 2 ð1þh2 Þ
E
bVE
2 11 e xi /ðzi Þ b þ E Uðzi2iÞð1U ðzi ÞÞ C 2 2 C A e2i / ðzi Þ Bl BbV b þ E Uðzi Þð1Uðzi ÞÞ
Bl Bl Bb BbV
Bl Bb
Bl Bl and A2 ¼ E ; Bb BbV The asymptotic variance of bˆ in the FIML estimation is the inverse of V ¼ A2
2
12 h4 h2 h2 11 21 22 2 A P A A P þ P A 2 2 2 2 A2 bbVA2 P : 2 2 2 3=2 1 þ h2 1 þ h 1þh
In the special case where k = 1 i.e. x is a scalar, the asymptotic variance of the FIML estiamte of b is the inverse of 2 /2 ðzi Þe22 xi /ðzi Þe22 Eðx2i ÞE Uðzi Þð1Uðz E ÞÞ Uðz Þð1Uðz ÞÞ i i i M D
where
/2 ðzi Þx2i ; Uðzi Þð1 Uðzi ÞÞ 2 /2 ðzi Þe22 xi /ðzi Þe22 Ex2i E D¼E Uðzi Þð1 Uðzi ÞÞ Uðzi ð1 Uðzi ÞÞÞ M ¼E
h2 /2 ðzi Þe22 b2 h2 þM E Ex2i 2 Uðzi Þð1 Uðzi ÞÞ 1þh ð1 þ h2 Þ2 bh2 xi /ðzi Þe22 2M E : Uðzi Þð1 Uðzi ÞÞ ð1 þ h2 Þ3=2 þM
252
D. Bhattacharya / Economics Letters 85 (2004) 247–255
The asymptotic variance of the probit estimator that ignores the linear equation is the inverse of ! 2 V V Bl Bl x x ð Þg f/ x b i i i 0 A02 ¼ E : ¼E Bb BbV b¼b0 ;h¼0 UðxiVb0 Þð1 UðxiVb0 ÞÞ While these expressions do not permit a direct comparison of the matrices V and A20, using the 0 22 may verify expressions for P12, P21, P n , one o that in general, V p A2 (no term in the expression for xi xiVf/ðxiVb0 Þg2 V involves the quantity E U ðxiVb Þð1UðxiVb ÞÞ Þ, although the definiteness of (A20 V) is ambiguous from 0 0 these expressions.3 What we have shown here is that, the asymptotic variance of the FIML estimator for b is different from that of the marginal likelihood maximizer. Also note that in the special case that q = 0 and therefore h = 0, the expression for V implies that (A20 V). 2.3. The general case We state the general result without proof. The proof follows exactly the same steps as the one for a pair of equations but is only notationally far unwieldy. Consider a setup where we have k linear and m nonlinear equations with identical regressors (which are assumed to possess finite second moments), and correlated structural errors that follow a joint normal distribution. Specifically: Yj ¼ X cj þ uj ;
j ¼ 1; . . . k
j ¼ k þ 1...k þ m Y *j ¼ X bjk þ vjk ; Yj ¼ 1 Y *j > 0 ; j ¼ k þ 1...k þ m 0 1 0 1 u uk 1 @ A @ AfNkþm ð0; V Þ where V is conformably partitioned as are independent of X where t um 1 0 1 Ruu Rvu A: V ¼@ Ruv Rvv One observes realizations of ( Yj)j = 1,. . .k + m and X in the data. First, note that 1 ðvAuÞfNk Ruv R1 uu u; Rvv Ruv Ruu Rvu : Let
Yk ¼ y11 ; y12 ; . . . y1n ; y21 . . . y2n ; . . . yk;1 . . . yk;n V; c ¼ ðc1 ; . . . ck ÞV;
b ¼ bðb1 . . . bm ÞV:
For scale identification of b, a scale normalization of diag(Rvv) = Im is essential. 3
An analogy is the tobit model where one can estimate the slopes using the probit part of the likelihood alone and those estimates are less efficient than the one that maximizes the full Tobit likelihood (c.f. Amemiya (1985), page 366). The situation is different here in that the FIML also estimates additional parameters q and c which affect the asymptotic variance of h and makes the comparison ambiguous.
D. Bhattacharya / Economics Letters 85 (2004) 247–255
253
Then, ignoring constants in the likelihood that do not depend on (b,c), likelihood for an i.i.d. sample is then given by Lðy1 . . . yk ; ykþ1...kþm AX ; hÞ ¼ Lð y1 . . . yk AX ; hÞ Lðykþ1... ; ykþm Ay1 . . . yk ; X ; hÞ 1 1 ¼ exp ½Yk ðIk X Þc V Ruu Ik ½Yk ðIk X Þc L2 ðb; cÞ: 2 Therefore the log-likelihood is given by 1 ðYk ðIk X ÞcÞV R1 uu Im ðYk ðIk X ÞcÞ þ l2 ðb; cÞ: 2 Then, first-order conditions are given by 0¼
Bl Bl2 ¼ Bb Bb
0¼
Bl Bl2 ¼ ðIk X ÞV R1 : uu Im ðYk ðIk X ÞcÞ þ Bc Bc
ð6Þ
and
Bl2 2 One may verify that here, Bl Bb ¼ 0 implies Bc ¼ 0 , so that one gets identical estimates for c by maximizing the full likelihood as one would get by k separate OLS estimation of the first k equations. In addition, the asymptotic variance for FIML estimates of b differs from those that ignore the linear equations.
3. Empirical illustration In this section, we illustrate our results using a dataset of infant births. The data came from the national maternal and infant health survey (NMIHS) conducted in the US in 1988. We estimate the following SUR model BO*1i ¼ Xi b þ ei BOi ¼ 1 BO*1i > 0 BWi ¼ Xi c þ vi with (ei, vi) independent of Xi and 00 1 0 1 0 @ AfN2 @@ A; @ qr 0 vi 0
ei
1
qr r
2
11 AA:
254
D. Bhattacharya / Economics Letters 85 (2004) 247–255
Table 1 Results from Joint Estimation Panel 1 (birthweight) Momed Income Income_square Female_child Smoke Mom_overweight intercept
0.0152 0.0021 0.0002 0.1162 0.0112 0.0207 3.4432
0.0064 0.0048 0.0001 0.0282 0.0026 0.0599 0.0899
2.37 0.43 1.44 4.12 4.26 0.35 38.30
Panel 2 (survival) Momed Income Income_square Female_child Smoke Mom_overweight Intercept q r
0.0173 0.0124 0.0002 0.0474 0.0064 0.1619 2.5124 0.4126 0.5827
0.0081 0.0050 0.0001 0.0340 0.0020 0.0628 0.1145 0.0113 0.0182
2.08 2.39 2.20 1.39 2.13 2.56 21.95 36.37 32.01
Panel 1 shows the results for infant survival up to 1 year after birth; panel 2 shows the results for the birthweight equation when estimation is done jointly by maximizing the full likelihood. q and r are as defined in Section 3 of the text.
Here, i indexes the birth, BO denotes whether the infant survived up to one year after birth (BO* is the latent health stock), BW denotes birthweight in kilograms and the common regressors are mother’s years of education, household income (in $000) and its square, average number of cigarettes smoked by mother during pregnancy, a dummy for whether child was female and a dummy for whether mother was overweight [body mass index (BMI) greater than 30]. The number of observations equals 2529. These estimations were performed using the statistical package Stata, where the full-information maximum likelihood was performed using Stata’s maximum likelihood routine where the likelihood function was manually entered. Table 1, shows the report on the results of the joint estimation, in Table 2, we report the OLS coefficients on the second (birthweight) equation and in Table 3, the coefficients from a probit of the first (survival) equation. From Table 2 and panel 1 of Table 1, one can see that for the second (i.e.,
Table 2 OLS estimates for birthweight equation Birthweight
Coefficient
S.D.
t
Momed Income Income_square Female_child Smoke Mom_overweight Intercept
0.0152 0.0021 0.0002 0.1162 0.0112 0.0207 3.4432
0.0064 0.0048 0.0001 0.0282 0.0026 0.0599 0.0899
2.37 0.43 1.44 4.12 4.26 0.35 38.30
D. Bhattacharya / Economics Letters 85 (2004) 247–255
255
Table 3 Probit estimates for survival equation Survival
Coefficient
S.D.
t
Momed Income Income_square Female_child Smoke Mom_overwight Intercept
0.0156 0.0119 0.0003 0.0709 0.0058 0.1919 1.8677
0.0087 0.0056 0.0001 0.0347 0.0028 0.0641 0.1174
1.80 2.12 2.11 2.05 2.03 2.99 15.91
birthweight) equation, we get identical coefficients and standard errors. From Table 3 and panel 2 of Table 1, we get nonidentical but close estimates for the coefficients in the survival equation, but the standard errors are smaller when the system is jointly estimated.
References Amemiya, T., 1985. Advanced Econometrics. Harvard University Press, Cambridge, Massachusetts. Greene, W., 1989. Econometric Analysis, 2nd edition. Macmillan Book. Rao, C.R., 1989. Linear Statistical Inference and Its Applications. Wiley Eastern, New York.