Economics Letters 85 (2004) 247 – 255 www.elsevier.com/locate/econbase

Seemingly unrelated regressions with identical regressors: a note Debopam Bhattacharya * Department of Economics, Dartmouth College, 327 Rockefeller Hall, 03755, Hanover, NH, USA Received 29 February 2004; accepted 20 April 2004

Abstract I show that in a seemingly unrelated, multivariate normal model with identical regressors, where some equations are linear and others are in the limited dependent variable (LDV) form, joint MLE can lead to efficiency gains for parameters of the LDV equations but not for parameters of the linear equations. D 2004 Elsevier B.V. All rights reserved. Keywords: SUR; Bivariate normal; Likelihood JEL classification: C0; C3

1. Introduction It is well-known that in a system of linear seemingly unrelated regression equations with identical regressors, equation by equation OLS yield efficient estimates of the coefficient vectors (see Greene, 1989, p. 488–489, for a text-book treatment). This note extends that result to identical regressors in a seemingly unrelated, (latent) multivariate normal model, where some equations are linear and others are in the limited dependent variable form (here, we investigate only the binary dependent variables). An example would be where the dependent variable in the first equation denotes whether an infant survived for up to 1 year after birth and the second equation denotes weight at birth and the common regressors are mother’s health and educational status and family income. Note that this is different from usual selection-type models where one variable is observed only for individuals whose second variable has crossed a threshold. One may well have more than two equations in the model; as an example, consider multichildren households. The dependent variables of interest are the school enrollment status of different siblings—one for the oldest, one for the second oldest, etc.—and another equation where the * Tel.: +1-603-359-5994; fax: +1-603-643-2122. E-mail address: [email protected] (D. Bhattacharya). 0165-1765/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.econlet.2004.04.012

248

D. Bhattacharya / Economics Letters 85 (2004) 247–255

dependent variable is the family income. Although family income is likely a major determinant of school enrollment, it is clearly an endogenous variable, and in absent plausible instrument, one may still estimate the reduced-form equations (where the dependent variables are the enrollment status of the different siblings plus income), with all exogenous variables included as regressors. Given that unobserved components for the different equations are likely to be correlated, a reasonable specification for this set of reduced-form equations is the SUR and a question of interest is whether using all the equations in the system improves the efficiency of the estimates. The main result of this note is that, in such a system with structural errors being multivariate normal, equation-by-equation OLS still yields efficient estimates of the coefficient vector for the linear equations. However, a (multivariate) probit for the limited dependent variable (LDV) equations alone, ignoring the linear equations in the system, will not necessarily yield asymptotically equivalent estimates of the LDV equation parameters which are obtained by maximizing the full-information joint likelihood (FIML). The result continues to hold if the regressors in the linear equation are a linear combination (in particular, a subset) of the regressors in the nonlinear equations. In Section 2, we show the results with one pair of equations to get the idea across in a simplified manner and state (without proof) the general case. In Section 3, we provide an empirical illustration.

2. Theory 2.1. One pair of equations For simplicity, we consider a two-equation normal SUR model where the first equation is a probit and the second is normal linear regression. We also simplify the notation by normalizing both variances to (1) (our main results do not depend on this normalization). Specifically, then, consider the model y*1i ¼ Xi b þ ei y2i ¼ Xi c þ ti with (ei, vi) are i.i.d. across i, independent of Xi and 0

1

00 1 0 11 0 1 q @ Af N2 @@ A; @ AA: ti 0 q 1 ei

b, caRk, qa( 1, 1), the sequence {xn} belongs to Rk and are i.i.d. with finite second moments. The observed version of the Eq. (1) is given by 8 9 < 0 if y*1i V 0 = : y1 ¼ : ; 1 if y*1i >0

ð1Þ

D. Bhattacharya / Economics Letters 85 (2004) 247–255

249

Denoting by / and U the standard normal density and cdf, the log-likelihood for a simple random sample is given by: N X

N X

Xi b q l¼ ln/ðy2i  Xi cÞ þ y1i lnU pffiffiffiffiffiffiffiffiffiffiffiffiffi  pffiffiffiffiffiffiffiffiffiffiffiffiffi ðy2i  Xi cÞ 2 1q 1  q2 1 1 !! N X Xi b q ð1  y1i Þln 1  U pffiffiffiffiffiffiffiffiffiffiffiffiffi  pffiffiffiffiffiffiffiffiffiffiffiffiffi ðy2i  Xi cÞ þ 1  q2 1  q2 1

!

where we have first conditioned on y2. Reparametrizing q h ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi 1  q2 and rewriting e2i ¼ y2i  Xi c Xi b q zi ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffi  pffiffiffiffiffiffiffiffiffiffiffiffiffi ðy2i  Xi cÞ 2 1q 1  q2 ¼ Xi b

pffiffiffiffiffiffiffiffiffiffiffiffiffi 1 þ h2  hðy2i  Xi cÞ

likelihood is given by l¼

N X

ln/ðe2i Þ þ

1

N X

y1i lnUðzi Þ þ

1

N X ð1  y1i Þlnð1  Uðzi ÞÞ 1

First-order conditions are given by   N pffiffiffiffiffiffiffiffiffiffiffiffiffi X Bl y1i  Uðzi Þ 2 ¼ 1þh 0¼ Xi /ðzi Þ ; Bb Uðzi Þð1  Uðzi ÞÞ 1

ð2Þ

  N N X X Bl y1i  Uðzi Þ xi e2i þ hXi /ðzi Þ ¼ 0¼ Bc Uðzi Þð1  Uðzi ÞÞ 1 1

¼

N X 1

h Bl xi e2i þ pffiffiffiffiffiffiffiffiffiffiffiffiffi ; 1 þ h2 Bb

ð3Þ

250

D. Bhattacharya / Economics Letters 85 (2004) 247–255

and     N Bl h Bl V X y1i  Uðzi Þ  ¼ e2i /ðzi Þ 0¼ b Bh Uðzi Þð1  Uðzi ÞÞ 1 þ h2 Bb 1

ð4Þ

Note from (2) that the second term in (3) is 0 so that (3) reduces to N X

xi e2i ¼ 0

ð5Þ

1

which implies the OLS solution for c (because the likelihood function is globally concave in the parameters1, there is a unique solution to the first order conditions and that solution must satisfy (5)). Note that this result continues to hold if X2ol(X1) where X1 are the regressors in the Eq. (1) and X2 are those in the second and l(X1) denotes the linear space spanned by the columns of X1. We now derive the asymptotic covariance matrix for the MLEs and show that the asymptotic variance of bˆ in FIML does not, in general, equal the asymptotic variance of the MLE of the simple probit estimator of b obtained by ignoring the linear equation. This shows, therefore, that not only are the estimates for b in the FIML and the simple probit numerically different in finite samples, they are also asymptotically different. 2.2. Asymptotic variances n     o1 Bl Bl V . In what The asymptotic covariance matrix for aˆ =(cˆ, bˆ, hˆ ) is given by R ¼ E Ba Ba a¼a0 follows, all expectations are computed at the true parameter values and we drop the subscript a= a0. Now,  E

Bl Ba



 Bl V Ba

  h2 Bl Bl E ðxi xiVÞ þ 1þh   2 E Bb Bb V B     B 2 Bl Bl Bl Bl B h E Bb  ¼ B pffiffiffiffiffiffiffi2ffi E Bb BbV BbV B 1þh        @ e22i xi /ðzi Þ h2 Bl Bl h Bl Bl h2 Bl þ E b E E bV E 2 2 3=2 Bb BbV Bb BbV Bb Uðzi Þð1Uðzi ÞÞ 1þh ð1þh2 Þ ð1þh2 Þ

1

0

1

Bl BbV



bþE



e22i /2 ðzi Þ Uðzi Þð1Uðzi ÞÞ

C C C C C A

This follows from the fact that the likelihood for a bivariate normal distribution is concave in the parameters and our likelihood is simply the integral of the bivariate normal likelihood, integrated over the range of one of the variables.

D. Bhattacharya / Economics Letters 85 (2004) 247–255

251

Standard results on inverses of partitioned matrices (e.g., Rao, 1989, p. 33) implies that the second diagonal block in R 1 (which equals the asymptotic variance of bˆ when the system is jointly estimated) equals the inverse of       ! Bl Bl h2 Bl Bl h Bl Bl E E  pffiffiffiffiffiffiffiffiffiffiffiffiffi E bV Bb BbV Bb BbV 1 þ h2 Bb BbV 1 þ h2 0      2  11 e2i xi /ðzi Þ h2 Bl Bl h2 Bl Bl E E b þ E Eðxi xiVÞ þ 1þh 2 3=2 Uðzi Þð1Uðzi ÞÞ Bb BbV Bb BbV C B ð1þh2 Þ        C B 2 2 2 A @ h2 2 e / ð z Þx V e / ð z Þ i i Bl Bl h Bl Bl bVE Bb bVE Bb þ E Uðzi2iÞð1U b þ E Uðzi Þ2ið1Ui ðzi ÞÞ 2 2 ð z Þ Þ BbV BbV 2 3=2 i ð1þh Þ ð1þh Þ     ! V h2 Bl Bl h Bl Bl  pffiffiffiffiffiffiffiffiffiffiffiffiffi E E b  2 Bb BbV 1 þ h Bb BbV 1 þ h2 Letting 0 @

P

11

P21

0

  h2 Bl Bl V E x ð Þ þ x E i i P Bb BbV 1þh2 B A¼B    2  @ h2 e /ðzi ÞxiV Bl Bl þ E Uðzi2iÞð1U bVE Bb P22 ð z Þ Þ BbV 2 3=2 i ð1þh Þ 12

1

h2 3=2 ð1þh2 Þ h2 2 ð1þh2 Þ

E



bVE

  2  11 e xi /ðzi Þ b þ E Uðzi2iÞð1U ðzi ÞÞ C   2 2 C A e2i / ðzi Þ Bl BbV b þ E Uðzi Þð1Uðzi ÞÞ

Bl Bl Bb BbV



Bl Bb



 Bl Bl and A2 ¼ E ; Bb BbV The asymptotic variance of bˆ in the FIML estimation is the inverse of V ¼ A2 

2

 12  h4 h2 h2 11 21 22 2 A P A  A P þ P  A 2 2 2 2   A2 bbVA2 P :   2 2 2 3=2 1 þ h2 1 þ h 1þh

In the special case where k = 1 i.e. x is a scalar, the asymptotic variance of the FIML estiamte of b is the inverse of     2 /2 ðzi Þe22 xi /ðzi Þe22 Eðx2i ÞE Uðzi Þð1Uðz  E ÞÞ Uðz Þð1Uðz ÞÞ i i i M D

where 

 /2 ðzi Þx2i ; Uðzi Þð1  Uðzi ÞÞ     2 /2 ðzi Þe22 xi /ðzi Þe22 Ex2i  E D¼E Uðzi Þð1  Uðzi ÞÞ Uðzi ð1  Uðzi ÞÞÞ M ¼E

  h2 /2 ðzi Þe22 b2 h2 þM E Ex2i 2 Uðzi Þð1  Uðzi ÞÞ 1þh ð1 þ h2 Þ2   bh2 xi /ðzi Þe22 2M E : Uðzi Þð1  Uðzi ÞÞ ð1 þ h2 Þ3=2 þM

252

D. Bhattacharya / Economics Letters 85 (2004) 247–255

The asymptotic variance of the probit estimator that ignores the linear equation is the inverse of !   2 V V Bl Bl x x ð Þg f/ x b i i i 0 A02 ¼ E : ¼E Bb BbV b¼b0 ;h¼0 UðxiVb0 Þð1  UðxiVb0 ÞÞ While these expressions do not permit a direct comparison of the matrices V and A20, using the 0 22 may verify expressions for P12, P21, P n , one o that in general, V p A2 (no term in the expression for xi xiVf/ðxiVb0 Þg2 V involves the quantity E U ðxiVb Þð1UðxiVb ÞÞ Þ, although the definiteness of (A20  V) is ambiguous from 0 0 these expressions.3 What we have shown here is that, the asymptotic variance of the FIML estimator for b is different from that of the marginal likelihood maximizer. Also note that in the special case that q = 0 and therefore h = 0, the expression for V implies that (A20  V). 2.3. The general case We state the general result without proof. The proof follows exactly the same steps as the one for a pair of equations but is only notationally far unwieldy. Consider a setup where we have k linear and m nonlinear equations with identical regressors (which are assumed to possess finite second moments), and correlated structural errors that follow a joint normal distribution. Specifically: Yj ¼ X cj þ uj ;

j ¼ 1; . . . k

j ¼ k þ 1...k þ m Y *j ¼ X bjk þ vjk ;   Yj ¼ 1 Y *j > 0 ; j ¼ k þ 1...k þ m 0 1 0 1 u uk  1 @ A @ AfNkþm ð0; V Þ where V is conformably partitioned as are independent of X where t um  1 0 1 Ruu Rvu A: V ¼@ Ruv Rvv One observes realizations of ( Yj)j = 1,. . .k + m and X in the data. First, note that   1 ðvAuÞfNk Ruv R1 uu u; Rvv  Ruv Ruu Rvu : Let

  Yk ¼ y11 ; y12 ; . . . y1n ; y21 . . . y2n ; . . . yk;1 . . . yk;n V; c ¼ ðc1 ; . . . ck ÞV;

b ¼ bðb1 . . . bm ÞV:

For scale identification of b, a scale normalization of diag(Rvv) = Im is essential. 3

An analogy is the tobit model where one can estimate the slopes using the probit part of the likelihood alone and those estimates are less efficient than the one that maximizes the full Tobit likelihood (c.f. Amemiya (1985), page 366). The situation is different here in that the FIML also estimates additional parameters q and c which affect the asymptotic variance of h and makes the comparison ambiguous.

D. Bhattacharya / Economics Letters 85 (2004) 247–255

253

Then, ignoring constants in the likelihood that do not depend on (b,c), likelihood for an i.i.d. sample is then given by Lðy1 . . . yk ; ykþ1...kþm AX ; hÞ ¼ Lð y1 . . . yk AX ; hÞ  Lðykþ1... ; ykþm Ay1 . . . yk ; X ; hÞ    1  1 ¼ exp  ½Yk  ðIk X Þc V Ruu Ik ½Yk  ðIk X Þc  L2 ðb; cÞ: 2 Therefore the log-likelihood is given by   1  ðYk  ðIk X ÞcÞV R1 uu Im ðYk  ðIk X ÞcÞ þ l2 ðb; cÞ: 2 Then, first-order conditions are given by 0¼

Bl Bl2 ¼ Bb Bb



  Bl Bl2 ¼ ðIk X ÞV R1 : uu Im ðYk  ðIk X ÞcÞ þ Bc Bc

ð6Þ

and

Bl2 2 One may verify that here, Bl Bb ¼ 0 implies Bc ¼ 0 , so that one gets identical estimates for c by maximizing the full likelihood as one would get by k separate OLS estimation of the first k equations. In addition, the asymptotic variance for FIML estimates of b differs from those that ignore the linear equations.

3. Empirical illustration In this section, we illustrate our results using a dataset of infant births. The data came from the national maternal and infant health survey (NMIHS) conducted in the US in 1988. We estimate the following SUR model BO*1i ¼ Xi b þ ei   BOi ¼ 1 BO*1i > 0 BWi ¼ Xi c þ vi with (ei, vi) independent of Xi and 00 1 0 1 0 @ AfN2 @@ A; @ qr 0 vi 0

ei

1

qr r

2

11 AA:

254

D. Bhattacharya / Economics Letters 85 (2004) 247–255

Table 1 Results from Joint Estimation Panel 1 (birthweight) Momed Income Income_square Female_child Smoke Mom_overweight intercept

0.0152 0.0021  0.0002  0.1162  0.0112 0.0207 3.4432

0.0064 0.0048 0.0001 0.0282 0.0026 0.0599 0.0899

2.37 0.43  1.44  4.12  4.26 0.35 38.30

Panel 2 (survival) Momed Income Income_square Female_child Smoke Mom_overweight Intercept q r

0.0173 0.0124  0.0002 0.0474  0.0064  0.1619 2.5124 0.4126 0.5827

0.0081 0.0050 0.0001 0.0340 0.0020 0.0628 0.1145 0.0113 0.0182

2.08 2.39  2.20 1.39  2.13  2.56 21.95 36.37 32.01

Panel 1 shows the results for infant survival up to 1 year after birth; panel 2 shows the results for the birthweight equation when estimation is done jointly by maximizing the full likelihood. q and r are as defined in Section 3 of the text.

Here, i indexes the birth, BO denotes whether the infant survived up to one year after birth (BO* is the latent health stock), BW denotes birthweight in kilograms and the common regressors are mother’s years of education, household income (in $000) and its square, average number of cigarettes smoked by mother during pregnancy, a dummy for whether child was female and a dummy for whether mother was overweight [body mass index (BMI) greater than 30]. The number of observations equals 2529. These estimations were performed using the statistical package Stata, where the full-information maximum likelihood was performed using Stata’s maximum likelihood routine where the likelihood function was manually entered. Table 1, shows the report on the results of the joint estimation, in Table 2, we report the OLS coefficients on the second (birthweight) equation and in Table 3, the coefficients from a probit of the first (survival) equation. From Table 2 and panel 1 of Table 1, one can see that for the second (i.e.,

Table 2 OLS estimates for birthweight equation Birthweight

Coefficient

S.D.

t

Momed Income Income_square Female_child Smoke Mom_overweight Intercept

0.0152 0.0021  0.0002  0.1162  0.0112 0.0207 3.4432

0.0064 0.0048 0.0001 0.0282 0.0026 0.0599 0.0899

2.37 0.43  1.44  4.12  4.26 0.35 38.30

D. Bhattacharya / Economics Letters 85 (2004) 247–255

255

Table 3 Probit estimates for survival equation Survival

Coefficient

S.D.

t

Momed Income Income_square Female_child Smoke Mom_overwight Intercept

0.0156 0.0119  0.0003 0.0709  0.0058  0.1919 1.8677

0.0087 0.0056 0.0001 0.0347 0.0028 0.0641 0.1174

1.80 2.12  2.11 2.05  2.03  2.99 15.91

birthweight) equation, we get identical coefficients and standard errors. From Table 3 and panel 2 of Table 1, we get nonidentical but close estimates for the coefficients in the survival equation, but the standard errors are smaller when the system is jointly estimated.

References Amemiya, T., 1985. Advanced Econometrics. Harvard University Press, Cambridge, Massachusetts. Greene, W., 1989. Econometric Analysis, 2nd edition. Macmillan Book. Rao, C.R., 1989. Linear Statistical Inference and Its Applications. Wiley Eastern, New York.

Seemingly unrelated regressions with identical ...

for this set of reduced-form equations is the SUR and a question of interest is whether ... equation parameters which are obtained by maximizing the full-information joint ..... expressions for P12, P21, P22, one may verify that in general, Vp A2.

94KB Sizes 0 Downloads 176 Views

Recommend Documents

Image Compression with Single and Multiple Linear Regressions
Keywords: Image Compression,Curve Fitting,Single Linear Regression,Multiple linear Regression. 1. Introduction. With the growth of ... in applications like medical and satellite images. Digital Images play a very .... In the proposed system, a curve

Image Compression with Single and Multiple Linear Regressions - IJRIT
Ernakulam, Kerala, India [email protected]. 2Assistant Professor, Computer Science, Model Engineering College. Ernakulam,, Kerala, India. Abstract.

Minimizing Makespan with Release Times on Identical Parallel ...
School of Mathematics and System Science, Shandong University. Jinan 250100 .... file, the processing times of all large batches will be known, thus we can.

Non identical winter clothes match.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Non identical ...

Interpreting Labor Supply Regressions in a Model of ...
http://www.aeaweb.org/articles.php?doi=10.1257/aer.101.3.476. Consider an individual with time separable preferences and period utility function of the form.

1499501162224-stock-trade-millionaires-identical-one-sec ...
... Forex-Trading-Tools. Page 2 of 3. Page 3 of 3. Page 3 of 3. 1499501162224-stock-trade-millionaires-identical-one-sec-collective-platform-vendor.pdf.

Seemingly Inextricable Dynamic Differences: The Case ...
Jun 24, 2017 - Assuming that violent crime is a career, we provide a straightforward dynamic ... career and also the continuation value for existing criminals.

The Power of Preemption on Unrelated Machines and ... - DII UChile
consider a linear relaxation of an integer programming formulation of RCmax, and show ... machine) so as to give the best possible service to his clients. .... generalizes P Cmax and P ∑wj Cj . These two problems are well known to be NP-hard, .....

Minimizing Maximum Lateness on Identical Parallel ... - Springer Link
denote the objective value of schedule S. We call a batch containing exactly B ... A technique used by Hall and Shmoys [9] allows us to deal with only a constant ...

Sensor-based exploration for planar two-identical-link ...
four degrees of freedom, and hence the roadmap is one-dimensional in an unknown configuration space R2 В T2. The. L2-generalized Voronoi ... sensor-based incremental method for motion plan- ning in an unknown environment. ...... generalized voronoi

Do spot return regressions convey useful information ...
Jun 19, 2012 - Email: [email protected], Tel.: +34-91-624-8668, .... 5In the ad hoc monetary models, a home money market relation is given by mt − pt = yt ...

Two-phase, Switching, Change-point regressions ... - Semantic Scholar
of the American Statistical Association 67(338): 306 – 310 ... Bayesian and Non-Bayesian analysis of ... Fitting bent lines to data, with applications to allometry. J.

Understanding Aggregate Crime Regressions Steven N ...
Feb 6, 2009 - One aspect is predictive, as illustrated by the literature that attempts to .... with respect to the modeler; the individual observes all the variables ...

The Power of Preemption on Unrelated Machines and ... - DII UChile
Scheduling jobs on unrelated parallel machines so as to minimize makespan is one of the basic ... machine) so as to give the best possible service to his clients.

Beyond magnitude Judging ordinality of symbolic number is unrelated ...
Beyond magnitude Judging ordinality of symbolic number i ... ntly relates to individual differences in arithmetic.pdf. Beyond magnitude Judging ordinality of ...

Regime Specific Predictability in Predictive Regressions
Jun 24, 2011 - using moderately small sample sizes and compares them with their asymptotic counterparts. .... represent correct decision frequencies evaluated as the number of times the pvalue of ... and T = 800 from the null DGP yt = 0.01 + ut. ....

lecture 13: from interpolations to regressions to gaussian ... - GitHub
LECTURE 13: FROM. INTERPOLATIONS TO REGRESSIONS. TO GAUSSIAN PROCESSES. • So far we were mostly doing linear or nonlinear regression of data points with simple small basis (for example, linear function y=ax+b). • The basis can be arbitrarily larg

reverse regressions and longhorizon forecasting - Wiley Online Library
Nov 23, 2011 - Long-horizon predictive regressions in finance pose formidable ... methods to forecasting excess bond returns using the term structure of ...

unrelated to body size Sauropod dinosaurs evolved ...
Sep 30, 2009 - package (Faure et al. 2006). ... and accounting for the amount of expected covariation ..... mesquiteproject.org/packages/stratigraphicTools/).

Correlated Equilibrium and Seemingly-Irrational Behavior
and Federgrauen, 2004), and supply chain relationships (Taylor and Plambeck, 2007). The issue raised in the ... trader can electronically access the data on all the prices of the di erent markets. Although in reality each ..... equilibrium in every n

On the configuration-LP for scheduling on unrelated ...
May 11, 2012 - Springer Science+Business Media New York 2013. Abstract Closing the approximability gap .... inequalities that prohibit two large jobs to be simultane- ously assigned to the same machine. .... Table 1 The integrality gap of the configu