Discussion Paper: 2011/02

Identification and inference in a simultaneous equation under alternative information sets and sampling schemes

Jan F. Kiviet

www.feb.uva.nl/ke/UvA-Econometrics

Amsterdam School of Economics Department of Quantitative Economics Valckenierstraat 65-67 1018 XE AMSTERDAM The Netherlands

Identi…cation and inference in a simultaneous equation under alternative information sets and sampling schemes Jan F. Kiviet Tinbergen Institute, University of Amsterdam 1 April 2011 JEL-classi…cation: C12, C13, C15, C26, J31 Keywords: partial identi…cation, weak instruments, (un)restrained repeated sampling, (un)conditional (limiting) distributions, credible robust inference

Abstract In simple static linear simultaneous equation models the empirical distributions of IV and OLS are examined under alternative sampling schemes and compared with their …rst-order asymptotic approximations. It is demonstrated why in this context the limiting distribution of a consistent estimator is not a¤ected by conditioning on exogenous regressors, whereas that of an inconsistent estimator is. The asymptotic variance and the simulated actual variance of the inconsistent OLS estimator are shown to diminish by extending the set of exogenous variables kept …xed in sampling, whereas such an extension disrupts the distribution of consistent IV estimation and deteriorates the accuracy of its standard asymptotic approximation, not only when instruments are weak. Against this background the consequences for the identi…cation of the parameters of interest are examined for a setting in which (in practice often incredible) assumptions regarding the zero correlation between instruments and disturbances are replaced by (generally more credible) interval assumptions on the correlation between endogenous regressors and disturbances. This leads to a feasible procedure for constructing purely OLS-based robust con…dence intervals, which yield conservative coverage probabilities in …nite samples, and often outperform IV-based intervals regarding their length.

1. Introduction We approach the fundamental problems of simultaneity and identi…cation in econometric linear structural relationships in a non-traditional way. First, we illustrate the e¤ects of the chosen sampling regime on the limiting and empirical …nite sample distributions of both OLS and IV. Department of Quantitative Economics, Amsterdam School of Economics, University of Amsterdam, Valckenierstraat 65-67, 1018 XE Amsterdam, The Netherlands; phone +31.20.5254224; email [email protected]. For the many constructive remarks regarding earlier stages of this research I want to thank seminar participants at the universities of Amsterdam, Montréal, Yale (New Haven), Brown (Providence), Harvard (Cambridge), Carleton (Ottawa), New York, Penn State (University Park), Rutgers (New Brunswick) and Columbia (New York), and those giving feedback at presentations at ESWC (Shanghai, 19 August 2010), EC2 (Toulouse, 19 December 2010) and CESifo (Munich, February 2011).

We distinguish sampling schemes in which exogenous variables are either random or kept …xed. We …nd that, apart from its shifted location due to its inconsistency, the OLS distribution is rather well-behaved, though mildly dependent on the chosen sampling regime, both in small and in large samples. On the other hand, and not predicted by standard asymptotic theory, the sampling regime is found to have a much more profound e¤ect on the actual IV distribution. Especially for …xed and not very strong instruments, IV is in fact often much less informative on the true parameter value than OLS is. Next, following ideas from the recent partial identi…cation literature1 , we then dare to replace the standard point-identifying but in practice often incredible assumptions on the zero correlation between (possibly weak) instruments and the disturbance term, supplemented with the exclusion restriction of the instruments as regressors from the structural equation, by an alternative identifying information set. This abstains from instruments; instead, it restricts the range of possible values of the correlation between endogenous regressor and disturbance. This range can be chosen quite wide, so that the resulting inference gains credibility. In this context, partial identi…cation refers to the opportunity to achieve full identi…cation of all parameters of interest by adopting particular assumptions. When alternative sets of assumptions are available for this purpose, one should exploit those which seem more credible, but of course the resulting inferences should also be judged on the basis of their robustness, their e¢ ciency and their accuracy. Achieving identi…cation through replacing orthogonality conditions by simultaneity assumptions yields con…dence intervals which are purely based on standard OLS statistics. These have just to be transformed by a very simple linear formula, which involves: both the endpoints of the assumed interval for the simultaneity correlation coe¢ cient, the sample size, and a standard Gaussian or Student critical value. The simplicity of the procedure is partly based on the following truly remarkable result regarding the asymptotic variance of OLS under simultaneity. This asymptotic variance is smaller (in a matrix sense) than the standard expression which holds under classic assumptions. Under partial identi…cation the OLS estimator is corrected for its inconsistency. This correction term is based on standard OLS results and on the assumed simultaneity coe¢ cient. Because this correction is random, it yields an increment to the asymptotic variance of inconsistent OLS. Under normality and adopting the sampling scheme which does not involve conditioning on the exogenous variables, this increment is found to yield again an asymptotic variance of the partially identi…ed modi…ed consistent OLS estimator which is identical to the classic well-known simple expression. The structure of this paper is as follows. In Section 2 we sketch the settings that we will examine regarding DGP and available and exploited information and we review the relevant available literature. For the sake of simplicity we just focus on a simultaneous equation involving one endogenous explanatory variable and one external instrument. We distinguish settings in which the researcher either uses OLS or IV-based techniques. Standard OLS may be chosen, either because the researcher simply disregards simultaneity, or perhaps because the instrument is not available, or is available but its exogeneity is not trusted. We compare this with the situation where IV is used, just focussing on the ideal situation where the instrumental variable is valid indeed, in the sense that its corresponding orthogonality condition holds, although it may in fact be a weak instrument. In addition to that, yielding a two-way clustering of settings, we also distinguish di¤erent sampling schemes in which particular exogenous variables are either kept …xed in repeated sampling or are genuinely random. In Section 3 a generic approach is developed to derive the limiting distribution of an inconsistent estimator while conditioning 1

See Manski (2003, 2007) and for a recent review see Tamer (2010). Phillips (1989) and others have used ‘partial identi…cation’to indicate a fundamentally di¤erent situation.

2

on a subset of the exogenous variables. This enables to unify the derivation of some earlier (then separate) results on the limiting distributions of inconsistent OLS and consistent IV under the various sampling schemes. A systematic comparison is made of the consequences of conditioning with respect to the asymptotic variances and asymptotic root mean squared errors, both analytically and numerically, by providing graphs covering most cases of interest over the whole parameter space for the chosen simple DGP. In Section 4 the actual distributions of OLS and IV for the various settings are simulated in …nite samples. From these it emerges that for many relevant cases the limiting distribution of IV does not provide a very accurate approximation to its actual distribution, especially not when the sampling scheme is restrained, even so when the instrument is not a very weak one. Next in Section 5 it is demonstrated that the OLS estimator becomes partially identi…ed when one is willing to make an assumption restraining the set of values of the correlation coe¢ cient between endogenous regressor and structural disturbance. In such a framework bias correction of the inconsistent OLS estimator becomes feasible, but does a¤ect its distribution, also asymptotically. After deriving this limiting distribution, which also requires an assumption on the third and fourth moments of the disturbance term, partially identi…ed OLS-based con…dence intervals for the parameter of interest become feasible too. From simulations we …nd that the accuracy of this instrument-free inference is remarkable. Its robustness and credibility can be enhanced by using as a conservative con…dence interval the union of the intervals obtained for a range of values for the simultaneity correlation coe¢ cient. Next it is demonstrated that the length of the resulting intervals often compares very favorably with what can be achieved by IV-based methods. Finally we present an empirical illustration, and Section 6 concludes.

2. The data generating processes and settings to be considered For the sake of simplicity we will focus on single regressor models only, and leave multivariate generalizations for future work. The DGP involves just two jointly dependent data series, which embody salient features met in practice when analyzing their relationship, especially when this has to be based on a cross-section sample. Instead of specifying these relationships top-down, as is usually done in econometrics, we specify them here bottom-up, following precisely their generating schemes as used in the simulation experiments that will be exploited later. Next we discuss the settings under which this DGP might be analyzed in practice, and make references to some related studies. 2.1. The simple DGP The basic building block of the DGP consists of the three mutually independent series (for i = 1; :::; n) of IID zero mean random variables "i

IID(0;

2 " );

vi

IID(0;

2 v

) and zi

IID(0;

2 z ):

(2.1)

To avoid particular pathological cases we assume 2" > 0 and 2z > 0; but allow 2v 0: The "i will establish the disturbance term in the single simultaneous equation of interest with just one regressor xi : The reduced form equation for this regressor xi has reduced form disturbance vi ; which consists of two independent components, since v i = v i + "i :

3

(2.2)

Variable zi ; the third of the independent IID series given in (2.1), is to be used as instrument. The reduced form equation for xi is (2.3)

x i = zi + v i ; and the simultaneous equation under study is

(2.4)

y i = x i + "i :

So, in total we have 6 parameters, namely 2" ; 2v ; 2z ; ; and ; where the latter is the parameter of primary interest. However, some of these parameters can be …xed, without loss of generality. We may take = 0; hence acting as if we had …rst subtracted xi form both sides of (2.4). Also, we may take z = 1 and next interpret as the original z : Moreover, scaling equation (2.4) by taking " = 1 has no principle consequences either. So, we …nd that this DGP has in fact only three free parameters, namely ; and 2v : Nevertheless, it represents the essentials of a general linear just identi…ed (provided 6= 0) simultaneous equation containing two endogenous variables, from which any further exogenous regressors, including the constant, have been partialled out. We shall consider a simple nonlinear reparametrization, because this yields three di¤erent basic parameters which are easier to interpret. Not yet imposing the normalizations ( z = 1 = " ) we …nd Cov(xi ; "i ) = x" x " = 2" and Cov(xi ; zi ) = xz x z = 2z : (2.5) The correlations x" and xz parametrize simultaneity and instrument strength respectively. Because the instrument obeys the moment restrictions E(zi "i ) = 0 (thus z" = 0) and E(zi vi ) = 0; we also …nd V ar(xi ) = 2x = 2 2z + 2v ; (2.6) with 2 v

V ar(vi ) =

=

2 v

+

2 2 ";

(2.7)

thus V ar(vi ) = and from

2 v

2 v

2 x (1

=

2 x"

2 xz );

(2.8)

0 it follows that 2 x"

+

2 xz

(2.9)

1:

So, only combinations of x" and xz values in or on a circle with radius unity are admissible. Such a restriction is plausible, because, given that zi is uncorrelated with "i ; regressor xi cannot be strongly correlated with both "i and zi at the same time. The above derivations yield a more insightful three-dimensional parameterization for this DGP. This is based on simply choosing values for x > 0; x" and xz ; respecting (2.9), and taking " = z = 1 and = 0: After choosing the three values x ; x" and xz ; the data can be generated according to the equations (2.1) through (2.4), where =

xz x

and

=

(2.10)

x" x ;

with 2v as in (2.8). Another basic model characteristic often mentioned in this context is the population concentration parameter (P CP ), given by P CP

nV ar( zi ) =n V ar(vi )

2 2 z 2 2 2 v + "

=n

2 xz

1

2 xz

:

(2.11)

We control this simply by setting values for xz and n: These two, together with x" and x ; and the further distributional characteristics (third and higher moments) of the IID variables "i ; vi and zi ; will determine the actual properties of any inference technique for this model. 4

2.2. Possible extensions of the simple DGP The DGP with just three fundamental parameters can of course easily be generalized in various directions. Not only could further exogenous variables be introduced explicitly, either as regressors included in the simultaneous equation or, when excluded, as exploitable overidenti…cation restrictions. Interesting would also be to generate further endogenous regressors and either include or exclude these. In the latter case, one could also examine the e¤ects of inappropriately using these as instruments. However, here we will just stick to the three parameter DGP, simply because there is still plenty to be discovered about it, as we shall demonstrate. Nevertheless, we will indicate here one particular type of possible extension, just because that supports the interpretation of the component vi of the reduced form disturbance vi : When allowing for overidenti…cation, this requires to replace zi IID(0; 1) by the L > 1 (l) series zi IID(0; 1); l = 1; :::; L: Then the reduced form equation generalizes to xi =

XL

(l) l zi

l=1

(2.12)

+ vi ;

where again vi = vi + "i : Without loss of generality we may assume that the L instruments (possibly after a transformation) are all mutually independent. Then it follows that 2 x

=

XL

l=1

2 l

+

2

+

2 v

:

(2.13)

XL (l) 2 2 With E(xi zi ) = xz(l) x = l and (2.10) one …nds 2v = 2x (1 (l) ): This x" l=1 xz again implies admissibility restrictions, which here involve an (L + 1)-dimensional unit sphere, namely 2 2 2 1: (2.14) x" + xz (1) + ::: + xz (L) Now imagine that a researcher has available (or is aware of) only L# < L of the instrumental (l) (1) (L# ) variables, namely zi for l = 1; :::; L# : Then xi = 1 zi + ::: + L# zi + vi# + vi + "i ; with # (L) (L +1) + ::: + L zi : Now vi# + vi plays the role of the earlier vi : If a sampling vi# = L# +1 zi scheme makes sense which keeps all the instruments …xed, then one in which vi# is kept …xed makes sense too. Because vi# + vi represents all the contributions to xi which are exogenous with respect to "i but are not explicitly taken into account by the available instruments, one may choose to treat them either as random, or condition on some or on all of them and treat these as …xed. For our simple model where L = 1 this implies that it makes sense to de…ne zi + vi : Regarding the reduced form equation the latent variable xi x i = zi + v i + " i = x i + "i ;

(2.15)

we will distinguish for the three random variables zi ; vi and "i of (2.1) the three nested sampling schemes: (a) no conditioning on any of them; (b) conditioning on zi ; and (c) conditioning on xi : Note that xi represents all components of xi which are unrelated to "i : 2.3. Various alternative practically relevant settings Despite its simplicity the data generation scheme above for the simple relationship (2.4) is to be used now to represent the essentials of particular practically relevant modelling situations, in which the ultimate goal is to produce interpretable (that is identi…ed), accurate (that is supplemented by rather precise probability statements) and credible (thus based on nonsuspect 5

assumptions) inferences regarding : This has to be obtained from a …nite sample and will be based of course on estimators and test statistics and the asymptotic approximations to their distributions. To that end we will examine for these di¤erent settings issues concerning the available and exploited information and its consequences for the identi…cation of : The settings are characterized by whether particular variables are supposed to be available (example: zi has been observed or not) and by conditioning, either on particular adopted assumptions, usually expressed in the form of parametric restrictions (such as z" = 0 or x" = 0), or on the realized values of random exogenous variables (such as zi or xi ). To avoid confusion about all this, in what follows we shall never use the word conditioning with respect to particular model assumptions or parametric restrictions, but exclusively for the notion of analyzing issues of interest according to a sampling scheme involving sticking to given realizations of particular random exogenous variables. Such conditioning will involve restraining the simulation design in a particular way. Whether such conditioning is relevant regarding the interpretation of inference depends on the particular practical situation one has in mind. Here an important consideration is whether the IID sample of size n has been obtained from a much larger population and is to be used for inference on this population, as is often the case for survey data. Or, whether the IID property is especially a feature of just the n observations of the series "i : Then the interest in inference conditional on the actual realization of the exogenous variables may be motivated by keeping as closely as possible to the underlying sample, possibly because there is no clear-cut population, as is often the case in time-series, or when the sample data are the result of a census in which case population and sample may coincide. If the analysis pertains just to the situation in which particular assumptions regarding limitations of the parameter values hold, then we will not use words like "conditional under the assumption that ...", but exclusively words like "under the restriction that ...". Restrictions that we will consider are, for example, the point-restriction x" = 0 or another admissible value, L U L but also interval-restrictions like x" 2 Rx" 0:1 and x" ; x" ; where, for instance, x" = U L U x" = 0:6: Of course, this specializes to a point-restriction when x" = x" : We do assume throughout that the variables yi and xi have been observed and are available to the practitioner, but not the disturbance (components) "i ; vi and vi : We will distinguish settings in which zi is available too, and is used as an instrument, but also the situation in which it is not used. Instrument zi may not be used because: (i) it is not available, (ii) because the researcher (possibly erroneously) assumes that x" = 0; or (iii) the researcher is not willing to make the assumption z" = 0: In these cases OLS may be used, although xi establishes an invalid instrument when x" 6= 0: However, in Section 5 we will show that under x" 2 Rx" coe¢ cient is partially identi…ed by a modi…ed version of OLS, which does not require any instruments. 2.4. Some related literature There has already been an extensive literature on the small sample properties of IV estimators. Pioneers are Basmann and Richardson; see also Sawa (1969) and Phillips (1983) and references in there. The …rst to pay speci…c attention to weak instrument problems were Nelson and Startz (1990a,b). They study the same simple simultaneous model as we do, but use quite a di¤erent parametrization. They focus exclusively on the situation where (in our notation) both zi and vi are kept …xed. That this has curious consequences was already noted by Maddala and Jeong (1992). Keeping zi …xed is also a characteristic of the studies by Woglom (2001), Hillier

6

(2006) and Forchini (2006). Of course, any simulation …ndings in which zi has been kept …xed, are speci…c for the particular realization used. To grasp its e¤ects, it would be instructive to illustrate results for a few di¤erent realizations. We will do so, and further examine the practical consequences of conditioning (or not), not just for IV, but also for OLS. A very speci…c case of the simple simultaneous model considered here has received extensive special interest in the literature. Consider the classic consumption function model ci = xi + "i (where xi is income, 0 < < 1), supplemented and closed by the budget restriction equation xi = ci + zi ; where zi is exogenous. De…ning yi = ci xi this yields yi = xi + "i (with true value = 0) and reduced form equation xi =

1 1

zi +

1 1

"i :

(2.16)

So, this represents the special parameterization = = 1=(1 ) and 2v = 0: So, instead of three free parameters, this DGP has just one p free parameter. Note that here x" = xz 2; thus from 2x" + 2z" = 1 we obtain x" = xz = 21 2 :7; with P CP = n and 2x = 2(1 ) > 2: So, the instrument is not weak for n 10; see Staiger and Stock (1997), and the simultaneity is always serious. Recently, the distribution of the IV estimator and its t-ratio have been examined for this highly special model by Phillips (2006, 2009), but exclusively for …xed zi : The above makes clear that much of our present-day understanding of the …nite sample distribution of IV estimators largely refers just to the setting in which some kind of conditioning on exogenous variables has been accommodated. We will demonstrate that a major …nding in that literature on the shape of IV probability densities (often bimodal and even zero between the modes) is not typical for the three parameter model, but only occurs in speci…c cases under a conditional sampling scheme. Before we do so, we will design …rst a generic approach by which we can derive for both consistent and inconsistent estimators what the e¤ects of conditioning are on their limiting distributions.

3. Limiting distributions of IV and OLS under alternative sampling schemes The IV and OLS estimators of

and

are given by Pn ^ IV = Pni=1 zi yi = i=1 zi xi

^

OLS

Pn xi yi = Pi=1 = n 2 i=1 xi

Pn zi "i + Pni=1 : i=1 zi xi

Pn x i "i + Pi=1 : n 2 i=1 xi

(3.1)

(3.2)

Here we will examine their limiting distributions under the di¤erent sampling schemes mentioned above. We restrict ourselves to the standard case where the variable zi is an appropriate instrument in the sense that it corresponds to the valid orthogonality condition E[zi (yi xi )] = 0; with E(zi xi ) 6= 0: Hence, is point-identi…ed, provided xz = O(1); although the instrument may be weak when xz is close to zero. In the asymptotic analysis to follow we do not consider consequences of instrument weakness in which one assumes, for instance, xz = O(n 1=2 ): To maintain in this section generality of the analytic results, we will not impose yet the normalization restrictions to be used later in the simulations, which are: = 0; 2" = 1 and 2z = 1: 7

To investigate the e¤ects on limiting distributions of conditioning on realized random variables which are independent of "i ; we will employ the following Lemma 1: Let R = N=D be a ratio P Pnof scalar random variables, where the numerator N = n i=1 Ni and the denominator D = i=1 Di are both sample aggregates for a sample of size n; whereas both N and D can be decomposed, employing some set of conditioning variables C; such ~ and D = D + D; ~ where N E(N j C) = O(n) and D E(D j C) = O(n); that N = N + N ~ j C = Op (n1=2 ) and D ~ j C = Op (n1=2 ): Then, provided D 6= 0; while N R=

1 D

~ N +N

N ~ D + op (n D

1=2

):

~ 1 ; where Proof of Lemma 1: We may rewrite D 1 = [D + (D D)] 1 = D 1 (1 + D 1 D) ~ = Op (n 1=2 ): Then the simple Taylor expansion (1 + D 1 D) ~ 1 = 1 D 1D ~ + op (n 1=2 ) D 1D 1 1 1 ~ 3=2 yields D = D (1 D D) + op (n ); giving ~ )D 1 (1 R = N=D = (N + N

~ + op (n D 1 D)

1=2

):

~ = Op (n 1 ): ~ D 2D From this, the result of the lemma directly follows, upon noting that N From Lemma 1 we …nd that, conditional on C, R

1 N = D D

N ~ D + op (n D

~ N

1=2

):

(3.3)

Now, employing results attributed to Slutsky and Cramér, one can easily obtain d ~ N D= ~ D) j C ! Lemma 2: For sample data such that Lemma 1 holds, whereas n 1=2 (N d N (0; V0 ) ; one has n1=2 R N =D j C ! N (0; V ) ; where V = (lim n 1 D) 2 V0 :

In the next subsections these lemmas are applied under the various sampling schemes, …rst with respect to IV and next regarding OLS. 3.1. Limiting distribution of IV and conditioning For the consistent IV estimator we shall now employ the above Lemma’s to Pn zi "i N ^ R = IV = = Pni=1 D i=1 zi xi

(3.4)

in order to derive the limiting distribution under the three settings: (i) unconditional, which we indicate by C? ; (ii) under the conditioning set Cz fz1 ; :::; zn g; and (iii) conditioning on both z and v ; indicated by Cx fx1 ; :::; xn g: As we discussed above, the latter kind of conditioning may seem odd, because vi is latent. Its justi…cation can be based on the mere fact that it was done too in the Nelson and Starz papers, which did arouse the great interest in weak instruments. Our justi…cation is that xi represents the e¤ect on xi of both the included and all the omitted instruments. By conditioning on xi the only remaining random element of the DGP is "i : In cases where we would have L > 1 instruments there would be many intermediary and possibly more relevant cases in between the present Cz and Cx : Now we just focus on these two extremes. 8

3.1.1. Unconditional IV For obtaining the unconditional limiting distribution we do not need the above Lemmas, and can just follow the standard textbook approach. This exploits that according to the Law of P Large Numbers (LLN) plim n 1 ni=1 zi xi = xz 6= 0; while a standard Central Limit Theorem P d (CLT) yields n 1=2 ni=1 zi "i ! N (0; 2" 2z ) ; since E(zi "i ) = 0; E(zi zj "i "j ) = 0 for i 6= j and E(zi2 "2i ) = E[zi2 E("2i j zi )] = 2" 2z ; provided the disturbances are conditionally homoskedastic. Then it follows that P P 1 d n1=2 ( ^ IV ) = n 1 ni=1 zi xi n 1=2 ni=1 zi "i ! N 0; 2" 2z = 2xz : (3.5)

Note that 2z = 2xz = 1=( 2xz 2x ): Hence, the asymptotic variance is inversely related to the strength of the instrument, and the asymptotic distribution is invariant with respect to the degree of simultaneity x" : 3.1.2. Conditioning IV on reduced form regressors

When conditioning on the zi we obtain P P N E(N j Cz ) = E( ni=1 zi "i j Cz ) = ni=1 zi E("i j Cz ) = 0; P ~ j Cz = Op (n1=2 ); ~ N N N = ni=1 zi "i ; with N Pn 2 P P D E(D j Cz ) = E( ni=1 zi xi j Cz ) = ni=1 zi E(xi j Cz ) = i=1 zi = O(n); P P n n ~ j Cz = Op (n1=2 ): ~ zi ) = i=1 zi vi ; with D D D D = i=1 zi (xi ~ j Cz = Op (n1=2 ); because V ar (Pn zi "i j Cz ) = 2" Pn zi2 = O(n): We conclude that N i=1 i=1 ~ j Cz = Op (n1=2 ); because we …nd The result for D follows immediately from (2.3), and D P P satis…ed, V ar ( ni=1 zi vi j Cz ) = 2v ni=1 zi2 = O(n): Hence, the conditions of Lemma 1 areP ~ ~ ~ although N is actually much smaller than O(n): Moreover, N N D=D = N = ni=1 zi "i ; P ~ N D= ~ D) = n 1=2 n zi "i ; for which we have already obtained that it has thus n 1=2 (N i=1 P conditional expectation zero and conditional variance 2" n1 ni=1 zi2 ; thus a standard CLT yields P V0 = lim 2" n1 ni=1 zi2 = 2" 2z : Because lim D=n = 2z = xz z x ; Lemma 2 yields n1=2 ( ^ IV

d

) j Cz ! N 0;

2 2 2 " =( xz x )

:

(3.6)

Conditioning did P not a¤ect (3.6) to di¤er from basically because Pn (3.5), Pn N = 0 and due to n 1 1 1 1 2 2 lim n D = lim n i=1 E(zi xi j zi ) = lim n i=1 zi = z = plim n i=1 zi xi :

3.1.3. Conditioning IV on all exogenous (latent) reduced form variables

Exploiting E(xi "i ) = 0 we now …nd P P N E(N j Cx ) = E( ni=1 zi "i j Cx ) = ni=1 zi E("i j Cx ) = 0; P ~ ~ j Cx = Op (n1=2 ); N N N = ni=1 zi "i ; with N Pn 2 Pn Pn P D E(D j Cx ) = E[ i=1 zi (xi + "i ) j Cx ] = ni=1 zi xi = i=1 zi + i=1 zi vi = O(n); P P n n 1=2 ~ ~ D D D = i=1 zi (xi zi v i ) = ): i=1 zi "i ; with D j Cx = Op (n ~ j Cx = Op (n1=2 ) is now self-evident. Because V ar ( Pn zi "i j Cx ) = 2 Pn z 2 = That N " i=1 i=1 i ~ j Cx = Op (n1=2 ): Again N = 0; giving n 1=2 (N ~ N D= ~ D) = n 1=2 Pn zi "i O(n) we …nd D i=1 9

P Pn 2 21 and V0 = lim 2" n1 ni=1 zi2 = 2" 2z : Because lim D=n = z + lim " n i=1 E(zi vi ) = xz z x ; we also …nd d n1=2 ( ^ IV ) j Cx ! N 0; 2" =( 2xz 2x ) :

2 z

=

(3.7)

Hence, the limiting distributions (3.5), (3.6) and (3.7) are all equivalent. The underlying reasons for this equivalence are that: (i) E(N j Cz ) = E(N j CxP ) = 0 and (therefore) also n 1=2 by E(N ) = 0; and the limiting distribution of the numerator n i=1 zi "i is not a¤ected P the conditioning; (ii) although the conditional and unconditional expectations of n 1 ni=1 zi xi di¤er, they have the same probability limit. We now turn to inconsistent OLS estimation and will …nd out that here conditioning does a¤ect limiting distributions. 3.2. Limiting distribution of OLS and conditioning For

Pn x i "i N (3.8) = = Pi=1 n 2 D i=1 xi ~ N D= ~ D) for the same three sampling schemes and we will now examine N =D and D 1 (N invoke Lemmas 1 and 2 to obtain the associated (un)conditional limiting distributions of ^ OLS : R = ^ OLS

3.2.1. Unconditional OLS For C empty and using decomposition (2.15) we easily obtain P N = E( ni=1 xi "i ) = n 2" = O(n); Pn 2 2 ~ = Pn (xi "i N ") = i=1 [xi "i + ("i i=1 Pn 2 D = E( i=1 xi ) = n 2x = O(n); 2 1=2 ~ = Pn (x2i D ); x ) = Op (n i=1

2 " )]

= Op (n1=2 );

~ and D ~ follow because their expectation is zero and their where the orders of probability of N variance is O(n); provided we assume that xi and "i ; and hence xi and also yi ; have …nite 3rd and 4th moments. We may write ~ N 2 2 " = x;

where, because N =D =

ui = xi "i + ("2i

N ~ Pn D = i=1 ui ; D 2 " [(xi 2 2 x

2 ")

2 x

) + 2 x i "i +

2

("2i

2 " )];

(3.9)

with E(ui ) = 0; E(ui uj ) = 0 for i 6= j: Assuming zero skewness and zero excess kurtosis for xi and "i ; and no conditional heteroskedasticity, i.e. E("2i j xi ) = 2" ; we …nd that E(u2i ) =

2 2 " x

=

2 2 " x

4 =

+2

2 2 x

2 2 " x (1

2 4 "

2 4 " 4 " 2 x

+

+2

+4

4

2

2 4 " 6 " 2 x

4 " [2 4x 4 x

+ 4

2

4

+4

2 2 x

4 " [2( 4x 4 x 6 " 2 x

2 "

2

2 x" ):

10

+2

4 4 "]

2 2 2 x "

+

4 4 4 ")

2 2 x

+4

4 " 2 x

4

2 2 2 x "

6 " 2 x

4

4

4 4 "

+2

4 4 "]

is …nite and constant. Therefore, a standard CLT applies, giving P d 2 n 1=2 ni=1 ui ! N 0; 2" 2x (1 x" ) :

Making use of n 1 D = 2x we …nd from Lemma 2 that the limiting distribution of unconditional ^ OLS can be characterized by n1=2

^

d

" OLS

x"

!N

x

0;

2 " (1 2 x

2 x" )

(3.10)

:

This neat result does not seem to be very well known, simply because authors of textbooks show no interest in inconsistent estimators, although we may suspect that they are omnipresent in practice. This result (for a vector ) can already be found in Goldberger (1964, p.359), who does not mention though, that it only holds when the IID observations on all the variables have 3rd and 4th moments corresponding to those of the Normal distribution, whereas the disturbances "i should be conditionally homoskedastic. 3.2.2. Conditioning OLS on reduced form regressors Now conditioning on the instrument, hence taking again Cz fz1 ; :::; zn g; we …nd P P N = E( ni=1 xi "i j Cz ) = ni=1 E[( zi + vi + "i )"i j Cz ] = n 2" = O(n); 2 1=2 ~ ~ = Pn [ zi "i + vi "i + ("2i ); N " )] and N j Cz = Op (n i=1 Pn 2 Pn 2 2 2 D = E( i=1 xi j Cz ) = i=1 zi + n v = O(n); P n 2 2 1=2 ~ = ~ D ); v )] and D j Cz = Op (n i=1 [2 zi vi + (vi

~ and D ~ again follow upon noting that their conditional where the orders of probability of N expectation is zero and their conditional variance is O(n): Here we …nd for ~ N

that it has ui =

N ~ Pn D = i=1 ui ; D

2 zi "i + vi "i + ("2i ") N [2 zi (vi + "i ) + (vi 2 D

2 v

) + 2 vi "i +

2

("2i

2 " )];

(3.11)

with E(ui j Cz ) = 0 and E(ui uj j Cz ) = 0 for i 6= j: Substituting N = n 2" ; and again assuming zero skewness and excess kurtosis of all random variables and conditional homoskedasticity of vi , we obtain E(u2i j Cz ) =

+ 2v + 2 2 2" + D 2 n2 4D 1 n 2 [ 2 zi2 2" + 2v 2" +

2 2 2 " f zi

2 2 2 2 2 " [4 v zi 2 4 " ]g;

+2

4 v]

which is …nite. Therefore, a standard CLT applies, giving P P d n 1=2 ni=1 ui j Cz ! N 0; lim n 1 ni=1 E(u2i j C) : P Making use of lim n 1 D = lim n 1 ( 2 ni=1 zi2 + n 2v ) = 2 2z + 2v = 2x we …nd P 2 4 lim n 1 ni=1 E(u2i j Cz ) = 2" 2x [1 x" (1 + 2 xz )] 11

from which we obtain that conditional on Cz = fz1 ; :::; zn g the limiting distribution of ^ OLS is characterized by n1=2

^

" OLS

x"

1 Pn 2 z )] n 2z i=1 i

2 xz (1

[1

x

d

1

Cz ! N

2 " [1 2 x

0;

2 x" (1

+2

4 xz )]

:

(3.12) This corresponds to the –in our opinion less transparent–formula Hausman (1978, p.1257) and Hahn and Hausman (2003, p.124) used, upon referring to the actual derivations in Rothenberg (1972, p.9). 3.2.3. Conditioning OLS on all exogenous (latent) reduced form variables When conditioning on Cx fx1 ; :::; xn g we …nd P P N = E( ni=1 xi "i j Cx ) = ni=1 E[(xi + "i )"i j Cx ] = n 1=2 2 ~ ~ = Pn [xi "i + ("2i ); N " )] and N j Cx = Op (n i=1 Pn 2 P n 2 2 2 D = E( i=1 xi j Cx ) = i=1 xi + n " = O(n); 2 1=2 ~ = Pn [2 x "i + 2 ("2 ~ D ); i i " )] and D j Cx = Op (n i=1

2 "

= O(n);

~ and D ~ again follow as before. Here where the orders of probability of N N ~ Pn D = i=1 ui ; D

~ N has

N [2 xi "i + D with E(ui j Cx ) = 0; E(ui uj j Cx ) = 0 for i 6= j and ui = xi "i + ("2i

E(u2i j Cx ) =

2 2 " fxi

+2

2 2 "

2 ")

1

Pn

2 i=1 E(ui

j

Cx ) =

= =

2 " 2 "

2 x 1 2 x (1

2 2 "f x

+

+

2 2 x" x

2 x" + 2 x" )[1

2 x" [4

+2 4 Pn 1

("2i

2 2 2 2 2 " [4 xi "

+ D 2 n2

which is …nite again. Making use of lim n 1 D = lim n we obtain lim n

2

4 "]

i=1

2 " )];

4D 1 n 2 [xi 2 2 2 "

xi 2 + n

2 x" [4 2x" 2x 2x + 2 4x" 2 x 2 2 4 (1 4[ x" x" ) + 2 x" ] 2 2 x" ) x" ]

4 x]

+

2(1

(3.13)

=

4[

2 x" (1

2 " 2 x

+

2 4 " ]g;

+

2 2 "

2 2 x" x 2 x" )

+

+

=

2 x

4 2 x" x ]g

4 x" ]

from which, upon using a CLT, we establish the limiting distribution of ^ OLS conditional on Cx : It is characterized by n1=2

^ OLS

" x"

x

[

2 x"

2 x" )

+ (1 d

!N

0;

1 n

2 " (1 2 x

2 x

Pn

i=1

2 x" )[1

This result can be found in Kiviet and Niemczyk (2007, 2010).

12

xi 2 ]

1

2(1

Cx 2 2 x" ) x" ]

(3.14) :

Obviously, if xi were available, it would establish the strongest possible instrument. We 2 …nd E(xi xi ) = 2x = 2x (1 x" ) and hence, due to (2.9), 2 x x

=

2 2 x" ) 2 ) x"

4 x (1 4 (1 x

=1

2 x"

2 xz :

(3.15)

We repeat that one might consider the latent variable vi as the e¤ect of all omitted instruments in the speci…cation of the reduced form. Result (3.14) describes the sampling situation in which all exogenous variables (both observed and latent) are kept …xed and only the randomness of "i is taken into account. We estabished that varying the conditioning set does not matter for IV, but requires di¤erent centering for OLS and yields di¤erent asymptotic variances. Similar distinct limiting distributions have been obtained in the context of errors in variables models by Schneeweiss (1980). 3.3. Comparison of the asymptotic variances We shall now compare for the three di¤erent conditioning settings the asymptotic OLS variance with that of IV. Note that when x" = 0 and there is no simultaneity the asymptotic variance of OLS (both conditionally and unconditionally) would be 2" = 2x : Under simultaneity, and applying IV with instrument z, this has to be multiplied by xz2 : Since j xz j 1; this factor is never smaller than 1. However, when sticking to OLS, the expression 2" = 2x has to be multiplied 4 2 2 under simultaneity in the three situations considered by the factors (1 x" (1 + 2 xz )] x" ); [1 and (1 2x" )[1 2(1 2x" ) 2x" ]; for unconditional, for conditioned on the instruments zi ; and for conditioned on the latent variables xi ; respectively. It can easily be established that all these three factors are positive and never bigger than 1. So, apart from the inconsistency, the OLS asymptotic distribution seems always more attractive than that of IV. As we will derive now, the asymptotic variance of OLS gets smaller by extending the conditioning set. Of course, 2 2 4 4 2 2 1 we obtain 1: Moreover, from 0 1 1 xz + x" x" (1 + 2 xz ); because 1 + 2 xz x" 4 2 2 1 + 2 xz 1 + 2(1 x" ) ; thus 1

2 x" (1

+2

4 xz )

1 = (1

2 x" [1 + 2 x" )[1

2(1 2(1

2 2 x" ) ] 2 2 x" ) x" ]

0:

So, we see that conditioning OLS on the truly exogenous instrument zi (which it does not exploit), does reduce the asymptotic variance, and the more so the stronger the instrument is. Further extending the conditioning set from zi to xi = zi + vi decreases the asymptotic variance even more, provided 2v > 0: This all results in the inequalities AsyV ar( ^ IV )

AsyV ar( ^ OLS j C? )

AsyV ar( ^ OLS j Cz )

AsyV ar( ^ OLS j Cx ):

(3.16)

The magnitudes of the four distinguished multiplying factors of 2" = 2x are depicted in Diagram 1, in separate panels for six increasing values of the strength xz of the instrument, and with the degree of simultaneity x" on the horizontal axis. Note that these six panels have di¤erent scales p on their horizontal axis, because its range of admissible values is restricted by 2 : All depicted curves are symmetric around zero in both 0 1 x" x" and xz ; so xz we may restrict ourselves to positive values only. Diagram 1 shows that conditioning on the instrument z has a noticeable e¤ect on the OLS variance (i.e. there is a discrepancy between the blue and the red curve) only when the instrument is su¢ ciently strong, say j xz j > :5: Adding vi to the conditioning set already consisting of zi has little e¤ect when the instrument 13

is very strong and 2v is correspondingly small. But for j xz j < :8 it has a noteworthy e¤ect, especially for intermediate values of the simultaneity x" (then the discrepancy between the green and the red and blue curves is substantial). When the strength parameter is below .8 the factor 1= 2xz for IV is so much larger than unity that it is di¢ cult to combine it with the OLS results in the same plot. Therefore, the dashed (magenta) IV line is only shown in the bottom two panels referring to a very strong instrument. Due to the bias and inconsistency of OLS we should not make a comparison with IV purely based on just asymptotic variance. Therefore, in Diagram 2, we compare the asymptotic root mean squared errors (ARMSE). For IV this is simply " =( x xz ): For OLS we have to take into account also the squared inconsistency, multiplied by n; because we confront it with the asymptotic variance. Hence, for unconditional OLS this leads to ARM SE( ^ OLS j C? ) =

" x

[(1

2 x" )

+n

2 1=2 ; x" ]

and similarly for the conditional OLS results. In Diagram 2 we present the ratio (hence, " = x drops out, and the only determinants are 2x" ; 2xz and n) with the IV result in the denominator. So, it results in what the ARMSE of OLS is as a fraction of the IV …gure. The e¤ect of the inconsistency is so substantial that the di¤erences in variance due to the conditioning setting become tri‡ing and the curves for the three (un)conditional OLS results almost coincide. OLS beats IV according to its asymptotic distribution when the curve is below the dotted line at unity. Hence, we note that for n = 100 IV seems preferable when j xz j > :5 and j x" j > :2: However, when j xz j < :5 and n = 30 the ARMSE of OLS could be half or just one …fth of that of IV. Whether this is useful information for samples that actually have size 30 depends on the accuracy of these asymptotic approximations, which we will check in the next section. Especially when the instrument is weak di¤erent special weak-instrument asymptotic results are available, see Andrews and Stock (2007) and Staiger and Stock (1997), which might be more appropriate for making comparisons with the ARMSE of OLS.

4. Actual distributions of IV and OLS under alternative sampling schemes We obtained the empirical densities of OLS and IV by generating the estimators many times. The …rst set of results has sample size n = 100. We examined the cases x" = 0; :3; :5; :7 and focus in particular on xz = :4 and xz = :1: This gives 2 4 panels of densities; the top four are those for the not so weak instrument, and the bottom four those for the much weaker instrument. We also present some results for a larger sample where n = 1000: In all simulations we chose x = 3: This does not lead to loss of generality, because all depicted densities are multiples of x ; which means that the form and relative positions of the densities would not change if we changed x : The only e¤ect would be that the scale on the horizontal axis would be di¤erent. The number of simulation replications is indicated by R in the diagrams. Diagram 3 considers the unconditional case, so in all replications new vectors for the three series (2.1) have been drawn. The IV density is dashed (red) and the OLS density is solid (blue). The panels also show the standard asymptotic approximations for IV (dotted, red) and for OLS (dotted, blue). By not fully displaying the mode of all distributions we get a clearer picture of the tails. Even at xz = :4; where P CP = 19, the asymptotic approximation for IV is not very accurate, especially for x" high. Although the limiting distribution is invariant for x" this is clearly not the case for the …nite sample distribution of IV. The asymptotic 14

approximation for the OLS distribution is so accurate that in all these drawings it collapses with the empirical distribution. For xz = :1 the standard IV approximation starts to get really bad, but it is much worse for even smaller values of xz (not depicted). For x" away from zero OLS is clearly biased, but so is IV when the instrument is weak. In most of the panels OLS seems preferable to IV, because most of its actual probability mass is much closer to the true value of zero than is the case for IV. Diagram 4 presents results in which the zi series has been kept …xed. Both IV and OLS are plotted for 6 di¤erent arbitrary realizations of zi (in the diagram indicated by 6 raw). For OLS the densities almost coincide, except for the stronger instrument when simultaneity is serious. For IV the e¤ects of conditioning are more pronounced, especially for the stronger instrument. In Diagram 5 the same 6 realizations of the zi series have been used after stylizing them by normalizing them such that their …rst two sample moments correspond to the population moments (indicated by 6 stylized). Now all 6 curves almost coincide, and apparently the e¤ects of higher order sample moments not fully corresponding with their population counterpart has hardly any e¤ects. In Diagram 6 we conditioned on both zi and vi ; again using 6 arbitrary realizations. Note that although E(zi vi ) = 0 the sample correlation coe¢ cient rzv may deviate from zero. The e¤ects of this type of conditioning are much more pronounced now, just a little for OLS, but extremely so for IV in the weak instrument case. Then, the empirical distribution of IV demonstrates characteristics completely di¤erent from its standard asymptotic approximation; especially for low xz and high x" bimodality is manifest with also a region of zero density between the two modes. Stylizing the zi and vi series (also making sure that their sample covariance is zero) makes the curves coincide again in Diagram 7, but shows bimodality again in the case where the instrument is weak and the simultaneity serious. From all these densities it seems obvious that the distribution of IV gets more problematic under conditioning, and that it seems worthwhile to develop a method such that OLS could be employed for inference. That this conclusion does not just pertain to very small samples can be learned from Diagram 8, where n = 1000 and xz = :1; giving rise to P CP just above 10. The top four diagrams are unconditional, whereas the bottom four diagrams present results conditional on xi : In most cases all realizations of the OLS estimator are much closer to the true value than most of those obtained by IV. Especially if OLS could be corrected for bias (without a major increase in variance) it would provide a welcome alternative to IV, not only when the sample is small, and even when the instrument is not all that weak.

5. Robust OLS inference under simultaneity In Kiviet and Niemczyk (2010) an attempt was made to design inference on based on a bias corrected OLS estimator and an assumption regarding x" : An infeasible version, which used also the true values of x and " worked well, which is promising. However, after replacing x and " by sample equivalents it became clear that such a substitution at the same time alters the limiting distribution of the bias corrected estimator, so that further re…nements of the asymptotic approximations are called for. 5.1. The limiting distribution of bias corrected OLS We shall derive the appropriate limiting distribution in the hope that it will lead to an asymptotic approximation that is still reasonably accurate in …nite samples. We will focus on 15

the unrestrained sampling scheme, because we found that the variance of OLS has a tendency to decrease by conditioning on exogenous regressors. Therefore, any resulting unconditional (regarding the exogenous variables) con…dence sets are expected to have larger coverage probability under conditioning. Moreover, when we will compare unconditional robust OLS inference with unconditional IV inference and the former is found to provide a useful alternative, then we would know for sure that this will certainly be the case under a restrained sampling scheme, because we have seen that conditioning worsens the situation for IV and improves it for OLS. So, our starting point is (3.10). This suggests the unfeasible bias corrected OLS-based 2 2 ^ estimator ^ OLS ( x" ; " = x ) ^ OLS x" " = x for ; with limiting distribution " = x = OLS n1=2 ^ OLS (

x" ;

"=

d

!N

x)

0;

2 " (1 2 x

2 x" )

(5.1)

:

For ^ 2" we …nd

n

1

Xn

i=1

plim ^ 2" = lim n = So, if

x"

2 "

xi ^ OLS )2 = n

(yi

1

(

Xn

E("2i )

i=1 2 2 2 ") = x

were known, ^ 2" =(1

=

2 " (1

1

Xn

i=1

lim n 2 x" ):

"2i

1

n

Xn

i=1

1

Xn

2

i=1

x i "i

=

2

E(xi "i )

= lim n

2 x" )

establishes a consistent estimator of 0 11=2 2 2 ) ^ ^ @ ^ " =(1 Xn x" A KLS ( x" ) OLS x" n 1 x2i

Xn

i=1

1

x2i

Xn

i=1

2 "

(5.2)

E(x2i )

and likewise

(5.3)

i=1

for : This latter estimator, nicknamed KLS (kinky least-squares), is at …rst sight unfeasible. Below though, we shall demonstrate how it can produce operational inference. The limiting distribution of ^ KLS ( x" ) does not coincide with that of ^ OLS ( x" ; " = x ) given in (5.1), due to the randomness of the correction term. To derive it, we substitute (3.2) and (5.2) and obtain (omitting from now on in the notation the dependence of ^ KLS on x" ) 0X 11=2 Xn Xn 2 n 2 x " " x i "i i i i B C i=1 x" i=1 ^ KLS : (5.4) = Xi=1 X @ n n X 2 A 2 )1=2 n 2 2 (1 2 x" xi xi xi i=1 i=1 i=1

This has now to be linearized and scaled in order that we can invoke a CLT to its leading terms. We consider …rst the major components of (5.4) separately. From our treatment of (3.8) we have Xn n 2 x i "i 1 X " i=1 = 2 + 2 ui + op (n 1=2 ); (5.5) Xn 2 n x x i=1 xi i=1

with ui as de…ned in (3.9). In similar way, employing Lemma 1, we obtain Xn " n # n 2 2 X "2i X 1 " 2 2 "2i x2i + op (n Xni=1 = 2" + 2 " x 2 2 n x x x xi i=1 i=1 i=1

16

1=2

);

and from (5.5)

Xn

2

i=1 Xn

x i "i

i=1

x2i

2 4 " 4 x

=

2

n 2 X " ui 4 x i=1

2 + n

1=2

+ op (n

):

Combining the latter two results yields Xn

Xni=1 i=1

Xn

"2i

2

x i "i

i=1 Xn

x2i

i=1

x2i

=

2

n

n 1 X 2 + 2 " n x i=1 i

2 4 " 4 x

2 " 2 x

n 2 X " 4 x i=1

x2i

n 2 X " ui 4 x i=1

2 n

2 x

2 "

+ op (n

1=2

):

This has the form v = + w + op (n 1=2 ); where = O(1) with w containing three terms and all are Op (n 1=2 ): One can easily obtain by a Taylor expansion (and check by taking the square) that v 1=2 = 1=2 + 21 1=2 w + op (n 1=2 ): Hence, 0X

n

B @ Xni=1 i=1

"2i x2i

Xn

2

i=1 Xn

x i "i

i=1

x2i

2

11=2 C A

2 1=2 x" )

" (1

=

+

2n

x n X

"2i

2 " 2 x

2 "

i=1

1 x (1

"

x2i

2 )1=2 x"

2

2 x

x" "

ui + op (n

1=2

):

x

Collecting the various components of (5.4) now yields n

1=2

( ^ KLS

n X

1

) =

n1=2

2 x i=1

ui n X

x"

2n1=2 2 (1 x

2 ) x" i=1 n X

" x (1

1

=

"2i

1

x x"

ui

2 ) n1=2 x" i=1

2

"

2 "

2 " 2 x

x2i

"2i

2 "

+

2

2 x

" x"

2

x

x" "

ui + op (1)

x

x2i

2 x

+ op (1);

from which we should derive the variance of its Op (1) terms. Substituting (3.9) we …nd x x"

V ar ui

2

"

"2i

= V arfxi "i + ("2i 1 2 = V arf(1

"2i

2 "

2

+

2 "

2

2 " )xi "i 2 x

2

x

2 " [(xi 2 2 x

2 ") 2 " 2 x

" x"

+

[(xi 2

1 + ( 2

2 x

3

17

x2i 2 x

2 x

) + 2 x i "i + 2

) + 2 x i "i +

2 " )("2i 2 x

2 ")

1 2

2

("2i 2 " (xi 2 2 x

("2i

2 " )]

2 " )]g

2 x

)g

2 " 2 2 ) x 2 x

1 + ( 2 1 2 2 3 2 2 (1 = (1 x" ) x " + 2 x" 2 2 = 2x 2" (1 x" ) : = (1

2

3

2 "

2 " 2 4 ) " 2 x

2 2 2 x" ) x

1 2 4" 4 4 x 2 x 1 2 2 (1 " + 2 x" +

2 2 2 2 x" ) x "

Now invoking a standard CLT yields the amazingly simple result n1=2 ( ^ KLS

)!N

0;

2 " 2 x

(5.6)

:

Hence, remarkably, the standard OLS estimator, which has asymptotic variance 2" = 2x when x" = 0; and which when x" 6= 0 has inconsistency x" " = x and under normality of both " 2 2 2 2 and x unconditional asymptotic variance 2" (1 x" )= x ; has again asymptotic variance " = x after one has subtracted from the OLS estimator an OLS-based estimate of its inconsistency. However, to estimate the variance of ^ KLS one should not use the standard expression Xn 2 2 xi ; but Vd ar( ^ OLS ) = ^ " = i=1

Vd ar( ^ KLS )

because then plim nVd ar( ^ KLS ) =

(1

^ 2" 1 = Xn 2 ) 1 x2i x"

2 2 " = x;

i=1

2 x"

Vd ar( ^ OLS );

(5.7)

as it should.

5.2. Simulation results on KLS-based inference In a Monte Carlo study based on 100,000 replications we have examined the bias of both ^ OLS and ^ KLS ; the true value (i.e. Monte Carlo estimate of) V ar( ^ KLS ) and the expectation of its empirical estimate Vd ar( ^ KLS ): And …nally we have examined the coverage probability of an asymptotic KLS based con…dence interval with nominal con…dence coe¢ cient of 95%. Hence, we analyzed the frequency over the Monte Carlo replications by which the true value of was covered by the interval [ ^ KLS

1:96

d ^ KLS ); ^ KLS + 1:96 SD(

d ^ KLS )]; SD(

(5.8)

1=2 d ^ 2 d ^ KLS ) (Vd SD( OLS ) and ^ KLS are calculated where both SD( ar( ^ KLS ))1=2 = (1 x" ) on the basis of the true value of x" : Since the simulation results in Table 1 are purely OLS based, they do not require the availability of an instrumental variable, and thus instrument strength is irrelevant. No conditioning on exogenous variables occurred and all results are invariant regarding the true values of and " : Just n; x" and x = " matter, but the latter is found (and can be shown) not to a¤ect the coverage probability of KLS con…dence intervals. The bias of OLS is found to be (almost) invariant with respect to n; and to be extremely close to the inconsistency, which predicts it to increase with (the absolute value of) x" and to be inversely related to x = " : The KLS coe¢ cient estimator, which is based on the true value of x" ; is almost unbiased, and variance estimator (5.7) is found to have a minor negative bias for the true variance. The actual coverage probability of the KLS con…dence intervals is remarkably close to the nominal value of 95%. These estimates have a Monte Carlo standard error of about .0007, hence we establish a signi…cant slight under-coverage over all the design parameter values examined, also when

18

= 0: The latter is due to using a critical value from the normal and not from the Student distribution. As far as the slight under-coverage is due to inaccuracies for n small in the OLS bias approximations, this could be repaired possibly by employing higher-order approximations 2 as in Kiviet and Phillips (1996). The appearance of the factor 1 x" in the denominator of (5.7) indicates that x" values really close to unity will be problematic. x"

Table 1 n x= " 3 30

10

x"

.0 .3 .6 .9

Qualities of KLS-based inference E( ^ OLS ) E( ^ KLS ) V ar( ^ KLS ) E[Vd ar( ^ KLS )] Cov.Prob. -.000 -.000 .00395 .00397 .9405 .100 -.002 .00396 .00397 .9409 .200 -.003 .00401 .00397 .9410 .300 -.005 .00408 .00397 .9401

300

.0 .3 .6 .9

.000 .100 .200 .300

.000 -.000 -.000 -.000

.000372 .000374 .000375 .000377

.000373 .000373 .000373 .000373

.9487 .9481 .9479 .9483

30

.3 .9

.030 .090

-.001 -.002

.000356 .000367

.000358 .000357

.9409 .9401

300

.3 .9

.030 .090

-.000 -.000

.000034 .000034

.000034 .000034

.9481 .9483

Intervals like (5.8) can straight-forwardly be based on standard OLS results, because it amounts to h i L;KLS R;KLS ^ ^ ^ ^ d d + (n; ) SD( ); + (n; ) SD( ) ; (5.9) OLS x" OLS OLS x" OLS =2 1 =2

with

p

the p-th quantile of the normal distribution and L;KLS (n; =2 R;KLS 1 =2 (n;

x" )

x" )

=2

=

=

(1 1

=2

(1

n1=2 x" 2 )1=2 ; x" n1=2 2 )1=2 x"

x"

9 > > =

> > : ;

(5.10)

It can deviate substantially from a standard OLS interval. Note that the factors (5.10) can both be either positive or negative. So, its location with respect to the standard OLS interval may shift, due to the bias correction. Moreover, its width is multiplied by the factor (1 2x" ) 1=2 and hence bulges up for 2x" close to unity. If one wants to base inference on an interval assumption regarding x" ; say x" 2 [ Lx" ; R )100% con…dence x" ]; then an asymptotically conservative (1 interval is obtained by taking the union of all intervals for x" in that interval. Diagrams 9 and 10 provide an impression of what such intervals might achieve in comparison to those provided by IV under ideal circumstances, namely based on the actual …nite sample distribution of IV, which in practice is very hard to approximate, whereas we have seen that this works …ne for the KLS estimator. Diagram 9 replicates the results of Diagram 3 for the …nite sample distributions of IV and OLS, but in addition it shows the densities of KLS when 19

based on an (in)correct assumption regarding x" : The blue curve uses x" = 0; so here KLS simpli…es to OLS. The other densities use the assumptions x" = :9 (red), :6 (green) or :3 (cyan), :3 (magenta), :6 (yellow), :9 (black). Hence, in each panel at most one KLS curve uses a x" value very close to its true value (which is mentioned in the panel). The superiority of KLS when employing a x" value that is wrong by about a 0:3 margin is apparent, because it yields a probability mass much closer to the true value of zero than IV, especially when the instrument is weak. In Diagram 10 densities of the same estimators are presented but now for n = 1000: For a not very weak instrument and a large sample IV may do better, but for cases where PCP is smaller than or around 10 the instrument-free KLS inference seems a very welcome alternative. This will be even more the case in practice, when the available (weak) instrument could actually be endogenous. 5.3. Comparison between the qualities of alternative con…dence sets The reputation of any inference technique should be based on the following four criteria: (a) is it based on credible assumptions; (b) does it ful…ll (also in …nite samples) its promises regarding its claimed accuracy; (c) can it boast other credentials, such as robustness; and (d) is its inference more e¢ cient than that produced by competing techniques. As a rule, in most contexts there does not exist a single technique which uniformly dominates all others under all relevant circumstances. For the DGP examined above we will now compare the properties of various con…dence sets on the scalar parameter : We will examine KLS-based con…dence sets with IV-based con…dence sets, where the latter are constructed from either the standard asymptotic Wald (W) test statistic or by inversion of the Anderson-Rubin (AR) static. The W procedure is known to be defective when instruments are weak, whereas the AR procedure is known to be robust under the null when instruments are weak and relatively simple to use. Mikusheva (2010) claims that it performs reasonably well amongst other techniques which focus on invariance of the asymptotic null distribution with respect to weakness of instruments. However, in the present just identi…ed case these other (LM and LR based) techniques simplify to the AR procedure. In the present context, the Wald statistic for H0 : = 0 is given by !2 !1=2 0 ^ ^ ^ IV (y x ) (y x )=(n 1) IV IV 0 d ^ IV ) = ; with SD( ; (5.11) W ( 0) = 0 z(z 0 z) 1 z 0 x ^ d x SD( IV ) where y; x and z are n 1 vectors containing all the sample data. For nominal (1 con…dence coe¢ cient, it yields con…dence interval ^

IV

+

d ^

=2 SD( IV );

^

IV

+

1

d ^

=2 SD( IV )

:

)

100%

(5.12)

The AR test statistic for the same hypothesis is performed by substituting the reduced form equation in the structural form relationship and next testing the signi…cance of the slope in the equation y x 0 = z ( 0) + " + ( 0 )v: Under the null the tests statistic AR(

0 ) = (n

is asymptotically distributed as the con…dence interval

1) 2

(y (y

x 0 )0 z(z 0 z) 1 z 0 (y x 0 ) x 0 )0 [I z(z 0 z) 1 z 0 ](y x 0 )

(1) and under normality exactly as F (1; n

CAR ( ) = f

0

: AR( 20

0)

< F (1; n

1)g :

(5.13) 1): It implies (5.14)

This can be established by solving a 2 +b +c 0; where a = x0 Ax; b = 2x0 Ay and c = y 0 Ay; with A = I dz(z 0 z) 1 z 0 and d = 1 + (n 1)=F (1; n 1): If = b2 4ac 0 then an interval follows easily provided a 0; when a > 0; though, the interval consists of the whole real line, except for a …nite interval, and thus has in…nite length. When < 0 the interval is empty for a < 0 and equals the full real line when a > 0 (note that < 0 and a = 0 cannot occur). These anomalies, if they really do occur, would be the result of uncomfortable behavior of the AR statistic under the alternative hypothesis, implying either unit rejection probability for all alternatives (CI empty), or zero rejection probability for either all alternatives (CI is the full real line) or for just a closed set of alternatives (CI is the real line, except for a particular interval). In the simulations to follow, we will monitor the occurrence of these anomalies. Since we will focus on the width of the interval, we will examine the 0 and a 0 cases exclusively, in order to exclude non-existing intervals and intervals of in…nite length. For those with …nite length we will report the median width, because the occurrence of outliers clearly indicates that for both W and AR the random variable width has no …nite moments. Regarding the KLS-based intervals we will present results for three intervals [ Lx" ; Ux" ]; namely: (A) Lx" = Ux" = x" ; (B) Lx" = x" :2; Ux" = x" + :2; and (C) Lx" = 0; Ux" = :5: Of course, in practice it will be unrealistic that one could ever attain (A), and although (B) is more realistic in practice one might specify occasionally an interval that excludes the true value. This is possible with interval (C), which simply states that any simultaneity will be non-negative and not exceeding :5: We ran 100,000 replications for a few combinations of x" and xz values for n = 100; giving rise to PCP values equal to 56.25, 4.16 and 1.01 respectively.

.0

.6 .2 .1

Actual properties of various con…dence intervals; x = " = 10; n = 100; = 0:05 CP = coverage prob.; MW = median width; PF = prob. of …nite width interval W AR KLS(A) KLS(B) KLS(C) CP MW PF CP MW CP MW CP MW CP MW .954 .065 1.00 .950 .069 .948 .039 1.000 .081 .973 .100 .993 .188 .52 .949 .246 .947 .039 1.000 .081 .973 .100 .998 .305 .17 .949 .349 .947 .039 1.000 .081 .973 .100

.2

.6 .2 .1

.953 .985 .994

.065 .186 .300

1.00 .950 .52 .950 .17 .944

.069 .244 .339

.947 .947 .947

.039 .039 .039

1.000 1.000 1.000

.083 .083 .083

1.000 1.000 1.000

.098 .098 .098

.4

.6 .2 .1

.949 .958 .971

.065 .179 .283

1.00 .950 .52 .951 .17 .924

.069 .239 .323

.947 .947 .947

.039 .039 .039

1.000 1.000 1.000

.091 .091 .091

.999 .999 .999

.091 .091 .091

.6

.6 .2 .05

.941 .905 .903

.064 .166 .252

1.00 .950 .52 .952 .17 .895

.070 .234 .293

.947 .946 .946

.039 .039 .039

1.000 1.000 1.000

.115 .115 .115

.656 .657 .658

.080 .080 .080

Table 2 x"

xz

Table 2 shows that for a weak instrument the IV-based Wald test can both be conservative (when the simultaneity is moderate) and yield under coverage (when the simultaneity is more serious). The AR procedure is of little practical use in this model, because it does not improve on the Wald test when the latter works well, and although it does produce intervals with 21

appropriate coverage when the instrument is weak, much too frequently it does not deliver an interval (of …nite length) at all. Moreover, the few intervals that it delivers when xz = :1 have larger width on average than those produced by the Wald procedure, even when the latter are conservative. The KLS(A) procedure performs close to perfect. Its coverage is not only very close to 95%, but its performance is also invariant with respect to both x" and zx ; and the width of the interval is 60% (for xz = :6) or just about 15% (for xz = :1) of the Wald interval. The more realistic KLS(B) intervals are much too conservative, but nevertheless have smaller width than the Wald intervals when the instrument is weak. The same holds for the realistic KLS(C) intervals, provided that the true value of x" is in the interval. If this is not the case, then the KLS procedure breaks down. 5.4. Consequences for empirical …ndings The application that has undoubtedly received most attention over the last two decades in the debate and research on the e¤ects of weakness and validity of instruments on the coe¢ cient estimate of an endogenous variable is Angrist and Krueger (1991), who analyzed the returns to schooling on log wage. In Donald and Newey (2001) and Flores-Lagunes (2007) many variants of IV based estimates have been obtained. Donald and Newey (2001, p.1178) present also the OLS results for the coe¢ cient estimate for schooling, being .0673 with a standard error equal to .0003. This has been obtained from a sample of size 329500. They also indicate sample second moments of the structural and reduced form disturbances which suggest estimates of the simultaneity coe¢ cient equal to -.127, -.192 and -.204 according to 2SLS, LIML and B2SLS respectively. However, when the IV estimates are biased or even inconsistent due to the use of invalid or weak instruments these assessments may be misleading. From the OLS estimates we can deduce for di¤erent assumptions on x" the KLS inferences collected in Table 3. Table 3 KLS estimates and con…dence intervals for the e¤ect of schooling ^ KLS SD( d ^ KLS ) 95% CI for x" -.5 .1667 .0003 (.1660, .1674) -.4 .1425 .0003 (.1418, .1431) -.3 .1215 .0003 (.1208, .1221) -.2 .1025 .0003 (.1019, .1031) -.1 .0846 .0003 (.0840, .0852) 0 .0673 .0003 (.0667, .0679) .1 .0500 .0003 (.0494, .0506) .2 .0321 .0003 (.0315, .0327) .3 .0131 .0003 (.0125, .0138) .4 -.0079 .0003 (-.0085, -.0072) .5 -.0321 .0003 (-.0328, -.0314) Note that unlike the IV results the KLS inferences are not a¤ected by the quality or validity of any of the external instrumental variables. However, like the IV results, they assume that apart from the schooling variable all regressors are exogenous. We …nd that when x" were close to -.2 indeed, KLS infers that the e¤ects of schooling are very close to .10. Most techniques examined in Flores-Lagunes (2007) yield con…dence intervals of about [:8; :13]: From KLS one 22

can deduce that such values are plausible only if x" is in the range (-.35,-.08). If x" is in fact mildly positive (as many have argued on theoretical grounds), then the e¤ect of schooling on wage is actually much smaller than .10. Of course, KLS may be inappropriate too here, because of remaining model misspecifcation.

6. Conclusions It is well known that the actual distribution of IV estimators gets rather anomalous when based on weak instruments. This paper shows that another factor causing deviations of the …nite sample distribution from its standard asymptotic approximation is the sampling scheme. Conditioning on exogenous regressors aggravates the anomalies of IV. It leads more often to the occurrence of bimodality and occasionally to an area close to the median of the distribution where the density is zero. On the other hand, the distribution of the OLS estimator in simultaneous equations is much smoother, always unimodal and much less dispersed than for IV, and it is much less a¤ected by the sampling scheme. Its major problem is its inconsistency. We show that it is relatively easy to correct the OLS estimator for bias and render it consistent. According to the established view, however, such a corrected estimator is unfeasible, because it is based on an unknown parameter, namely the degree of simultaneity. Of course, to a large extend the same criticism applies to an IV estimator, because its consistency is also based on unknown parameters, namely the assumed orthogonality conditions or zero correlations between instruments and disturbance and also on the required exclusion restrictions of the instruments from the structural equation under study. Only some of these orthogonality conditions and overidentifying restrictions can formally be tested. In that light, not only the bias corrected OLS estimator, but also the standard IV estimator is in fact just partially identi…ed. Only by adopting particular parametric assumptions identi…cation of the parameters of interest is achieved. We derive the limiting distribution of bias corrected OLS, which allows to produce inference on structural parameters for a range of possible values of the simultaneity parameter. By taking the union of a series of con…dence intervals we obtain relatively e¢ cient, accurate, robust and credible inference, without the need to nominate instrumental variables.

References Andrews, D.W.K., Stock, J.H., 2007. Inference with Weak Instruments, Chapter 6 in: Blundell, R., Newey, W.K., Persson, T. (eds.), Advances in Economics and Econometrics, Theory and Applications, 9th Congress of the Econometric Society, Vol. 3. Cambridge, UK: Cambridge University Press. Angrist, J.D., Krueger, A.B., 1991. Does compulsory school attendance a¤ect schooling and earnings? Quarterly Journal of Economics 106, 979-1014. Donald, S.G., Newey, W., 2001. Choosing the number of instruments. Econometrica 69, 1161-1191. Flores-Lagunes, A., 2007. Finite sample evidence of IV estimators under weak instruments. Journal of Applied Econometrics 22, 677-694. Forchini, G., 2006. On the bimodality of the exact distribution of the TSLS estimator. Econometric Theory 22, 932-946. Goldberger, A.S., 1964. Econometric Theory. John Wiley & Sons. New York. 23

Hahn, J., Hausman, J.A., 2003. Weak instruments: Diagnosis and cures in empirical econometrics. American Economic Review 93, 181-125. Hausman, J.A., 1978. Speci…cation tests in econometrics. Econometrica 46, 1251-1271. Hillier, G., 2006. Yet more on the exact properties of IV estimators. Econometric Theory 22, 913-931. Kiviet, J.F., Niemczyk, J., 2007. The asymptotic and …nite sample distribution of OLS and simple IV in simultaneous equations. Journal of Computational Statistics and Data Analysis 51, 3296-3318. Kiviet, J.F., Niemczyk, J., 2010. The asymptotic and …nite sample (un)conditional distributions of OLS and simple IV in simultaneous equations. Journal of Computational Statistics and Data Analysis, forthcoming. Kiviet, J.F., Phillips, G.D.A., 1996. The bias of the ordinary least squares estimator in simultaneous equation models. Economics Letters 53, 161-167. Maddala, G.S., Jeong, J., 1992. On the exact small sample distribution of the instrumental variable estimator. Econometrica 60, 181-183. Manski, C.F., 2003. Partial Identi…cation of Probability Distributions. New York, Springer. Manski, C.F., 2007. Identi…cation for Prediction and Decision. Cambridge, MA. Harvard University Press. Mikusheva, A., 2010. Robust con…dence sets in the presence of weak instruments. Journal of Econometrics 157, 236-247. Nelson, C.R., Startz, R., 1990a. Some further results on the exact small sample properties of the instrumental variable estimator. Econometrica 58, 967-976. Nelson, C.R., Startz, R., 1990b. The distribution of the instrumental variables estimator and its t-ratio when the instrument is a poor one. Journal of Business 63, S125-S140. Phillips, P.C.B., 1983. Exact small sample theory in the simultaneous equations model. In M.D. Intriligator, Z. Griliches (eds.), Handbook of Econometrics, Ch. 8, 449-516. North– Holland. Phillips, P.C.B., 1989. Partially identi…ed econometric models. Econometric Theory 5, 181-240. Phillips, P.C.B., 2006. A remark on bimodality and weak instrumentation in structural equation estimation. Econometric Theory 22, 947–960. Phillips, P.C.B., 2009. Exact distribution theory in structural estimation with an identity. Econometric Theory 25, 958-984. Rothenberg, T.J., 1972. The asymptotic distribution of the least squares estimator in the errors in variables model. Unpublished mimeo. Sawa, T., 1969. The exact sampling distribution of ordinary least squares and tw0-stage least squares estimators. Journal of the American Statistical Association 64, 923-937. Schneeweiss, H., 1980. Di¤erent asymptotic variances for the same estimator in a regression with errors in the variables. Methods of Operations Research 36, 249-269. Staiger, D., Stock, J.H., 1997. Instrumental variables regression with weak instruments. Econometrica 65, 557-586. Tamer, E., 2010. Partial identi…cation in econometrics. Annual Reviews Economics 2, 167-195. Woglom, G., 2001. More results on the exact small sample properties of the instrumental variable estimator. Econometrica 69, 1381-1389.

24

factor 1.5

factor 1.5

1.0

1.0

0.5

0.5

0.0

0.0 0.0

0.2

0.4

0.6

0.8

0.0

1.0

simultaneity

strenghth = 0.01

0.2

0.4

0.6

0.8

1.0

simultaneity

strenghth = 0.1

factor 1.5

factor 1.5

1.0

1.0

0.5

0.5

0.0

0.0 0.0

0.2

0.4

0.6

0.0

0.8

simultaneity

strenghth = 0.3

0.2

0.4

0.6

0.8

simultaneity

strenghth = 0.5

factor 1.5

factor 1.5

1.0

1.0

0.5

0.5

0.0

0.0 0.0

0.1

0.2

0.3

0.4

0.5

0.0

0.6

simultaneity

strenghth = 0.8

0.1

0.2

0.3

simultaneity

strenghth = 0.95

Diagram 1: Factor of 2" = 2x to obtain the asymptotic variance for OLS when conditioned on: C? (blue); Cz (red); Cx (green) respectively, and for IV (dashed magenta) simultaneity = x" ; strength = xz

25

ratio

0.6

ratio

0.5

2.5 2.0

0.4

1.5

0.3 1.0

0.2

0.5

0.1 0.0

0.0 0.0

0.2

0.4

0.6

0.8

1.0

0.0

simultaneity

strenghth = 0:1; n = 30; P CP = :3

ratio

0.2

0.4

0.6

0.8

simultaneity

strenghth = 0:5; n = 30; P CP = 10

3.0

ratio

2.0

2.5 1.5

2.0 1.5

1.0

1.0 0.5

0.5 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.0

simultaneity

strenghth = 0:8; n = 30; P CP = 53

ratio

0.1

0.2

0.3

simultaneity

strenghth = 0:95; n = 30; P CP = 270

5

ratio

5

4

4

3

3

2

2

1

1

0

0 0.0

0.2

0.4

0.6

0.8

0.0

simultaneity

strenghth = 0:5; n = 100; P CP = 33

0.1

0.2

0.3

0.4

0.5

0.6

simultaneity

strenghth = 0:8; n = 100; P CP = 178

Diagram 2: Relative asymptotic precision: ARM SE (OLS j C) =ARM SE (IV ) ; where C = C ? (blue); Cz (red); Cx (green)

26

ρ

ρ

2

σ = 3; R = 500000

= 0.40; n = 100; PCP = 19.05

xz



= 0.00

x

2

1.5

1.5

1

1

0.5

0.5

0

-1

ρ

2



0

0

1

= 0.50

2

1.5

1.5

1

1

0.5

0.5

0

-1

ρ 2

xz

ρ

0

0

1



2

1

1

0.5

0.5

ρ



0

0

1

= 0.50

2

1.5

1.5

1

1

0.5

0.5

0

-1

ρ



-1

0

0

1

0

1

= 0.70

-1

x

= 0.00

-1

= 0.30

σ = 3; R = 500000

1.5

2



= 0.10; n = 100; PCP = 1.01

1.5

0

ρ

0

1

ρ



= 0.30

-1

ρ



-1

0

1

0

1

= 0.70

Diagram 3: Probability densities for IV (dashed, red) and OLS (solid, blue), both unconditional, supplemented by their respective asymptotic approximations (dotted)

27

ρ

xz

ρ

1

σ = 3; R = 100000

= 0.40; n = 100; PCP = 19.05



= 0.00

x

1

0.5

ρ



= 0.30

0.5

0 -0.5

0

ρ

1



0 -0.5

0.5

= 0.50

1

0.5

0

ρ



0.5

= 0.70

0.5

0 -0.5

ρ 1

0

xz

0 -0.5

0.5



x

= 0.00

1

0.5

0



ρ



0 -0.5

0.5

= 0.50

1

0.5

0 -0.5

ρ

= 0.30

0.5

0 -0.5

1

0.5

σ = 3; R = 100000

= 0.10; n = 100; PCP = 1.01 ρ

0

0

ρ



0.5

= 0.70

0.5

0

0 -0.5

0.5

0

0.5

Diagram 4: IV (dashed) and OLS (solid) conditional on z (6 raw)

28

ρ

xz

ρ

1



= 0.00

-1

ρ

1



0

0

1

= 0.50

1

0.5



= 0.30

-1

ρ



0

1

0

1

= 0.70

0.5

0

-1

ρ

xz

ρ

0

0

1

-1

σ = 3; R = 100000

= 0.10; n = 100; PCP = 1.01



x

= 0.00

1

0.5

ρ



= 0.30

0.5

0

-1

ρ



0

0

1

= 0.50

1

0.5

0

ρ

0.5

0

1

x

1

0.5

1

σ = 3; R = 100000

= 0.40; n = 100; PCP = 19.05

-1

ρ



0

1

0

1

= 0.70

0.5

-1

0

0

1

-1

Diagram 5: IV (dashed) and OLS (solid) conditional on z (6 stylized)

29

ρ

xz

ρ

1



= 0.00

-1

ρ

1



0

0

1

= 0.50

1

0.5



= 0.30

-1

ρ



0

1

0

1

= 0.70

0.5

0

-1

ρ

xz

ρ

0

0

1

-1

σ = 3; R = 100000

= 0.10; n = 100; PCP = 1.01



x

= 0.00

1

0.5

ρ



= 0.30

0.5

0

-1

ρ



0

0

1

= 0.50

1

0.5

0

ρ

0.5

0

1

x

1

0.5

1

σ = 3; R = 100000

= 0.40; n = 100; PCP = 19.05

-1

ρ



0

1

0

1

= 0.70

0.5

-1

0

0

1

-1

Diagram 6: IV (dashed) and OLS (solid) conditional on x (6 raw) The 6 arbitrary realizations had: rzv = -.14, .11, -.03, -.04, .08, -.05 respectively.

30

ρ

xz

ρ

1



= 0.00

-1

ρ

1



0

0

1

= 0.50

1

0.5



= 0.30

-1

ρ



0

1

0

1

= 0.70

0.5

0

-1

ρ

xz

ρ

0

0

1

-1

σ = 3; R = 100000

= 0.10; n = 100; PCP = 1.01



x

= 0.00

1

0.5

ρ



= 0.30

0.5

0

-1

ρ



0

0

1

= 0.50

1

0.5

0

ρ

0.5

0

1

x

1

0.5

1

σ = 3; R = 100000

= 0.40; n = 100; PCP = 19.05

-1

ρ



0

1

0

1

= 0.70

0.5

-1

0

0

1

-1

Diagram 7: IV (dashed) and OLS (solid) conditional on x (6 stylized)

31

ρ

xz

σ = 3; R = 100000

= 0.10; n = 1000; PCP = 10.10 ρ

2



= 0.00

x

1.5

1.5

1

1

0.5

0.5

0 -0.5

0

ρ

2



= 0.50

1.5

1

1

0.5

0.5

ρ 1

xz



= 0.00

0.5

= 0.70

0

0.5

σ = 3; R = 100000 x

1

ρ



= 0.30

0.5

0

-1

ρ



0

0

1

= 0.50

1

0.5

0



0 -0.5

0.5

0.5

1

ρ

= 0.10; n = 1000; PCP = 10.10 ρ

= 0.30

0

2

0



0 -0.5

0.5

1.5

0 -0.5

ρ

2

-1

ρ



0

1

0

1

= 0.70

0.5

-1

0

0

1

-1

Diagram 8: IV (dashed) and OLS (solid) for n = 1000 and xz = :1; unconditional in the top panel and conditional on x (6 raw) in the bottom panel

32

ρ

ρ

1



= 0.00

-1

ρ

1



0

0

1

= 0.50

1

0.5

-1

ρ

xz

ρ

0

0

1

ρ



0

1

0

1

= 0.70

-1

σ = 3; R = 500000



x

= 0.00

1

ρ



= 0.30

0.5

-1

ρ



0

0

1

= 0.50

1

0.5

x"

= 0.30

-1

= 0.10; n = 100; PCP = 1.01

0.5

0



0.5

0

1

ρ

0.5

0

0

x

1

0.5

1

σ = 3; R = 500000

= 0.40; n = 100; PCP = 19.05

xz

-1

ρ



0

1

0

1

= 0.70

0.5

-1

0

0

1

-1

Diagram 9: Unconditional IV (dashed) and KLS for n = 100 where KLS uses: = :9 (red), :6 (green) or :3 (cyan), 0 (blue), :3 (magenta), :6 (yellow), :9 (black)

33

ρ

xz

ρ

1



= 0.00

-1

ρ

1



0

0

1

= 0.50

1

0.5

-1

xz

0

0

1

ρ



= 0.00

ρ



0

1

0

1

= 0.70

-1

σ = 3; R = 50000 x

1

ρ



= 0.30

0.5

-1

ρ



0

0

1

= 0.50

1

0.5

0

= 0.30

-1

= 0.10; n = 1000; PCP = 10.10

0.5

1



0.5

0

0

ρ

0.5

0

1

x

1

0.5

ρ

σ = 3; R = 50000

= 0.40; n = 1000; PCP = 190.48

-1

ρ



0

1

0

1

= 0.70

0.5

-1

0

0

1

-1

Diagram 10: Unconditional IV (dashed) and KLS for n = 1000 and where KLS uses: :9 (red), :6 (green) or :3 (cyan), 0 (blue), :3 (magenta), :6 (yellow), :9 (black) x" =

34

Identification and inference in a simultaneous equation ...

Apr 1, 2011 - After choosing the three values σ7, p7# and p78, the data can be generated ...... Journal of Computational Statistics and Data Analysis.

580KB Sizes 1 Downloads 298 Views

Recommend Documents

Identification and inference in a simultaneous equation ...
Apr 1, 2011 - sampling regime is found to have a much more profound effect on the actual IV .... Finally we present an empirical illustration, and Section 6 concludes. 2. ... The basic building block of the DGP consists of the three mutually ...

Nonparametric Euler Equation Identification and ... - Boston College
Sep 24, 2015 - the solution of equation (1) has a well-posed generalized inverse, ...... Together, these allow us to establish nonparametric global point iden-.

Nonparametric Euler Equation Identification and ... - Boston College
Sep 24, 2015 - (1997), Newey and Powell (2003), Ai and Chen (2003) and Darolles .... estimation problems include Carrasco and Florens (2000), Ai and Chen.

Simultaneous Equation - eliminating fractions q6 solved.pdf ...
1. Loading… Page 1. Simultaneous Equation - eliminating fractions q6 solved.pdf. Simultaneous Equation - eliminating fractions q6 solved.pdf. Open. Extract.

Identification, Estimation and Inference
E Fs t. 4. ≤ M < ∞, T−1 ∑T t=1 Fs t Fs t p. → ΣFs for some r ×r p.d.. ΣFs , s = 1,...,S. • Ht = [Gt,Ft ] . T−1 ∑T t=1 HtHt p. → ΣH for p.d. ΣH with rank r +r1 +...+rs and ...

Simultaneous Inference in General Parametric Models
Simultaneous inference is a common problem in many areas of application. If multiple null hypotheses are tested ...... Stefanski, L. A. and Boos, D. D. (2002).

Simultaneous identification of noise and estimation of noise ... - ismrm
Because noise in MRI data affects all subsequent steps in this pipeline, e.g., from ... is the case for Rayleigh-distributed data, we have an analytical form for the.

Groups Identification and Individual Recommendations in ... - Unica
users by exploiting context-awareness in a domain. This is done by computing a set of previously expressed preferences, in order to recommend items that are ...

Simultaneous measurement of in-plane and out-of ...
wrapping or differentiation operations. The method relies on multidirectional illumination of the object with lasers of different wavelengths and digital recording of holograms using a three CCD sensor camera, i.e., three-color CCD with red, green, a

A RELATIONSHIP BETWEEN SMELL IDENTIFICATION AND EMPATHY
Olfaction is a sense that has close relationships with the limbic system and emotion. Empathy is a vicarious feeling of others' emotional states. The two functions are known to be subserved by common neuroana- tomical structures, including orbitofron

Simultaneous Synthesis of Au and Cu Nanoparticles in ...
Dec 6, 2012 - Nanotechnology, Edmonton, Alberta, Canada, and Department of Chemistry, Guru ..... The corresponding TEM images of sample G1 are shown.

Simultaneous communication in noisy channels
Each user i has a confusion graph Gi on the set of letters of Σ, where two letters .... For each subset of the users I ⊆ {1,2,...,r}, the alphabet Σk can be partitioned into s ≤ ...... Acknowledgement: I am grateful to Alon Orlitsky and Ofer Sh

Memory in Inference
the continuity of the inference, e.g. when I look out of the window at a bird while thinking through a problem, but this should not blind us to the existence of clear cases of both continuous and interrupted inferences. Once an inference has been int

simultaneous localization and map building by a ... - Semantic Scholar
Ultrasonic sonar ranger sensors are used to build an occupancy grid, the first structure, and a map ... on line segments extracted from the occupancy grid is built.

Simultaneous Synthesis of Au and Cu Nanoparticles in ...
Dec 6, 2012 - A seed-growth method has been applied to synthesize the gold (Au) .... tube. In this tube, 5 mL of pure water was added along with two to three ...

Simultaneous Synthesis of Au and Cu Nanoparticles in ...
Dec 6, 2012 - At high CuSO4 concentration, small Cu NP appeared which arranged ...... from CCP (Centre for Chemical Physics) and CIHR (Canadian.

Delays in Simultaneous Ascending Auctions
This paper uses auction data from the Federal Communication Commission (FCC). It concludes .... blocks in the same market had the same bandwidth, 30MHz, and can be fairly treated as identical items. ..... Comcast Telephony Services II,.

Simultaneous Labia Minora and Majora Reduction: A Case Report
Case Report. Simultaneous Labia Minora and Majora Reduction: A Case Report. John R. Miklos, MD*, and Robert D. Moore, DO. From Atlanta Urogynecology ...

simultaneous localization and map building by a ... - Semantic Scholar
Ultrasonic sonar ranger sensors are used to build an occupancy grid, the first ... the localization can be carried out by integrating the odometry data provided.

Unconventional height functions in simultaneous ...
Ak−d+1 + Ak+1 ≤ Ψ(Ak). (NOv2). Although deceptively simple, it is quite complicated to determine for which functions Ψ the equation (NOv2) has a solution. To finish this talk, let's consider the simple case where. Ψ(b) = αb for some α ≥ 0

Development of a Karplus equation for 3JCOCH in ester ... - Arkivoc
Nov 19, 2017 - Email: [email protected] ...... the presented results expand the application of the GLYCAM06 force field to uronate esters ...... Reisbick, S.; Willoughby, P. Protocol Exchange 2014, http://dx.doi.org/10.1038/protex.2014.015.

The Equation of a Line in Standard Form
There is more than one way to write the equation of a line. We have already seen slope y-intercept form, and today we will look at the standard form of.

Wedge in Euler Equation, Monetary Policy and Net ...
rate in five small open economies – Australia, Canada, Finland, Korea, and the U.K. Standard Euler equation ..... variables in the VAR system simultaneously: international commodity price index, crisis dummy, ..... Business cycle accounting.