Explanation of the HDFE iteration with 3 FEs Sergio Correia July 30, 2014

1

Background

Our objective is to obtain the OLS estimates for β in the regression y = Xβ+D1 α1 +D2 α2 +D3 α3 +, where Di represents a set of indicator or design variables, and αi is the set of associated fixed effects. Let M , P be the usual annihilator and projection matrices in an OLS regression. When the regressors are dummy variables, P and M are just averages and demeaned values within each group respectively. Also, from the normal equation we know the projection matrix is orthogonal to the residuals: P e = 0. Recall the two-part FWL theorem. Its first part states that we can regress y = Xβ + D1 α1 + D2 α2 +D3 α3 + by first regressing y and each X against the set D1 , D2 , D3 , and then just regressing the residuals of y against the residuals of X. The second part states that the residuals of both regressions are also identical. Thus, the strategy will be to apply FWL to obtain residuals of (y, X) with respect to all the indicator variables, and then regress the residuals to obtain estimates for β. Notice that estimates for each α can still be recovered, but they may not be identified.

2

Fixed Point Iteration

To regress against dummies there are two usual approaches: (i) add the dummies explictly or (ii) demean all the variables. Option (ii) is much faster but can only be done once, so if we have two or more sets of DVs the usual approach is to apply (ii) to the fixed effect with more dimensions (i.e. categories) and (i) to all the others. However, if there are two or more sets of high-dimensional DVs, we would have to create a lot of dummies which is impractical (b/c of out of memory errors, etc.). Instead, we’ll apply a fixed-point iteration over the normal equations, as suggested by Guimaraes & Portugal (2010) (i.e., guess a value, and iterate our estimates until the normal equations hold). Note that instead of iterating the estimates α ˆ1, α ˆ2, α ˆ 3 , we will iterate Z2 := D2 αˆ2 and Z3 := D3 αˆ3 . Also, note that we are not employing Guimaraes et al (2013) version for three sets of FEs, but an extension of their original work, which works significatively faster.

3

Algorithm

Wlog, start with y, 1. Compute P1 y and y˜ := M1 y (0)

(0)

2. Initialize Z2 = Z3 = 0 1

3. Until Z2 and Z3 converge, set h   i (n) (n−1) (n−1) (n−1) (a) Z2 = P2 y˜ + P1 Z2 + Z3 − Z3 h   i (n) (n) (n−1) (n) (b) Z3 = P3 y˜ + P1 Z2 + Z3 − Z2 4. Then, compute Z1 = P1 (y − Z2 − Z3 ) and with that compute y ∗ = y − Z1 − Z2 − Z3 , which is what we desire. After repeating this for every variable, regress the transformed variables (y ∗ = X ∗ βˆ +e)to obtain ˆ Notice that the residuals of this FWL regression are the same as those of the correct estimates β. the full regression. If you want to obtain the fixed effects for the regression, notice that y = X βˆ + Z1 + Z2 + Z3 + e where both βˆ and e are the same as in the transformed regression. Thus, compute e˜ := y − X βˆ = e + Z1 + Z2 + Z3 , and then apply the previous step to obtain the Z1 , Z2 , Z3 fixed effects for that variable. Notice that if we remove the third FE (i.e. set Z3 = 0), the formula collapses to the usual two-FE version.

4

Deriving the fixed point iteration with three fixed-effects 1. Start with the identity y = D1 α ˆ 1 + D2 α ˆ 2 + D3 α ˆ 3 + e = Z1 + Z2 + Z3 + e, where Zi := Di α ˆi 2. Premultiply by M1 to obtain y˜ = M1 Z2 + M1 Z3 + e, since M1 D1 = D1 − P1 D1 = D1 − D1 = 0 and M1 e = e − P1 e = 0 (from the normal equation). 3. Rearrange Z2 + Z3 = y˜ + P1 (Z2 + Z3 ) − e 4. Premultiply by P2 and notice that P2 e = 0, and after rearranging, obtain Z2 = P2 [˜ y + P1 (Z2 + Z3 ) − Z3 ] 5. Premultiply by P3 and obtain the equivalent equation for the third fixed effect.

With these formulas, and adding a little more structure (to show that the fixed point is attractive, etc.), we can turn the formulas above into a fixed point iteration that will converge linearly.

5

Implementation details

This fixed point iteration is very slow, but there are (at least) two venues for speeding it up. 1. We can minimize the number of P operations and speed them up. First, notice that sorting the dataset (an O(n log(n)) operation) is not needed to obtain group means. Also, since in this case P just demeans a variable within a group, we can store the output of Pi as a G1 × 1 vector, instead of a N × 1 vector. Finally, we can try precompute intermediate steps such as P2 y˜ (which we only compute once) and P1 Z2 (which we only compute once per iteration, instead of twice). 2. We can try to extrapolate the fixed-point iteration so it converges faster. These methods are called acceleration tecniques, and of those, we use a vector-based version of Aitken’s ∆2 method. In particular, we are employing method 3 as described by Macleod (1986), which has a reasonable performance, and accelerating every three iterations. 2

All in all, each iteration requires only 6 P operations (2 in the case of just two fixed effects), which are implemented reasonably fast. Together with the acceleration, this method becomes both fast and conservative in terms of memory usage.

6

Extension: Algorithm beyond three sets of fixed effects

The algorithm of part 3 can be extended to an arbitrarily large number of fixed effects. This will lead to the following steps: Wlog, start with y, and let there be G sets of fixed effects. 1. Compute P1 y and y˜ := M1 y 2. Precompute Pg y˜ ∀g > 1 (0)

3. Initialize Zg

4. Initialize Σ(0)

= 0 ∀g > 1 P (0) := G g=2 Zg = 0

5. Until all Zg converge for all g > 1 (this condition can also be replaced by a condition on Σ), set for g = 2, 3, . . . , G:   (n) (n−1) (a) Zg = Zg + Pg y˜ + (P1 − 1)Σ(n−1,g−1) (n)

(b) Σ(n−1,g) = Σ(n−1,g) + Zg

(n−1)

− Zg

6. Then, compute Z1 = P1 (y − Σ) and with that compute y ∗ [= M y = M1 (y − Σ)] = y − Z1 − Σ. Notice that step 5 could be expressed more compactly just in terms of Σ, but that complicates obtaining the individual FEs.

7

Extension: Interactions between FEs and continuous variables

If we interact a variable V elementwise with each column of a design matrix D representing a set of K fixed effects, a few adjustements would be required. First, note that it’s equivalent to estimating a system of bivariate regressions. For instance, if the D matrix had three fixed effects (i.e. it’s a N × 3 matrix), then PD could be written as the partitioned matrix:   P1 0 0  0 P2 0  0 0 P3 And each Pj would be in turn be equal to  2 v11  v2 v1 v3 v1

(for the case of 3 obs. in the group j):  v1 v2 v1 v3 2 v22 v2 v3  2 v3 v2 v33

(subindices are relative to each group) Then, we can show that U := PD y is such that ui = vi × qg(i) , where Q is a K × 1 vector s.t. P qj =

i s.t. Dij =1 vi2

vi yi

. 3

In terms of code, before we obtained a Q matrix where vi = 1, so we were basically computing P i s.t. D

=1

yi

ij group means (qj = ) and now we are i) weighting them and ii) multiplying by vi Nj afterwards. Some optimizations can be done, such as precomputing the denominators of Q, and doing scalar products of v and y so we can use the code of group means without many changes.

8

Extension: Instrumental Variables

Just demean all variables w.r.t the set of fixed effects. Then, we just run the first stage and second stage regressions taking care of adjusting the standard errors.

4

Explanation of the HDFE iteration with 3 FEs - GitHub

Jul 30, 2014 - DVs, we would have to create a lot of dummies which is impractical (b/c of out of memory errors, etc.). Instead, we'll apply a fixed-point iteration ...

163KB Sizes 1 Downloads 238 Views

Recommend Documents

Safety, Explanation, Iteration
Good: Appeal to some general principles about gravity and potential energy, to show ... together with non-mental environmental conditions (e.g., facts about the ..... argument.16 But if there is a well-motivated alternative conception of safety, ...

FIS-FES 3-13-2017 Routes.pdf
FIS-FES 3-13-2017 Routes.pdf. FIS-FES 3-13-2017 Routes.pdf. Open. Extract. Open with. Sign In. Main menu. Displaying FIS-FES 3-13-2017 Routes.pdf.

Explanation of the LAB Color Space
space, which is typically used in image editing programs. For example, the Lab space is useful for sharpening images and the removing artifacts in JPEG images ...

An Iteration Method.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. An Iteration ...

fes-meknes-prim.pdf
Download. Connect more apps... Try one of the apps below to open or edit this item. fes-meknes-prim.pdf. fes-meknes-prim.pdf. Open. Extract. Open with. Sign In.

Random Iteration of Rational Functions
For the case of an RDS consisting of rational functions, there is only one previously known result due to Jonsson ('00):. Theorem (Jonsson, '00). Suppose that (T,Ω,P,θ) is a random dynamical system on ̂C consisting of rational functions Tω, such

Metric regularity of Newton's iteration
Under ample parameterization, metric regularity of the mapping associated with convergent Newton's ... programs “José Castillejo” and “Juan de la Cierva.”.

Haxe 3 Manual - GitHub
of programs in Haxe. Each Haxe class has an explicit name, an implied path and zero or more class fields. Here we will focus on the general structure of classes and their relations, while leaving the details of class fields for Class Fields (Chapter

Lab 3: Structure - GitHub
Structure Harvester is very easy to use, and is all web-based! You simply upload your zip file and then click “Harvest!” It may take a few minutes to run.

symbiotic 3 - GitHub
Marek Chalupa, Martin Jonáš, Jiri Slaby,. Jan Strejcek, and Martina Vitovská. Masaryk University, Brno. Page 2. Symbiotic workflow. SOURCES. LLVM.

The Nature of Dynamical Explanation
The received view of dynamical explanation is that dynamical cognitive science seeks to provide covering-law explanations of cognitive phenomena. By analyzing three prominent examples of dynamicist research, I show that the received view is misleadin

EU \3 - GitHub
l)The switch has been open for a long time when at time t = 0, the switch is closed. What is. 11(0), the magnitude of the current through the resistor R1 just after ...

Chapter 3 - GitHub
N(0, 1). The CLT tells us about the shape of the “piling”, when appropriately normalized. Evaluation. Once I choose some way to “learn” a statistical model, I need to decide if I'm doing a good job. How do I decide if I'm doing anything good?

Hero's Journey explanation with graphic.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Hero's Journey ...

Lab 3 Example - GitHub
Download “polygonum.stru”'. • Look at “polygonum.stru” using a text editor. – Column 1 refers to individual ID (516 total individuals). – Column 2 refers to ...

Project 3 - GitHub
Discuss the following: 1. Plot the residual vs. number of iteration for each method. Use different relaxation factors for PSOR and LSOR. 2. What relaxation factor ...

The Convincing Explanation
(Nicholas Carr's new book The Shallows: What the Internet is Doing to our ... pedestrian may be that you were texting while driving, though this behavior does ...

Combined GitHub GeoQ Fact Sheetrdb WITH PUB REL (3).pdf ...
The GeoQ system features include: Crisis workflow management and distribution of work across geographically disparate locations. HTML5-based ...

Combined GitHub GeoQ Fact Sheetrdb WITH PUB REL (3).pdf ...
creation and management process. Page 2 of 2. Combined GitHub GeoQ Fact Sheetrdb WITH PUB REL (3).pdf. Combined GitHub GeoQ Fact Sheetrdb WITH ...

Retraite 2016 fes meknes.pdf
45 السطوبي عبد العالي 165516 أستاذ التعليم االبتدائي الحاجب. 46 محمد الشبا 176467 أستاذ التعليم االبتدائي الحاجب. 47 محمد ازوازي 205573 أستاذ التعليم

Queens Community District 3 - GitHub
This metric from the Mayor's Office for Economic Opportunity accounts for NYC's high cost of housing, as well as other costs of living and anti-poverty benefits. Land Use Category. % Lot. Area. # Lots. Click here for a more detailed land use map of Q

Manhattan Community District 3 - GitHub
E 14 St. EastRiver. MN 6. MN 2. MN 1. Manhattan Community District 3. Neighborhoods1: Chinatown, East Village, Lower East Side, NoHo, Two Bridges. LAND USE MAP. 164,407. 163,277. -1% ... ACCESS TO PARKS7 of residents live within.