17  Panel Data

17.1 Introduction

Economists traditionally use the term panel data to refer to data structures consisting of observations on individuals for multiple time periods. Other fields such as statistics typically call this structure longitudinal data. The observed “individuals” can be, for example, people, households, workers, firms, schools, production plants, industries, regions, states, or countries. The distinguishing feature relative to cross-sectional data sets is the presence of multiple observations for each individual. More broadly, panel data methods can be applied to any context with cluster-type dependence.

There are several distinct advantages of panel data relative to cross-section data. One is the possibility of controlling for unobserved time-invariant endogeneity without the use of instrumental variables. A second is the possibility of allowing for broader forms of heterogeneity. A third is modeling dynamic relationships and effects.

There are two broad categories of panel data sets in economic applications: micro panels and macro panels. Micro panels are typically surveys or administrative records on individuals and are characterized by a large number of individuals (often in the 1000’s or higher) and a relatively small number of time periods (often 2 to 20 years). Macro panels are typically national or regional macroeconomic variables and are characterized by a moderate number of individuals (e.g. 7-20) and a moderate number of time periods (20-60 years).

Panel data was once relatively esoteric in applied economic practice. Now, it is a dominant feature of applied research.

A typical maintained assumption for micro panels (which we follow in this chapter) is that the individuals are mutually independent while the observations for a given individual are correlated across time periods. This means that the observations follow a clustered dependence structure. Because of this, current econometric practice is to use cluster-robust covariance matrix estimators when possible. Similar assumptions are often used for macro panels though the assumption of independence across individuals (e.g. countries) is much less compelling.

The application of panel data methods in econometrics started with the pioneering work of Mundlak (1961) and Balestra and Nerlove (1966).

Several excellent monographs and textbooks have been written on panel econometrics, including Arellano (2003), Hsiao (2003), Wooldridge (2010), and Baltagi (2013). This chapter will summarize some of the main themes but for a more in-depth treatment see these references.

One challenge arising in panel data applications is that the computational methods can require meticulous attention to detail. It is therefore advised to use established packages for routine applications. For most panel data applications in economics Stata is the standard package.

17.2 Time Indexing and Unbalanced Panels

It is typical to index observations by both the individual i and the time period t, thus Yit denotes a variable for individual i in period t. We index individuals as i=1,,N and time periods as t=1,T. Thus N is the number of individuals in the panel and T is the number of time series periods.

Panel data sets can involve data at any time series frequency though the typical application involves annual data. The observations in a data set will be indexed by calendar time which for the case of annual observations is the year. For notational convenience it is customary to denote the time periods as t= 1,,T, so that t=1 is the first time period observed and T is the final time period.

When observations are available on all individuals for the same time periods we say that the panel is balanced. In this case there are an equal number T of observations for each individual and the total number of observations is n=NT.

When different time periods are available for the individuals in the sample we say that the panel is unbalanced. This is the most common type of panel data set. It does not pose a problem for applications but does make the notation cumbersome and also complicates computer programming.

To illustrate, consider the data set Invest 1993 on the textbook webpage. This is a sample of 1962 U.S. firms extracted from Compustat, assembled by Bronwyn Hall, and used in the empirical work in Hall and Hall (1993). In Table 17.1 we display a set of variables from the data set for the first 13 observations. The first variable is the firm code number. The second variable is the year of the observation. These two variables are essential for any panel data analysis. In Table 17.1 you can see that the first firm (#32) is observed for the years 1970 through 1977. The second firm (#209) is observed for 1987 through 1991. You can see that the years vary considerably across the firms so this is an unbalanced panel.

For unbalanced panels the time index t=1,,T denotes the full set of time periods. For example, in the data set Invest 1993 there are observations for the years 1960 through 1991, so the total number of time periods is T=32. Each individual is observed for a subset of Ti periods. The set of time periods for individual i is denoted as Si so that individual-specific sums (over time periods) are written as tSi.

The observed time periods for a given individual are typically contiguous (for example, in Table 17.1, firm #32 is observed for each year from 1970 through 1977) but in some cases are non-continguous (if, for example, 1973 was missing for firm #32). The total number of observations in the sample is n=i=1NTi.

Table 17.1: Observations from Investment Data Set

Firm Code Number Year Iit I¯i I˙it Qit Q¯i Q˙it e^it
32 1970 0.122 0.155 0.033 1.17 0.62 0.55 .
32 1971 0.092 0.155 0.063 0.79 0.62 0.17 0.005
32 1972 0.094 0.155 0.061 0.91 0.62 0.29 0.005
32 1973 0.116 0.155 0.039 0.29 0.62 0.33 0.014
32 1974 0.099 0.155 0.057 0.30 0.62 0.32 0.002
32 1975 0.187 0.155 0.032 0.56 0.62 0.06 0.086
32 1976 0.349 0.155 0.194 0.38 0.62 0.24 0.248
32 1977 0.182 0.155 0.027 0.57 0.62 0.05 0.081
209 1987 0.095 0.071 0.024 9.06 21.57 12.51 .
209 1988 0.044 0.071 0.027 16.90 21.57 4.67 0.244
209 1989 0.069 0.071 0.002 25.14 21.57 3.57 0.257
209 1990 0.113 0.071 0.042 25.60 21.57 4.03 0.226
209 1991 0.034 0.071 0.037 31.14 21.57 9.57 0.283

17.3 Notation

This chapter focuses on panel data regression models whose observations are pairs (Yit,Xit) where Yit is the dependent variable and Xit is a k-vector of regressors. These are the observations on individual i for time period t.

It will be useful to cluster the observations at the level of the individual. We borrow the notation from Section 4.21 to write Yi as the Ti×1 stacked observations on Yit for tSi, stacked in chronological order. Similarly, we write Xi as the Ti×k matrix of stacked Xit for tSi, stacked in chronological order.

We will also sometimes use matrix notation for the full sample. To do so, let Y=(Y1,,YN) denote the n×1 vector of stacked Yi, and set X=(X1,,XN) similarly.

17.4 Pooled Regression

The simplest model in panel regresion is pooled regresssion

Yit=Xitβ+eitE[Xiteit]=0.

where β is a k×1 coefficient vector and eit is an error. The model can be written at the level of the individual as

Yi=Xiβ+eiE[Xiei]=0

where ei is Ti×1. The equation for the full sample is Y=Xβ+e where e is n×1.

The standard estimator of β in the pooled regression model is least squares, which can be written as

β^pool =(i=1NtSiXitXit)1(i=1NtSiXitYit)=(i=1NXiXi)1(i=1NXiYi)=(XX)1(XY).

In the context of panel data β^pool  is called the pooled regression estimator. The vector of residuals for the ith individual is e^i=YiXiβ^pool .

The pooled regression model is ideally suited for the context where the errors eit satisfy strict mean independence:

E[eitXi]=0.

This occurs when the errors eit are mean independent of all regressors Xij for all time periods j=1,,T. Strict mean independence is stronger than pairwise mean independence E[eitXit]=0 as well as projection (17.1). Strict mean independence requires that neither lagged nor future values of Xit help to forecast eit. It excludes lagged dependent variables (such as Yit1 ) from Xit (otherwise eit would be predictable given Xit+1 ). It also requires that Xit is exogenous in the sense discussed in Chapter 12.

We now describe some statistical properties of β^pool  under (17.2). First, notice that by linearity and the cluster-level notation we can write the estimator as

β^pool=(i=1NXiXi)1(i=1NXi(Xiβ+ei))=β+(i=1NXiXi)1(i=1NXiei).

Using (17.2)

E[β^pool X]=β+(i=1NXiXi)1(i=1NXiE[eiXi])=β

so β^pool  is unbiased for β.

Under the additional assumption that the error eit is serially uncorrelated and homoskedastic the covariance estimator takes a classical form and the classical homoskedastic variance estimator can be used. If the error eit is heteroskedastic but serially uncorrelated then a heteroskedasticity-robust covariance matrix estimator can be used.

In general, however, we expect the errors eit to be correlated across time t for a given individual. This does not necessarily violate (17.2) but invalidates classical covariance matrix estimation. The conventional solution is to use a cluster-robust covariance matrix estimator which allows arbitrary withincluster dependence. Cluster-robust covariance matrix estimators for pooled regression equal

V^pool =(XX)1(i=1NXie^ie^iXi)(XX)1.

As in (4.55) this can be multiplied by a degree-of-freedom adjustment. The adjustment used by the Stata regress command is

V^pool =(n1nk)(NN1)(XX)1(i=1NXie^ie^iXi)(XX)1

The pooled regression estimator with cluster-robust standard errors can be obtained using the Stata command regress cluster(id) where id indicates the individual.

When strict mean independence (17.2) fails the pooled least squares estimator β^pool  is not necessarily consistent for β. Since strict mean independence is a strong and undesirable restriction it is typically preferred to adopt one of the alternative estimators described in the following sections.

To illustrate the pooled regression estimator consider the data set Invest1993 described earlier. We consider a simple investment model

Iit=β1Qit1+β2Dit1+β3CFit1+β4Ti+eit

where I is investment/assets, Q is market value/assets, D is long term debt/assets, CF is cash flow/assets, and T is a dummy variable indicating if the corporation’s stock is traded on the NYSE or AMEX. The regression also includes 19 dummy variables indicating an industry code. The Q theory of investment suggests that β1>0 while β2=β3=0. Theories of liquidity constraints suggest that β2<0 and β3>0. We will be using this example throughout this chapter. The values of I and Q for the first 13 observations are also displayed in Table 17.1.

In Table 17.2 we present the pooled regression estimates of (17.3) in the first column with clusterrobust standard errors.

17.5 One-Way Error Component Model

One approach to panel data regression is to model the correlation structure of the regression error eit. The most common choice is an error-components structure. The simplest takes the form

eit=ui+εit

Table 17.2: Estimates of Investment Equation

Cluster-robust standard errors in parenthesis.

where ui is an individual-specific effect and εit are idiosyncratic (i.i.d.) errors. This is known as a oneway error component model.

In vector notation we can write ei=1iui+εi where 1i is a Ti×1 vector of 1’s.

The one-way error component regression model is

Yit=Xitβ+ui+εit

written at the level of the observation, or Yi=Xiβ+1iui+εi written at the level of the individual.

To illustrate why an error-component structure such as (17.4) might be appropriate, examine Table 17.1. In the final column we have included the pooled regression residuals e^it for these observations. (There is no residual for the first year for each firm due to the lack of lagged regressors for this observation.) What is quite striking is that the residuals for the second firm (#209) are all negative, clustering around 0.25. While informal, this suggests that it may be appropriate to model these errors using (17.4), expecting that firm #209 has a large negative value for its individual effect u.

17.6 Random Effects

The random effects model assumes that the errors ui and εit in (17.4) are conditionally mean zero, uncorrelated, and homoskedastic.

Assumption 17.1 Random Effects. Model (17.4) holds with

E[εitXi]=0E[εit2Xi]=σε2E[εitεjsXi]=0E[uiXi]=0E[ui2Xi]=σu2E[uiεitXi]=0

where (17.7) holds for all st. Assumption 17.1 is known as a random effects specification. It implies that the vector of errors ei for individual i has the covariance structure

E[eiXi]=0E[eieiXi]=1i1iσu2+Iiσε2=(σu2+σε2σu2σu2σu2σu2+σε2σu2σu2σu2σu2+σε2)=σε2Ωi,

say, where Ii is an identity matrix of dimension Ti. The matrix Ωi depends on i since its dimension depends on the number of observed time periods Ti.

Assumptions 17.1.1 and 17.1.4 state that the idiosyncratic error εit and individual-specific error ui are strictly mean independent so the combined error eit is strictly mean independent as well.

The random effects model is equivalent to an equi-correlation model. That is, suppose that the error eit satisfies

E[eitXi]=0E[eit2Xi]=σ2

and

E[eiseitXi]=ρσ2

for st. These conditions imply that eit can be written as (17.4) with the components satisfying Assumption 17.1 with σu2=ρσ2 and σε2=(1ρ)σ2. Thus random effects and equi-correlation are identical.

The random effects regression model is

Yit=Xitβ+ui+εit

or Yi=Xiβ+1iui+εi where the errors satisfy Assumption 17.1.

Given the error structure the natural estimator for β is GLS. Suppose σu2 and σε2 are known. The GLS estimator of β is

β^gls=(i=1NXiΩi1Xi)1(i=1NXiΩi1Yi).

A feasible GLS estimator replaces the unknown σu2 and σε2 with estimators. See Section 17.15.

We now describe some statistical properties of the estimator under Assumption 17.1. By linearity

β^glsβ=(i=1NXiΩi1Xi)1(i=1NXiΩi1ei).

Thus

E[β^glsβX]=(i=1NXiΩi1Xi)1(i=1NXiΩi1E[eiXi])=0.

Thus β^gls  is conditionally unbiased for β. The conditional variance of β^gls  is

Vgls=(i=1nXiΩi1Xi)1σε2

Now let’s compare β^gls  with the pooled estimator β^pool. . Under Assumption 17.1 the latter is also conditionally unbiased for β and has conditional variance

Vpool =(i=1nXiXi)1(i=1nXiΩiXi)1(i=1nXiXi)1.

Using the algebra of the Gauss-Markov Theorem we deduce that

Vgls Vpool 

and thus the random effects estimator β^gls  is more efficient than the pooled estimator β^pool  under Assumption 17.1. (See Exercise 17.1.) The two variance matrices are identical when there is no individualspecific effect (when σu2=0 ) for then Vgls =Vpool =(XX)1σε2.

Under the assumption that the random effects model is a useful approximation but not literally true then we may consider a cluster-robust covariance matrix estimator such as

V^gls=(i=1NXiΩi1Xi)1(i=1NXiΩi1e^ie^iΩi1Xi)(i=1nXiΩi1Xi)1

where e^i=YiXiβ^gls. This may be re-scaled by a degree of freedom adjustment if desired.

The random effects estimator β^gls  can be obtained using the Stata command xtreg. The default covariance matrix estimator is (17.11). For the cluster-robust covariance matrix estimator (17.14) use the command xtreg vce(robust). (The xtset command must be used first to declare the group identifier. For example, cusip is the group identifier in Table 17.1.)

To illustrate, in the second column of Table 17.2 we present the random effect regression estimates of the investment model (17.3) with cluster-robust standard errors (17.14). The point estimates are reasonably different from the pooled regression estimator. The coefficient on debt switches from positive to negative (the latter consistent with theories of liquidity constraints) and the coefficient on cash flow increases significantly in magnitude. These changes appear to be greater in magnitude than would be expected if Assumption 17.1 were correct. In the next section we consider a less restrictive specification.

17.7 Fixed Effect Model

Consider the one-way error component regression model

Yit=Xitβ+ui+εit

or

Yi=Xiβ+liui+εi.

In many applications it is useful to interpret the individual-specific effect ui as a time-invariant unobserved missing variable. For example, in a wage regression ui may be the unobserved ability of individual i. In the investment model (17.3) ui may be a firm-specific productivity factor.

When ui is interpreted as an omitted variable it is natural to expect it to be correlated with the regressors Xit. This is especially the case when Xit includes choice variables.

To illustrate, consider the entries in Table 17.1. The final column displays the pooled regression residuals e^it for the first 13 observations which we interpret as estimates of the error eit=ui+εit. As described before, what is particularly striking about the residuals is that they are all strongly negative for firm #209, clustering around 0.25. We can interpret this as an estimate of ui for this firm. Examining the values of the regressor Q for the two firms we can see that firm #209 has very large values (in all time periods) for Q. (The average value Q¯i for the two firms appears in the seventh column.) Thus it appears (though we are only looking at two observations) that ui and Qit are correlated. It is not reasonable to infer too much from these limited observations, but the relevance is that such correlation violates strict mean independence.

In the econometrics literature if the stochastic structure of ui is treated as unknown and possibly correlated with Xit then ui is called a fixed effect.

Correlation between ui and Xit will cause both pooled and random effect estimators to be biased. This is due to the classic problems of omitted variables bias and endogeneity. To see this in a generated example view Figure 17.1. This shows a scatter plot of three observations (Yit,Xit) from three firms. The true model is Yit=9Xit+ui. (The true slope coefficient is 1.) The variables ui and Xit are highly correlated so the fitted pooled regression line through the nine observations has a slope close to +1. (The random effects estimator is identical.) The apparent positive relationship between Y and X is driven entirely by the positive correlation between X and u. Conditional on u, however, the slope is 1. Thus regression techniques which do not control for ui will produce biased and inconsistent estimators.

Figure 17.1: Scatter Plot and Pooled Regression Line

The presence of the unstructured individual effect ui means that it is not possible to identify β under a simple projection assumption such as E[Xitεt]=0. It turns out that a sufficient condition for identification is the following. Definition 17.1 The regressor Xit is strictly exogenous for the error εit if

E[Xisεit]=0

for all s=1,,T.

Strict exogeneity is a strong projection condition, meaning that if Xis for any st is added to (17.15) it will have a zero coefficient. Strict exogeneity is a projection analog of strict mean independence

E[εitXi]=0.

(17.18) implies (17.17) but not conversely. While (17.17) is sufficient for identification and asymptotic theory we will also use the stronger condition (17.18) for finite sample analysis.

While (17.17) and (17.18) are strong assumptions they are much weaker than (17.2) or Assumption 17.1, which require that the individual effect ui is also strictly mean independent. In contrast, (17.17) and (17.18) make no assumptions about ui.

Strict exogeneity (17.17) is typically inappropriate in dynamic models. In Section 17.41 we discuss estimation under the weaker assumption of predetermined regressors.

17.8 Within Transformation

In the previous section we showed that if ui and Xit are correlated then pooled and random-effects estimators will be biased and inconsistent. If we leave the relationship between ui and Xit fully unstructured then the only way to consistently estimate the coefficient β is by an estimator which is invariant to ui. This can be achieved by transformations which eliminate ui.

One such transformation is the within transformation. In this section we describe this transformation in detail.

Define the mean of a variable for a given individual as

Y¯i=1TitSiYit.

We call this the individual-specific mean since it is the mean of a given individual. Contrarywise, some authors call this the time-average or time-mean since it is the average over the time periods.

Subtracting the individual-specific mean from the variable we obtain the deviations

Y˙it=YitY¯i.

This is known as the within transformation. We also refer to Y˙it as the demeaned values or deviations from individual means. Some authors refer to Y˙it as deviations from time means. What is important is that the demeaning has occured at the individual level.

Some algebra may also be useful. We can write the individual-specific mean as Y¯i=(1i1i)11iYi. Stacking the observations for individual i we can write the within transformation using the notation

Y˙i=Yi1iY¯i=Yi1i(1i1i)11iYi=MiYi

where Mi=Ii1i(1i1i)11i is the individual-specific demeaning operator. Notice that Mi is an idempotent matrix.

Similarly for the regressors we define the individual-specific means and demeaned values:

X¯i=1TitSiXitX˙it=XitX¯iX˙i=MiXi.

We illustrate demeaning in Table 17.1. In the fourth and seventh columns we display the firm-specific means I¯i and Q¯i and in the fifth and eighth columns the demeaned values I˙it and Q˙it.

We can also define the full-sample within operator. Define D=diag{1T1,,1TN} and MD=In D(DD)1D. Note that MD=diag{M1,,MN}. Thus

MDY=Y˙=(Y˙1Y˙N),MDX=X˙=(X˙1X˙N)

Now apply these operations to equation (17.15). Taking individual-specific averages we obtain

Y¯i=X¯iβ+ui+ε¯i

where ε¯i=1TitSiεit. Subtracting from (17.15) we obtain

Y˙it=X˙itβ+ε˙it

where ε˙it=εitε¯it. The individual effect ui has been eliminated! obtain

We can alternatively write this in vector notation. Applying the demeaning operator Mi to (17.16) we

Y˙i=X˙iβ+ε˙i.

The individual-effect ui is eliminated because Mi1i=0. Equation (17.22) is a vector version of (17.21).

The equation (17.21) is a linear equation in the transformed (demeaned) variables. As desired the individual effect ui has been eliminated. Consequently estimators constructed from (17.21) (or equivalently (17.22)) will be invariant to the values of ui. This means that the the endogeneity bias described in the previous section will be eliminated.

Another consequence, however, is that all time-invariant regressors are also eliminated. That is, if the original model (17.15) had included any regressors Xit=Xi which are constant over time for each individual then for these regressors the demeaned values are identically 0 . What this means is that if equation (17.21) is used to estimate β it will be impossible to estimate (or identify) a coefficient on any regressor which is time invariant. This is not a consequence of the estimation method but rather a consequence of the model assumptions. In other words, if the individual effect ui has no known structure then it is impossible to disentangle the effect of any time-invariant regressor Xi. The two have observationally equivalent effects and cannot be separately identified.

The within transformation can greatly reduce the variance of the regressors. This can be seen in Table 17.1 where you can see that the variation between the elements of the transformed variables I˙it and Q˙it is less than that of the untransformed variables, as much of the variation is captured by the firm-specific means.

It is not typically needed to directly program the within transformation, but if it is desired the following Stata commands easily do so.

Stata Commands for Within Transformation
x is the original variable
id is the group identifier
xdot is the within-transformed variable
egen xmean = mean (x), by(id) gen xdot =xxmean

17.9 Fixed Effects Estimator

Consider least squares applied to the demeaned equation (17.21) or equivalently (17.22). This is

β^fe=(i=1NtSiX˙itX˙it)1(i=1NtSiX˙itY˙it)=(i=1NX˙iX˙i)1(i=1NX˙iY˙i)=(i=1NXiMiXi)1(i=1NXiMiYi)

This is known as the fixed-effects or within estimator of β. It is called the fixed-effects estimator because it is appropriate for the fixed effects model (17.15). It is called the within estimator because it is based on the variation of the data within each individual.

The above definition implicitly assumes that the matrix i=1NX˙iX˙i is full rank. This requires that all components of Xit have time variation for at least some individuals in the sample.

The fixed effects residuals are

ε^it=Y˙itX˙itβ^feε^i=Y˙iX˙iβ^fe

Let us describe some of the statistical properties of the estimator under strict mean independence (17.18). By linearity and the fact Mi1i=0, we can write

β^feβ=(i=1NXiMiXi)1(i=1NXiMiεi)

Then (17.18) implies

E[β^feβX]=(i=1NXiMiXi)1(i=1NXiMiE[εiXi])=0

Thus β^fe is unbiased for β under (17.18).

Let Σi=E[εiεiXi] denote the Ti×Ti conditional covariance matrix of the idiosyncratic errors. The variance of β^fe is

Vfe=var[β^feX]=(i=1NX˙iX˙i)1(i=1NX˙iΣiX˙i)(i=1NX˙iX˙i)1

This expression simplifies when the idiosyncratic errors are homoskedastic and serially uncorrelated:

E[εit2Xi]=σε2E[εijεitXi]=0

for all jt. In this case, Σi=Iiσε2 and (17.24) simplifies to

Vfe0=σε2(i=1NX˙iX˙i)1.

It is instructive to compare the variances of the fixed-effects and pooled estimators under (17.25)(17.26) and the assumption that there is no individual-specific effect ui=0. In this case we see that

Vfe0=σε2(i=1NX˙iX˙i)1σε2(i=1NXiXi)1=Vpool 

The inequality holds since the demeaned variables X˙i have reduced variation relative to the original observations Xi. (See Exercise 17.28.) This shows the cost of using fixed effects relative to pooled estimation. The estimation variance increases due to reduced variation in the regressors. This reduction in efficiency is a necessary by-product of the robustness of the estimator to the individual effects ui.

17.10 Differenced Estimator

The within transformation is not the only transformation which eliminates the individual-specific effect. Another important transformation which does the same is first-differencing.

The first-differencing transformation is ΔYit=YitYit1. This can be applied to all but the first observation (which is essentially lost). At the level of the individual this can be written as ΔYi=DiYi where Di is the (Ti1)×Ti matrix differencing operator

Di=[110000110000011].

Applying the transformation Δ to (17.15) or (17.16) we obtain ΔYit=ΔXitβ+Δεit or

ΔYi=ΔXiβ+Δεi.

We can see that the individual effect ui has been eliminated.

Least squares applied to the differenced equation (17.29) is

β^Δ=(i=1Nt2ΔXitΔXit)1(i=1Nt2ΔXitΔYit)=(i=1NΔXiΔXi)1(i=1NΔXiΔYi)=(i=1NXiDiDiXi)1(i=1NXiDiDiYi)

(17.30) is called the differenced estimator. For T=2,β^Δ=β^fe equals the fixed effects estimator. See Exercise 17.6. They differ, however, for T>2.

When the errors εit are serially uncorrelated and homoskedastic then the error Δεi=Diεi in (17.29) has covariance matrix Hσε2 where

H=DiDi=(2100120010012).

We can reduce estimation variance by using GLS. When the errors εit are i.i.d. (serially uncorrelated and homoskedastic), this is

β~Δ=(i=1NΔXiH1ΔXi)1(i=1NΔXiH1ΔYi)=(i=1NXiDi(DiDi)1DiXi)1(i=1NXiDi(DiDi)1DiYi)=(i=1NXiMiXi)1(i=1NXiMiYi)

where Mi=Di(DiDi)1Di. Recall, the matrix Di is (Ti1)×Ti with rank Ti1 and is orthogonal to the vector of ones 1i. This means Mi projects orthogonally to 1i and thus equals the within transformation matrix. Hence β~Δ=β^fe, the fixed effects estimator!

What we have shown is that under i.i.d. errors, GLS applied to the first-differenced equation precisely equals the fixed effects estimator. Since the Gauss-Markov theorem shows that GLS has lower variance than least squares, this means that the fixed effects estimator is more efficient than first differencing under the assumption that εit is i.i.d.

This argument extends to any other transformation which eliminates the fixed effect. GLS applied after such a transformation is equal to the fixed effects estimator and is more efficient than least squares applied after the same transformation under i.i.d. errors. This shows that the fixed effects estimator is Gauss-Markov efficient in the class of estimators which eliminate the fixed effect, under these assumptions.

17.11 Dummy Variables Regression

An alternative way to estimate the fixed effects model is by least squares of Yit on Xit and a full set of dummy variables, one for each individual in the sample. It turns out that this is algebraically equivalent to the within estimator.

To see this start with the error-component model without a regressor:

Yit=ui+εit.

Consider least squares estimation of the vector of fixed effects u=(u1,,uN). Since each fixed effect ui is an individual-specific mean and the least squares estimate of the intercept is the sample mean it follows that the least squares estimate of ui is u^i=Y¯i. The least squares residual is then ε^it=YitY¯i= Y˙it, the within transformation. If you would prefer an algebraic argument, let di be a vector of N dummy variables where the ith element indicates the ith individual. Thus the ith element of di is 1 and the remaining elements are zero. Notice that ui=diu and (17.32) equals Yit=diu+εit. This is a regression with the regressors di and coefficients u. We can also write this in vector notation at the level of the individual as Yi=1idiu+εi or using full matrix notation as Y=Du+ε where D=diag{1T1,,1TN}.

The least squares estimate of u is

u^=(DD)1(DY)=diag(1i1i)1{1iYi}i=1,,n={(1i1i)11iYi}i=1,,n={Y¯i}i=1,,n.

The least squares residuals are

ε^=(InD(DD)1D)Y=Y˙

as shown in (17.19). Thus the least squares residuals from the simple error-component model are the within transformed variables.

Now consider the error-component model with regressors, which can be written as

Yit=Xitβ+diu+εit

since ui=diu as discussed above. In matrix notation

Y=Xβ+Du+ε.

We consider estimation of (β,u) by least squares and write the estimates as Y=Xβ^+Du^+ε^. We call this the dummy variable estimator of the fixed effects model.

By the Frisch-Waugh-Lovell Theorem (Theorem 3.5) the dummy variable estimator β^ and residuals ε^ may be obtained by the least squares regression of the residuals from the regression of Y on D on the residuals from the regression of X on D. We learned above that the residuals from the regression on D are the within transformations. Thus the dummy variable estimator β^ and residuals ε^ may be obtained from least squares regression of the within transformed Y˙ on the within transformed X˙. This is exactly the fixed effects estimator β^fe. Thus the dummy variable and fixed effects estimators of β are identical.

This is sufficiently important that we state this result as a theorem.

Theorem 17.1 The fixed effects estimator of β algebraically equals the dummy variable estimator of β. The two estimators have the same residuals.

This may be the most important practical application of the Frisch-Waugh-Lovell Theorem. It shows that we can estimate the coefficients either by applying the within transformation or by inclusion of dummy variables (one for each individual in the sample). This is important because in some cases one approach is more convenient than the other and it is important to know that the two methods are algebraically equivalent.

When N is large it is advisable to use the within transformation rather than the dummy variable approach. This is because the latter requires considerably more computer memory. To see this consider the matrix D in (17.34) in the balanced case. It has TN2 elements which must be created and stored in memory. When N is large this can be excessive. For example, if T=10 and N=10,000, the matrix D has one billion elements! Whether or not a package can technically handle a matrix of this dimension depends on several particulars (system RAM, operating system, package version), but even if it can execute the calculation the computation time is slow. Hence for fixed effects estimation with large N it is recommended to use the within transformation rather than dummy variable regression.

The dummy variable formulation may add insight about how the fixed effects estimator achieves invariance to the fixed effects. Given the regression equation (17.34) we can write the least squares estimator of β using the residual regression formula:

β^fe=(XMDX)1(XMDY)=(XMDX)1(XMD(Xβ+Du+ε))=β+(XMDX)1(XMDε)

since MDD=0. The expression (17.35) is free of the vector u and thus β^fe is invariant to u. This is another demonstration that the fixed effects estimator is invariant to the actual values of the fixed effects, and thus its statistical properties do not rely on assumptions about ui.

17.12 Fixed Effects Covariance Matrix Estimation

First consider estimation of the classical covariance matrix Vfe0 as defined in (17.27). This is

V^fe0=σ^ε2(X˙X˙)1

with

σ^ε2=1nNki=1ntSiε^it2=1nNki=1nε^iε^i.

The N+k degree of freedom adjustment is motivated by the dummy variable representation. You can verify that σ^ε2 is unbiased for σε2 under assumptions (17.18), (17.25) and (17.26). See Exercise 17.8.

Notice that the assumptions (17.18), (17.25), and (17.26) are identical to (17.5)-(17.7) of Assumption 17.1. The assumptions (17.8)-(17.10) are not needed. Thus the fixed effect model weakens the random effects model by eliminating the assumptions on ui but retaining those on εit.

The classical covariance matrix estimator (17.36) for the fixed effects estimator is valid when the errors εit are homoskedastic and serially uncorrelated but is invalid otherwise. A covariance matrix estimator which allows εit to be heteroskedastic and serially correlated across t is the cluster-robust covariance matrix estimator, clustered by individual

V^fecluster =(X˙X˙)1(i=1NX˙iε^iε^iX˙i)(X˙X˙)1

where ε^i as the fixed effects residuals as defined in (17.23). (17.38) was first proposed by Arellano (1987). As in (4.55) V^fe cluster  can be multiplied by a degree-of-freedom adjustment. The adjustment recommended by the theory of C. Hansen (2007) is

V^fecluster =(NN1)(X˙X˙)1(i=1NX˙iε^iε^iX˙i)(X˙X˙)1

and that corresponding to (4.55) is

V^fecluster =(n1nNk)(NN1)(X˙X˙)1(i=1NX˙iε^iε^iX˙i)(X˙X˙)1

These estimators are convenient because they are simple to apply and allow for unbalanced panels.

In typical micropanel applications N is very large and k is modest. Thus the adjustment in (17.39) is minor while that in (17.40) is approximately T¯/(T¯1) where T¯=n/N is the average number of time periods per individual. When T¯ is small this can be a very large adjustment. Hence the choice between (17.38), (17.39), and (17.40) can be substantial.

To understand if the degree of freedom adjustment in (17.40) is appropriate, consider the simplified setting where the residuals are constructed with the true β but estimated fixed effects ui. This is a useful approximation since the number of estimated slope coefficients β is small relative to the sample size n. Then ε^i=ε˙i=Miεi so X˙iε^i=X˙iεi and (17.38) equals

V^fecluster =(X˙X˙)1(i=1NX˙iεiεiX˙i)(X˙X˙)1

which is the idealized estimator with the true errors rather than the residuals. Since E[εiεiXi]=Σi it follows that E[V^fecluster X]=Vfe and V^fecluster  is unbiased for Vfe ! Thus no degree of freedom adjustment is required. This is despite the fact that N fixed effects have been estimated. While this analysis concerns the idealized case where the residuals have been constructed with the true coefficients β so does not translate into a direct recommendation for the feasible estimator, it still suggests that the strong ad hoc adjustment in (17.40) is unwarranted.

This (crude) analysis suggests that for the cluster robust covariance estimator for fixed effects regression the adjustment recommended by C. Hansen (17.39) is the most appropriate. It is typically well approximated by the unadjusted estimator (17.38). Based on current theory there is no justification for the ad hoc adjustment (17.40). The main argument for the latter is that it produces the largest standard errors and is thus the most conservative choice.

In current practice the estimators (17.38) and (17.40) are the most commonly used covariance matrix estimators for fixed effects estimation.

In Sections 17.22 and 17.23 we discuss covariance matrix estimation under heteroskedasticity but no serial correlation.

To illustrate, in Table 17.2 we present the fixed effect regression estimates of the investment model (17.3) in the third column with cluster-robust standard errors. The trading indicator Ti and the industry dummies cannot be included as they are time-invariant. The point estimates are similar to the random effects estimates, though the coefficients on debt and cash flow increase in magnitude.

17.13 Fixed Effects Estimation in Stata

There are several methods to obtain the fixed effects estimator β^fe in Stata.

The first method is dummy variable regression. This can be obtained by the Stata regress command, for example reg y x , cluster(id) where id is the group (individual) identifier. In most cases, as discussed in Section 17.11, this is not recommended due to the excessive computer memory requirements and slow computation. If this command is done it may be useful to suppress display of the full list of coefficient estimates. To do so, type quietly reg y x , cluster(id) followed by estimates table, keep( xcons) be se. The second command will report the coefficient(s) on x only, not those on the index variable id. (Other statistics can be reported as well.) The second method is to manually create the within transformed variables as described in Section 17.8, and then use regress.

The third method is xtreg fe which is specifically written for panel data. This estimates the slope coefficients using the partialling-out approach. The default covariance matrix estimator is classical as defined in (17.36). The cluster-robust covariance matrix (17.38) can be obtained using the options vce(robust) or r.

The fourth method is areg absorb (id). This command is an alternative implementation of partiallingout regression. The default covariance matrix estimator is the classical (17.36). The cluster-robust covariance matrix estimator (17.40) can be obtained using the cluster(id) option. The heteroskedasticityrobust covariance matrix is obtained when r or vce (robust) is specified but this is not recommended unless Ti is large as will be discussed in Section 17.22.

An important difference between the Stata xtreg and areg commands is that they implement different cluster-robust covariance matrix estimators: (17.38) in the case of xtreg and (17.40) in the case of areg. As discussed in the previous section the adjustment used by areg is ad hoc and not well-justified but produces the largest and hence most conservative standard errors.

Another difference between the commands is how they report the equation R2. This difference can be huge and stems from the fact that they are estimating distinct population counter-parts. Full dummy variable regression and the areg command calculate R2 the same way: the squared correlation between Yit and the fitted regression with all predictors including the individual dummy variables. The xtreg fe command reports three values for R2 : within, between, and overall. The “within” R2 is identical to what is obtained from a second stage regression using the within transformed variables. (The second method described above.) The “overall” R2 is the squared correlation between Yit and the fitted regression excluding the individual effects.

Which R2 should be reported? The answer depends on the baseline model before regressors are added. If we view the baseline as an individual-specific mean, then the within calculation is appropriate. If the baseline is a single mean for all observations then the full regression (areg) calculation is appropriate. The latter (areg) calculation is typically much higher than the within calculation, as the fixed effects typically “explain” a large portion of the variance. In any event as there is not a single definition of R2 it is important to be explicit about the method if it is reported.

In current econometric practice both xtreg and areg are used, though areg appears to be the more popular choice. Since the latter typically produces a much higher value of R2, reported R2 values should be viewed skeptically unless their calculation method is documented by the author.

17.14 Between Estimator

The between estimator is calculated from the individual-mean equation (17.20)

Y¯i=X¯iβ+ui+ε¯i.

Estimation can be done at the level of individuals or at the level of observations. Least squares applied to (17.41) at the level of the N individuals is

β^be=(i=1NX¯iX¯i)1(i=1NX¯iY¯i).

Least squares applied to (17.41) at the level of observations is

β~be=(i=1NtSiX¯iX¯i)1(i=1NtSiX¯iY¯i)=(i=1NTiX¯iX¯i)1(i=1NTiX¯iY¯i).

In balanced panels β~be=β^be  but they differ on unbalanced panels. β~be equals weighted least squares applied at the level of individuals with weight Ti.

Under the random effects assumptions (Assumption 17.1) β^be  is unbiased for β and has variance

Vbe=var[β^beX]=(i=1NX¯iX¯i)1(i=1NX¯iX¯iσi2)(i=1NX¯iX¯i)1

where

σi2=var[ui+ε¯i]=σu2+σε2Ti

is the variance of the error in (17.41). When the panel is balanced the variance formula simplifies to

Vbe=var[β^beX]=(i=1NX¯iX¯i)1(σu2+σε2T).

Under the random effects assumption the between estimator β^be  is unbiased for β but is less efficient than the random effects estimator β^gls . Consequently there seems little direct use for the between estimator in linear panel data applications.

Instead, its primary application is to construct an estimate of σu2. First, consider estimation of

σb2=1Ni=1Nσi2=σu2+1Ni=1Nσε2Ti=σu2+σε2T¯

where T¯=N/i=1NTi1 is the harmonic mean of Ti. (In the case of a balanced panel T¯=T.) A natural estimator of σb2 is

σ^b2=1Nki=1Ne^bi2.

where e^bi=Y¯iX¯iβ^be  are the between residuals. (Either β^be  or β~be  can be used.)

From the relation σb2=σu2+σε2/T¯ and (17.42) we can deduce an estimator for σu2. We have already described an estimator σ^ε2 for σε2 in (17.37) for the fixed effects model. Since the fixed effects model holds under weaker conditions than the random effects model, σ^ε2 is valid for the latter as well. This suggests the following estimator for σu2

σ^u2=σ^b2σ^ε2T¯.

To summarize, the fixed effect estimator is used for σ^ε2, the between estimator for σ^b2, and σ^u2 is constructed from the two.

It is possible for (17.43) to be negative. It is typical to use the constrained estimator

σ^u2=max[0,σ^b2σ^ε2T¯].

(17.44) is the most common estimator for σu2 in the random effects model.

The between estimator β^be  can be obtained using the Stata command xtreg be. The estimator β~be  can be obtained by xtreg be wls.

17.15 Feasible GLS

The random effects estimator can be written as

β^re=(i=1NXiΩi1Xi)1(i=1NXiΩi1Yi)=(i=1NX~iX~i)1(i=1NX~iY~i)

where X~i=Ωi1/2Xi and Y~i=Ωi1/2Yi. It is instructive to study these transformations.

Define Pi=1i(1i1i)11i so that Mi=IiPi. Thus while Mi is the within operator, Pi can be called the individual-mean operator since PiYi=1iY¯i. We can write

Ωi=Ii+1i1iσu2/σε2=Ii+Tiσu2σε2Pi=Mi+ρi2Pi

where

ρi=σεσε2+Tiσu2.

Since the matrices Mi and Pi are idempotent and orthogonal we find that Ωi1=Mi+ρi2Pi and

Ωi1/2=Mi+ρiPi=Ii(1ρi)Pi.

Therefore the transformation used by the GLS estimator is

Y~i=(Ii(1ρi)Pi)Yi=Yi(1ρi)1iY¯i

which is a partial within transformation.

The transformation as written depends on ρi which is unknown. It can be replaced by the estimator

ρ^i=σ^εσ^ε2+Tiσ^u2

where the estimators σ^ε2 and σ^u2 are given in (17.37) and (17.44). We obtain the feasible transformations

Y~i=Yi(1ρ^i)1iY¯i

and

X~i=Xi(1ρ^i)1iX¯i.

The feasible random effects estimator is (17.45) using (17.49) and (17.50).

In the previous section we noted that it is possible for σ^u2=0. In this case ρ^i=1 and β^re =β^pool .

What this shows is the following. The random effects estimator (17.45) is least squares applied to the transformed variables X~i and Y~i defined in (17.50) and (17.49). When ρ^i=0 these are the within transformations, so X~i=X˙i,Y~i=Y˙i, and β^re=β^fe is the fixed effects estimator. When ρ^i=1 the data are untransformed X~i=Xi,Y~i=Yi, and β^re=β^pool  is the pooled estimator. In general, X~i and Y~i can be viewed as partial within transformations.

Recalling the definition ρ^i=σ^ε/σ^ε2+Tiσ^u2 we see that when the idiosyncratic error variance σ^ε2 is large relative to Tiσ^u2 then ρ^i1 and β^re β^pool. . Thus when the variance estimates suggest that the individual effect is relatively small the random effect estimator simplifies to the pooled estimator. On the other hand when the individual effect error variance σ^u2 is large relative to σ^ε2 then ρ^i0 and β^reβ^fe. Thus when the variance estimates suggest that the individual effect is relatively large the random effect estimator is close to the fixed effects estimator.

17.16 Intercept in Fixed Effects Regression

The fixed effect estimator does not apply to any regressor which is time-invariant for all individuals. This includes an intercept. Yet some authors and packages (e.g. Amemiya (1971) and xtreg in Stata) report an intercept. To see how to construct an estimator of an intercept take the components regression equation adding an explicit intercept

Yit=α+Xitβ+ui+εit.

We have already discussed estimation of β by β^fe. Replacing β in this equation with β^fe and then estimating α by least squares, we obtain

α^fe=Y¯X¯β^fe

where Y¯ and X¯ are averages from the full sample. This is the estimator reported by xtreg.

17.17 Estimation of Fixed Effects

For most applications researchers are interested in the coefficients β not the fixed effects ui. But in some cases the fixed effects themselves are interesting. This arises when we want to measure the distribution of ui to understand its heterogeneity. It also arises in the context of prediction. As discussed in Section 17.11 the fixed effects estimate u^ is obtained by least squares applied to the regression (17.33). To find their solution, replace β in (17.33) with the least squares minimizer β^fe and apply least squares. Since this is the individual-specific intercept the solution is

u^i=1TitSi(YitXitβ^fe)=Y¯iX¯iβ^fe.

Alternatively, using (17.34) this is

u^=(DD)1D(YXβ^fe)=diag{Ti1}i=1Ndi1i(YiXiβ^fe)=i=1Ndi(Y¯iX¯iβ^fe)=(u^1,,u^N)

Thus the least squares estimates of the fixed effects can be obtained from the individual-specific means and does not require a regression with N+k regressors.

If an intercept has been estimated (as discussed in the previous section) it should be subtracted from (17.51). In this case the estimated fixed effects are

u^i=Y¯iX¯iβ^feα^fe

With either estimator when the number of time series observations Ti is small u^i will be an imprecise estimator of ui. Thus calculations based on u^i should be interpreted cautiously.

The fixed effects (17.52) may be obtained in Stata after ivreg, fe using the predict u command or after areg using the predict d command.

17.18 GMM Interpretation of Fixed Effects

We can also interpret the fixed effects estimator through the generalized method of moments.

Take the fixed effects model after applying the within transformation (17.21). We can view this as a system of T equations, one for each time period t. This is a multivariate regression model. Using the notation of Chapter 11 define the T×kT regressor matrix

Xi=(X˙i100X˙i200X˙iT).

If we treat each time period as a separate equation we have the kT moment conditions

E[Xi(Y˙iX˙iβ)]=0.

This is an overidentified system of equations when T3 as there are k coefficients and kT moments. (However, the moments are collinear due to the within transformation. There are k(T1) effective moments.) Interpreting this model in the context of multivariate regression, overidentification is achieved by the restriction that the coefficient vector β is constant across time periods.

This model can be interpreted as a regression of Y˙i on X˙i using the instruments Xi. The 2SLS estimator using matrix notation is

β^=((X˙X)(XX)1(XX˙))1((X˙X)(XX)1(XY˙))

Notice that

XX=i=1n(X˙i100X˙i200X˙iT)(X˙i100X˙i200X˙iT)=(i=1nX˙i1X˙i100i=1nX˙i2X˙i200i=1nX˙iTX˙iT)XX˙=(i=1nX˙i1X˙i1i=1nX˙iTX˙iT)

and

XY˙=(i=1nX˙i1Y˙i1i=1nX˙iTY˙iT)

Thus the 2SLS estimator simplifies to

β^2sls=(t=1T(i=1nX˙itX˙it)(i=1nX˙itX˙it)1(i=1nX˙itX˙it))1×(t=1T(i=1nX˙itX˙it)(i=1nX˙itX˙it)1(i=1nX˙itY˙it))=(t=1Ti=1nX˙itX˙it)1(t=1Ti=1nX˙itY˙it)=β^fe

the fixed effects estimator!

This shows that if we treat each time period as a separate equation with its separate moment equation so that the system is over-identified, and then estimate by GMM using the 2SLS weight matrix, the resulting GMM estimator equals the simple fixed effects estimator. There is no change by adding the additional moment conditions.

The 2SLS estimator is the appropriate GMM estimator when the equation error is serially uncorrelated and homoskedastic. If we use a two-step efficient weight matrix which allows for heteroskedasticity and serial correlation the GMM estimator is

β^gmm=(t=1T(i=1nX˙itX˙it)(i=1nX˙itX˙ite^it2)1(i=1nX˙itX˙it))1×(t=1T(i=1nX˙itX˙it)(i=1nX˙itX˙ite^it2)1(i=1nX˙itY˙it))

where e^it are the fixed effects residuals.

Notationally, this GMM estimator has been written for a balanced panel. For an unbalanced panel the sums over i need to be replaced by sums over individuals observed during time period t. Otherwise no changes need to be made.

17.19 Identification in the Fixed Effects Model

The identification of the slope coefficient β in fixed effects regression is similar to that in conventional regression but somewhat more nuanced.

It is most useful to consider the within-transformed equation, which can be written as Y˙it=X˙itβ+ε˙it or Y˙i=X˙iβ+ε˙i

From regression theory we know that the coefficient β is the linear effect of X˙it on Y˙it. The variable X˙it is the deviation of the regressor from its individual-specific mean and similarly for Y˙it. Thus the fixed effects model does not identify the effect of the average level of Xit on the average level of Yit, but rather the effect of the deviations in Xit on Yit.

In any given sample the fixed effects estimator is only defined if i=1NX˙iX˙i is full rank. The population analog (when individuals are i.i.d.) is

E[X˙iX˙i]>0.

Equation (17.54) is the identification condition for the fixed effects estimator. It requires that the regressor matrix is full-rank in expectation after application of the within transformation. The regressors cannot contain any variable which does not have time-variation at the individual level nor a set of regressors whose time-variation at the individual level is collinear.

17.20 Asymptotic Distribution of Fixed Effects Estimator

In this section we present an asymptotic distribution theory for the fixed effects estimator in balanced panels. Unbalanced panels are considered in the following section.

We use the following assumptions.

Assumption $17.2

  1. Yit=Xitβ+ui+εit for i=1,,N and t=1,,T with T2.

  2. The variables (εi,Xi),i=1,,N, are independent and identically distributed.

  3. E[Xisεit]=0 for all s=1,,T.

  4. QT=E[X˙iX˙i]>0.

  5. E[εit4]<.

  6. EXit4<

Given Assumption 17.2 we can establish asymptotic normality for β^fe.

Theorem 17.2 Under Assumption 17.2, as N,N(β^feβ)dN(0,Vβ) where Vβ=QT1ΩTQT1 and ΩT=E[X˙iεiεiX˙i].

This asymptotic distribution is derived as the number of individuals N diverges to infinity while the time number of time periods T is held fixed. Therefore the normalization is N rather than n (though either could be used since T is fixed). This approximation is appropriate for the context of a large number of individuals. We could alternatively derive an approximation for the case where both N and T diverge to infinity but this would not be a stronger result. One way of thinking about this is that Theorem 17.2 does not require T to be large.

Theorem 17.2 may appear standard given our arsenal of asymptotic theory but in a fundamental sense it is quite different from any other result we have introduced. Fixed effects regression is effectively estimating N+k coefficients - the k slope coefficients β plus the N fixed effects u - and the theory specifies that N. Thus the number of estimated parameters is diverging to infinity at the same rate as sample size yet the the estimator obtains a conventional mean-zero sandwich-form asymptotic distribution. In this sense Theorem 17.2 is new and special.

We now discuss the assumptions.

Assumption 17.2.2 states that the observations are independent across individuals i. This is commonly used for panel data asymptotic theory. An important implied restriction is that it means that we exclude from the regressors any serially correlated aggregate time series variation. Assumption 17.2.3 imposes that Xit is strictly exogeneous for εit. This is stronger than simple projection but is weaker than strict mean independence (17.18). It does not impose any condition on the individual-specific effects ui.

Assumption 17.2.4 is the identification condition discussed in the previous section.

Assumptions 17.2.5 and 17.2.6 are needed for the central limit theorem.

We now prove Theorem 17.2. The assumptions imply that the variables (X˙i,εi) are i.i.d. across i and have finite fourth moments. Thus by the WLLN

1Ni=1NX˙iX˙ipE[X˙iX˙i]=QT.

Assumption 17.2.3 implies

E[X˙iεi]=t=1TE[X˙itεit]=t=1TE[Xitεit]t=1Tj=1TE[Xijεit]=0

so they are mean zero. Assumptions 17.2.5 and 17.2.6 imply that X˙iεi has a finite covariance matrix ΩT. The assumptions for the CLT (Theorem 6.3) hold, thus

1Ni=1NX˙iεidN(0,ΩT)

Together we find

N(β^feβ)=(1Ni=1NX˙iX˙i)1(1Ni=1NX˙iεi)dQT1 N(0,ΩT)=N(0,Vβ)

as stated.

17.21 Asymptotic Distribution for Unbalanced Panels

In this section we extend the theory of the previous section to cover unbalanced panels under random selection. Our presentation is built on Section 17.1 of Wooldridge (2010).

Think of an unbalanced panel as a shortened version of an idealized balanced panel where the shortening is due to “missing” observations due to random selection. Thus suppose that the underlying (potentially latent) variables are Yi=(Yi1,,YiT) and Xi=(Xi1,,XiT). Let si=(si1,,siT) be a vector of selection indicators, meaning that sit=1 if the time period t is observed for individual i and sit=0 otherwise. Then we can describe the estimators algebraically as follows.

Let Si=diag(si) and Mi=Sisi(sisi)1si, which is idempotent. The within transformations can be written as Y˙i=MiYi and X˙i=MiXi. They have the property that if sit=0 (so that time period t is missing) then the tth element of Y˙i and the tth row of X˙i are all zeros. The missing observations have been replaced by zeros. Consequently, they do not appear in matrix products and sums.

The fixed effects estimator of β based on the observed sample is

β^fe=(i=1NX˙iX˙i)1(i=1NX˙iY˙i).

Centered and normalized,

N(β^feβ)=(1Ni=1NX˙iX˙i)1(1Ni=1NX˙iεi)

Notationally this appears to be identical to the case of a balanced panel but the difference is that the within operator Mi incorporates the sample selection induced by the unbalanced panel structure.

To derive a distribution theory for β^fe we  need to be explicit about the stochastic nature of si. That is, why are some time periods observed and some not? We could take several approaches:

  1. We could treat si as fixed (non-random). This is the easiest approach but the most unsatisfactory.

  2. We could treat si as random but independent of (Yi,Xi). This is known as “missing at random” and is a common assumption used to justify methods with missing observations. It is justified when the reason why observations are not observed is independent of the observations. This is appropriate, for example, in panel data sets where individuals enter and exit in “waves”. The statistical treatment is not substantially different from the case of fixed si.

  3. We could treat (Yi,Xi,si) as jointly random but impose a condition sufficient for consistent estimation of β. This is the approach we take below. The condition turns out to be a form of mean independence. The advantage of this approach is that it is less restrictive than full independence. The disadvantage is that we must use a conditional mean restriction rather than uncorrelatedness to identify the coefficients.

The specific assumptions we impose are as follows.

Assumption 17.3

  1. Yit=Xitβ+ui+εit for i=1,,N with Ti2.

  2. The variables (εi,Xi,si),i=1,,N, are independent and identically distributed.

  3. E[εitXi,si]=0.

  4. QT=E[X˙iX˙i]>0.

  5. E[εit4]<.

  6. EXit4<.

The primary difference with Assumption 17.2 is that we have strengthened strict exogeneity to strict mean independence. This imposes that the regression model is properly specified and that selection does not affect the mean of εit. It is less restrictive than full independence since si can affect other moments of εit and more importantly does not restrict the joint dependence between si and Xi.

Given the above development it is straightforward to establish asymptotic normality.

Theorem 17.3 Under Assumption 17.3, as N,N(β^feβ)dN(0,Vβ) where Vβ=QT1ΩTQT1 and ΩT=E[X˙iεiεiX˙i]. We now prove Theorem 17.3. The assumptions imply that the variables (X˙i,εi) are i.i.d. across i and have finite fourth moments. By the WLLN

1Ni=1NX˙iX˙ipE[X˙iX˙i]=QT.

The random vectors X˙iεi are i.i.d. The matrix X˙i is a function of (Xi,si) only. Assumption 17.3.3 and the law of iterated expectations implies

E[X˙iεi]=E[X˙iE[εiXi,si]]=0.

so that X˙iεi is mean zero. Assumptions 17.3.5 and 17.3.6 and the fact that si is bounded implies that X˙iεi has a finite covariance matrix, which is ΩT. The assumptions for the CLT hold, thus

1Ni=1NX˙iεidN(0,ΩT)

Together we obtain the stated result.

17.22 Heteroskedasticity-Robust Covariance Matrix Estimation

We have introduced two covariance matrix estimators for the fixed effects estimator. The classical estimator (17.36) is appropriate for the case where the idiosyncratic errors εit are homoskedastic and serially uncorrelated. The cluster-robust estimator (17.38) allows for heteroskedasticity and arbitrary serial correlation. In this and the following section we consider the intermediate case where εit is heteroskedastic but serially uncorrelated.

Assume that (17.18) and (17.26) hold but not necessarily (17.25). Define the conditional variances

E[εit2Xi]=σit2.

Then Σi=E[εiεiXi]=diag(σit2). The covariance matrix (17.24) can be written as

Vfe=(X˙X˙)1(i=1NtSiX˙itX˙itσit2)(X˙X˙)1

A natural estimator of σit2 is ε^it2. Replacing σit2 with ε^it2 in (17.56) and making a degree-of-freedom adjustment we obtain a White-type covariance matrix estimator

V^fe=nnNk(X˙X˙)1(i=1NtSiX˙itX˙itε^it2)(X˙X˙)1.

Following the insight of White (1980) it may seem appropriate to expect V^fe  to be a reasonable estimator of Vfe. . Unfortunately this is not the case as discovered by Stock and Watson (2008). The problem is that V^fe is a function of the individual-specific means ε¯i which are negligible only if the number of time series observations Ti are large.

We can see this by a simple bias calculation. Assume that the sample is balanced and that the residuals are constructed with the true β. Then

ε^it=ε˙it=εit1Tt=1Tεij.

Using (17.26) and (17.55)

E[ε^it2Xi]=(T2T)σit2+σ¯i2T

where σ¯i2=T1t=1Tσit2. (See Exercise 17.10.) Using (17.57) and setting k=0 we obtain

E[V^feX]=TT1(X˙X˙)1(i=1NtSiX˙itX˙itE[ε^it2Xi])(X˙X˙)1=(T2T1)Vfe+1T1(X˙X˙)1(i=1NX˙iX˙iσ¯i2)(X˙X˙)1.

Thus V^fe is biased of order O(T1). Unless T this bias will persist as N.V^fe is unbiased in two contexts. The first is when the errors εit are homoskedastic. The second is when T=2. (To show the latter requires some algebra so is omitted.)

To correct the bias for the case T>2, Stock and Watson (2008) proposed the estimator

V~fe=(T1T2)V^fe1T1B^feB^fe=(X˙X˙)1(i=1NX˙iX˙iσ^i2)(X˙X˙)1σ^i2=1T1t=1Tε^it2.

You can check that E[σ^i2Xi]=σ¯i2 and E[V~fe Xi]=Vfe  so V~fe  is unbiased for Vfe . (See Exercise 17.11.)

Stock and Watson (2008) show that V~fe  is consistent with T fixed and N. In simulations they show that V~fe has excellent performance.

Because of the Stock-Watson analysis Stata no longer calculates the heteroskedasticity-robust covariance matrix estimator V^fe when the fixed effects estimator is calculated using the xtreg command. Instead, the cluster-robust estimator V^fecluster  is reported when robust standard errors are requested. However, fixed effects is often implemented using the areg command which reports the biased estimator V^fe if robust standard errors are requested. These leads to the practical recommendation that areg should be used with the cluster(id) option.

At present the corrected estimator (17.58) has not been programmed as a Stata option.

17.23 Heteroskedasticity-Robust Estimation - Unbalanced Case

A limitation with the bias-corrected robust covariance matrix estimator of Stock and Watson (2008) is that it was only derived for balanced panels. In this section we generalize their estimator to cover unbalanced panels.

The estimator is

V~fe=(X˙X˙)1Ω~fe(X˙X˙)1Ω~fe=i=1NtSiX˙itX˙it[(Tiε^it2σ^i2Ti2)1{Ti>2}+(Tiε^it2Ti1)1{Ti=2}]

where

σ^i2=1Ti1tSiε^it2.

To justify this estimator, as in the previous section make the simplifying assumption that the residuals are constructed with the true β. We calculate that

E[ε^it2Xi]=(Ti2Ti)σit2+σ¯i2TiE[σ^i2Xi]=σ¯i2.

You can show that under these assumptions, E[V~feX]=Vfe and thus V~fe is unbiased for Vfe. (See Exercise 17.12.)

In balanced panels the estimator V~fe simplifies to the Stock-Watson estimator (with k=0 ).

17.24 Hausman Test for Random vs Fixed Effects

The random effects model is a special case of the fixed effects model. Thus we can test the null hypothesis of random effects against the alternative of fixed effects. The Hausman test is typically used for this purpose. The statistic is a quadratic in the difference between the fixed effects and random effects estimators. The statistic is

H=(β^feβ^re)var^[β^feβ^re]1(β^feβ^re)=(β^feβ^re)(V^feV^re)1(β^feβ^re)

where both V^fe and V^re take the classical (non-robust) form.

The test can be implemented on a subset of the coefficients β. In particular this needs to be done if the regressors Xit contain time-invariant elements so that the random effects estimator contains more coefficients than the fixed effects estimator. In this case the test should be implemented only on the coefficients on the time-varying regressors.

An asymptotic 100α test rejects if H exceeds the 1αth quantile of the χk2 distribution where k= dim(β). If the test rejects this is evidence that the individual effect ui is correlated with the regressors so the random effects model is not appropriate. On the other hand if the test fails to reject this evidence says that the random effects hypothesis cannot be rejected.

It is tempting to use the Hausman test to select whether to use the fixed effects or random effects estimator. One could imagine using the random effects estimator if the Hausman test fails to reject the random effects hypothesis and using the fixed effects estimator otherwise. This is not, however, a wise approach. This procedure - selecting an estimator based on a test - is known as a pretest estimator and is biased. The bias arises because the result of the test is random and correlated with the estimators.

Instead, the Hausman test can be used as a specification test. If you are planning to use the random effects estimator (and believe that the random effects assumptions are appropriate in your context) the Hausman test can be used to check this assumption and provide evidence to support your approach.

17.25 Random Effects or Fixed Effects?

We have presented the random effects and fixed effects estimators of the regression coefficients. Which should be used in practice? How should we view the difference?

The basic distinction is that the random effects estimator requires the individual error ui to satisfy the conditional mean assumption (17.8). The fixed effects estimator does not require (17.8) and is robust to its violation. In particular, the individual effect ui can be arbitrarily correlated with the regressors. On the other hand the random effects estimator is efficient under random effects (Assumption 17.1). Current econometric practice is to prefer robustness over efficiency. Consequently, current practice is (nearly uniformly) to use the fixed effects estimator for linear panel data models. Random effects estimators are only used in contexts where fixed effects estimation is unknown or challenging (which occurs in many nonlinear models).

The labels “random effects” and “fixed effects” are misleading. These are labels which arose in the early literature and we are stuck with these labels today. In a previous era regressors were viewed as “fixed”. Viewing the individual effect as an unobserved regressor leads to the label of the individual effect as “fixed”. Today, we rarely refer to regressors as “fixed” when dealing with observational data. We view all variables as random. Consequently describing ui as “fixed” does not make much sense and it is hardly a contrast with the “random effect” label since under either assumption ui is treated as random. Once again, the labels are unfortunate but the key difference is whether ui is correlated with the regressors.

17.27 Two-Way Error Components

In the previous section we discussed inclusion of time trends and individual-specific time trends. The functional forms imposed by linear time trends are restrictive. There is no economic reason to expect the “trend” of a series to be linear. Business cycle “trends” are cyclic. This suggests that it is desirable to be more flexible than a linear (or polynomial) specifications. In this section we consider the most flexible specification where the trend is allowed to take any arbitrary shape but will require that it is common rather than individual-specific.

The model we consider is the two-way error component model

Yit=Xitβ+vt+ui+εit.

In this model ui is an unobserved individual-specific effect, vt is an unobserved time-specific effect, and εit is an idiosyncratic error.

The two-way model (17.63) can be handled either using random effects or fixed effects. In a random effects framework the errors vt and ui are modeled as in Assumption 17.1. When the panel is balanced the covariance matrix of the error vector e=v1N+1Tu+ε is

var[e]=Ω=(IT1N1N)σv2+(1T1TIN)σu2+Inσε2.

When the panel is unbalanced a similar but cumbersome expression for (17.64) can be derived. This variance (17.64) can be used for GLS estimation of β.

More typically (17.63) is handled using fixed effects. The two-way within transformation subtracts both individual-specific means and time-specific means to eliminate both vt and ui from the two-way model (17.63). For a variable Yit we define the time-specific mean as follows. Let St be the set of individuals i for which the observation t is included in the sample and let Nt be the number of these individuals. Then the time-specific mean at time t is

Y~t=1NtiStYit.

This is the average across all values of Yit observed at time t.

For the case of balanced panels the two-way within transformation is

Y¨it=YitY¯iY~t+Y¯

where Y¯=n1i=1Nt=1TYit is the full-sample mean. If Yit satisfies the two-way component model

Yit=vt+ui+εit

then Y¯i=v¯+ui+ε¯i,Y~t=vt+u¯+ε~t and Y¯=v¯+u¯+ε¯. Hence

Y¨it=vt+ui+εit(v¯+ui+ε¯i)(vt+u¯+ε~t)+v¯+u¯+ε¯=εitε¯iε~t+ε¯=ε¨it

so the individual and time effects are eliminated.

The two-way within transformation applied to (17.63) yields

Y¨it=X¨itβ+ε¨it

which is invariant to both vt and ui. The two-way within estimator is least squares applied to (17.66).

For the unbalanced case there are two computational approaches to implement the estimator. Both are based on the realization that the estimator is equivalent to including dummy variables for all time periods. Let τt be a set of T dummy variables where the tth indicates the tth time period. Thus the tth element of τt is 1 and the remaining elements are zero. Set v=(ν1,,νT) as the vector of time fixed effects. Notice that vt=τtν. We can write the two-way model as

Yit=Xitβ+τtν+ui+εit.

This is the dummy variable representation of the two-way error components model.

Model (17.67) can be estimated by one-way fixed effects with regressors Xit and τt and coefficient vectors β and ν. This can be implemented by standard one-way fixed effects methods including xtreg or areg in Stata. This produces estimates of the slopes β as well as the time effects ν. To achieve identification one time dummy variable is omitted from τt so the estimated time effects are all relative to this baseline time period. This is the most common method in practice to estimate a two-way fixed effects model. As the number of time periods is typically modest this is a computationally attractive approach.

The second computational approach is to eliminate the time effects by residual regression. This is done by the following steps. First, subtract individual-specific means for (17.67). This yields

Y˙it=X˙itβ+τ˙tv+ε˙it.

Second, regress Y˙it on τ˙t to obtain a residual Y¨it and regress each element of X˙it on τ˙t to obtain a residual X¨it. Third, regress Y¨it on X¨it to obtain the within estimator of β. These steps eliminate the fixed effects vt so the estimator is invariant to their value. What is important about this two-step procedure is that the second step is not a within transformation across the time index but rather standard regression.

If the two-way within estimator is used then the regressors Xit cannot include any time-invariant variables Xi or common time series variables Xt. Both are eliminated by the two-way within transformation. Coefficients are only identified for regressors which have variation both across individuals and across time.

If desired, the relevance of the time effects can be tested by an exclusion test on the coefficients ν. If the test rejects the hypothesis of zero coefficients then this indicates that the time effects are relevant in the regression model.

The fixed effects estimator of (17.63) is invariant to the values of vt and ui, thus no assumptions need to be made concerning their stochastic properties.

To illustrate, the fourth column of Table 17.2 presents fixed effects estimates of the investment equation, augmented to included year dummy indicators, and is thus a two-way fixed effects model. In this example the coefficient estimates and standard errors are not greatly affected by the inclusion of the year dummy variables.

17.28 Instrumental Variables

Take the fixed effects model

Yit=Xitβ+ui+εit.

We say Xit is exogenous for εit if E[Xitεit]=0, and we say Xit is endogenous for εit if E[Xitεit]0. In Chapter 12 we discussed several economic examples of endogeneity and the same issues apply in the panel data context. The primary difference is that in the fixed effects model we only need to be concerned if the regressors are correlated with the idiosyncratic error εit, as correlation between Xit and ui is allowed.

As in Chapter 12 if the regressors are endogenous the fixed effects estimator will be biased and inconsistent for the structural coefficient β. The standard approach to handling endogeneity is to specify instrumental variables Zit which are both relevant (correlated with Xit ) yet exogenous (uncorrelated with εit ).

Let Zit be an ×1 instrumental variable where k. As in the cross-section case, Zit may contain both included exogenous variables (variables in Xit that are exogenous) and excluded exogenous variables (variables not in Xit ). Let Zi be the stacked instruments by individual and Z be the stacked instruments for the full sample.

The dummy variable formulation of the fixed effects model is Yit=Xitβ+diu+εit where di is an N×1 vector of dummy variables, one for each individual in the sample. The model in matrix notation for the full sample is

Y=Xβ+Du+ε.

Theorem 17.1 shows that the fixed effects estimator for β can be calculated by least squares estimation of (17.69). Thus the dummies D should be viewed as included exogenous variables. Consider 2SLS estimation of β using the instruments Z for X. Since D is an included exogenous variable it should also be used as an instrument. Thus 2SLS estimation of the fixed effects model (17.68) is algebraically 2SLS of the regression (17.69) of Y on (X,D) using the pair (Z,D) as instruments.

Since the dimension of D can be excessively large, as discussed in Section 17.11, it is advisable to use residual regression to compute the 2SLS estimator as we now describe.

In Section 12.12, we described several alternative representations for the 2SLS estimator. The fifth (equation (12.32)) shows that the 2SLS estimator for β equals

β^2 sls =(XMDZ(ZMDZ)1ZMDX)1(XMDZ(ZMDZ)1ZMDY)

where MD=InD(DD)1D. The latter is the matrix within operator, thus MDY=Y˙,MDX=X˙, and MDZ=Z˙. It follows that the 2SLS estimator is

β^2 sls =(X˙Z˙(Z˙Z˙)1Z˙X˙)1(X˙Z˙(Z˙Z˙)1Z˙Y˙).

This is convenient. It shows that the 2SLS estimator for the fixed effects model can be calculated by applying 2SLS to the within-transformed Yit,Xit, and Zit. The 2SLS residuals are e^=Y˙X˙β^2sls.

This estimator can be obtained using the Stata command xtivreg fe. It can also be obtained using the Stata command ivregress after making the within transformations.

The presentation above focused for clarity on the one-way fixed effects model. There is no substantial change in the two-way fixed effects model

Yit=Xitβ+ui+vt+εit.

The easiest way to estimate the two-way model is to add T1 time-period dummies to the regression model and include these dummy variables as both regressors and instruments.

17.29 Identification with Instrumental Variables

To understand the identification of the structural slope coefficient β in the fixed effects model it is necessary to examine the reduced form equation for the endogenous regressors Xit. This is

Xit=ΓZit+Wi+ζit

where Wi is a k×1 vector of fixed effects for the k regressors and ζit is an idiosyncratic error.

The coefficient matrix Γ is the linear effect of Zit on Xit holding the fixed effects Wi constant. Thus Γ has a similar interpretation as the coefficient β in the fixed effects regression model. It is the effect of the variation in Zit about its individual-specific mean on Xit.

The 2SLS estimator is a function of the within transformed variables. Applying the within transformation to the reduced form we find X˙it=ΓZ˙it+ζ˙it. This shows that Γ is the effect of the within-transformed instruments on the regressors. If there is no time-variation in the within-transformed instruments or there is no correlation between the instruments and the regressors after removing the individual-specific means then the coefficient Γ will be either not identified or singular. In either case the coefficient β will not be identified.

Thus for identification of the fixed effects instrumental variables model we need

E[Z˙iZ˙i]>0

and

rank(E[Z˙iX˙i])=k.

Condition (17.70) is the same as the condition for identification in fixed effects regression - the instruments must have full variation after the within transformation. Condition (17.71) is analogous to the relevance condition for identification of instrumental variable regression in the cross-section context but applies to the within-transformed instruments and regressors.

Condition (17.71) shows that to examine instrument validity in the context of fixed effects 2SLS it is important to estimate the reduced form equation using fixed effects (within) regression. Standard tests for instrument validity ( F tests on the excluded instruments) can be applied. However, since the correlation structure of the reduced form equation is in general unknown it is appropriate to use a cluster-robust covariance matrix, clustered at the level of the individual.

17.30 Asymptotic Distribution of Fixed Effects 2SLS Estimator

In this section we present an asymptotic distribution theory for the fixed effects estimator. We provide a formal theory for the case of balanced panels and discuss an extension to the unbalanced case.

We use the following assumptions for balanced panels.

Assumption $17.4

  1. Yit=Xitβ+ui+εit for i=1,,N and t=1,,T with T2.

  2. The variables (εi,Xi,Zi),i=1,,N, are independent and identically distributed.

  3. E[Zisεit]=0 for all s=1,,T.

  4. QZZ=E[Z˙iZ˙i]>0.

  5. rank(QZX)=k where QZX=E[Z˙iX˙i].

  6. E[εit4]<.

  7. EXit2<.

  8. EZit4<.

Given Assumption 17.4 we can establish asymptotic normality for β^2sls.

Theorem 17.4 Under Assumption 17.4, as N,N(β^2slsβ)dN(0,Vβ) where

Vβ=(QZXΩZZ1QZX)1(QZXΩZZ1ΩZεΩZZ1QZX)(QZXΩZZ1QZX)1ΩZε=E[Z˙iεiεiZ˙i].

The proof of the result is similar to Theorem 17.2 so is omitted. The key condition is Assumption 17.4.3, which states that the instruments are strictly exogenous for the idiosyncratic errors. The identification conditions are Assumptions 17.4.4 and 17.4.5, which were discussed in the previous section.

The theorem is stated for balanced panels. For unbalanced panels we can modify the theorem as in Theorem 17.3 by adding the selection indicators si and replacing Assumption 17.4.3 with E[εitZi,si]= 0 , which states that the idiosyncratic errors are mean independent of the instruments and selection.

If the idiosyncratic errors εit are homoskedastic and serially uncorrelated then the covariance matrix simplifies to

Vβ=(QZXΩZZ1QZX)1σε2.

In this case a classical homoskedastic covariance matrix estimator can be used. Otherwise a clusterrobust covariance matrix estimator can be used, and takes the form

V^β^=(X˙Z˙(Z˙Z˙)1Z˙X˙)1(X˙Z˙)(Z˙Z˙)1(i=1NZ˙iε^iε^iZ˙i)×(Z˙Z˙)1(Z˙X˙)(X˙Z˙(Z˙Z˙)1Z˙X˙)1

As for the case of fixed effects regression, the heteroskedasticity-robust covariance matrix estimator is not recommended due to bias when T is small, and a bias-corrected version has not been developed.

The Stata command xtivreg, fe by default reports the classical homoskedastic covariance matrix estimator. To obtain a cluster-robust covariance matrix use option vce (robust) orvce (cluster id).

17.31 Linear GMM

Consider the just-identified 2SLS estimator. It solves the equation Z˙(Y˙X˙β)=0. These are sample analogs of the population moment condition E[Z˙i(Y˙iX˙iβ)]=0. These population conditions hold at the true β because Z˙u=ZMDu=0 as u lies in the null space of D, and E[Z˙iε]=0 is implied by Assumption 17.4.3.

The population orthogonality conditions hold in the overidentified case as well. In this case an alternative to 2SLS is GMM. Let W^ be an estimator of W=E[Z˙iεiεiZ˙i], for example

W^=1Ni=1NZ˙iε^iε^iZ˙i

where ε^i are the 2SLS fixed effects residuals. The GMM fixed effects estimator is

β^gmm=(X˙Z˙W^1Z˙X˙)1(X˙Z˙W^1Z˙Y˙).

The estimator (17.73)-(17.72) does not have a Stata command but can be obtained by generating the within transformed variables X˙,Z˙ and Y˙, and then estimating by GMM a regression of Y˙ on X˙ using Z˙ as instruments using a weight matrix clustered by individual.

17.32 Estimation with Time-Invariant Regressors

One of the disappointments with the fixed effects estimator is that it cannot estimate the effect of regressors which are time-invariant. They are not identified separately from the fixed effect and are eliminated by the within transformation. In contrast, the random effects estimator allows for time-invariant regressors but does so only by assuming strict exogeneity which is stronger than typically desired in economic applications.

It turns out that we can consider an intermediate case which maintains the fixed effects assumptions for the time-varying regressors but uses stronger assumptions on the time-invariant regressors. For our exposition we will denote the time-varying regressors by the k×1 vector Xit and the time-invariant regressors by the ×1 vector Zi.

Consider the linear regression model

Yit=Xitβ+Ziγ+ui+εit.

At the level of the individual this can be written as

Yi=Xiβ+Ziγ+ıiui+εi

where Zi=ıiZi. For the full sample in matrix notation we can write this as

Y=Xβ+Zγ+u+ε.

We maintain the assumption that the idiosyncratic errors εit are uncorrelated with both Xit and Zi at all time horizons:

E[Xisεit]=0E[Ziεit]=0.

In this section we consider the case where Zi is uncorrelated with the individual-level error ui, thus

E[Ziui]=0,

but the correlation of Xit and ui is left unrestricted. In this context we say that Zi is exogenous with respect to the fixed effect ui while Xit is endogenous with respect to ui. Note that this is a different type of endogeneity than considered in the sections on instrumental variables: there endogeneity meant correlation with the idiosyncratic error εit. Here endogeneity means correlation with the fixed effect ui.

We consider estimation of (17.74) by instrumental variables and thus need instruments which are uncorrelated with the error ui+εit. The time-invariant regressors Zi satisfy this condition due to (17.76) and (17.77), thus

E[Zi(YiXiβZiγ)]=0.

While the time-varying regressors Xit are correlated with ui the within transformed variables X˙it are uncorrelated with ui+εit under (17.75), thus

E[X˙i(YiXiβZiγ)]=0.

Therefore we can estimate (β,γ) by instrumental variable regression using the instrument set (X˙,Z). Specifically, regression of Y on X and Z treating X as endogenous, Z as exogenous, and using the instrument X˙. Write this estimator as (β^,γ^). This can be implemented using the Stata ivregress command after constructing the within transformed X˙.

This instrumental variables estimator is algebraically equal to a simple two-step estimator. The first step β^=β^fe  is the fixed effects estimator. The second step sets γ^=(ZZ)1(Zu^), the least squares coefficient from the regression of the estimated fixed effect u^i on Zi. To see this equivalence observe that the instrumental variables estimator estimator solves the sample moment equations

X˙(YXβZγ)=0Z(YXβZγ)=0.

Notice that X˙iZi=X˙iliZi=0 so X˙Z=0. Thus (17.78) is the same as X˙(YXβ)=0 whose solution is β^fe. Plugging this into the left-side of (17.79) we obtain

Z(YXβ^feZγ)=Z(YXβ^feZγ)=Z(u^Zγ)

where Y and X are the stacked individual means ıiY¯i and ıiX¯i. Set equal to 0 and solving we obtain the least squares estimator γ^=(ZZ)1(Zu^) as claimed. This equivalence was first observed by Hausman and Taylor (1981).

For standard error calculation it is recommended to estimate (β,γ) jointly by instrumental variable regression and use a cluster-robust covariance matrix clustered at the individual level. Classical and heteroskedasticity-robust estimators are misspecified due to the individual-specific effect ui.

The estimator (β^,γ^) is a special case of the Hausman-Taylor estimator described in the next section. (For an unknown reason the above estimator cannot be estimated using Stata’s xthtaylor command.)

17.33 Hausman-Taylor Model

Hausman and Taylor (1981) consider a generalization of the previous model. Their model is

Yit=X1itβ1+X2itβ2+Z1iγ1+Z2iγ2+ui+εit

where X1it and X2it are time-varying and Z1i and Z2i are time-invariant. Let the dimensions of X1it, X2it,Z1i, and Z2i be k1,k2,1, and 2, respectively.

Write the model in matrix notation as

Y=X1β1+X2β2+Z1γ1+Z2γ2+u+ε.

Let X1 and X2 denote conformable matrices of individual-specific means and let X˙1=X1X1 and X˙2=X2X2 denote the within-transformed variables.

The Hausman-Taylor model assumes that all regressors are uncorrelated with the idiosyncratic error εit at all time horizons and that X1it and Z1i are exogenous with respect to the fixed effect ui so that

E[X1itui]=0E[Z1iui]=0.

The regressors X2it and Z2i, however, are allowed to be correlated with ui.

Set X=(X1,X2,Z1,Z2) and β=(β1,β2,γ1,γ2). The assumptions imply the following population moment conditions

E[X˙1(YXβ)]=0E[X˙2(YXβ)]=0E[X1(YXβ)]=0E[Z1(YXβ)]=0.

There are 2k1+k2+1 moment conditions and k1+k2+1+2 coefficients. Identification requires k12 : that there are at least as many exogenous time-varying regressors as endogenous time-invariant regressors. (This includes the model of the previous section where k1=2=0.) Given the moment conditions the coefficients β=(β1,β2,γ1,γ2) can be estimated by 2SLS regression of (17.80) using the instruments Z=(X˙1,X˙2,X1,Z1) or equivalently Z=(X1,X˙2,X1,Z1). This is 2SLS regression treating X1 and Z1 as exogenous and X2 and Z2 as endogenous using the excluded instruments X˙2 and X1

It is recommended to use cluster-robust covariance matrix estimation clustered at the individual level. Neither conventional nor heteroskedasticity-robust covariance matrix estimators should be used as they are misspecified due to the individual-specific effect ui.

When the model is just-identified the estimators simplify as follows. β^1 and β^2 are the fixed effects estimator. γ^1 and γ^2 equal the 2 SLS estimator from a regression of u^i on Z1i and Z2i using X¯1i as an instrument for Z2i. (See Exercise 17.14.)

When the model is over-identified the equation can also be estimated by GMM with a cluster-robust weight matrix using the same equations and instruments.

This estimator with cluster-robust standard errors can be calculated using the Stata ivregress cluster(id) command after constructing the transformed variables X˙2 and X1.

The 2SLS estimator described above corresponds with the Hausman and Taylor (1981) estimator in the just-identified case with a balanced panel.

Hausman and Taylor derived their estimator under the stronger assumption that the errors εit and ui are strictly mean independent and homoskedastic and consequently proposed a GLS-type estimator which is more efficient when these assumptions are correct. Define Ω=diag(Ωi) where Ωi=Ii+ 1i1iσu2/σε2 and σε2 and σu2 are the variances of the error components εit and ui. Define as well the transformed variables Y~=Ω1/2Y,X~=Ω1/2X and Z~=Ω1/2Z. The Hausman-Taylor estimator is

β^ht=(XΩ1Z(ZΩ1Z)1ZΩ1X)1(XΩ1Z(ZΩ1Z)1ZΩ1Y)=(X~Z~(Z~Z~)1Z~X~)1(X~Z~(Z~Z~)1Z~Y~).

Recall from (17.47) that Ωi1/2=Mi+ρiPi where ρi is defined in (17.46). Thus

Y~i=Yi(1ρi)Y¯iX~1i=X1i(1ρi)X¯1iX~2i=X2i(1ρi)X¯2iZ~1i=ρiZ1iZ~2i=ρiZ2iX˙~1i=X˙1iX˙~2i=X˙2i.

It follows that the Hausman-Taylor estimator can be calculated by 2SLS regression of Y~i on (X~1i,X~2i,ρiZ1i,ρiZ2i) using the instruments (X˙1i,X˙2i,ρiX1i,ρiZ2i)

When the panel is balanced the coefficients ρi all equal and scale out from the instruments. Thus the estimator can be calculated by 2SLS regression of Y~i on (X~1i,X~2i,Z1i,Z2i) using the instruments (X˙1i,X˙2i,X1i,Z2i)

In practice ρi is unknown. It can be estimated as in (17.48) with the modification that the error variance is estimated from the untransformed 2SLS regression. Under the homoskedasticity assumptions used by Hausman and Taylor the estimator β^ht  has a classical asymptotic covariance matrix. When these assumptions are relaxed the covariance matrix can be estimated using cluster-robust methods. The Hausman-Taylor estimator with cluster-robust standard errors can be implemented in Stata by the command xthtaylor vce(robust). This Stata command, for an unknown reason, requires that there is at least one exogenous time-invariant variable (11) and at least one exogenous time-varying variable (k11), even when the model is identified. Otherwise, the estimator can be implemented using the instrumental variable method described above.

The Hausman-Taylor estimator was refined by Amemiya and MaCurdy (1986) and Breusch, Mizon and Schmidt (1989) who proposed more efficient versions using additional instruments which are valid under stronger orthogonality conditions. The observation that in the unbalanced case the instruments should be weighted by ρi was made by Gardner (1998).

In the over-identified case it is unclear if it is preferred to use the simpler 2SLS estimator β^2sls or the GLS-type Hausman-Taylor estimator β^ht. The advantages of β^ht are that it is asymptotically efficient under their stated homoskedasticity and serial correlation conditions and that there is an available program in Stata. The advantage of β^2 sls  is that it is much simpler to program (if doing so yourself), may have better finite sample properties (because it avoids variance-component estimation), and is the natural estimator from the the modern GMM viewpoint.

To illustrate, the final column of Table 17.2 contains Hausman-Taylor estimates of the investment model treating Qit1,Dit1, and Ti as endogenous for ui and CFit1 and the industry dummies as exogenous. Relative to the fixed effects models this allows estimation of the coefficients on the trading indicator Ti. The most interesting change relative to the previous estimates is that the coefficient on the trading indicator Ti doubles in magnitude relative to the random effects estimate. This is consistent with the hypothesis that Ti is correlated with the fixed effect and hence the random effects estimate is biased.

17.34 Jackknife Covariance Matrix Estimation

As an alternative to asymptotic inference the delete-cluster jackknife can be used for covariance matrix calculation. In the context of fixed effects estimation the delete-cluster estimators take the form

β^(i)=(jiX˙jX˙j)1(jiX˙jY˙j)=β^fe(i=1NX˙iX˙i)1X˙ie~i

where

e~i=(IiX˙i(X˙iX˙i)1X˙i)1e^ie^i=Y˙iX˙iβ^fe

The delete-cluster jackknife estimator of the variance of β^fe is

V^β^jack =N1Ni=1N(β^(i)β¯)(β^(i)β¯)β¯=1Ni=1Nβ^(i).

The delete-cluster jackknife estimator V^β^jack  is similar to the cluster-robust covariance matrix estimator.

For parameters which are functions θ^fe=r(β^fe) of the fixed effects estimator the delete-cluster jack- knife estimator of the variance of θ^fe is

V^θ^jack=N1Ni=1N(θ^(i)θ¯)(θ^(i)θ¯)θ^(i)=r(β^(i))θ¯=1Ni=1Nθ^(i).

The estimator V^θ^jack  is similar to the delta-method cluster-robust covariance matrix estimator for θ^.

As in the context of i.i.d. samples one advantage of the jackknife covariance matrix estimator is that it does not require the user to make a technical calculation of the asymptotic distribution. A downside is an increase in computation cost as N separate regressions are effectively estimated. This can be particularly costly in micro panels which have a large number N of individuals.

In Stata jackknife standard errors for fixed effects estimators are obtained by using either x treg fe vce(jackknife) or areg absorb(id) cluster(id) vce(jackknife) where id is the cluster variable. For the fixed effects 2SLS estimator usextivreg fe vce(jackknife).

17.35 Panel Bootstrap

Bootstrap methods can also be applied to panel data by a straightforward application of the pairs cluster bootstrap which samples entire individuals rather than single observations. In the context of panel data we call this the panel nonparametric bootstrap.

The panel nonparametric bootstrap samples N individual histories (Yi,Xi) to create the bootstrap sample. Fixed effects (or any other estimation method) is applied to the bootstrap sample to obtain the coefficient estimates. By repeating B times, bootstrap standard errors for coefficients estimates, or functions of the coefficient estimates, can be calculated. Percentile-type and percentile-t confidence intervals can be calculated. The BCa interval requires an estimator of the acceleration coefficient a which is a scaled jackknife estimate of the third moment of the estimator. In panel data the delete-cluster jackknife should be used for estimation of a.

In Stata, to obtain bootstrap standard errors and confidence intervals use either xtreg, vce(bootstrap, reps (#)) or areg, absorb(id) cluster(id) vce(bootstrap, reps(#)) where id is the cluster variable and # is the number of bootstrap replications. For the fixed effects 2SLS estimator use xtivreg, fe vce(bootstrap, reps(#)).

17.36 Dynamic Panel Models

The models so far considered in this chapter have been static with no dynamic relationships. In many economic contexts it is natural to expect that behavior and decisions are dynamic, explicitly depending on past behavior. In our investment equation, for example, economic models predict that a firm’s investment in any given year will depend on investment decisions from previous years. These considerations lead us to consider explicitly dynamic models.

The workhorse dynamic model in a panel framework is the pth-order autoregression with regressors and a one-way error component structure. This is

Yit=α1Yi,t1++αpYi,tp+Xitβ+ui+εit.

where αj are the autoregressive coefficients, Xit is a k vector of regressors, ui is an individual effect, and εit is an idiosyncratic error. It is conventional to assume that the errors ui and εit are mutually independent and the εit are serially uncorrelated and mean zero. For the present we will assume that the regressors Xit are strictly exogenous (17.17). In Section 17.41 we discuss predetermined regressors.

For many illustrations we will focus on the AR(1) model

Yit=αYi,t1+ui+εit

The dynamics should be interpreted individual-by-individual. The coefficient α in (17.82) equals the first-order autocorrelation. When α=0 the series is serially uncorrelated (conditional on ui ). α>0 means Yit is positively serially correlated. α<0 means Yit is negatively serially correlated. An autoregressive unit root holds when α=1, which means that Yit follows a random walk with possible drift. Since ui is constant for a given individual it should be treated as an individual-specific intercept. The idiosyncratic error εit plays the role of the error in a standard time series autoregression.

If |α|<1 the model (17.82) is stationary. By standard autoregressive backwards recursion we calculate that

Yit=j=0αj(ui+εit)=(1α)1ui+j=0αjεi,tj

Thus conditional on ui the mean and variance of Yit are (1α)1ui and (1α2)1σε2, respectively. The kth autocorrelation (conditional on ui ) is αk. Notice that the effect of cross-section variation in ui is to shift the mean but not the variance or serial correlation. This implies that if we view time series plots of Yit against time for a set of individuals i, the series Yit will appear to have different means but have similar variances and serial correlation.

As with the case with time series data, serial correlation (large α ) can proxy for other factors such as time trends. Thus in applications it will often be useful to include time effects to eliminate spurious serial correlation.

17.37 The Bias of Fixed Effects Estimation

To estimate the panel autoregression (17.81) it may appear natural to use the fixed effects (within) estimator. Indeed, the within transformation eliminates the individual effect ui. The trouble is that the within operator induces correlation between the AR(1) lag and the error. The result is that the within estimator is inconsistent for the coefficients when T is fixed. A thorough explanation appears in Nickell (1981). We describe the basic problem in this section focusing on the AR(1) model (17.82).

Applying the within operator to (17.82) we obtain

Y˙it=αY˙it1+ε˙it

for t2. As expected the individual effect is eliminated. The difficulty is that E[Y˙it1ε˙it]0 because both Y˙it1 and ε˙it are functions of the entire time series.

To see this clearly in a simple example, suppose we have a balanced panel with T=3. There are two observed pairs (Yit,Yit1) per individual so the within estimator equals the differenced estimator. Applying the differencing operator to (17.82) for t=3 we find

ΔYi3=αΔYi2+Δεi3.

Because of the lagged dependent variable and differencing there is effectively one observation per individual. Notice that the individual effect has been eliminated.

The fixed effects estimator of α is equal to the least squares estimator applied to (17.84), which is

α^fe=(i=1NΔYi22)1(i=1NΔYi2ΔYi3)=α+(i=1NΔYi22)1(i=1NΔYi2Δεi3).

The differenced regressor and error are negatively correlated. Indeed

E[ΔYi2Δεi3]=E[(Yi2Yi1)(εi3εi2)]=E[Yi2εi3]E[Yi1εi3]E[Yi2εi2]+E[Yi1εi2]=00σε2+0=σε2

Using the variance formula for AR(1) models (assuming |α|<1) we calculate that E[(ΔYi2)2]=2σε2/(1+ α ). It follows that the probability limit of the fixed effects estimator α^fe of α in (17.84) is

plimN(α^feα)=E[ΔYi2Δεi3]E[(ΔYi2)2]=1+α2.

It is typical to call (17.85) the “bias” of α^fe , though it is technically a probability limit.

The bias found in (17.85) is large. For α=0 the bias is 1/2 and increases towards 1 as α1. Thus for any α<1 the probability limit of α^fe is negative! This is extreme bias.

Now take the case T>3. From Nickell’s (1981) expressions and some algebra, we can calculate that the probability limit of the fixed effects estimator for |α|<1 is

plimN(α^feα)=1+α2α1αT11αT1.

It follows that the bias is of order O(1/T).

It is often asserted that it is okay to use fixed effects if T is sufficiently large, e.g. T30. However, from (17.86) we can calculate that for T=30 the bias of the fixed effects estimator is 0.056 when α=0.5 and the bias is 0.15 when α=0.9. For T=60 and α=0.9 the bias is 0.05. These magnitudes are unacceptably large. This includes the longer time series encountered in macro panels. Thus the Nickell bias problem applies to both micro and macro panel applications.

The conclusion from this analysis is that the fixed effects estimator should not be used for models with lagged dependent variables even if the time series dimension T is large.

17.38 Anderson-Hsiao Estimator

Anderson and Hsiao (1982) made an important breakthrough by showing that a simple instrumental variables estimator is consistent for the parameters of (17.81).

The method first eliminates the individual effect ui by first-differencing (17.81) for tp+1

ΔYit=α1ΔYi,t1+α2ΔYi,t2++αpΔYi,tp+ΔXitβ+Δεit.

This eliminates the individual effect ui. The challenge is that first-differencing induces correlation between ΔYit1 and Δεit :

E[ΔYi,t1Δεit]=E[(Yi,t1Yi,t2)(εitεit1)]=σε2.

The other regressors are not correlated with Δεit. For s>1,E[ΔYitsΔεit]=0, and when Xit is strictly exogenous E[ΔXitΔεit]=0.

The correlation between ΔYit1 and Δεit is endogeneity. One solution to endogeneity is to use an instrument. Anderson-Hsiao pointed out that Yit2 is a valid instrument because it is correlated with ΔYi,t1 yet uncorrelated with Δεit.

E[Yi,t2Δεit]=E[Yi,t2εit]E[Yi,t2εit1]=0.

The Anderson-Hsiao estimator is IV using Yi,t2 as an instrument for ΔYi,t1. Equivalently, this is IV using the instruments (Yi,t2,,Yi,tp1) for (ΔYi,t1,,ΔYi,tp). The estimator requires Tp+2.

To show that this estimator is consistent, for simplicity assume we have a balanced panel with T=3, p=1, and no regressors. In this case the Anderson-Hsiao IV estimator is

α^iv=(i=1NYi1ΔYi2)1(i=1NYi1ΔYi3)=α+(i=1NYi1ΔYi2)1(i=1NYi1Δεi3).

Under the assumption that εit is serially uncorrelated, (17.88) shows that E[Yi1Δεi3]=0. In general, E[Yi1ΔYi2]0. As N

α^ivpαE[Yi1Δεi3]E[Yi1ΔYi2]=α.

Thus the IV estimator is consistent for α.

The Anderson-Hsiao IV estimator relies on two critical assumptions. First, the validity of the instrument (uncorrelatedness with the equation error) relies on the assumption that the dynamics are correctly specified so that εit is serially uncorrelated. For example, many applications use an AR(1). If instead the true model is an AR(2) then Yit2 is not a valid instrument and the IV estimates will be biased. Second, the relevance of the instrument (correlatedness with the endogenous regressor) requires E[Yi1ΔYi2]0. This turns out to be problematic and is explored further in Section 17.40. These considerations suggest that the validity and accuracy of the estimator are likely to be sensitive to these unknown features.

17.39 Arellano-Bond Estimator

The orthogonality condition (17.88) is one of many implied by the dynamic panel model. Indeed, all lags Yit2,Yit3, are valid instruments. If T>p+2 these can be used to potentially improve estimation efficiency. This was first pointed out by Holtz-Eakin, Newey, and Rosen (1988) and further developed by Arellano and Bond (1991).

Using these extra instruments has a complication that there are a different number of instruments for each time period. The solution is to view the model as a system of T equations as in Section 17.18.

It will be useful to first write the model in vector notation. Stack the differenced regressors (ΔYi,t1,, ΔYi,tp,ΔXit ) into a matrix ΔXi and the coefficients into a vector θ. We can write (17.87) as ΔYi= ΔXiθ+Δεi. Stacking all N individuals this can be written as ΔY=ΔXθ+Δε.

For period t=p+2 we have p+k valid instruments [Yi1,Yip,ΔXi,p+2]. For period t=p+3 there are p+1+k valid instruments [Yi1,Yip+1,ΔXi,p+3]. For period t=p+4 there are p+2+k instruments. In general, for any tp+2 there are t2 instruments [Yi1,,Yi,t2,ΔXit]. Similarly to (17.53) we can define the instrument matrix for individual i as

Zi=[[Yi1,,Yip,ΔXi,p+2]000[Yi1,,Yi,p+1,ΔXi,p+3]000[Yi1,Yi2,,Yi,T2,ΔXi,T]].

This is (Tp1)× where =k(Tp1)+((T2)(T1)(p2)(p1))/2. This instrument matrix consists of all lagged values Yi,t2,Yi,t3, which are available in the data set plus the differenced strictly exogenous regressors.

The moment conditions are

E[Zi(ΔYiΔXiα)]=0

If T>p+2 then >p and the model is overidentified. Define the × covariance matrix for the moment conditions

Ω=E[ZiΔεiΔεiZi].

Let Z denote Zi stacked into a (Tp1)N× matrix. The efficient GMM estimator of α is

α^gmm=(ΔXZΩ1ZΔX)1(ΔXZΩ1ZΔY).

If the errors εit are conditionally homoskedastic then

Ω=E[ZiHZi]σε2

where H is given in (17.31). In this case set

Ω^1=i=1NZiHZi

as a (scaled) estimate of Ω. Under these assumptions an asymptotically efficient GMM estimator is

α^1=(ΔXZΩ^11ZΔX)1(ΔXZΩ^11ZΔY).

Estimator (17.91) is known as the one-step Arellano-Bond GMM estimator.

Under the assumption that the error εit is homoskedastic and serially uncorrelated, a classical covariance matrix estimator for α^1 is

V^10=(ΔXZΩ^11ZΔX)1σ^ε2

where σ^ε2 is the sample variance of the one-step residuals ε^i=ΔYiΔXiα^. A covariance matrix estimator which is robust to violation of these assumptions is

V^1=(ΔXZΩ^11ZΔX)1(ΔXZΩ^11ZΩ^2ZΩ^11ZΔX)(ΔXZΩ^11ZΔX)1

where

Ω^2=i=1NZiε^iε^iZi

is a (scaled) cluster-robust estimator of Ω using the one-step residuals.

An asymptotically efficient two-step GMM estimator which allows heterskedasticity is

α^2=(ΔXZΩ^21ZΔX)1(ΔXZΩ^21ZΔY).

Estimator (17.94) is known as the two-step Arellano-Bond GMM estimator. An appropriate robust covariance matrix estimator for α^2 is

V^2=(ΔXZΩ^21ZΔX)1(ΔXZΩ^21ZΩ^3ZΩ^21ZΔX)(ΔXZΩ^21ZΔX)1

where

Ω^3=i=1NZiε^iε^iZi

is a (scaled) cluster-robust estimator of Ω using the two-step residuals ε^i=ΔYiΔXiα^2. Asymptotically, V^2 is equivalent to

V~2=(ΔXZΩ^21ZΔX)1.

The GMM estimator can be iterated until convergence to produce an iterated GMM estimator.

The advantage of the Arellano-Bond estimator over the Anderson-Hsiao estimator is that when T> p+2 the additional (overidentified) moment conditions reduce the asymptotic variance of the estimator and stabilize its performance. The disadvantage is that when T is large using the full set of lags as instruments may cause a “many weak instruments” problem. The advised compromise is to limit the number of lags used as instruments.

The advantage of the one-step Arellano-Bond estimator is that the weight matrix Ω^1 does not depend on residuals and is therefore less random than the two-step weight matrix Ω^2. This can result in better performance by the one-step estimator in small to moderate samples especially when the errors are approximately homoskedastic. The advantage of the two-step estimator is that it achieves asymptotic efficiency allowing for heteroskedasticity and is thus expected to perform better in large samples with non-homoskedastic errors.

To summarize, the Arellano-Bond estimator applies GMM to the first-differenced equation (17.87) using a set of available lags Yi,t2,Yi,t3, as instruments for ΔYi,t1,,ΔYi,tp.

The Arellano-Bond estimator may be obtained in Stata using either the xtabond or xtdpd command. The default setting is the one-step estimator (17.91) and non-robust standard errors (17.92). For the twostep estimator and robust standard errors use the twostep vce (robust) options. Reported standard errors in Stata are based on Windmeijer’s (2005) finite-sample correction to the asymptotic estimator (17.96). The robust covariance matrix (17.95) nor the iterated GMM estimator are implemented.

17.40 Weak Instruments

Blundell and Bond (1998) pointed out that the Anderson-Hsiao and Arellano-Bond estimators suffer from weak instruments. This can be seen easiest in the AR(1) model with the Anderson-Hsiao estimator which uses Yi,t2 as an instrument for ΔYi,t1. The reduced form equation for ΔYit1 is

ΔYi,t1=Yi,t2γ+vit.

The reduced form coefficient γ is defined by projection. Using ΔYi,t1=(α1)Yi,t2+ui+εi,t1 and E[Yit2εi,t1]=0 we calculate that

γ=E[Yi,t2ΔYi,t1]E[Yit22]=(α1)+E[Yi,t2ui]E[Yi,t22].

Assuming stationarity so that (17.83) holds,

E[Yi,t2ui]=E[(ui1α+j=0αjεi,t2j)ui]=σu21α

and

E[Yi,t22]=E[(ui1α+j=0αjεit2j)2]=σu2(1α)2+σε2(1α2)

where σu2=E[ui2] and σε2=E[εit2]. Using these expressions and a fair amount of algebra, Blundell and Bond (1998) found that the reduced form coefficient equals

γ=(α1)(kk+σu2/σε2)

where k=(1α)/(1+α). The Anderson-Hsiao instrument Yi,t2 is weak if γ is close to zero. From (17.97) we see that γ=0 when either α=1 (a unit root) or σu2/σε2= (the idiosyncratic effect is small relative to the individualspecific effect). In either case the coefficient α is not identified. We know from our earlier study of the weak instruments problem (Section 12.36) that when γ is close to zero then α is weakly identified and the estimators will perform poorly. This means that when the autoregressive coefficient α is large or the individual-specific effect dominates the idiosyncratic effect these estimators will be weakly identified, have poor performance, and conventional inference methods will be misleading. Since the value of α and the relative variances are unknown a priori this means that we should generically treat this class of estimators as weakly identified.

An alternative estimator which has improved performance is discussed in Section 17.42.

17.41 Dynamic Panels with Predetermined Regressors

The assumption that regressors are strictly exogenous is restrictive. A less restrictive assumption is that the regressors are predetermined. Dynamic panel methods can be modified to handle predetermined regressors by using their lags as instruments.

Definition 17.2 The regressor Xit is predetermined for the error εit if

E[Xi,tsεit]=0

for all s0.

The difference between strictly exogenous and predetermined regressors is that for the former (17.98) holds for all s not just s0. One way of interpreting a regression model with predetermined regressors is that the model is a projection on the complete past history of the regressors.

Under (17.98), leads of Xit can be correlated with εit, that is E[Xit+sεit]0 for s1, or equivalently Xit can be correlated with lags of εij, that is E[Xitεits]0 for s1. This means that Xit can respond dynamically to past values of Yit, as in, for example, an unrestricted vector autoregression.

Consider the differenced equation (17.87)

ΔYit=α1ΔYi,t1+α2ΔYi,t2++αpΔYi,tp+ΔXitβ+Δεit.

When the regressors are predetermined but not strictly exogenous, Xit and εit are uncorrelated but ΔXit and Δεit are correlated. To see this,

E[ΔXitΔεit]=E[Xitεit]E[Xi,t1εit]E[Xitεi,t1]+E[Xi,t1εi,t1]=E[Xitεi,t1]0.

This means that if we treat ΔXit as exogenous the coefficient estimates will be biased.

To solve the correlation problem we can use instruments for ΔXit. A valid instrument is Xi,t1 because it is generally correlated with ΔXit yet uncorrelated with Δεit. Indeed, for any s1

E[Xi,tsΔεit]=E[Xi,tsεit]E[Xi,tsεi,t1]=0.

Consequently, Arellano and Bond (1991) recommend the instrument set (Xi1,Xi2,,Xit1). When the number of time periods is large it is advised to limit the number of instrument lags to avoid the many weak instruments problem. Algebraically, GMM estimation is the same as the estimators described in Section 17.39, except that the instrument matrix (17.89) is modified to

To understand how the model is identified we examine the reduced form equation for the regressor. For t=p+2 and using the first lag as an instrument the reduced form is

ΔXit=γ1Yi,t2+Γ2Xi,t1+ζit.

The model is identified if Γ2 is full rank. This is valid (in general) when Xit is stationary. Identification fails, however, when Xit has a unit root. This indicates that the model will be weakly identified when the predetermined regressors are highly persistent.

The method generalizes to handle multiple lags of the predetermined regressors. To see this, write the model explicitly as

Yit=α1Yi,t1++αpYi,tp+Xitβ1++Xi,tqβq+ui+εit.

In first differences the model is

ΔYit=α1ΔYi,t1++αpΔYi,tp+ΔXitβ1++ΔXi,tqβq+Δεit.

A sufficient set of instruments for the regressors are (Xit1,ΔXi,t1,,ΔXi,tq) or equivalently (Xi,t1,Xi,t2,,Xi,tq1).

In many cases it is more reasonable to assume that Xit1 is predetermined but not Xit, because Xit and εit may be endogenous. This, for example, is the standard assumption in vector autoregressions. In this case the estimation method is modified to use the instruments (Xi,t2,Xi,t3,,Xi,tq1). While this weakens the exogeneity assumption it also weakens the instrument set as now the reduced form uses the second lag Xi,t2 to predict ΔXit.

The advantage obtained by treating a regressor as predetermined (rather than strictly exogenous) is that it is a substantial relaxation of the dynamic assumptions. Otherwise the parameter estimates will be inconsistent due to endogeneity.

The major disadvantage of treating a regressor as predetermined is that it substantially reduces the strength of identification especially when the predetermined regressors are highly persistent.

In Stata the xtabond command by default treats independent regressors as strictly exogenous. To treat the regressors as predetermined use the option pre. By default all regressor lags are used as instruments, but the number can be limited if specified.

17.42 Blundell-Bond Estimator

Arellano and Bover (1995) and Blundell and Bond (1998) introduced a set of orthogonality conditions which reduce the weak instrument problem discussed in the Section 17.40 and improve performance in finite samples.

Consider the levels AR(1) model with no regressors (17.82). Recall, least squares (pooled) regression is inconsistent because the regressor Yi,t1 is correlated with the error ui. This raises the question: Is there an instrument Zit which solves this problem in the sense that Zit is correlated with Yi,t1 yet uncorrelated with uit+εit ? Blundell-Bond propose the instrument ΔYi,t1. Clearly, ΔYi,t1 and Yi,t1 are correlated so ΔYi,t1 satisfies the relevance condition. Also, ΔYi,t1 is uncorrelated with the idiosyncratic error εit when the latter is serially uncorrelated. Thus the key to the Blundell-Bond instrument is whether or not

E[ΔYit1ui]=0.

Blundell and Bond (1998) show that a sufficient condition for (17.100) is

E[(Yi1ui1α)ui]=0.

Recall that ui/(1α) is the conditional mean of Yit under stationarity. Condition (17.101) states that the deviation of the initial condition Yi1 from this conditional mean is uncorrelated with the individual effect ui. Condition (17.101) is implied by stationarity but is somewhat weaker.

To see that (17.101) implies (17.100), by applying recursion to (17.87) we find that

ΔYi,t1=αt3ΔYi2+j=0t3αjΔεi,t1j.

Also,

ΔYi2=(α1)Yi1+ui+εi2=(α1)(Yi1ui1α)+εi2.

Hence

E[ΔYi,t1ui]=E[(αt3(α1)(Yi1ui1α)+αt3εi2+j=0t3αjΔεi,t1j)ui]=αt3(α1)E[(Yi1ui1α)ui]=0

under (17.101), as claimed.

Now consider the full model (17.81) with predetermined regressors. Consider the assumption that the regressors have constant correlation with the individual effect

E[Xitui]=E[Xisui]

for all s. This implies

E[ΔXitui]=0

which means that the differenced predetermined regressors ΔXit can also be used as instruments for the level equation.

Using (17.100) and (17.102) Blundell and Bond propose the following moment conditions for GMM estimation

E[ΔYi,t1(Yitα1Yi,t1αpYi,tpXitβ)]=0E[ΔXi,t(Yitα1Yi,t1αpYi,tpXitβ)]=0

for t=p+2,,T. Notice that these are for the levels (undifferenced) equation while the Arellano-Bond (17.90) moments are for the differenced equation (17.87). We can write (17.103)-(17.104) in vector notation if we set Z2i=diag(ΔYi2,,ΔYiT1,ΔXi3,,ΔXiT). Then (17.103)-(17.104) equals

E[Z2i(YiXiθ)]=0.

Blundell and Bond proposed combining the Arellano-Bond moments with the levels moments. This can be done by stacking the moment conditions (17.90) and (17.105). Recall from Section 17.39 the variables ΔYi,ΔXi, and Zi. Define the stacked variables Yi=(ΔYi,Yi),Xi=(ΔXi,Xi) and Zi= diag(Zi,Z2i). The stacked moment conditions are

E[Zi(YiXiθ)]=0.

The Blundell-Bond estimator is found by applying GMM to this equation. They call this a systems GMM estimator. Let Y,X, and Z denote Yi,Xi, and Zi stacked into matrices. Define H=diag(H,IT2) where H is from (17.31) and set

Ω^1=i=1NZiZZi.

The Blundell-Bond one-step GMM estimator is

θ^1=(XZΩ^11ZX)1(XZΩ^11ZY).

The systems residuals are ε^i=YiXiθ^1. A robust covariance matrix estimator is

V^1=(XZΩ^11ZX)1(XZΩ^11ZΩ^2ZΩ^11ZX)(XZΩ^11ZX)1

where

Ω^2=i=1NZiε^iε^iZi.

The Blundell-Bond two-step GMM estimator is

θ^2=(XZΩ^21ZX)1(XZ1ZY).

The two-step systems residuals are ε^i=YiXiθ^2. A robust covariance matrix estimator is

V^2=(XZΩ^21ZX)1(XZΩ^21ZΩ^3ZΩ^21ZX)(XZ21ZX)1

where

Ω^3=i=1NZiε^iε^iZi.

Asymptotically, V^2 is equivalent to

V~2=(XZΩ^21ZX)1.

The GMM estimator can be iterated until convergence to produce an iterated GMM estimator.

Simulation experiments reported in Blundell and Bond (1998) indicate that their systems GMM estimator performs substantially better than the Arellano-Bond estimator, especially when α is close to one or the variance ratio σu2/σε2 is large. The explanation is that the orthogonality condition (17.103) does not suffer the weak instrument problem in these cases.

The advantage of the Blundell-Bond estimator is that the added orthogonality condition (17.103) greatly improves performance relative to the Arellano-Bond estimator when the latter is weakly identified. A disadvantage of the Blundell-Bond estimator is that their orthogonality condition is justified by a stationarity condition (17.101) and violation of the latter may induce estimation bias.

The advantages and disadvantages of the one-step versus two-step Blundell-Bond estimators are the same as described for the Arellano-Bond estimator as described in Section 17.39. Also as described there when T is large it may be desired to limit the number of lags to use as instruments in order to avoid the many weak instruments problem.

The Blundell-Bond estimator may be obtained in Stata using either the xtdpdsys or xtdpd command. The default setting is the one-step estimator (17.106) and non-robust standard errors. For the two-step estimator and robust standard errors use the twostep vce (robust) options. Stata standard errors are Windmeijer’s (2005) finite-sample correction to the asymptotic estimate (17.110). The robust covariance matrix estimator (17.109) nor the iterated GMM estimator are implemented.

17.43 Forward Orthogonal Transformation

Arellano and Bover (1995) proposed an alternative transformation which eliminates the individualspecific effect and may have advantages in dynamic panel models. The forward orthogonal transformation is

Yit=cit(Yit1Tit(Yi,t+1++YiTi))

where cit2=(Tit)/(Tit+1). This can be applied to all but the final observation (which is lost). Essentially, Yit subtracts from Yit the average of the remaining values and then rescales so that the variance is constant under the assumption of homoskedastic errors. The transformation (17.111) was originally proposed for time-series observations by Hayashi and Sims (1983).

At the level of the individual this can be written as Yi=AiYi where Ai is the (Ti1)×Ti orthogonal deviation operator

Ai=diag(Ti1Ti,,12)[11Ti11Ti11Ti11Ti11Ti1011Ti21Ti21Ti21Ti200011212000011].

Important properties of the matrix Ai are that Ai1i=0 (so it eliminates individual effects), AiAi=Mi, and AiAi=ITi1. These can be verified by direct multiplication.

Applying the transformation Ai to (17.81) we obtain

Yit=α1Yi,t1++αpYi,tp+Xitβ+εit.

for t=p+1,,T1. This is equivalent to first differencing (17.87) when T=3 but differs for T>3.

What is special about the transformed equation (17.112) is that under the assumption that εit are serially uncorrelated and homoskedastic the error vector εi has variance σε2AiAi=σε2ITi1. This means that εi has the same covariance structure as εi. Thus the orthogonal transformation operator eliminates the fixed effect while preserving the covariance structure. This is in contrast to (17.87) which has serially correlated errors Δεit.

The transformed error εit is a function of εit,εit+1,,εiT. Thus valid instruments are Yit1,Yit2,. Using the instrument matrix Zi from (17.89) in the case of strictly exogenous regressors or (17.99) with predetermined regressors the moment conditions can be written using matrix notation as

E[Zi(YiXiθ)]=0.

Define the × covariance matrix

Ω=E[ZiεiεiZi]

If the errors εit are conditionally homoskedastic then Ω=E[ZiZi]σε2. Thus an asymptotically efficient GMM estimator is 2SLS applied to the orthogonalized equation using Zi as an instrument. In matrix notation,

θ^1=(XZ(ZZ)1ZX)1Y

This is the one-step GMM estimator.

Given the residuals ε^i=YiXiθ^1 the two-step GMM estimator which is robust to heteroskedasticity and arbitrary serial correlation is

θ^2=(XZΩ^21ZX)1(XZΩ^21ZY)

where

Ω^2=i=1NZiε^iε^iZi.

Standard errors for θ^1 and θ^2 can be obtained using cluster-robust methods.

Forward orthogonalization may have advantages over first differencing. First, the equation errors in (17.112) have a scalar covariance structure under i.i.d. idiosyncratic errors which is expected to improve estimation precision. It also implies that the one-step estimator is 2SLS rather than GMM. Second, while there has not been a formal analysis of the weak instrument properties of the estimators after forward orthogonalization it appears that if T>p+2 the method is less affected by weak instruments than first differencing. The disadvantages of forward orthogonalization are that it treats early observations asymmetrically from late observations, it is less thoroughly studied than first differencing, and is not available with several popular estimation methods.

The Stata command xtdpd includes forward orthogonalization as an option but not when levels (Blundell-Bond) instruments are included or if there are gaps in the data. An alternative is the downloadable Stata package x tabond2.

17.44 Empirical Illustration

We illustrate the dynamic panel methods with the investment model (17.3). Estimates from two models are presented in Table 17.3. Both are estimated by Blundell-Bond two-step GMM with lags 2 through 6 as instruments, a cluster-robust weight matrix, and clustered standard errors.

The first column presents estimates of an AR(2) model. The estimates show that the series has a moderate amount of positive serial correlation but appears to be well modeled as an AR(1) as the AR(2) coefficient is close to zero. This pattern of serial correlation is consistent with the presence of investment projects which span two years.

The second column presents estimates of the dynamic version of the investment regression (17.3) excluding the trading indicator. Two lags are included of the dependent variable and each regressor. The regressors are treated as predetermined in contrast to the fixed effects regressions which treated the regressors as strictly exogenous. The regressors are not contemporaneous with the dependent variable but lagged one and two periods. This is done so that they are valid predetermined variables. Contemporaneous variables are likely endogenous so should not be treated as predetermined.

The estimates in the second column of Table 17.3 complement the earlier results. The evidence shows that investment has a moderate degree of serial dependence, is positively related to the first lag of Q, and is negatively related to lagged debt. Investment appears to be positively related to change in cash flow, rather than the level. Thus an increase in cash flow in year t1 leads to investment in year t. Table 17.3: Estimates of Dynamic Investment Equation

AR(2) AR(2) with Regressors
Iit1 0.3191 0.2519
(0.0172) (0.0220)
Iit2 0.0309 0.0137
(0.0112) (0.0125)
Qit1 0.0018
(0.0007)
Qit2 0.0000
(0.0003)
Dit1 0.0154
(0.0058)
Dit2 0.0043
(0.0054)
CFit1 0.0400
(0.0091)
CFit2 0.0290
(0.0051)

Two-step GMM estimates. Cluster-robust standard errors in parenthesis.

All regressions include time effects. GMM instruments include lags 2 through 6.

17.45 Exercises

17.46 Exercise 17.1

  1. Show (17.11) and (17.12).

  2. Show (17.13).

Exercise 17.2 Is E[εitXit]=0 sufficient for β^fe to be unbiased for β ? Explain why or why not.

Exercise 17.3 Show that var[X˙it]var[Xit]

Exercise 17.4 Show (17.24).

Exercise 17.5 Show (17.28).

Exercise 17.6 Show that when T=2 the differenced estimator equals the fixed effects estimator.

Exercise 17.7 In Section 17.14 it is described how to estimate the individual-effect variance σu2 using the between residuals. Develop an alternative estimator of σu2 only using the fixed effects error variance σ^ε2 and the levels error variance σ^e2=n1i=1NtSie^it2 where e^it=YitXitβ^fe  are computed from the levels variables.

Exercise 17.8 Verify that σ^ε2 defined in (17.37) is unbiased for σε2 under (17.18), (17.25) and (17.26). Exercise 17.9 Develop a version of Theorem 17.2 for the differenced estimator β^Δ. Can you weaken Assumption 17.2.3? State an appropriate version which is sufficient for asymptotic normality.

Exercise 17.10 Show (17.57).

17.47 Exercise 17.11

  1. For σ^i2 defined in (17.59) show E[σ^i2Xi]=σ¯i2

  2. For V~fe defined in (17.58) show E[V~feX]=Vfe.

17.48 Exercise 17.12

  1. Show (17.61).\
  2. Show (17.62).\
  3. For V~fe defined in (17.60) show E[V~feX]=Vfe.

Exercise 17.13 Take the fixed effects model Yit=Xitβ1+Xit2β2+ui+εit. A researcher estimates the model by first obtaining the within transformed Y˙it and X˙it and then regressing Y˙it on X˙it and X˙it2. Is the correct estimation method? If not, describe the correct fixed effects estimator.

Exercise 17.14 In Section 17.33 verify that in the just-identified case the 2SLS estimator β^2 sls  simplifies as claimed: β^1 and β^2 are the fixed effects estimator. γ^1 and γ^2 equal the 2SLS estimator from a regression of u^ on Z1 and Z2 using X¯1 as an instrument for Z2.

Exercise 17.15 In this exercise you will replicate and extend the empirical work reported in Arellano and Bond (1991) and Blundell and Bond (1998). Arellano-Bond gathered a dataset of 1031 observations from an unbalanced panel of 140 U.K. companies for 1976-1984 and is in the datafile AB1991 on the textbook webpage. The variables we will be using are log employment (N), log real wages (W), and log capital (K). See the description file for definitions.

  1. Estimate the panel AR(1) Kit=αKit1+ui+vt+εit using Arellano-Bond one-step GMM with clustered standard errors. Note that the model includes year fixed effects.

  2. Re-estimate using Blundell-Bond one-step GMM with clustered standard errors.

  3. Explain the difference in the estimates.

Exercise 17.16 This exercise uses the same dataset as the previous question. Blundell and Bond (1998) estimated a dynamic panel regression of log employment N on log real wages W and log capital K. The following specification 1 used the Arellano-Bond one-step estimator, treating Wi,t1 and Ki,t1 as predetermined.

This equation also included year dummies and the standard errors are clustered.

1 Blundell and Bond (1998), Table 4, column 3. (a) Estimate (17.114) using the Arellano-Bond one-step estimator treating Wit and Kit as strictly exogenous.

  1. Estimate (17.114) treating Wi,t1 and Ki,t1 as predetermind to verify the results in (17.114). What is the difference between the estimates treating the regressors as strictly exogenous versus predetermined?

  2. Estimate the equation using the Blundell-Bond one-step systems GMM estimator.

  3. Interpret the coefficient estimates viewing (17.114) as a firm-level labor demand equation.

  4. Describe the impact on the standard errors of the Blundell-Bond estimates in part (c) if you forget to use clustering. (You do not have to list all the standard errors, but describe the magnitude of the impact.)

Exercise 17.17 Use the datafile Invest 1993 on the textbook webpage. You will be estimating the panel AR(1) Dit=αDi,t1+ui+εit for D= debt/ assets (this is debta in the datafile). See the description file for definitions.

  1. Estimate the model using Arellano-Bond twostep GMM with clustered standard errors.

  2. Re-estimate using Blundell-Bond twostep GMM.

  3. Experiment with your results, trying twostep versus onestep, AR(1) versus AR(2), number of lags used as instruments, and classical versus robust standard errors. What makes the most difference for the coefficient estimates? For the standard errors?

Exercise 17.18 Use the datafile Invest1993 on the textbook webpage. You will be estimating the model

Dit=αDi,t1+β1Ii,t1+β2Qi,t1+β3CFi,t1+ui+εit.

The variables are debta, inva, vala, and cfa in the datafile. See the description file for definitions.

  1. Estimate the above regression using Arellano-Bond two-step GMM with clustered standard errors treating all regressors as predetermined.

  2. Re-estimate using Blundell-Bond two-step GMM treating all regressors as predetermined.

  3. Experiment with your results, trying two-step versus one-step, number of lags used as instruments, and classical versus robust standard errors. What makes the most difference for the coefficient estimates? For the standard errors?