4 Least Squares Regression
4.1 Introduction
In this chapter we investigate some finite-sample properties of the least squares estimator in the linear regression model. In particular we calculate its finite-sample expectation and covariance matrix and propose standard errors for the coefficient estimators.
4.2 Random Sampling
Assumption
The simplest context is when the observations are mutually independent in which case we say that they are independent and identically distributed or i.i.d. It is also common to describe i.i.d. observations as a random sample. Traditionally, random sampling has been the default assumption in crosssection (e.g. survey) contexts. It is quite convenient as i.i.d. sampling leads to straightforward expressions for estimation variance. The assumption seems appropriate (meaning that it should be approximately valid) when samples are small and relatively dispersed. That is, if you randomly sample 1000 people from a large country such as the United States it seems reasonable to model their responses as mutually independent.
Assumption 4.1 The random variables
For most of this chapter we will use Assumption
Assumption
This assumption may be violated if individuals in the sample are connected in some way, for example if they are neighbors, members of the same village, classmates at a school, or even firms within a specific industry. In this case it seems plausible that decisions may be inter-connected and thus mutually dependent rather than independent. Allowing for such interactions complicates inference and requires specialized treatment. A currently popular approach which allows for mutual dependence is known as clustered dependence which assumes that that observations are grouped into “clusters” (for example, schools). We will discuss clustering in more detail in Section 4.21.
4.3 Sample Mean
We start with the simplest setting of the intercept-only model
which is equivalent to the regression model with
We now calculate the expectation and variance of the estimator
This shows that the expected value of the least squares estimator (the sample mean) equals the projection coefficient (the population expectation). An estimator with the property that its expectation equals the parameter it is estimating is called unbiased.
Definition
We next calculate the variance of the estimator
Then
The second-to-last inequality is because
We have shown that
4.4 Linear Regression Model
We now consider the linear regression model. Throughout this chapter we maintain the following.
Assumption 4.2 Linear Regression Model The variables
The variables have finite second moments
and an invertible design matrix
We will consider both the general case of heteroskedastic regression where the conditional variance
Assumption 4.3 Homoskedastic Linear Regression Model In addition to Assumption
is independent of
4.5 Expectation of Least Squares Estimator
In this section we show that the OLS estimator is unbiased in the linear regression model. This calculation can be done using either summation notation or matrix notation. We will use both.
First take summation notation. Observe that under (4.1)-(4.2)
The first equality states that the conditional expectation of
Now let’s show the same result using matrix notation. (4.4) implies
Similarly
Using
At the risk of belaboring the derivation, another way to calculate the same result is as follows. Insert
This is a useful linear decomposition of the estimator
Regardless of the method we have shown that
Theorem 4.1 Expectation of Least Squares Estimator In the linear regression model (Assumption 4.2) with i.i.d. sampling (Assumption 4.1)
Equation (4.7) says that the estimator
It is worth mentioning that Theorem 4.1, and all finite sample results in this chapter, make the implicit assumption that
4.6 Variance of Least Squares Estimator
In this section we calculate the conditional variance of the OLS estimator.
For any
and for any pair
We define
The conditional covariance matrix of the
The
while the
where the first equality uses independence of the observations (Assumption 4.1) and the second is (4.2). Thus
In the special case of the linear homoskedastic regression model (4.3), then
For any
In particular, we can write
It is useful to note that
a weighted version of
In the special case of the linear homoskedastic regression model,
Theorem 4.2 Variance of Least Squares Estimator In the linear regression model (Assumption 4.2) with i.i.d. sampling (Assumption 4.1)
where
4.7 Unconditional Moments
The previous sections derived the form of the conditional expectation and variance of the least squares estimator where we conditioned on the regressor matrix
Indeed, it is not obvious if
is well defined if
This dilemma is avoided when the regressors have continuous distributions. A clean statement was obtained by Kinal (1980) under the assumption of normal regressors and errors. Theorem 4.3 Kinal (1980)
In the linear regression model with i.i.d. sampling, if in addition
This shows that when the errors and regressors are normally distributed that the least squares estimator possesses all moments up to
The law of iterated expectations (Theorem 2.1) combined with Theorems
Hence
Furthermore, if
the second equality because
In both cases the expectation cannot pass through the matrix inverse because this is a nonlinear function. Thus there is not a simple expression for the unconditional variance, other than stating that is it the expectation of the conditional variance.
4.8 Gauss-Markov Theorem
The Gauss-Markov Theorem is one of the most celebrated results in econometric theory. It provides a classical justification for the least squares estimator, showing that it is lowest variance among unbiased estimators.
Write the homoskedastic linear regression model in vector format as
In this model we know that the least squares estimator is unbiased for
The following version of the theorem is due to B. E. Hansen (2021).
Theorem 4.4 Gauss-Markov Take the homoskedastic linear regression model (4.11)-(4.13). If
Theorem
This earliest version of Theorem
Their versions of the Theorem restricted attention to linear estimators of
The derivation of the Gauss-Markov Theorem under the restriction to linear estimators is straightforward, so we now provide this demonstration. For
the second equality because
the last equality using the homoskedasticity assumption (4.13). To establish the Theorem we need to show that for any such matrix
Set
The final inequality states that the matrix
The above derivation imposed the restriction that the estimator
Since
Under the assumptions,
because
To illustrate, take the case
Figure 4.1: Original and Auxiliary Density
Let
This shows that
In Figure 4.1, the means of the two densities are indicated by the arrows to the
The parametric family
The likelihood score of the auxiliary density function for an observation, using the fact that
Therefore the information matrix is
By assumption,
This is the variance lower bound, completing the proof of Theorem 4.4.
The above argument is rather tricky. At its core is the observation that the model
4.9 Generalized Least Squares
Take the linear regression model in matrix format
Consider a generalized situation where the observation errors are possibly correlated and/or heteroskedastic. Specifically, suppose that
for some
Under these assumptions, by arguments similar to the previous sections we can calculate the expectation and variance of the OLS estimator:
(see Exercise 4.5).
Aitken (1935) established a generalization of the Gauss-Markov Theorem. The following statement is due to B. E. Hansen (2021). Theorem 4.5 Take the linear regression model (4.17)-(4.19). If
We defer the proof to Section 4.24. See also Exercise 4.6.
Theorem
When
This is called the Generalized Least Squares (GLS) estimator of
You can calculate that
This shows that the GLS estimator is unbiased and has a covariance matrix which equals the lower bound from Theorem 4.5. This shows that the lower bound is sharp. GLS is thus efficient in the class of unbiased estimators.
In the linear regression model with independent observations and known conditional variances, so that
The assumption
In most settings the matrix
4.10 Residuals
What are some properties of the residuals
Recall from (3.24) that we can write the residuals in vector notation as
and
where
We can simplify this expression under the assumption of conditional homoskedasticity
In this case (4.25) simplifies to
In particular, for a single observation
As this variance is a function of
Similarly, recall from (3.45) that the prediction errors
and
which simplifies under homoskedasticity to
The variance of the
A residual with constant conditional variance can be obtained by rescaling. The standardized residuals are
and in vector notation
From the above calculations, under homoskedasticity,
and
and thus these standardized residuals have the same bias and variance as the original errors when the latter are homoskedastic.
4.11 Estimation of Error Variance
The error variance
In the linear regression model we can calculate the expectation of
Then
The final equality holds because the trace is the sum of the diagonal elements of
Adding the assumption of conditional homoskedasticity
the final equality by (3.22). This calculation shows that
Another way to see this is to use (4.27). Note that
the last equality using Theorem 3.6.
Since the bias takes a scale form a classic method to obtain an unbiased estimator is by rescaling. Define
By the above calculation
You can show (see Exercise 4.9) that
and thus
When
4.12 Mean-Square Forecast Error
One use of an estimated regression is to predict out-of-sample. Consider an out-of-sample realization
The first term in (4.33) is
where we use the fact that
Under conditional homoskedasticity this simplifies to
A simple estimator for the MSFE is obtained by averaging the squared prediction errors (3.46)
where
By a similar calculation as in (4.34) we find
This is the MSFE based on a sample of size
Theorem 4.6 MSFE In the linear regression model (Assumption 4.2) and i.i.d. sampling (Assumption 4.1)
where
4.13 Covariance Matrix Estimation Under Homoskedasticity
For inference we need an estimator of the covariance matrix
Under homoskedasticity the covariance matrix takes the simple form
which is known up to the scale
Since
This was the dominant covariance matrix estimator in applied econometrics for many years and is still the default method in most regression packages. For example, Stata uses the covariance matrix estimator (4.35) by default in linear regression unless an alternative is specified. If the estimator (4.35) is used but the regression error is heteroskedastic it is possible for
(Notice that we use the fact that
4.14 Covariance Matrix Estimation Under Heteroskedasticity
In the previous section we showed that that the classic covariance matrix estimator can be highly biased if homoskedasticity fails. In this section we show how to construct covariance matrix estimators which do not require homoskedasticity.
Recall that the general form for the covariance matrix is
with
where
Indeed,
verifying that
The label “HC” refers to “heteroskedasticity-consistent”. The label “HC0” refers to this being the baseline heteroskedasticity-consistent covariance matrix estimator.
We know, however, that
While the scaling by
Alternatively, we could use the standardized residuals
and
The four estimators
Since
(See Exercise 4.10.) The inequality
This calculation shows that
By a similar calculation (again under homoskedasticity) we can calculate that the HC2 estimator is unbiased
(See Exercise 4.11.)
It might seem rather odd to compare the bias of heteroskedasticity-robust estimators under the assumption of homoskedasticity but it does give us a baseline for comparison.
Another interesting calculation shows that in general (that is, without assuming homoskedasticity) the HC3 estimator is biased away from zero. Indeed, using the definition of the prediction errors (3.44)
so
Note that
It follows that
This means that the HC3 estimator is conservative in the sense that it is weakly larger (in expectation) than the correct variance for any realization of
We have introduced five covariance matrix estimators, including the homoskedastic estimator
Of the four robust estimators
4.15 Standard Errors
A variance estimator such as
Definition
When
When the classical covariance matrix estimator (4.35) is used the standard error takes the simple form
As we discussed in the previous section there are multiple possible covariance matrix estimators so standard errors are not unique. It is therefore important to understand what formula and method is used by an author when studying their work. It is also important to understand that a particular standard error may be relevant under one set of model assumptions but not under another set of assumptions.
To illustrate, we return to the log wage regression (3.12) of Section 3.7. We calculate that
We also calculate that
Therefore the HC2 covariance matrix estimate is
The standard errors are the square roots of the diagonal elements of these matrices. A conventional format to write the estimated equation with standard errors is
Alternatively, standard errors could be calculated using the other formulae. We report the different standard errors in the following table.
Table 4.1: Standard Errors
Education | Intercept | |
---|---|---|
Homoskedastic (4.35) | ||
HC0 (4.36) | ||
HC1 |
||
HC2 |
||
HC3 |
The homoskedastic standard errors are noticeably different (larger in this case) than the others. The robust standard errors are reasonably close to one another though the HC3 standard errors are larger than the others.
4.16 Estimation with Sparse Dummy Variables
The heteroskedasticity-robust covariance matrix estimators can be quite imprecise in some contexts. One is in the presence of sparse dummy variables - when a dummy variable only takes the value 1 or 0 for very few observations. In these contexts one component of the covariance matrix is estimated on just those few observations and will be imprecise. This is effectively hidden from the user. To see the problem, let
The number of observations for which
To simplify our analysis, we take the extreme case
In the regression model (4.45) we can calculate that the true covariance matrix of the least squares estimator for the coefficients under the simplifying assumption of conditional homoskedasticity is
In particular, the variance of the estimator for the coefficient on the dummy variable is
Essentially, the coefficient
Now let’s examine the standard HC1 covariance matrix estimator (4.37). The regression has perfect fit for the observation for which
where
In particular, the estimator for
It has expectation
The variance estimator
The fact that
Another insight is to examine the leverage values. The (single) observation with
This is an extreme leverage value.
A possible solution is to replace the biased covariance matrix estimator
It is unclear if there is a best practice to avoid this situation. Once possibility is to calculate the maximum leverage value. If it is very large calculate the standard errors using several methods to see if variation occurs.
4.17 Computation
We illustrate methods to compute standard errors for equation (3.13) extending the code of Section
Stata do File (continued)
- Homoskedastic formula (4.35):
reg wage education experience exp2 if
HC1 formula (4.37):
reg wage education experience exp2 if
formula (4.38):
reg wage education experience
HC3 formula (4.39):
reg wage education experience exp2 if (mnwf
4.18 Measures of Fit
As we described in the previous chapter a commonly reported measure of regression fit is the regression
where
However,
While
where
One problem with
In the statistical literature the MSPE
In summary, it is recommended to omit
4.19 Empirical Example
We again return to our wage equation but use a much larger sample of all individuals with at least 12 years of education. For regressors we include years of education, potential work experience, experience squared, and dummy variable indicators for the following: female, female union member, male union member, married female
Table
Table 4.2: OLS Estimates of Linear Equation for
Education | ||
Experience | ||
Experience |
||
Female | ||
Female Union Member | ||
Male Union Member | ||
Married Female | ||
Married Male | ||
Formerly Married Female | ||
Formerly Married Male | ||
Hispanic | ||
Black | ||
American Indian | ||
Asian | ||
Mixed Race | ||
Intercept | ||
Sample Size | 46,943 |
Standard errors are heteroskedasticity-consistent (Horn-Horn-Duncan formula).
As a general rule it is advisable to always report standard errors along with parameter estimates. This allows readers to assess the precision of the parameter estimates, and as we will discuss in later chapters, form confidence intervals and t-tests for individual coefficients if desired.
The results in Table
4.20 Multicollinearity
As discussed in Section 3.24, if
A related common situation is near multicollinearity which is often called “multicollinearity” for brevity. This is the situation when the regressors are highly correlated. An implication of near multicollinearity is that individual coefficient estimates will be imprecise. This is not necessarily a problem for econometric analysis if the reported standard errors are accurate. However, robust standard errors can be sensitive to large leverage values which can occur under near multicollinearity. This leads to the undesirable situation where the coefficient estimates are imprecise yet the standard errors are misleadingly small.
We can see the impact of near multicollinearity on precision in a simple homoskedastic linear regression model with two regressors
and
In this case
The correlation
What is happening is that when the regressors are highly dependent it is statistically difficult to disentangle the impact of
Many early-generation textbooks overemphasized multicollinearity. An amusing parody of these texts is Micronumerosity, Chapter
The extreme case, ‘exact micronumerosity’, arises when
Tests for the presence of micronumerosity require the judicious use of various fingers. Some researchers prefer a single finger, others use their toes, still others let their thumbs rule.
A generally reliable guide may be obtained by counting the number of observations. Most of the time in econometric analysis, when
Arthur S. Goldberger, A Course in Econometrics (1991), pp.
Arthur S. Goldberger |
---|
Art Goldberger (1930-2009) was one of the most distinguished members of the |
Department of Economics at the University of Wisconsin. His Ph.D. thesis devel- |
oped a pioneering macroeconometric forecasting model (the Klein-Goldberger |
model). Most of his remaining career focused on microeconometric issues. He |
was the leading pioneer of what has been called the Wisconsin Tradition of em- |
pirical work - a combination of formal econometric theory with a careful critical |
analysis of empirical work. Goldberger wrote a series of highly regarded and in- |
fluential graduate econometric textbooks, including Econometric Theory (1964), |
Topics in Regression Analysis (1968), and A Course in Econometrics (1991). |
4.21 Clustered Sampling
In Section
It might be easiest to understand the idea of clusters by considering a concrete example. Duflo, Dupas, and Kremer (2011) investigate the impact of tracking (assigning students based on initial test score) on educational attainment in a randomized experiment. An extract of their data set is available on the textbook webpage in the file DDK2011.
In 2005, 140 primary schools in Kenya received funding to hire an extra first grade teacher to reduce class sizes. In half of the schools (selected randomly) students were assigned to classrooms based on an initial test score (“tracking”); in the remaining schools the students were randomly assigned to classrooms. For their analysis the authors restricted attention to the 121 schools which initially had a single first-grade class.
The key regression
where TestScore
where
In clustering contexts it is convenient to double index the observations as
While it is typical to write the observations using the double index notation
and using cluster notation as
where
Using this notation we can write the sums over the observations using the double sum
The residuals are
The standard clustering assumption is that the clusters are known to the researcher and that the observations are independent across clusters.
Assumption 4.4 The clusters
In our example clusters are schools. In other common applications cluster dependence has been assumed within individual classrooms, families, villages, regions, and within larger units such as industries and states. This choice is up to the researcher though the justification will depend on the context, the nature of the data, and will reflect information and assumptions on the dependence structure across observations. The model is a linear regression under the assumption
This is the same as assuming that the individual errors are conditionally mean zero
or that the conditional expectation of
In the regression (4.46) the conditional expectation is necessarily linear and satisfies (4.50) since the
\ controls, (4.50) requires that the achievement of any student is unaffected by the individual controls (e.g. age, gender, and initial test score) of other students within the same school.
Given (4.50) we can calculate the expectation of the OLS estimator. Substituting (4.48) into (4.49) we find
The mean of
The first equality holds by linearity, the second by Assumption 4.4, and the third by (4.50).
This shows that OLS is unbiased under clustering if the conditional expectation is linear.
Theorem 4.7 In the clustered linear regression model (Assumption
Now consider the covariance matrix of
It follows that
This differs from the formula in the independent case due to the correlation between observations within clusters. The magnitude of the difference depends on the degree of correlation between observations within clusters and the number of observations within clusters. To see this, suppose that all clusters have the same number of observations
If
Arellano (1987) proposed a cluster-robust covariance matrix estimator which is an extension of the White estimator. Recall that the insight of the White covariance estimator is that the squared error
The three expressions in (4.54) give three equivalent formulae which could be used to calculate
Given the expressions (4.51)-(4.52) a natural cluster covariance matrix estimator takes the form
where
The factor
Alternative cluster-robust covariance matrix estimators can be constructed using cluster-level prediction errors such as
We then have the robust covariance matrix estimator
The label “CR” refers to “cluster-robust” and “CR3” refers to the analogous formula for the HC3 estimator.
Similarly to the heteroskedastic-robust case you can show that CR3 is a conservative estimator for
To illustrate in the context of the Kenyan schooling example we present the regression of student test scores on the school-level tracking dummy with two standard errors displayed. The first (in parenthesis) is the conventional robust standard error. The second [in square brackets] is the clustered standard error (4.55)-(4.56) where clustering is at the level of the school.
We can see that the cluster-robust standard errors are roughly three times the conventional robust standard errors. Consequently, confidence intervals for the coefficients are greatly affected by the choice.
For illustration, we list here the commands needed to produce the regression results with clustered standard errors in Stata, R, and MATLAB.
You can see that clustered standard errors are simple to calculate in Stata.
Programming clustered standard errors in
Here we see that programming clustered standard errors in MATLAB is less convenient than the other packages but still can be executed with just a few lines of code. This example uses the accumarray command which is similar to the rowsum command in
4.22 Inference with Clustered Samples
In this section we give some cautionary remarks and general advice about cluster-robust inference in econometric practice. There has been remarkably little theoretical research about the properties of cluster-robust methods - until quite recently - so these remarks may become dated rather quickly.
In many respects cluster-robust inference should be viewed similarly to heteroskedaticity-robust inference where a “cluster” in the cluster-robust case is interpreted similarly to an “observation” in the heteroskedasticity-robust case. In particular, the effective sample size should be viewed as the number of clusters, not the “sample size”
Furthermore, most cluster-robust theory (for example, the work of Chris Hansen (2007)) assumes that the clusters are homogeneous including the assumption that the cluster sizes are all identical. This turns out to be a very important simplication. When this is violated - when, for example, cluster sizes are highly heterogeneous - the regression should be viewed as roughly equivalent to the heteroskedastic case with an extremely high degree of heteroskedasticity. Cluster sums have variances which are proportional to the cluster sizes so if the latter is heterogeneous so will be the variances of the cluster sums. This also has a large effect on finite sample inference. When clusters are heterogeneous then cluster-robust inference is similar to heteroskedasticity-robust inference with highly heteroskedastic observations.
Put together, if the number of clusters
A further complication occurs when we are interested in treatment as in the tracking example given in the previous section. In many cases (including Duflo, Dupas, and Kremer (2011)) the interest is in the effect of a treatment applied at the cluster level (e.g., schools). In many cases (not, however, Duflo, Dupas, and Kremer (2011)), the number of treated clusters is small relative to the total number of clusters; in an extreme case there is just a single treated cluster. Based on the reasoning given above these applications should be interpreted as equivalent to heteroskedasticity-robust inference with a sparse dummy variable as discussed in Section 4.16. As discussed there, standard error estimates can be erroneously small. In the extreme of a single treated cluster (in the example, if only a single school was tracked) then the estimated coefficient on tracking will be very imprecisely estimated yet will have a misleadingly small cluster standard error. In general, reported standard errors will greatly understate the imprecision of parameter estimates.
4.23 At What Level to Cluster?
A practical question which arises in the context of cluster-robust inference is “At what level should we cluster?” In some examples you could cluster at a very fine level, such as families or classrooms, or at higher levels of aggregation, such as neighborhoods, schools, towns, counties, or states. What is the correct level at which to cluster? Rules of thumb have been advocated by practitioners but at present there is little formal analysis to provide useful guidance. What do we know?
First, suppose cluster dependence is ignored or imposed at too fine a level (e.g. clustering by households instead of villages). Then variance estimators will be biased as they will omit covariance terms. As correlation is typically positive, this suggests that standard errors will be too small giving rise to spurious indications of significance and precision.
Second, suppose cluster dependence is imposed at too aggregate a measure (e.g. clustering by states rather than villages). This does not cause bias. But the variance estimators will contain many extra components so the precision of the covariance matrix estimator will be poor. This means that reported standard errors will be imprecise - more random - than if clustering had been less aggregate.
These considerations show that there is a trade-off between bias and variance in the estimation of the covariance matrix by cluster-robust methods. It is not at all clear-based on current theory - what to do. I state this emphatically. We really do not know what is the “correct” level at which to do cluster-robust inference. This is a very interesting question and should certainly be explored by econometric research. One challenge is that in empirical practice many people have observed: “Clustering is important. Standard errors change a lot whether or not we cluster. Therefore we should only report clustered standard errors.” The flaw in this reasoning is that we do not know why in a specific empirical example the standard errors change under clustering. One possibility is that clustering reduces bias and thus is more accurate. The other possibility is that clustering adds sampling noise and is thus less accurate. In reality it is likely that both factors are present.
In any event a researcher should be aware of the number of clusters used in the reported calculations and should treat the number of clusters as the effective sample size for assessing inference. If the number of clusters is, say,
To illustrate the thought experiment consider the empirical example of Duflo, Dupas, and Kremer (2011). They reported standard errors clustered at the school level and the application uses 111 schools. Thus
4.24 Technical Proofs*
Proof of Theorems
Our approach is to calculate the Cramér-Rao bound for a carefully crafted parametric model. This is based on an insight of Newey (1990, Appendix B) for the simpler context of a population expectation.
Without loss of generality, assume that the true coefficient equals
Define the truncation function
Notice that it satisfies
As
Define the auxiliary joint density function
for parameters
The bounds imply that for
This implies that
We calculate that
the last equality because
Let
because
The bound (4.61) implies
This means that
The likelihood score for
The information matrix is
where the inequality is
By assumption, the estimator
where the second inequality is (4.62). Since this holds for all
This is the variance lower bound.
4.25 Exercises
Exercise 4.1 For some integer
Construct an estimator
for .Show that
is unbiased for .Calculate the variance of
, say . What assumption is needed for to be finite?Propose an estimator of
Exercise 4.2 Calculate
Exercise 4.3 Explain the difference between
Exercise 4.4 True or False. If
Exercise 4.5 Prove (4.20) and (4.21).
Exercise 4.6 Prove Theorem
Exercise 4.7 Let
Show (4.23).
Show (4.24).
Prove that
, where .Prove that
. (e) Find .Is
a reasonable estimator for ?
Exercise 4.8 Let
In which contexts would
be a good estimator?Using your intuition, in which situations do you expect
to perform better than OLS?
Exercise 4.9 Show (4.32) in the homoskedastic regression model.
Exercise 4.10 Prove (4.40).
Exercise 4.11 Show (4.41) in the homoskedastic regression model.
Exercise 4.12 Let
Exercise 4.13 Take the simple regression model
Exercise 4.14 Take a regression model
Find
using our knowledge of and . Is biased for ?Suggest an (approximate) biased-corrected estimator
using an estimator for .For
to be potentially unbiased, which estimator of is most appropriate?
Under which conditions is
Exercise 4.15 Consider an i.i.d. sample
Find
In general, are
and for correlated or uncorrelated?Find a sufficient condition so that
and for are uncorrelated.
Exercise 4.16 Take the linear homoskedastic CEF
and suppose that
Derive an equation for
as a function of . Be explicit to write the error term as a function of the structural errors and . What is the effect of this measurement error on the model (4.63)?Describe the effect of this measurement error on OLS estimation of
in the feasible regression of the observed on .Describe the effect (if any) of this measurement error on standard error calculation for
.
Exercise 4.17 Suppose that for the random variables
A friend suggests that you estimate
Investigate your friend’s suggestion. Define
. Show that is implied by (4.64).Use
to calculate . What does this tell you about the implied equation (4.65)?Can you recover either
and/or from estimation of (4.65)? Are additional assumptions required?Is this a reasonable suggestion?
Exercise 4.18 Take the model
where
Exercise 4.19 Let
Exercise 4.20 Take the model in vector notation
Assume for simplicity that
Find the (conditional) covariance matrix for
Exercise 4.21 The model is
The parameter
Exercise 4.22 An economist friend tells you that the assumption that the observations
Exercise 4.23 Take the linear regression model with
where
Exercise 4.24 Continue the empirical analysis in Exercise 3.24.
Calculate standard errors using the homoskedasticity formula and using the four covariance matrices from Section
Repeat in a second programming language. Are they identical?
Exercise 4.25 Continue the empirical analysis in Exercise 3.26. Calculate standard errors using the HC3 method. Repeat in your second programming language. Are they identical?
Exercise 4.26 Extend the empirical analysis reported in Section
Compare the two sets of standard errors. Which standard error changes the most by clustering? Which changes the least?
How does the coefficient on tracking change by inclusion of the individual controls (in comparison to the results from (4.60))?