8 Restricted Estimation
8.1 Introduction
In the linear projection model
a common task is to impose a constraint on the coefficient vector
At first glance this appears the same as the linear projection model but there is one important difference: the error
In general, a set of
where
Sometimes we will call (8.1) a constraint and sometimes a restriction. They are the same thing. Similarly sometimes we will call estimators which satisfy (8.1) constrained estimators and sometimes restricted estimators. They mean the same thing.
The constraint
a selector matrix, and
A typical reason to impose a constraint is that we believe (or have information) that the constraint is true. By imposing the constraint we hope to improve estimation efficiency. The goal is to obtain consistent estimates with reduced variance relative to the unconstrained estimator.
The questions then arise: How should we estimate the coefficient vector
8.2 Constrained Least Squares
An intuitively appealing method to estimate a constrained linear projection is to minimize the least squares criterion subject to the constraint
The constrained least squares estimator is
where
The estimator
One method to find the solution to (8.3) is the technique of Lagrange multipliers. The problem (8.3) is equivalent to finding the critical points of the Lagrangian
over
and
Premultiplying (8.6) by
where
Notice that
(See Exercise
This is a general formula for the CLS estimator. It also can be written as
The CLS residuals are
To illustrate we generated a random sample of 100 observations for the variables
Figure 8.1: Constrained Least Squares Criterion
In Stata constrained least squares is implemented using the cnsreg command.
8.3 Exclusion Restriction
While (8.8) is a general formula for CLS, in most cases the estimator can be found by applying least squares to a reparameterized equation. To illustrate let us return to the first example presented at the beginning of the chapter - a simple exclusion restriction. Recall that the unconstrained model is
the exclusion restriction is
In this setting the CLS estimator is OLS of
The CLS estimator of the entire vector
It is not immediately obvious but (8.8) and (8.13) are algebraically identical. To see this the first component of (8.8) with (8.2) is
Using (3.39) this equals
which is (8.13) as originally claimed.
8.4 Finite Sample Properties
In this section we explore some of the properties of the CLS estimator in the linear regression model
First, it is useful to write the estimator and the residuals as linear functions of the error vector. These are algebraic relationships and do not rely on the linear regression assumptions. Theorem 8.1 The CLS estimator satisfies
is symmetric and idempotent
where
For a proof see Exercise 8.6.
Given the linearity of Theorem 8.1.2 it is not hard to show that the CLS estimator is unbiased for
Theorem 8.2 In the linear regression model (8.14)-(8.15) under (8.1),
For a proof see Exercise 8.7.
We can also calculate the covariance matrix of
Theorem 8.3 In the homoskedastic linear regression model (8.14)-(8.15) with
For a proof see Exercise 8.8.
We use the
For inference we need an estimate of
where
is a biased-corrected estimator of
The estimator (8.16) has the property that it is unbiased for
We defer the remainder of the proof to Exercise 8.9.
Theorem 8.4 In the homoskedastic linear regression model (8.14)-(8.15) with
Now consider the distributional properties in the normal regression model
Similarly, from Exericise
From (8.17) and the fact that
It follows that the
a student
The relevance of this calculation is that the “degrees of freedom” for CLS regression equal
We summarize the properties of the normal regression model. Theorem 8.5 In the normal linear regression model (8.14)-(8.15) with constraint (8.1),
An interesting relationship is that in the homoskedastic regression model
This means that
A second corollary is
This also shows that the difference between the CLS and OLS variances matrices equals
the final equality meaning positive semi-definite. It follows that
The relationship (8.18) is rather interesting and will appear again. The expression says that the variance of the difference between the estimators is equal to the difference between the variances. This is rather special. It occurs generically when we are comparing an efficient and an inefficient estimator. We call (8.18) the Hausman Equality as it was first pointed out in econometrics by Hausman (1978).
8.5 Minimum Distance
The previous section explored the finite sample distribution theory under the assumptions of the linear regression model, homoskedastic regression model, and normal regression model. We now return to the general projection model where we do not impose linearity, homoskedasticity, nor normality. We are interested in the question: Can we do better than CLS in this setting?
A minimum distance estimator tries to find a parameter value satisfying the constraint which is as close as possible to the unconstrained estimator. Let
This is a (squared) weighted Euclidean distance between
The CLS estimator is the special case when
To see the equality of CLS and minimum distance rewrite the least squares criterion as follows. Substitute the unconstrained least squares fitted equation
where the third equality uses the fact that
We can solve for
The solution to the pair of first order conditions is
(See Exercise 8.10.) Comparing (8.23) with (8.9) we can see that
An obvious question is which weight matrix
8.6 Asymptotic Distribution
We first show that the class of minimum distance estimators are consistent for the population parameters when the constraints are valid.
Assumption 8.1
Theorem 8.6 Consistency Under Assumptions 7.1, 8.1, and 8.2,
For a proof see Exercise 8.11.
Theorem
Similarly, the constrained estimators are asymptotically normally distributed.
Theorem 8.7 Asymptotic Normality Under Assumptions 7.2, 8.1, and 8.2,
as
and
For a proof see Exercise 8.12.
Theorem
Theorem 8.8 Asymptotic Distribution of CLS Estimator Under Assumptions
where
For a proof see Exercise 8.13.
8.7 Variance Estimation and Standard Errors
Earlier we introduced the covariance matrix estimator under the assumption of conditional homoskedasticity. We now introduce an estimator which does not impose homoskedasticity.
The asymptotic covariance matrix
Notice that we have used an adjusted degrees of freedom. This is an
and that for
We can calculate standard errors for any linear combination
8.8 Efficient Minimum Distance Estimator
Theorem
The asymptotic distribution of (8.25) can be deduced from Theorem 8.7. (See Exercises
Theorem 8.9 Efficient Minimum Distance Estimator Under Assumptions
as
Since
the estimator (8.25) has lower asymptotic variance than the unrestricted estimator. Furthermore, for any
so (8.25) is asymptotically efficient in the class of minimum distance estimators.
Theorem
The fact that CLS is generally inefficient is counter-intuitive and requires some reflection. Standard intuition suggests to apply the same estimation method (least squares) to the unconstrained and constrained models and this is the common empirical practice. But Theorem
Inequality (8.27) shows that the efficient minimum distance estimator
8.9 Exclusion Restriction Revisited
We return to the example of estimation with a simple exclusion restriction. The model is
with the exclusion restriction
The second estimator of
The third estimator of
where we have partitioned
From Theorem
See Exercise
In general the three estimators are different and they have different asymptotic variances. It is instructive to compare the variances to assess whether or not the constrained estimator is more efficient than the unconstrained estimator.
First, assume conditional homoskedasticity. In this case the two covariance matrices simplify to
This means that under conditional homoskedasticity
However, in the general case of conditional heteroskedasticity this ranking is not guaranteed. In fact what is really amazing is that the variance ranking can be reversed. The CLS estimator can have a larger asymptotic variance than the unconstrained least squares estimator.
To see this let’s use the simple heteroskedastic example from Section 7.4. In that example,
Thus the CLS estimator
What we have found is that when the estimation method is least squares, deleting the irrelevant variable
It turns out that a more refined answer is appropriate. Constrained estimation is desirable but not necessarily CLS. While least squares is asymptotically efficient for estimation of the unconstrained projection model it is not an efficient estimator of the constrained projection model.
8.10 Variance and Standard Error Estimation
We have discussed covariance matrix estimation for CLS but not yet for the EMD estimator.
The asymptotic covariance matrix (8.26) may be estimated by replacing
Following the formula for CLS we recommend an adjusted degrees of freedom. Given
A standard error for
8.11 Hausman Equality
Form (8.25) we have
It follows that the asymptotic variances of the estimators satisfy the relationship
We call (8.37) the Hausman Equality: the asymptotic variance of the difference between an efficient and another estimator is the difference in the asymptotic variances.
8.12 Example: Mankiw, Romer and Weil (1992)
We illustrate the methods by replicating some of the estimates reported in a well-known paper by Mankiw, Romer, and Weil (1992). The paper investigates the implications of the Solow growth model using cross-country regressions. A key equation in their paper regresses the change between 1960 and 1985 in
Intercept | |||
Standard errors are heteroskedasticity-consistent
GDP, (3) the log of the sum of the population growth rate
The data is available on the textbook webpage in the file MRW1992.
The sample is 98 non-oil-producing countries and the data was reported in the published paper. As
The authors show that in the Solow model the
We now present Stata, R and MATLAB code which implements these estimates.
You may notice that the Stata code has a section which uses the Mata matrix programming language. This is used because Stata does not implement the efficient minimum distance estimator, so needs to be separately programmed. As illustrated here, the Mata language allows a Stata user to implement methods using commands which are quite similar to MATLAB.
8.13 Misspecification
What are the consequences for a constrained estimator
where
This situation is a generalization of the analysis of “omitted variable bias” from Section
One answer is to apply formula (8.23) to find that
The second term,
However, we can say more.
For example, we can describe some characteristics of the approximating projections. The CLS estimator projection coefficient has the representation
the best linear predictor subject to the constraint (8.1). The minimum distance estimator converges in probability to
where
We can also show that
(Note that (8.38) and (8.39) are different!) Then
In particular
This means that even when the constraint (8.1) is misspecified the conventional covariance matrix estimator (8.35) and standard errors (8.36) are appropriate measures of the sampling variance though the distributions are centered at the pseudo-true values (projections)
An alternative approach to the asymptotic distribution theory under misspecification uses the concept of local alternatives. It is a technical device which might seem a bit artificial but it is a powerful method to derive useful distributional approximations in a wide variety of contexts. The idea is to index the true coefficient
for some
The asymptotic theory is derived as
Since
There is no difference under fixed (classical) or local asymptotics since the right-hand-side is independent of the coefficient
A difference arises for the constrained estimator. Using (8.41),
and
It follows that
The first term is asymptotically normal (from 8.42)). The second term converges in probability to a constant. This is because the
Consequently we find that the asymptotic distribution equals
where
The asymptotic distribution (8.43) is an approximation of the sampling distribution of the restricted estimator under misspecification. The distribution (8.43) contains an asymptotic bias component
8.14 Nonlinear Constraints
In some cases it is desirable to impose nonlinear constraints on the parameter vector
where
The constrained least squares and minimum distance estimators of
where
or
Computationally there is no general closed-form solution so they must be found numerically. Algorithms to numerically solve (8.45) and (8.46) are known as constrained optimization methods and are available in programming languages including MATLAB and R. See Chapter 12 of Probability and Statistics for Economists.
Assumption 8.3
. is continuously differentiable at the true . , where .
The asymptotic distribution is a simple generalization of the case of a linear constraint but the proof is more delicate. Theorem 8.10 Under Assumptions 7.2, 8.2, and 8.3, for
as
The asymptotic covariance matrix for the efficient minimum distance estimator can be estimated by
where
Standard errors for the elements of
8.15 Inequality Restrictions
Inequality constraints on the parameter vector
for some function
The constrained least squares and minimum distance estimators can be written as
and
Except in special cases the constrained estimators do not have simple algebraic solutions. An important exception is when there is a single non-negativity constraint, e.g.
The computation problems (8.50) and (8.51) are examples of quadratic programming. Quick computer algorithms are available in programming languages including MATLAB and R.
Inference on inequality-constrained estimators is unfortunately quite challenging. The conventional asymptotic theory gives rise to the following dichotomy. If the true parameter satisfies the strict inequality
8.16 Technical Proofs*
Proof of Theorem 8.9, equation (8.28) Let
and
Thus
Since
Proof of Theorem 8.10 We show the result for the minimum distance estimator
For each element
Let
Since
Since
The first-order condition for (8.47) is
Thus
From Theorem
8.17 Exercises
Exercise 8.1 In the model
Exercise 8.2 In the model
Exercise 8.3 In the model
Exercise 8.4 In the linear projection model
Find the CLS estimator of
under the restriction .Find an expression for the efficient minimum distance estimator of
under the restriction .
Exercise 8.5 Verify that for
Exercise 8.6 Prove Theorem 8.1.
Exercise 8.7 Prove Theorem 8.2, that is,
Exercise 8.9 Prove Theorem 8.4. That is, show
Exercise 8.10 Verify (8.22), (8.23), and that the minimum distance estimator
Exercise 8.11 Prove Theorem 8.6.
Exercise 8.12 Prove Theorem 8.7.
Exercise 8.13 Prove Theorem 8.8. (Hint: Use that CLS is a special case of Theorem 8.7.)
Exercise 8.14 Verify that (8.26) is
Exercise 8.15 Prove (8.27). Hint: Use (8.26).
Exercise 8.16 Verify (8.29), (8.30) and (8.31).
Exercise 8.17 Verify (8.32), (8.33), and (8.34).
Exercise 8.18 Suppose you have two independent samples each with
Find the estimator
of .Find the asymptotic distribution of
.How would you approach the problem if the sample sizes are different, say
and ?
Exercise 8.19 Use the cps09mar dataset and the subsample of white male Hispanics.
- Estimate the regression
where married
Estimate the equation by CLS imposing the constraints
and . Report the estimates and standard errors.Estimate the equation using efficient minimum distance imposing the same constraints. Report the estimates and standard errors.
Under what constraint on the coefficients is the wage equation non-decreasing in experience for experience up to 50 ?
Estimate the equation imposing
, and the inequality from part (d). Exercise 8.20 Take the model
with i.i.d. observations
How should we interpret the function
given the projection assumption? How should we interpret ? (Briefly)Describe an estimator
of .Find the asymptotic distribution of
as .Show how to construct an asymptotic 95% confidence interval for
(for a single ).Assume
. Describe how to estimate imposing the constraint that is concave.Assume
. Describe how to estimate imposing the constraint that is increasing on the region
Exercise 8.21 Take the linear model with restrictions
the unconstrained least squares estimator the constrained least squares estimator the constrained efficient minimum distance estimator
For the three estimator define the residuals
As
is the most efficient estimator and the least, do you expect in large samples?Consider the statistic
Find the asymptotic distribution for
- Does the result of the previous question simplify when the error
is homoskedastic?
Exercise 8.22 Take the linear model
Find an explicit expression for the CLS estimator
of under the restriction. Your answer should be specific to the restriction. It should not be a generic formula for an abstract general restriction.Derive the asymptotic distribution of
under the assumption that the restriction is true.