6  A Review of Large Sample Asymptotics

6.1 Introduction

The most widely-used tool in sampling theory is large sample asymptotics. By “asymptotics” we mean approximating a finite-sample sampling distribution by taking its limit as the sample size diverges to infinity. In this chapter we provide a brief review of the main results of large sample asymptotics. It is meant as a reference, not as a teaching guide. Asymptotic theory is covered in detail in Chapters 7-9 of Probability and Statistics for Economists. If you have not previous studied asymptotic theory in detail you should study these chapters before proceeding.

6.2 Modes of Convergence

Definition 6.1 A sequence of random vectors ZnRk converges in probability to Z as n, denoted ZnpZ or alternatively plimnZn=Z, if for all δ>0

limnP[ZnZδ]=1.

We call Z the probability limit (or plim) of Zn.

The above definition treats random variables and random vectors simultaneously using the vector norm. It is useful to know that for a random vector, (6.1) holds if and only if each element in the vector converges in probability to its limit.

Definition 6.2 Let Zn be a sequence of random vectors with distributions Fn(u)=P[Znu]. We say that Zn converges in distribution to Z as n, denoted ZndZ, if for all u at which F(u)=P[Zu] is continuous, Fn(u) F(u) as n. We refer to Z and its distribution F(u) as the asymptotic distribution, large sample distribution, or limit distribution of Zn.

6.3 Weak Law of Large Numbers

Theorem 6.1 Weak Law of Large Numbers (WLLN)

If YiRk are i.i.d. and EY<, then as n,

Y¯=1ni=1nYip[Y].

The WLLN shows that the sample mean Y¯ converges in probability to the true population expectation μ. The result applies to any transformation of a random vector with a finite mean.

Theorem 6.2 If YiRk are i.i.d., h(y):RkRq, and Eh(Y)<, then μ^= 1ni=1nh(Yi)pμ=E[h(Y)] as n.

An estimator which converges in probability to the population value is called consistent.

Definition 6.3 An estimator θ^ of θ is consistent if θ^pθ as n.

6.4 Central Limit Theorem

Theorem 6.3 Multivariate Lindeberg-Lévy Central Limit Theorem (CLT). If YiRk are i.i.d. and EY2<, then as n

n(Y¯μ)dN(0,V)

where μ=E[Y] and V=E[(Yμ)(Yμ)].

The central limit theorem shows that the distribution of the sample mean is approximately normal in large samples. For some applications it may be useful to notice that Theorem 6.3 does not impose any restrictions on V other than that the elements are finite. Therefore this result allows for the possibility of singular V.

The following two generalizations allow for heterogeneous random variables. Theorem 6.4 Multivariate Lindeberg CLT. Suppose that for all n,YniRk,i= 1,,rn, are independent but not necessarily identically distributed with expectations E[Yni]=0 and variance matrices Vni=E[YniYni]. Set Vn=i=1nVni. Suppose vn2=λmin(Vn)>0 and for all ϵ>0

limn1vn2i=1rnE[Yni21{Yni2ϵvn2}]=0.

Then as n

Vn1/2i=1rnYnidN(0,Ik).

Theorem 6.5 Suppose YniRk are independent but not necessarily identically distributed with expectations E[Yni]=0 and variance matrices Vni= E[YniYni]. Suppose

1ni=1nVniV>0

and for some δ>0

supn,iEYni2+δ<.

Then as n

nY¯dN(0,V)

6.5 Continuous Mapping Theorem and Delta Method

Continuous functions are limit-preserving. There are two forms of the continuous mapping theorem, for convergence in probability and convergence in distribution.

Theorem 6.6 Continuous Mapping Theorem (CMT). Let ZnRk and g(u): RkRq. If Znp as n and g(u) is continuous at c then g(Zn)pg(c) as n

Theorem 6.7 Continuous Mapping Theorem. If ZndZ as n and g: RmRk has the set of discontinuity points Dg such that P[ZDg]=0, then g(Zn)dg(Z) as n Differentiable functions of asymptotically normal random estimators are asymptotically normal.

Theorem 6.8 Delta Method. Let μRk and g(u):RkRq. If n(μ^μ)d, where g(u) is continuously differentiable in a neighborhood of μ, then as n

n(g(μ^)g(μ))dGξ

where G(u)=ug(u) and G=G(μ). In particular, if ξN(0,V) then as n

n(g(μ^)g(μ))dN(0,GVG).

6.6 Smooth Function Model

The smooth function model is θ=g(μ) where μ=E[h(Y)] and g(μ) is smooth in a suitable sense.

The parameter θ=g(μ) is not a population moment so it does not have a direct moment estimator. Instead, it is common to use a plug-in estimator formed by replacing the unknown μ with its point estimator μ^ and then “plugging” this into the expression for θ. The first step is the sample mean μ^=n1i=1nh(Yi). The second step is the transformation θ^=g(μ^). The hat ” ” indicates that θ^ is a sample estimator of θ. The smooth function model includes a broad class of estimators including sample variances and the least squares estimator.

Theorem 6.9 If YiRm are i.i.d., h(u):RmRk,Eh(Y)<, and g(u): RkRq is continuous at μ, then θ^pθ as n.

Theorem 6.10 If YiRm are i.i.d., h(u):RmRk,Eh(Y)2<,g(u):Rk Rq, and G(u)=ug(u) is continuous in a neighborhood of μ, then as n

n(θ^θ)dN(0,Vθ)

where Vθ=GVG,V=E[(h(Y)μ)(h(Y)μ)], and G=G(μ).

Theorem 6.9 establishes the consistency of θ^ for θ and Theorem 6.10 establishes its asymptotic normality. It is instructive to compare the conditions. Consistency requires that h(Y) has a finite expectation; asymptotic normality requires that h(Y) has a finite variance. Consistency requires that g(u) be continuous; asymptotic normality requires that g(u) is continuously differentiable.

6.7 Stochastic Order Symbols

It is convenient to have simple symbols for random variables and vectors which converge in probability to zero or are stochastically bounded. In this section we introduce some of the most common notation.

Let Zn and an,n=1,2, be sequences of random variables and constants. The notation

Zn=op(1)

(“small oh-P-one”) means that Znp0 as n. We also write

Zn=op(an)

if an1Zn=op(1)

Similarly, the notation Zn=Op (1) (“big oh-P-one”) means that Zn is bounded in probability. Precisely, for any ϵ>0 there is a constant Mϵ< such that

lim supnP[|Zn|>Mϵ]ϵ.

Furthermore, we write

Zn=Op(an)

if an1Zn=Op(1).

Op(1) is weaker than op(1) in the sense that Zn=op(1) implies Zn=Op(1) but not the reverse. However, if Zn=Op(an) then Zn=op(bn) for any bn such that an/bn0.

A random sequence with a bounded moment is stochastically bounded.

Theorem 6.11 If Zn is a random vector which satisfies EZnδ=O(an) for some sequence an and δ>0, then Zn=Op(an1/δ). Similarly, EZnδ=o(an) implies Zn=op(an1/δ).

There are many simple rules for manipulating op(1) and Op(1) sequences which can be deduced from the continuous mapping theorem. For example,

op(1)+op(1)=op(1)op(1)+Op(1)=Op(1)Op(1)+Op(1)=Op(1)op(1)op(1)=op(1)op(1)Op(1)=op(1)Op(1)Op(1)=Op(1).

6.8 Convergence of Moments

We give a sufficient condition for the existence of the mean of the asymptotic distribution, define uniform integrability, provide a primitive condition for uniform integrability, and show that uniform integrability is the key condition under which E[Zn] converges to E[Z]. Theorem 6.12 If ZndZ and EZnC then EZC.

Definition 6.4 The random vector Zn is uniformly integrable as n if

limMlim supnE[Zn1{Zn>M}]=0

Theorem 6.13 If for some δ>0, EZn1+δC<, then Zn is uniformly integrable.

Theorem 6.14 If ZndZ and Zn is uniformly integrable then E[Zn]E[Z].

The following is a uniform stochastic bound.

Theorem 6.15 If |Yi|r is uniformly integrable, then as n

n1/rmax1in|Yi|p0

Equation (6.6) implies that if Y has r finite moments then the largest observation will diverge at a rate slower than n1/r. The higher the moments, the slower the rate of divergence.