These are the notes taken throughout my weekly meeting with Dr. Bekker at RuG. We're using the econometrics book written by Fumio Hayashi[1] in 2000 and mainly focused on the chapters about asymptotic theories. The book has covered almost every inch of classical econometrics and is by far the best advanced econometrics book in my mind. I'm sure I'll go back to these notes from time to time.


A probability space (概率空间) consists of: 1. a sample space \(\Omega=\{\omega\}\) (样本空间), 2. a \(\sigma\)-algebra (西格玛代数), 3. a probability measure \(P\) (概率测度). where sigma-algebra (西格玛代数), \(\mathcal{A}\), is referred as a collection of subsets of \(\Omega\) (样本空间子集的并集) if 1. \(\Omega\in\mathcal{A}\) (样本空间属于\(\mathcal{A}\),换而言之\(\mathcal{A}=\Omega\)), 2. \(\epsilon\in\mathcal{A}\Rightarrow\overline\epsilon\in\mathcal{A}\) (对差集运算闭), 3. \(\epsilon_j\in\mathcal{A}\) , \(j=1,2,\ldots\) \(\Rightarrow\) \(\cup_{j=1}^{\infty}\epsilon_k\in\mathcal{A}\) (对可数并集运算闭).

Hence, we have \(\cap_{j=1}^{\infty}\epsilon_j=\overline{\cup_{j=1}^{\infty}\overline{\epsilon_j}}\in\mathcal{A}\) (因而对可数交集运算也闭).

Example 1 Suppose \(\Omega=\{a,b,c,d\}\), then the \(\sigma\)-algebra generated by \(\{a,b\}\) and \(\{c\}\) is given by \(\{a,b\}\), \(\{c\}\), \(\{c,d\}\), \(\{a,b,d\}\), \(\{a,b,c\}\), \(\{d\}\), \(\Omega\) and \(\emptyset\). The set \(\{a,c\}\) is not an element of this smallest \(\sigma\)-algebra containing \(\{a,b\}\) and \(\{c\}\).

A probability measure (概率测度) \(P\) is a real scalar function over the \(\sigma\)-algebra \(\mathcal{A}\) (定义于西格玛代数) such that 1. \(\epsilon\in\mathcal{A}\Rightarrow P(\epsilon)\ge 0\) (\(\mathcal{A}\)中的事件概率均非负), 2. \(P(\Omega)=1\) (总事件的概率为1), 3. All \(\epsilon_j,j=1,2,\ldots\) disjoint \(\Rightarrow P(\cup_{j=1}^{\infty}\epsilon_j)=\sum_{j=1}^{\infty}P(\epsilon_j)\) (独立事件的并发概率,等于分别概率求和).

Let \(\mathcal{S}_1\subset\mathcal{S}_2\subset\ldots\) be a growing sequence of sets. Then the limit is defined as \(\displaystyle\lim_{n\to\infty}\mathcal{S}_n\equiv\cup_{j=1}^{\infty}\mathcal{S}_j\).

Result 1 If \(\mathcal{S}_j\in\mathcal{A},j=1,2,\ldots\) s.t. \(\mathcal{S}_i\subset\mathcal{S}_j\) if \(i<j\), then \[ P(\lim_{n\to\infty}\mathcal{S}_n)=\lim_{n\to\infty}P(\mathcal{S_n}) \] which can be proved by using the definition of probability measure and pigeonhole principle (or Dirichlet's principle)

A random variable (随机变量) \(X(\omega)\) on (\(\Omega\), \(\mathcal{A}\), \(P\)) is a real function such that \(\{\omega\mid X(\omega)\le x\}\in\mathcal{A}\) and the (cumulative) distribution function (分布函数) \(F(x)\) of a random variable \(X(\omega)\) is given by

\[ F(x)=P(\{\omega\mid X(\omega)\le x\})=P(X\le x). \]

Result 2 If \(F(x)\) is a distribution function, then \(F(-\infty)=0\), \(F(\infty)=1\) and \(F(x)\) is nondecreasing and continuous from the right, i.e. \(\displaystyle\lim_{x\downarrow a}F(x)=F(a)\). Exercise 1 Prove \(F(x)\) is continuous from the right.
Proof. Define sequence \(\{x_n\}\) by \[ x_1>x_2>\ldots>x_n>\ldots,\lim_{n\to\infty}x_n=a \]

and define \[ \mathcal{S}_n=\left\{\omega\ \mid\ X(\omega)\le x_n\right\}, \] so \(\mathcal{S}_1\subset\mathcal{S}_2\subset\ldots\) is an infinite growing sequence of sets, then using Result 1 yields \[ \lim_{n\to\infty}P(X\le x_n)=\lim_{n\to\infty}P(\mathcal{S_n})=P(\lim_{n\to\infty}\mathcal{S}_n)=P(X\le a). \] which, by definition of distribution function, is equivalent to \[ \lim_{n\to\infty}F(x_n)=F(a) \] for all sequences \(\{x_n\}\) defined as above. Notice that the LHS is exactly the Heine definition of right limit, i.e. \[ \lim_{x\downarrow a}F(x)=\lim_{n\to\infty}F(x_n)=F(a).\tag*{Q.E.D.} \]

Using the Riemann-Stieltjes integral (黎曼·斯蒂尔杰斯积分), the expectation (期望) of \(h(X(\omega))\) is given by

\[ E(h(X))=\int_{-\infty}^{\infty}h(x)dF(x)=\lim_{\substack{b\to\infty\\a\to -\infty}}\int_a^bh(x)dF(x), \]

provided the limits exists for all \(a\to-\infty\) and \(b\to\infty\). If the density (密度函数) \(f(x)=dF(x)/dx\) exists, then \(F(x_{i+1})-F(x_i)=f(x_i^*)(x_{i+1}-x_i)\), so

\[ E(h(X))=\int_{-\infty}^{\infty}h(x)f(x)dx. \]

If \(X\) is discrete, i.e. \(X=c_i\) with probability \(p_i,i=1,2,\ldots,K\), then

\[ E(h(X))=\sum_{i=1}^Kh(c_i)p_i. \]

The events \(\mathcal{A}\) and \(\mathcal{B}\) are independent (独立) if \(P(\mathcal{A}\cap\mathcal{B})=P(\mathcal{A})P(\mathcal{B})\). The random variables \(X\) and \(Y\) are independent if for every event \(\mathcal{A}\) (in the value space of \(X\), i.e. \(\{\omega\mid X(\omega)\in\mathcal{A}\}\)) and for every event \(\mathcal{B}\) (in the value space of \(Y\)) the following equality holds:

\[ P(X\text{ in }\mathcal{A}\text{ and }Y\text{ in }\mathcal{B})=P(X\text{ in }\mathcal{A})P(Y\text{ in }\mathcal{B}). \]

In that case \(F_{X,Y}(x,y)=F_X(x)F_Y(y)\) and, if the bivariate density exists, \(f_{X,Y}(x,y)=f_X(x)f_Y(y)\). An example is given by letting the distribution function be the Cantor function (康托函数), in which case density doesn't exist. Distributions like this are called Cantor distribution (康托分布).

Modes of Convergence

Convergence in distribution (依分布收敛): A sequence \(\{X_n\}\) is said to converge in distribution, \(X_n\overset{d}{\to}X\), if \(F_n\to F\) at all continuity points of \(F\). This is also denoted as \(X_n\overset{L}{\to}X\) (in law), or \(X_n\overset{A}{\sim}F\) (asymptotically distributed as). If \(Y_n\overset{d}{\to}X\) as well, then this is denoted as \(X_n\overset{LD}{=}Y_n\) (same limit distribution).

Example 2 If \(X\) is symmetrically distributed (about zero), and \(X_n=X\), \(Y_n=-X\), \(n=1,2,\ldots\), then \(X_n\overset{LD}{=}Y_n\). So, convergence in distribution is a rather weak form of convergence. Example 3 Let \[ F_{X_n}(x)= \begin{cases} 0,&\mbox{if $x\le a-n^{-1}$;}\\ (x-a+n^{-1})n/2,&\mbox{if $a-n^{-1}<x\le a+n^{-1}$;}\\ 1,&\mbox{otherwise.} \end{cases} \] Then, \(\displaystyle\lim_{n\to\infty}F_{X_n}(x)\) is not a distribution function (not continuous from the right). Still \(X_n\overset{d}{\to}X\), where \[ F(x)= \begin{cases} 0,&\mbox{if $x<a$;}\\ 1,&\mbox{otherwise.} \end{cases} \] which is continuous from the right.

Example 4 Let \(P(X_n=a)=p\) and \(P(X_n=n)=1-p\), and let \(F_{X_n}(x)\) be the distribution function of \(X_n\). Then \(\displaystyle\lim_{n\to\infty}F_{X_n}(x)\) is not a distribution function (not 1 when \(x\to+\infty\)). Furthermore, \(X_n\) does not converge in distribution to any random variable \(X\).

Result 3 If \(E(\left|X_n\right|^r)<c\), \(\forall n\), and \(X_n\overset{d}{\to}X\), then \(E(X_n^s)\to E(X^s)\) if \(s<r\).

Proof. Apply Helly's lemma (海利引理), we have that there exists a subsequence of \(F_{X_n^r}\), \(F_m\) such that \[ F_m\to F \] where \(F\) is a nundecreasing bounded function. If \(x_0\) is a continuity point of \(F\), then \[ \int_{|X|\ge x_0}dF_m\le x_0^{-r}\int_{|X|\ge x_0}\left|x^{r}\right|dF_m< x_0^{-r}c\tag{Lyapunov Inequality} \] for large \(m\). This is exactly equivalent to \[ 1-[F_{x_n}(x_0)-F_{x_n}(-x_0)]<x_0^{-r}c \] or \[ F_m(x_0)-F_m(-x_0)>1-x_0^{-r}c \] Letting \(n\to\infty\) yields \[ F(x_0)-F(-x_0)>1-x_0^{-r}c \] Letting \(x_0\to+\infty\) yields \(\forall\epsilon>0\) \[ F(+\infty)-F(-\infty)>1-\epsilon \] while, as \(F\) is bounded by \(F_{X_n}\) to \([0,1]\), it means \[ F(+\infty)-F(-\infty)=1 \] So \(F_m\) converges to a distribution function \(F\). Now, \[ \int_{|x|\ge x_0}\left|x^s\right|dF_m\le x_0^{s-r}\int_{|x|\ge x_0}\left|x^r\right|dF_m<x_0^{s-r}c \] So, \(\left|X_n^s\right|\) is uniformly integratable if \(s<r\), which means there exists an \(\epsilon\) independant of \(n\) such that \[ \epsilon+\int_{c}^{d}\left|x_n^s\right|dF_m\ge\int\left|x_n^s\right|dF_m\ge\int_{c^{\prime}}^{d^{\prime}}\left|x_n^s\right|dF_m,c^{\prime}<c,d^{\prime}>d \]

Take limit \(n\to\infty\) \[ \epsilon+\int_{c}^{d}\left|x_n^s\right|dF\ge\int_{c^{\prime}}^{d^{\prime}}\left|x_n^s\right|dF \]

Hence, \(\left|x_n^s\right|\) and therefore \(x_n^s\) is integratable w.r.t. \(dF\). Now, \[ \int x_n^s dF_m-\int x_n^s dF=\int_{-c}^c x_n^s (dF_m-dF)+\int_{ x_n^s \ge c} x_n^s dF_m-\int_{ x_n^s \ge c} x_n^s dF \]

Since \(c\) is arbitrarily given, we assume \(c\) to be sufficiently large, and take limit \(n\to\infty\), then \[ \int x_n^s dF_m-\int x_n^s dF\to0\tag*{Q.E.D.} \]

Example 5 Let $ F_{X_n}$, \(F_{Y_n}\) and \(F_{Z}\) be the distribution functions of positive random variables \(X_n\), \(Y_n\), \(n=1,2,\ldots\), and \(Z\), respectively, such that \[ F_{X_n}(x)=(1-n^{-1})F_Z(x)+n^{-1}F_{Y_n}(x). \]

Furthermore, let \(E(Z)=1\) and \(E(Y_n)=n\). Then \(E(X_n)=2-n^{-1}\). If we wish to apply Result 3, we find for \(r=1\): \(E(\left|X_n\right|^r)=E(X_n)<2\) and \(X_n\overset{d}{\to}Z\), while \(E(X_n)\to 2\not=E(Z)\). We cannot apply 3 since \(s\not<r\).

Convergence in probability (依概率收敛): A sequence \(\{X_n\}\) is said to converge in probability, \(X_n\overset{p}{\to}X\), if

\[ \lim_{n\to\infty}P(\left|X_n-X\right|>\epsilon)=0,\quad\forall\epsilon>0. \]

This is also denoted as \(plim(X_n)=X\) (probability limit), or \(X_n-X=o_p(1)\).

Convergence in \(r\)-th mean (依r阶矩收敛): Let \(r>0\) and \(E(|X_n|^r+|X|^r)<\infty\). The sequence \(\{X_n\}\) is said to converge in \(r\)-th mean, or in \(L^r\), \(X_n\overset{L^r}{\to}X\), if

\[ \lim_{n\to\infty}E(\left|X_n-X\right|^r)=0 \]

In particular, convergence in quadratic mean, \(r=2\), is frequently used.

Almost sure convergence (几乎确定收敛): A sequence \(\{X_n\}\) is said to converge almost surely, \(X_n\overset{a.s.}{\to}X\), (with probability 1, strong convergence) if

\[ P(\omega\mid\lim_{n\to\infty}X_n(\omega)=X(\omega))=1 \]

In short notation we also write \(P(\displaystyle\lim_{n\to\infty}|X_n-X|=0)=1\); it amounts to

\[ P(\lim_{n\to\infty}\cap_{N\ge n}\{\omega\mid \left|X_N(\omega)-X(\omega)\right|\le \epsilon\})=1,\quad \forall\epsilon>0. \]

There are several implications between the various modes of convergence. We will prove the following:

\[ \begin{cases} L^r\Rightarrow p\Rightarrow d;\\ a.s.\Rightarrow p. \end{cases} \]

Result 4 Markov inequality (马尔可夫不等式), or Chebyshev inequality (切比雪夫不等式): let \(r>0\) and \(E(\left|Z\right|^r)<\infty\), then \(\forall\epsilon>0\) \[ P(\left|Z\right|\ge \epsilon)\le \dfrac{E(|Z|^r)}{\epsilon^r}. \]

Proof. The result is equivalent to \[ E(\left|Z\right|^r)=\int_{-\infty}^{\infty}\left|z\right|^rdF(z)\ge \int_{\left|z\right|\ge\epsilon}\left|z\right|^rdF(z)\ge \epsilon^rP(\left|Z\right|\ge \epsilon),\tag*{Q.E.D} \]

Result 5 \[ X_n\overset{L^r}{\to}X\Rightarrow X_n\overset{p}{\to}X. \]

Exercise 2 Prove Result 5.

Proof. By assumption, \(X_n\overset{L^r}{\to}X\) yields \[ \lim_{n\to\infty}E(\left|X_n-X\right|^r)=0. \]

Now consider Chebyshev inequality \[ P(\left|X_n-X\right|\ge \epsilon)\le \dfrac{E(|X_n-X|^r)}{\epsilon^r}. \]

for all \(\epsilon>0\). Taking limit \(n\to\infty\) on both sides yields \[ \lim_{n\to\infty}P(\left|X_n-X\right|\ge \epsilon)\le \lim_{n\to\infty}\dfrac{E(|X_n-X|^r)}{\epsilon^r}=0\tag*{Q.E.D.} \]

Result 6 \[ X_n\overset{p}{\to}X\Rightarrow X_n\overset{d}{\to}X. \]

This is a special case of the following result.

Result 7 If \(Y_n\overset{d}{\to}X\) and \(X_n=Y_n+o_p(1)\), then \(X_n\overset{d}{\to}X\).

Proof. Let \(Z_n=Y_n-X_n\), then \(Z_n=o_p(1)\). Let \(x\) be a point of continuity of \(F_X(x)\), then \[ \begin{aligned}F_{X_n}(x) &=P(X_n\le x)=P(Y_n\le x+Z_n)\\ &=P(Y_n\le x+Z_n\text{ and }Z_n<\epsilon)+P(Y_n\le x+Z_n\text{ and }Z_n>\epsilon)\\ &\le P(Y_n\le x+\epsilon)+P(Z_n\ge \epsilon)\\ &=F_{Y_n}(x+\epsilon)+o(1), \end{aligned} \]

where \(P(Z_n\ge \epsilon)=o(1)\) means \(\displaystyle\lim_{n\to\infty}P(Z_n\ge \epsilon)=0\), which holds since \(Z_n=o_p(1)\). As \(F_{Y_n}(x+\epsilon)\to F_{X}(x+\epsilon)\) for sufficiently small positive \(\epsilon\), since \(Y_n\overset{d}{\to}X\) and \(x\) is a point of continuity of \(F_{X}(x)\), we find for \(n\) sufficiently large: \(F_{X_n}(x)\le F_{X}(x+\epsilon)\). Similarly, we find: \(F_{X_n}(x)\ge F_{X}(x-\epsilon)\). Due to the continuity of \(F_X(x)\) in \(x\) we find as \(\epsilon\to 0\): \(F_{X_n}(x)\to F_X(x)\tag*{Q.E.D.}\)

Exercise 3 Prove that Result 6 is a special case of Result 7.

Proof. Let \(Y_n=X=0\) and \(X_n=Y_n+o_p(1)=o_p(1)\), then \[ \lim_{n\to\infty}P(\left|X\right|<\epsilon)=0=X_n \]

which means \(X_n\overset{p}{\to}X\), and applying Result 7 we have therefore \[ X_n\overset{d}{\to}X. \]

Result 8 \[ X_n\overset{a.s.}{\to}X\Rightarrow X_n\overset{p}{\to}X. \]

Exercise 4 Prove Result 8 using Result 1.

Proof. Let \(X_n\overset{a.s.}{\to}X\), which is equivalent to \[ P(\lim_{n\to\infty}\left|X_n-X\right|=0)=1, \]

which is also equivalent to that \(\forall\epsilon>0\), \(\exists N\in\mathbb{N}\) such that for all \(n>N\) holds \[ P(\left|X_n-X\right|\ge\epsilon)=0, \]

This allows us to define a decreasing sequence \(\{\epsilon_n\}\) such that \(\epsilon_1\ge\epsilon_2\ge\ldots\), \(\displaystyle\lim_{n\to\infty}\epsilon_n=0\) and that \[ P(\left|X_n-X\right|\ge\epsilon_n)=0, \] Now define \(\mathcal{S}_n=\{\omega\mid\left|X_n-X\right|\ge\epsilon_n\}\), then \(\mathcal{S}_1\subset\mathcal{S}_2\subset\ldots\) and \(\displaystyle\lim_{n\to\infty}\mathcal{S}_n=\{\omega\mid\left|X_n-X\right|=0\}\). By applying Result 1 yields \[ \lim_{n\to\infty}P(\mathcal{S}_n)=P(\lim_{n\to\infty}\mathcal{S}_n)=0 \] Consider the LHS, \[ \lim_{n\to\infty}P(\left|X_n-X\right|\ge\epsilon_n)=0\Rightarrow\lim_{n\to\infty}P(\left|X_n-X\right|\ge\epsilon)=0\tag*{Q.E.D.} \]

Result 9 Let \(a\) be a constant, then \[ X_n\overset{d}{\to}a\Rightarrow X_n\overset{p}{\to}a. \] Exercise 5 Prove Result 9.

Proof. For any \(\epsilon>0\), by definition of cdf of \(X_n\) we have \[ P(\left|X_n-a\right|\le\epsilon)=P(a-\epsilon\le X_n\le a+\epsilon)=F_{X_n}(a+\epsilon)-F_{X_n}(a-\epsilon) \]

As \(X_n\overset{d}{\to}a\) as a constant, we have \[ \lim_{n\to\infty}F_{X_n}(x)= F_a(x)=P(a\le x)= \begin{cases} 1,&\mbox{$x\ge a$}\\ 0,&\mbox{$x<a$} \end{cases} \]

and thus \[ \lim_{n\to\infty}F_{X_n}(a+\epsilon)=F_a(a+\epsilon)=1,\\ \lim_{n\to\infty}F_{X_n}(a-\epsilon)=F_a(a-\epsilon)=0. \] So we have \[ \lim_{n\to\infty}P(\left|X_n-a\right|\le\epsilon)=\lim_{n\to\infty}F_{X_n}(a+\epsilon)-\lim_{n\to\infty}F_{X_n}(a-\epsilon)=1-0=1\tag*{Q.E.D.} \]

Result 10 Let \(1\le s\le r\), then \[ X_n\overset{L^r}{\to}X\Rightarrow X_n\overset{L^s}{\to}X. \] Example 6 Let \(a_1=0\) and \[ a_{n+1}= \begin{cases} a_n+n^{-1}-1&\mbox{if $a_n+n^{-1}\ge 1$;}\\ a_n+n^{-1}&\mbox{otherwise}. \end{cases} \] Define a sequence of random variables on \(\Omega=[0,1]\) as follows: \[ X_n(\omega)=\begin{cases} 1,&\mbox{if $a_n\le\omega\le\min(a_n+n^{-1},1)$;}\\ 1,&\mbox{if $0\le\omega\le a_n+n^{-1}-1$, (when $a_n+n^{-1}\ge 1$);}\\ 0,&\mbox{otherwise}. \end{cases} \]

Result 11 Slutsky (斯拉茨基定理) Let \(a\) be a constant and let \(X_n\overset{d}{\to}X\) and \(Y_n\overset{p}{\to}a\), then \[ \begin{aligned} (i)\quad & X_n+Y_n\overset{d}{\to}X+a\\ (ii)\quad & X_nY_n\overset{d}{\to}aX\\ (iii)\quad & X_n/Y_n\overset{d}{\to}X/a\quad(a\not =0) \end{aligned} \] Proof. Use Result 7.

Convergence of Random Vectors and Functions

Result 11 can also be derived as a special case of multivariate results on random vectors. For example, convergence in probability for random vectors can be defined by replacing \(\left|X_n-X\right|\), in the scalar definition, by the L\(^2\)-norm \(\Vert X_n-X\Vert\), where \(X\) is a random k-vector, say, and \(\Vert X\Vert=(X^{\prime}X)^{1/2}\).

Result 12 Elementwise convergence in p implies convergence in p of the whole vector. Proof. Use the conclusion that \(\left\{\sum_{i=1}^k(X_n-X)_i^2\right\}^{1/2}\le\sum_{i=1}^k\left|(X_n-X)_i\right|\).

Just note that this is not the case when it's about convergence in d. Convergence elementwise in d need not imply the convergence of the vector in d.

Example 7 Let \(X_n=X\sim\mathcal{N}(0,\sigma^2)\) for \(n=1,2,\ldots\) Let \(Y_n=X\) if \(\left|X\right|<a\) and \(Y_n=-X\) if \(\left|X\right|\ge a\). So \(Y_n\sim\mathcal{N}(0,\sigma^2)\) for \(n=1,2,\ldots\) If we choose for a value such that \[ \int_a^{\infty}x^2f_X(x)dx=\sigma^2/4 \] then \(X_n\) and \(Y_n\) are uncorrelated (check). Yet \(\left|X_n+Y_n\right|<2a\), so \(X_n+Y_n\) cannot be normally distributed and (\(X_n, Y_n\)) does not have a multivariate (asymptotic) normal distribution.

The following result can be shown by using the Continuity Theorem (连续性定理), Result 20.

Result 13 Reduction Theorem or Cramer-Wold Device (约减定理) Let \(X_n\) and \(X\) be random k-vectors. If for all k-vectors \(\lambda\), with \(\lambda^{\prime}\lambda=1\), it holds that \(\lambda^{\prime}X_n\overset{d}{\to}\lambda^{\prime}X\), then \(X_n\overset{d}{\to}X\).

Result 14 Mann and Wald Let \(X_n\) and \(X\) be random k-vectors. Let \(g:\mathbb{R}^k\to\mathbb{R}\) be a scalar mapping where \(g\) is continuous (except perhaps on a closed set \(\mathcal{E}\) such that \(P(X\in\mathcal{E})=0\)), then \[ X_n\overset{d}{\to}X\Rightarrow g(X_n)\overset{d}{\to}g(X). \]

Result 15 Let \(X_n\) be a random k-vector. Let \(g:\mathbb{R}^k\to\mathbb{R}\) be a scalar mapping where \(g\) is continuous at \(a\), then \[ X_n\overset{p}{\to}a\Rightarrow g(X_n)\overset{p}{\to}g(a). \] Exercise 6 Prove Result 15. Proof. Notice that for all \(n>0\), \[ \dfrac{\sum_{i=1}^k\left|(X_n-a)_i\right|}{k}\le\left(\sum_{i=1}^k\dfrac{(X_n-a)_i^2}{k}\right)^{1/2} \] where \(k\) is constant. Hence, for all \(n>0\) and any \(\epsilon>0\), \[ \sum_{i=1}^k\left|(X_n-a)_i\right|>\epsilon\sqrt{k}\Rightarrow\left(\sum_{i=1}^k(X_n-a)_i^2\right)^{1/2}>\epsilon \] Therefore, it holds that for all \(n>0\) and any \(\epsilon>0\), \[ 0\le P\left(\sum_{i=1}^k\left|(X_n-a)_i\right|>\epsilon\sqrt{k}\right)\le P\left(\left(\sum_{i=1}^k(X_n-a)_i^2\right)^{1/2}>\epsilon\right) \] While by assumption, we have \[ X_n\overset{p}{\to}a \iff\lim_{n\to\infty}P\left(\left\Vert X_n-a\right\Vert>\epsilon\right)=0,\quad\forall\epsilon>0 \iff\lim_{n\to\infty}P\left(\left(\sum_{i=1}^k(X_n-a)_i^2\right)^{1/2}>\epsilon\right)=0,\quad\forall\epsilon>0 \] So we can conclude that \[ \lim_{n\to\infty}P\left(\sum_{i=1}^k\left|(X_n-a)_i\right|>\epsilon\sqrt{k}\right)=0,\quad\forall\epsilon>0 \] which implies elementwise convergence of \(X_n\) to \(a\) since \(k\) is merely a constant. So by using Result 6, Result 14 and then Result 9 yields \[ X_{n,i}\overset{p}{\to}a_i\Rightarrow X_{n,i}\overset{d}{\to}a_i\Rightarrow g(X_{n,i})\overset{d}{\to} g(a_i) \Rightarrow g(X_{n,i})\overset{p}{\to} g(a_i) \] where \(X_{n,i}\) and \(a_i\) are respectively the i\(^{th}\) element of \(X_n\) and \(a\), and \(g\) is an arbitrary scalar mapping that is continuous at \(a\). Therefore, \(g(X_n)\) is elementwise convergent to \(g(a)\), and by using Result 12 we have \[ g(X_n)\overset{p}{\to}g(a)\tag*{Q.E.D.} \]

Result 16 The "convergence in p" in Result 15 may be replaced by a.s. convergence.

Result 17 Let \(X_n\) be a random k-vector s.t. \(X_n\overset{p}{\to}a_n\), where, for \(n\) sufficiently large, \(a_n\) is interior to \(\mathcal{C}\) uniformly in \(n\). If \(g:\mathbb{R}^k\to\mathbb{R}\) is uniformly continuous on \(\mathcal{C}\), then \(g(X_n)\overset{p}{\to}g(a_n)\).

Characteristic Functions

The characteristic function (特征函数) \(\phi_X(\lambda)\) of a random variable is defined by \(E(e^{i\lambda X})\), where \(e^{i\lambda X}=\cos{(\lambda X)}+i\sin{(\lambda X)}\). Consequently, \(\phi_X(\lambda)\) is real if \(X\) is distributed symmetrically about zero.

Result 18 If \(X\sim\mathcal{N}(\mu,\sigma^2)\), then \(\phi_X(\lambda)=e^{i\lambda\mu-\frac{\sigma^2\lambda^2}{2}}\).

Result 19 If \(E(|X|^r)<\infty\), then \[ g_X(\lambda)=\log\{\phi_X(\lambda)\}=\sum_{j=1}^r\kappa_j\dfrac{(i\lambda)^j}{j!}+o(|\lambda|^r). \]

Here the constants

\[ \kappa_j=\left.\left\{\dfrac{\partial^j g_X(\lambda)}{\partial\lambda^j}\right\}/i^j\right\rvert_{\lambda=0} \]

are called cumulants (累积量), \(\kappa_1=E(X)\), \(\kappa_2=E\left[(X-E(X))^2\right]=E(X^2)-[E(X)]^2=Var(X)\), \(\kappa_3=E\left[(X-E(X))^3\right]\), \(\kappa_4=E\left[(X-E(X))^4\right]-3[Var(X)]^2\). For the normal distribution, \(\kappa_k=0\) if \(j>2\). Compare to moments, cumulants are easily manipulated when random variables are linearly transformed, or when independent random variables are addded.

For random vectors we use similar expressions. If \(X\) is a k-vector of random variables, then

\[ \phi_X(\lambda)=E\left(e^{i\lambda^{\prime}X}\right)=E\left(\prod_{j=1}^k e^{i\lambda_jX_j}\right) \]

where \(\lambda\in\mathbb{R}^k\). In particular, if random variables \(X_j\) are independent, then \(\phi_X(\lambda)=\prod_{j=1}^k\phi_{X_j}(\lambda_j)\). If two random variables are added, we find \(\phi_{X+Y}(\lambda)=E\left(e^{i\lambda(X+Y)}\right)\); if \(X\) and \(Y\) are independent, then \(\phi_{X+Y}(\lambda)=\phi_X(\lambda)\phi_Y(\lambda)\). In general we have \(\phi_{a^{\prime}X}(\lambda)=E\left(e^{i\lambda a^{\prime}X}\right)=\phi_X(\lambda a)\).

Result 20 Continuity Theorem / Levy-Cramer (连续映射定理) The following two results hold 1. \(X_n\overset{d}{\to}X\Rightarrow\phi_{X_n}(\lambda)\to\phi_{X}(\lambda)\) 2. If \(\phi_{X_n}(\lambda)\to h(\lambda)\), where \(h(\lambda)\) is continuous at \(\lambda=0\), then \(X_n\overset{d}{\to}X\) and \(\phi_X(\lambda)=h(\lambda)\).

The Small o and the Big O

We write \(X_n=o_p(n^{-r})\) if \(n^rX_n=o_p(1)\) and we write \(X_n=o_p(1)\) if \(X_n\overset{p}{\to}0\), i.e. for all \(\epsilon>0\) and all \(\Delta>0\) there exists an \(N_{\epsilon,\Delta}\) s.t. \(P(\left|X_n\right|>\Delta)<\epsilon\) if \(n\ge N_{\epsilon,\Delta}\).

We write \(X_n=O_p(n^{-r})\) if \(n^rX_n=O_p(1)\) and we write \(X_n=O_p(1)\) if for all \(\epsilon>0\) there exists a \(\Delta_{\epsilon}\) and \(N_{\epsilon}\) s.t. \(P(\left|X_n\right|>\Delta_{\epsilon})<\epsilon\) if \(n\ge N_{\epsilon}\).

Example 8 For example, the sequence described in Example 4 is not bounded in p.

Result 21 \[ X_n\overset{d}{\to}X\Rightarrow X_n=O_p(1). \] Proof. Choose \(\Delta_{\epsilon}\) s.t. \(1-F_X(\Delta_{\epsilon})<\epsilon/2\) and let \(N_{\epsilon}\) be large enough to ensure \(\left|F_X(\Delta_{\epsilon})-F_{X_n}(\Delta_{\epsilon})\right|<\epsilon/2\) if \(n\ge N_{\epsilon}\), which is possible since \(X_n\overset{d}{\to}X\). Then we find \[ P(\left|X_n\right|>\Delta_{\epsilon})\le 1-F_{X_n}(\Delta_{\epsilon})\le 1-F_X(\Delta_{\epsilon})+\epsilon/2<\epsilon\text{ if }n\ge N_{\epsilon}. \] Something is wrong. Exercise 7 Provide a correct proof for Result 21. Proof. Choose \(\Delta_{\epsilon}\) such that \(1-F_X(\Delta_{\epsilon})+F_X(-\Delta_{\epsilon})<\epsilon/2\) and let \(N_{\epsilon}\) be large enough to ensure \[ \max\left\{\left|F_X(\Delta_{\epsilon})-F_{X_{n}}(\Delta_{\epsilon})\right|,\left|F_X(-\Delta_{\epsilon})-F_{X_{n}}(-\Delta_{\epsilon})\right|\right\}<\epsilon/4 \] if \(n\ge N_{\epsilon}\), which is possible since \(X_n\overset{d}{\to}X\). Then we find \[ \begin{aligned} P(\left|X_n\right|>\Delta_{\epsilon}) =&P(X_n>\Delta_{\epsilon})+P(X_n<-\Delta_{\epsilon})\\ \le&P(X_n>\Delta_{\epsilon})+P(X_n\le -\Delta_{\epsilon})\\ =&1-F_{X_n}(\Delta_{\epsilon})+F_{X_n}(-\Delta_{\epsilon})\\ <&1-F_X(\Delta_{\epsilon})+F_X(-\Delta_{\epsilon})+\epsilon/4+\epsilon/4<\epsilon \end{aligned} \] if \(n\ge N_{\epsilon}\). So it holds that \[ X_n\overset{d}{\to}X\Rightarrow X_n=O_p(1).\tag*{Q.E.D.} \]

Result 22 \(o_p(1)O_p(1)=o_p(1).\) Exercise 8 Prove Result 22. Proof. Let \(X_n=o_p(1)\) and \(Y_n=O_p(1)\), then: 1. for all \(\epsilon>0\) and all \(\Delta_X>0\) there exists an \(N_X(\epsilon/2,\Delta_X)\) s.t. \(P(\left|X_n\right|>\Delta_X)<\epsilon/2\) if \(n\ge N_X\), and 2. for all \(\epsilon>0\) there exists a \(\Delta_Y(\epsilon)>0\) and \(N_Y(\epsilon/2)\) s.t. \(P(\left|Y_n\right|>\Delta_Y)<\epsilon/2\) if \(n\ge N_Y\). Since \(\Delta_X\) is arbitrarily given, we may let \(\Delta_X=\Delta/\Delta_Y\), then there always exists \[ N(\epsilon,\Delta)=\max\{N_X(\epsilon/2,\Delta_X),N_Y(\epsilon/2)\} \] s.t. \[ \begin{align} P(\left|X_nY_n\right|>\Delta)=& P\left(\left.\left|Y_n\right|>\dfrac{\Delta}{\left|X_n\right|}\right\rvert\left|X_n\right|>\Delta_X\right)P\left(\left|X_n\right|>\Delta_X\right)+ P\left(\left.\left|Y_n\right|>\dfrac{\Delta}{\left|X_n\right|}\right\rvert\left|X_n\right|\le\Delta_X\right)P\left(\left|X_n\right|\le\Delta_X\right)\\ <&1\cdot P\left(\left|X_n\right|>\Delta_X\right)+ P\left(\left.\left|Y_n\right|>\dfrac{\Delta}{\left|X_n\right|}\right\rvert\left|X_n\right|\le\Delta_X\right)\cdot 1\\ \le&P\left(\left|X_n\right|>\Delta_X\right)+P\left(\left|Y_n\right|>\Delta_Y\right)\\ <&\epsilon/2+\epsilon/2=\epsilon \end{align} \] where we used the fact \(\Delta_X=\Delta/\Delta_Y\). Therefore in conclusion, for all \(\epsilon>0\) and all \(\Delta>0\), there exists an \(N(\epsilon,\Delta)\) s.t. for all \(n\ge N\) holds \[ P\left(\left|X_nY_n\right|>\Delta\right)<\epsilon.\tag*{Q.E.D.} \]

Result 23 Let \(\kappa=\max\{\lambda,\mu\}\), then \[ \begin{align} o_p(n^{\lambda})o_p(n^{\mu})=o_p(n^{\lambda+\mu}),\quad & o_p(n^{\lambda})+o_p(n^{\mu})=o_p(n^{\kappa}),\\ O_p(n^{\lambda})O_p(n^{\mu})=O_p(n^{\lambda+\mu}),\quad & O_p(n^{\lambda})+O_p(n^{\mu})=O_p(n^{\kappa}),\\ o_p(n^{\lambda})O_p(n^{\mu})=o_p(n^{\lambda+\mu}),\quad & o_p(n^{\lambda})+O_p(n^{\mu})= \begin{cases} o_p(n^{\lambda}),&\text{if $\kappa=\lambda$;}\\ O_p(n^{\mu}),&\text{if $\kappa=\mu$.} \end{cases} \end{align} \]

Laws of Large Numbers

Consider a sequence \(\{X_n\}\). If, as \(n\to\infty\),

\[ \overline{X_n}=\dfrac{\sum_{i=1}^nX_i}{n}\overset{p}{\to}a_n \]

where \(a_n\) are constants.<!-- more --Then \(\{X_n\}\) satisfies the Weak Law of Large Numbers (w.l.l.n., 弱大数定理). If "p" is replaced by "a.s." then \(\{X_n\}\) satisfies the Strong Law of Large Numbers (s.l.l.n., 强大数定理).

Result 24 Chebyshev's Theorem of w.l.l.n. (切比雪夫弱大数定理) Let \(E(X_i)=\mu_i\), \(Var(X_i)=\sigma_i^2\) and let \(X_i\) and \(X_j\) be uncorrelated, \(Cov(X_i,X_j)=0\) if \(i\not=j\). Then

\[ \dfrac{\overline{\sigma_n^2}}{n}\equiv\sum_{i=1}^n\dfrac{\sigma_i^2}{n^2}\to 0\Rightarrow \overline{X_n}\overset{p}{\to}\overline{\mu_n}\equiv\dfrac{\sum_{i=1}^n\mu_i}{n}. \]

Proof. The proof follows from Result 5.

Result 25 Khinchine's Theorem of w.l.l.n. (辛钦弱大数定理) Let \(\{X_n\}\) be an i.i.d. sequence (homogeneous) with \(E(X_i)=\mu\), then \(\overline{X_n}\overset{p}{\to}\mu\).

Notice that here the existence of the second moment is not required.

Exercise 9 Prove Result 25.

Proof. Consider an i.i.d. sequence of random variables \(\{X_n\}\) and their characteristic functions, \(\phi_X(\lambda)\). By using Result 19 and letting \(r=1\) yields

\[ \begin{align} \phi_X(\lambda)&=\exp\left(i\lambda\kappa_1+o(\left|\lambda\right|)\right)\\ &=\exp\left(i\lambda\mu+o(\left|\lambda\right|)\right)\\ &=1+i\lambda\mu+o(\left|\lambda\right|),\quad\lambda\to 0\\ \end{align} \]

Result 19 also implies that if \(X_i\) and \(X_j\) are independent, then

\[ \phi_{X_i+X_j}(\lambda)=\phi_{X_i}(\lambda)\phi_{X_j}(\lambda), \]

and that

\[ \phi_{a^{\prime}X}(\lambda)=\phi_X(\lambda a). \]

Now, let \(a=1/n\) and then we have

\[ \phi_{\overline{X_n}}(\lambda)=\left[\phi_X\left(\dfrac{\lambda}{n}\right)\right]^n= \left[1+\dfrac{i\lambda\mu}{n}+o\left(\left|\dfrac{\lambda}{n}\right|\right)\right]^n \to e^{i\lambda\mu},\quad\text{as }n\to\infty. \]

Notice that \(\phi_X(\lambda)=e^{i\lambda\mu}\) is the characteristic function of the constant random variable \(X=\mu\) and is continuous at \(\lambda=0\), so with Result 20 we have

\[ \overline{X_n}\overset{d}{\to}\mu \]

Since \(\mu\) is a constant, we may use Result 9 and yield

\[ \overline{X_n}\overset{d}{\to}\mu\Rightarrow \overline{X_n}\overset{p}{\to}\mu.\tag*{Q.E.D.} \]

Result 26 Kolmogorov 2 of s.l.l.n. (柯尔莫果洛夫第二强大数定理) Let \(\{X_n\}\) be an i.i.d. sequence (homogenous), then

\[ \overline{X_n}\overset{a.s.}{\to}\mu\iff E(X_i)\text{ exists, and }E(X_i)=\mu. \]

Result 27 Kolmogorov 1 of s.l.l.n. (柯尔莫果洛夫第一强大数定理) Let the sequence \(\{X_n\}\) be independent (heterogeneous) and let the first two moments exist, \(E(X_n)=\mu_n\) and \(Var(X_n)=\sigma_n^2\). Then

\[ \sum_{i=1}^{\infty}\dfrac{\sigma_i^2}{i^2}<\infty\Rightarrow\overline{X_n}\overset{a.s.}{\to}\overline{\mu_n}. \]

The strange condition \(\sum_{i=1}^{\infty}\sigma_i^2/i^2<\infty\) is equivalent to \(\lim_{n\to\infty}\sum_{i=1}^n\sigma_i^2/i^2=c<\infty\), and it can be related to two other, more transparent conditions.

Result 28

\[ \overline{\sigma_n^2}=\sum_{i=1}^n\dfrac{\sigma_i^2}{n}=O(1)\Rightarrow\lim_{n\to\infty}\sum_{i=1}^n\dfrac{\sigma_i^2}{i^2}=c<\infty\Rightarrow Var(\overline{X_n})=\dfrac{\overline{\sigma_n^2}}{n}\to 0. \]

To prove Result 28 we need three additional results.

Result 29 Let \(x_i\in\mathbb{R}\), \(i=0,1,\ldots,n\), and let \(0\le a_1\le a_2\le \ldots\le a_n\), then

\[ a_n(x_n-\max\{x_0,x_1,\ldots,x_{n-1}\})\le\sum_{i=1}^n a_i(x_i-x_{i-1})\le a_n(x_n-\min\{x_0,x_1,\ldots,x_{n-1}\}). \]

Proof. Consider \(a_n=0\) and then \(a_n>0\), it can be proved that the middle is a convex combination.

Result 30 Kronecker's Lemma (克罗尼克引理) Let \(0\le b_1\le b_2\le \ldots\le b_n\) and \(x_i\in\mathbb{R}\), \(i=1,2,\ldots,n\), if \(b_n\to\infty\), as \(n\to\infty\), then

\[ \sum_{i=1}^n\dfrac{x_i}{b_i}\to c\Rightarrow\sum_{i=1}^n\dfrac{x_i}{b_n}\to 0. \]

Proof. Omitted here, but you can see that on the syllabus.

Result 31 Let \(0\le a_1\le a_2\le \ldots\le a_n\), \(0\le b_1\le b_2\le \ldots\le b_n\) and \(0<b_1/a_1\le b_2/a_2\le \ldots\le b_n/a_n\). Furthermore, let \(\sum_{i=1}^n(a_i-a_{i-1})/b_i=O(1)\), as \(n\to\infty\). Then for \(x_i\in\mathbb{R}\), \(i=0,1,\ldots,n\),

\[ \sum_{i=1}^n\dfrac{x_i}{a_n}=O(1)\Rightarrow\sum_{i=1}^n\dfrac{x_i}{b_i}=O(1). \]

Proof. The proof is also omitted here.

Exercise 10 Prove Result 28.

Proof. (1) Let \(a_1=a_2=\ldots=a_n=n\) and \(b_i=i^2\), \(i=1,2,\ldots,n\). So it holds that \(0\le a_1\le a_2\le \ldots\le a_n\), \(0\le b_1\le b_2\le \ldots\le b_n\) and \(0<b_1/a_1\le b_2/a_2\le \ldots\le b_n/a_n\). Furthermore, \(\sum_{i=1}^n(a_i-a_{i-1})/b_i=0=O(1)\). Therefore, by using > Result 31 yields that for \(\sigma_i^2\in\mathbb{R}\),

\[ \sum_{i=1}^n\dfrac{\sigma_i^2}{n}=\sum_{i=1}^n\dfrac{\sigma_i^2}{a_n}=O(1)\Rightarrow\sum_{i=1}^n\dfrac{\sigma_i^2}{b_i}=\sum_{i=1}^n\dfrac{\sigma_i^2}{i^2}=O(1). \]

This means that the sequence \(\left\{\sum_{i=1}^n\sigma_i^2/i^2\right\}\) is both bounded and monotonously increasing, and thus must has a limit, say, \(c<\infty\), as \(n\to\infty\). So in conclusion,

\[ \overline{\sigma_n^2}=\sum_{i=1}^n\dfrac{\sigma_i^2}{n}=O(1)\Rightarrow\lim_{n\to\infty}\sum_{i=1}^n\dfrac{\sigma_i^2}{i^2}=c<\infty. \]

  1. Define \(\{a_n\}\) and \(\{b_n\}\) just as above. Since \(0\le b_1\le b_2\le \ldots\le b_n\) and \(b_n=n^2\to\infty\) as \(n\to\infty\), we may apply Result 30:

\[ \sum_{i=1}^n\dfrac{\sigma_i^2}{b_i}\to c\Rightarrow\sum_{i=1}^n\dfrac{\sigma_i^2}{b_n}\to 0. \]

Notice that \(\sum_{i=1}^n\dfrac{\sigma_i^2}{b_n}=\sum_{i=1}^n\dfrac{\sigma_i^2}{n^2}=\dfrac{1}{n}\sum_{i=1}^n\dfrac{\sigma_i^2}{n}=\dfrac{\overline{\sigma_n^2}}{n}\), therefore, we may conclude that

\[ \lim_{n\to\infty}\sum_{i=1}^n\dfrac{\sigma_i^2}{i^2}=c<\infty\Rightarrow Var(\overline{X_n})=\dfrac{\overline{\sigma_n^2}}{n}\to 0.\tag*{Q.E.D.} \]

  1. 1.Hayashi, Fumio. Econometrics. Princeton University Press. Section 1 (2000): 60-69. ↩︎