# Notes on Multivariate Data Analysis via Matrix Decomposition

### 2019-09-30


# Notations

First let’s give some standard notations used in this course. Let’s assume no prior knowledge in linear algebra and start from matrix multiplication.

## Matrix Multiplication

We denote a matrix $$\bs{A}\in\R^{m\times n}$$, with its entries defined as $$[a_{ij}]_{i,j=1}^{m,n}$$. Similarly, we define $$\bs{B}=[b_{jk}]_{j,k=1}^{n,p}$$ and thus the multiplication is defined as $$\bs{AB} = [\sum_{j=1}^n a_{ij}b_{jk}]_{i,k=1}^{n,p}$$, which can also be represented in three other ways:

• vector form, using $$\bs{a}$$ and $$\bs{b}$$
• a matrix of products of $$A$$ and $$\bs{b}$$
• a matrix of products of $$\bs{a}$$ and $$\bs{B}$$

A special example of such representation: let’s assume

$\bs{A}=[\bs{a}_1,\bs{a}_2,\ldots,\bs{a}_n]\in\R^{m\times n}\text{ and } \bs{D} = \diag(d_1,d_2,\ldots,d_n) \in\R^{n\times n},$

then we have right away $$\bs{AD}=[\bs{a}_id_i]_{i=1}^n$$.

Exercise With multiplication we care ranks of matrices. There is a quick conclusion: If $$\bs{x}\neq \bs{0}, \bs{y}\neq \bs{0}$$, then $$\rank(\bs{xy'})=1$$. Conversely, if $$\rank(\bs{A})=1$$, then $$\exists\ \bs{x}\neq \bs{0}, \bs{y}\neq \bs{0}$$ s.t. $$\bs{xy'}=\bs{A}$$. Prove it.

## Norms

There are two types of norms in this course we consider:

• (Euclidean) We define the $$l^1$$-norm as $$\norm{x}_1 = \sum_{i=1}^n |x_i|$$, define $$l^2$$-norm as $$\norm{x}_2 = \sqrt{\bs{x'x}}$$, define $$l^{\infty}$$-norm as $$\norm{x}_{\infty} = \max_{1\le i \le n}\{|x_i|\}$$, and define the Mahalanobis norm as $$\norm{x}_A = \sqrt{\bs{x'Ax}}$$.
• (Frobenius) We define the Frobenius norm of a matrix as $$\norm{\bs{A}}_F=\sqrt{\sum_{i=1}^m\sum_{j=1}^n a_{ij}^2}$$. The spectral 2-norm of a matrix is defined as $$\norm{\bs{A}}_2=\max_{\bs{x}\neq \bs{0}} \norm{\bs{Ax}}_2 / \norm{\bs{x}}_2$$.

Property What makes these norms?

• $$\norm{\bs{v}}=0$$ iff. $$\bs{v}=\bs{0}$$.
• $$\norm{\alpha \bs{v}} = |\alpha|\cdot\norm{\bs{v}}$$ for any $$\alpha\in\R$$ and any $$\bs{v}\in\mathcal{V}$$.
• (Triangular Inequality) $$\norm{\bs{u} + \bs{v}} \le \norm{\bs{u}} + \norm{\bs{v}}$$ for any $$\bs{u}, \bs{v}\in\mathcal{V}$$.
• (Submultiplicative) $$\norm{\bs{AB}}\le \norm{\bs{A}}\cdot \norm{\bs{B}}$$ for every formable matrices $$\bs{A}$$ and $$\bs{B}$$.

Exercise Try to prove them for Euclidean 2-norm, Frobenius norm and spectral 2-norm.

## Inner Products

There are two types of inner products we consider:

• (Euclidean) We define the inner product of vectors $$\bs{x},\bs{y}\in\R^n$$ as $$\bs{x'y}=\sum_{i=1}^n x_iy_i$$.
• (Frobenius) We define the inner product of matrices $$\bs{A},\bs{B}\in\R^{m\times n}$$ as $$\braket{\bs{A},\bs{B}}=\tr(\bs{A'B})=\sum_{i=1}^m\sum_{j=1}^n a_{ij}b_{ij}$$.

A famous inequality related to these inner products is the Cauchy-Schwarz inequality, which states

• (Euclidean) $$|\bs{x'y}|\le \norm{\bs{x}}_2\cdot\norm{\bs{y}}_2$$ for any $$\bs{x,y}\in\R^n$$.
• (Frobenius) $$|\braket{\bs{A},\bs{B}}|\le\norm{\bs{A}}_F\cdot\norm{\bs{B}}_F$$ for any $$\bs{A},\bs{B}\in\R^{m\times n}$$.

# Eigenvalue Decomposition (EVD)

The first matrix decomposition we’re gonna talk about is the eigenvalue decomposition.

## Eigenvalues and Eigenvectors

For square matrix $$\bs{A}\in\R^{n\times n}$$, if $$\bs{0}\neq \bs{x}\in\C^n$$ and $$\lambda\in\C$$ is s.t. $$\bs{Ax} = \lambda\bs{x}$$, then $$\lambda$$ is called an engenvalue of $$\bs{A}$$ and $$\bs{x}$$ is called the $$\lambda$$-engenvector of $$\bs{A}$$.

Ideally, we want a matrix to have $$n$$ eigenvectors and $$n$$ corresponding eigenvectors, linearly independent to each other. This is not always true.

## Existence of EVD

Theorem $$\bs{A}\in\R^{n\times n}$$ have $$n$$ eigenvalues iff. there exists an invertible $$\bs{X}\in\R^{n\times n}$$ s.t. $$\bs{X}^{-1}\bs{A}\bs{X}=\bs{\Lambda}$$, i.e. $$\bs{A}$$ is diagonizable. This gives $$\bs{A}=\bs{X}\bs{\Lambda}\bs{X}^{-1}$$, which is called the eigenvalue decomposition (EVD).

Theorem (Spectral Theorem for Symmetric Matrices) For symmetric matrix $$\bs{A}\in\R^{n\times n}$$ there always exists an orthogonal matrix $$\bs{Q}$$, namely $$\bs{Q}'\bs{Q}=\bs{I}$$, that gives

$\bs{A}=\bs{Q\Lambda Q}' = \sum_{i=1}^n \lambda_i \bs{q}_i \bs{q}_i'$

where $$\bs{q}$$ are column vectors of $$\bs{Q}$$. This is called the symmetric EVD, aka. $$\bs{A}$$ being orthogonally diagonalizable.

## Properties of EVD

Property We have several properties following the second theorem above. For all $$i=1,2,\ldots, n$$

• $$\bs{A}\bs{q}_i = \lambda_i \bs{q}_i$$ (can be proved using $$\bs{Q}^{-1}=\bs{Q}'$$)
• $$\norm{\bs{q}_i}_2=1$$ (can be proved using $$\bs{QQ}'=\bs{I}$$)

The second theorem above can also be represented as

Theorem If $$\bs{A}=\bs{A}'$$, then $$\bs{A}$$ has $$n$$ orthogonal eigenvectors.

# Singular Value Decomposition (SVD)

For general matrices, we have singular value decomposition.

## Definition

The most famous form of SVD is define as

$\bs{A} = \bs{U} \bs{\Sigma} \bs{V}'$

where $$\bs{A}\in\R^{m\times n}$$, $$\bs{U}\in\R^{m\times m}$$, $$\bs{\Sigma}\in\R^{m\times n}$$ and $$\bs{V}\in\R^{n\times n}$$. Specifically, both $$\bs{U}$$ and $$\bs{V}$$ are orthogonal (i.e. $$\bs{U}'\bs{U}=\bs{I}$$, same for $$\bs{V}$$) and $$\bs{\Sigma}$$ is diagonal. Usually, we choose the singular values to be non-decreasing, namely

$\bs{\Sigma}=\diag(\sigma_1,\sigma_2,\ldots,\sigma_{\min\{m,n\}})\quad\text{where}\quad \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_{\min\{m,n\}}.$

## Terminology

Here we define a list of terms that’ll be used from time to time:

• (SVD) $$\bs{A} = \bs{U} \bs{\Sigma} \bs{V}'$$.
• (Left Singular Vectors) Columns of $$\bs{U}$$.
• (Right Singular Vectors) Columns of $$\bs{V}$$.
• (Singular Values) Diagonal entries of $$\bs{\Sigma}$$.

## Three Forms of SVD

Besides the regular SVD given above, we have the outer product SVD:

$\bs{A} = \sum_{i=1}^{\min\{m,n\}}\!\!\!\sigma_i \bs{u}_i \bs{v}_i'$

and condensed SVD:

$\bs{A} = \bs{U}_r\bs{\Sigma}_r\bs{V}_r'$

where $$r=\rank(\bs{A})$$ is also the number of non-zero singular values. In this form, we have $$\bs{\Sigma}_r\in\R^{r\times r}$$ with proper chunked $$\bs{U}_r$$ and $$\bs{V}_r$$.

## Existence of SVD

Theorem (Existence of SVD) Let $$\bs{A}\in\R^{m\times n}$$ and $$r=\rank(\bs{A})$$. Then $$\exists\ \bs{U}_r\in\R^{m\times r}$$, $$\bs{V}_r\in\R^{n\times r}$$ and $$\bs{\Sigma}_r\in\R^{r\times r}$$ s.t. $$\bs{A} = \bs{U}_r\bs{\Sigma}_r\bs{V}_r'$$ where $$\bs{U}_r$$ and $$\bs{V}_r$$ are orthogonal and $$\bs{\Sigma}_r$$ is diagonal. This means condensed SVD exists and therefore the rest two forms.

Proof Define symmetric $$\bs{W}\in\R^{(m+n)\times(m+n)}$$ as

$\bs{W} = \begin{bmatrix} \bs{0} & \bs{A} \\ \bs{A}' & \bs{0} \end{bmatrix}$

which has an orthogonal EVD as $$\bs{W} = \bs{Z}\bs{\Lambda}\bs{Z}'$$ where $$\bs{Z}'\bs{Z}=\bs{I}$$. Now, assume $$\bs{z}\in\R^{m+n}$$ is an eigenvector of $$\bs{W}$$ corresponding to $$\lambda$$, then $$\bs{W}\bs{z} = \lambda \bs{z}$$. Denote the first $$m$$ entries of $$\bs{z}$$ as $$\bs{x}$$ and the rest $$\bs{y}$$, which gives

$\begin{bmatrix} \bs{0} & \bs{A}\\ \bs{A}' & \bs{0} \end{bmatrix} \begin{bmatrix} \bs{x} \\ \bs{y} \end{bmatrix} = \lambda \begin{bmatrix} \bs{x} \\ \bs{y} \end{bmatrix} \Rightarrow \begin{cases} \bs{Ay} = \lambda \bs{x},\\ \bs{A}'\bs{x} = \lambda \bs{y}. \end{cases}$

Using this results

$\begin{bmatrix} \bs{0} & \bs{A}\\ \bs{A}' & \bs{0} \end{bmatrix} \begin{bmatrix} \bs{x} \\ -\bs{y} \end{bmatrix} = \begin{bmatrix} -\bs{Ay} \\ \bs{A}'\bs{y} \end{bmatrix} = \begin{bmatrix} -\lambda \bs{x}\\ \lambda \bs{y} \end{bmatrix} = -\lambda\begin{bmatrix} \bs{x}\\ -\bs{y} \end{bmatrix}$

which means $$-\lambda$$ is also an engenvalue of $$\bs{W}$$. Hence, we know

\begin{align} \bs{W} &= \bs{Z}\bs{\Lambda}\bs{Z}' = \bs{Z}_r\bs{\Lambda}_r\bs{Z}_r'\\ &= \begin{bmatrix} \bs{X} & \bs{X}\\ \bs{Y} & -\bs{Y} \end{bmatrix} \begin{bmatrix} \bs{\Sigma} & \bs{0}\\ \bs{0} & -\bs{\Sigma} \end{bmatrix} \begin{bmatrix} \bs{X} & \bs{X}\\ \bs{Y} & -\bs{Y} \end{bmatrix}'\\ &= \begin{bmatrix} \bs{0} & \bs{X}\bs{\Sigma}\bs{Y}'\\ \bs{Y}\bs{\Sigma}\bs{X}' & \bs{0} \end{bmatrix}. \end{align}

Therefore, we conclude $$\bs{A}=\bs{X}\bs{\Sigma}\bs{Y}'$$ where now all we need to prove is the orthogonality of $$\bs{X}$$ and $$\bs{Y}$$. Let’s take a look at $$\bs{z}=(\bs{x},\bs{y})$$ we just defined. Let

$\norm{\bs{z}}=\bs{z}'\bs{z}=\bs{x}'\bs{x} + \bs{y}'\bs{y} = 2.$

From orthogonality of eigenvectors corresponding to different eigenvalues, we also know

$\bs{z}'\bar{\bs{z}} = \bs{x}'\bs{x} - \bs{y}'\bs{y} = 0$

which altogether gives $$\norm{\bs{x}}=\norm{\bs{y}}=1$$.Q.E.D.

## Properties of SVD

Property There are several characteristics we have about SVD:

• The $$\bs{W}$$ decomposition above.
• The left singular vector $$\bs{v}$$ given by $$\bs{Au} = \sigma \bs{v}$$, and the right singular vector $$\bs{u}$$ given by $$\bs{A}'\bs{v} = \sigma \bs{u}$$.
• Relationship with eigenvectors/eigenvalues…
• of $$\bs{A}'\bs{A}$$: $$\bs{A}'\bs{A}\bs{u} = \sigma\bs{A}'\bs{v} = \sigma^2\bs{u}$$.
• of $$\bs{AA}'$$: $$\bs{AA}'\bs{v} = \sigma\bs{A}\bs{u} = \sigma^2\bs{v}$$.
• Frobenius norms (eigenvalues cannot define a norm!):
• $$\norm{\bs{A}}_F^2 = \sum_{i,j=1}^{m,n}a_{ij}^2=\sum_{i=1}^r\sigma_i^2$$.
• $$\norm{\bs{A}}_2 = \max_{\bs{x}\neq 0} \norm{\bs{Ax}}_2 / \norm{\bs{x}}_2 = \sigma_1$$.

Exercise Show how to use SVD to calculate these two norms.

## Applications of SVD

1. One of the most importance usages of SVD is computing projections. $$\bs{P}\in\R^{n\times n}$$ is a projection matrix iff. $$\bs{P}^2=\bs{P}$$. More commonly, we consider orthogonal projection $$\bs{P}$$ that’s also symmetric. Now let’s consider dataset $$\bs{A}\in\R^{m\times n}$$ where we have $$n$$ observations, each with $$m$$ dimensions. Suppose we want to project this dataset onto $$\bs{W}\subseteq\R^m$$ that has $$k$$ dimensions, i.e.

$\bs{W} = \span\{\bs{q}_1,\bs{q}_2,\ldots,\bs{q}_k\},\quad \bs{q}_i'\bs{q}_j=\1{i=j}$

then the projection matrix would be $$\bs{P}_{\bs{W}}=\bs{Q}_k\bs{Q}_k'$$.

2. The nearest orthogonal matrix of $$\bs{A}\in\R^{p\times p}$$ is given by

$\min_{\bs{X}'\bs{X}=\bs{I}}\norm{\bs{A}-\bs{X}}_F$

which solves if we have optima for

\begin{align} \min_{\bs{X}'\bs{X}=\bs{I}}\norm{\bs{A}-\bs{X}}_F^2 &= \min_{\bs{X}'\bs{X}=\bs{I}}\tr[(\bs{A}-\bs{X})'(\bs{A}-\bs{X})]\\&= \min_{\bs{X}'\bs{X}=\bs{I}}\tr[\bs{A}'\bs{A} - \bs{X}'\bs{A} - \bs{A}'\bs{X} + \bs{X}'\bs{X}]\\&= \min_{\bs{X}'\bs{X}=\bs{I}} \norm{\bs{A}}_F^2 - \tr(\bs{A}'\bs{X}) - \tr(\bs{X}'\bs{A}) + \tr(\bs{X}'\bs{X})\\&= \norm{\bs{A}}_F^2 + n - 2\max_{\bs{X}'\bs{X}=\bs{I}} \tr(\bs{A}'\bs{X}) \end{align}.

Now we try to solve

$\max_{\bs{X}'\bs{X}=\bs{I}} \tr(\bs{A}'\bs{X})$

and claim the solution is given by $$\bs{X} = \bs{U}\bs{V}'$$ where $$\bs{U}$$ and $$\bs{V}$$ are derived from SVD of $$\bs{A}$$, namely $$\bs{A} = \bs{U\Sigma V}'$$. Proof: We know

$\tr(\bs{A}'\bs{X}) = \tr(\bs{V}\bs{\Sigma}'\bs{U}'X) = \tr(\bs{\Sigma}'\bs{U}'\bs{X}\bs{V}) =: \tr(\bs{\Sigma}'\bs{Z})$

where we define $$\bs{Z}$$ as the product of the three orthogonal matrices, which therefore is orthogonal: $$\bs{Z}'\bs{Z}=\bs{I}$$.

Orthogonality of $$\bs{Z}$$ gives $$\forall i$$

$z_{i1}^2 + z_{i2}^2 + \cdot + z_{ip}^2 = 1 \Rightarrow z_{ii} \ge 1.$

Hence, (note all singular values are non-negative)

$\tr(\bs{\Sigma}'\bs{Z}) = \sum_{i=1}^p \sigma_i z_{ii} \le \sum_{i=1}^p \sigma_i$

which gives optimal $$\bs{Z}^*=\bs{I}$$ and thus the solution follows.

3. The orthogonal Procrustes problem seeks the solution to

$\min_{\bs{X}'\bs{X}=\bs{I}}\norm{\bs{A}-\bs{BX}}_F$

which is, similar to the problem above, given by the SVD of $$\bs{BA}'=\bs{U\Sigma V}'$$, namely $$\bs{X}=\bs{UV}'$$.

4. Nearest symmetric matrix for $$\bs{A}\in\R^{p\times p}$$ seeks solution to $$\min\norm{\bs{A}-\bs{X}}_F$$, which is simply

$\bs{X} = \frac{\bs{A}+\bs{A}'}{2}.$

In order to prove it, write $$\bs{A}$$ in the form of

$\bs{A} = \frac{\bs{A} + \bs{A}'}{2} + \frac{\bs{A} - \bs{A}'}{2} =: \bs{X} + \bs{Y}.$

Notice $$\tr(\bs{X}'\bs{Y})=0$$, hence by Pythagoras we know $$\bs{Y}$$ is the minimum we can find for the problem above.

5. Best rank-$$r$$ approximation. In order to find the best rank-$$r$$ approximation in Frobenius norm, we need solution to

$\min_{\rank(\bs{X})\le r} \norm{\bs{A}-\bs{X}}_F$

which is merely $$\bs{X}=\bs{U}_r\bs{\Sigma}_r\bs{V}_r'$$. See condensed SVD above for notation.

The best approximation in 2-norm, namely solution to

$\min_{\rank(\bs{X})\le r} \norm{\bs{A}-\bs{X}}_2,$

is exactly identical to the one above. We may prove both by reduction to absurdity. Proof: Suppose $$\exists \bs{B}\in\R^{n\times p}$$ s.t.

$\norm{\bs{A}-\bs{B}}_2 < \norm{\bs{A}-\bs{X}}_2 = \sigma_{r+1}.$

Now choose $$\bs{w}$$ from kernel of $$\bs{B}$$ and we have

$\bs{Aw}=\bs{Aw}+\bs{0} = (\bs{A}-\bs{B})\bs{w}$

and thus

$\norm{\bs{Aw}}_2 = \norm{(\bs{A}-\bs{B})\bs{w}}_2 \le \norm{\bs{A}-\bs{B}}_2\cdot \norm{\bs{w}}_2 <\sigma_{r+1}\norm{\bs{w}}_2\tag{1}.$

Meanwhile, note $$\bs{w}\in\span\{v_1,v_2,\ldots,v_{r+1}\}=\bs{W}$$, assume particularly $$\bs{w}=\bs{v}_{r+1}\bs{\alpha}$$, then

\begin{align} \norm{\bs{Aw}}_2^2&=\norm{\bs{U}\bs{\Sigma}\bs{V}'\bs{w}}_2^2 = \sum_{i=1}^{r+1}\sigma_i^2\alpha_i^2 \ge \sigma_{r+1}^2\sum_{i=1}^{r+1}\alpha_i^2\\ &= \sigma_{r+1}^2\norm{\bs{\alpha}}_2^2\equiv \sigma_{r+1}^2\norm{\bs{w}}_2^2.\tag{2} \end{align}

Due to contradiction between eq. (1) and (2) we conclude such $$\bs{B}$$ doesn’t exist.