Notes on Multivariate Data Analysis via Matrix Decomposition


These are the notes taken on my master course Multivariate Data Analysis via Matrix Decomposition. If you’re confused with the course name, you can think of this as a statistical course on unsupervised learning.\(\newcommand{1}[1]{\unicode{x1D7D9}_{\{#1\}}}\newcommand{span}{\text{span}}\newcommand{bs}{\boldsymbol}\newcommand{R}{\mathbb{R}}\newcommand{rank}{\text{rank}}\newcommand{\norm}[1]{\left\lVert#1\right\rVert}\newcommand{diag}{\text{diag}}\newcommand{tr}{\text{tr}}\newcommand{braket}[1]{\left\langle#1\right\rangle}\newcommand{C}{\mathbb{C}}\)


First let’s give some standard notations used in this course. Let’s assume no prior knowledge in linear algebra and start from matrix multiplication.

Matrix Multiplication

We denote a matrix \(\bs{A}\in\R^{m\times n}\), with its entries defined as \([a_{ij}]_{i,j=1}^{m,n}\). Similarly, we define \(\bs{B}=[b_{jk}]_{j,k=1}^{n,p}\) and thus the multiplication is defined as \(\bs{AB} = [\sum_{j=1}^n a_{ij}b_{jk}]_{i,k=1}^{n,p}\), which can also be represented in three other ways:

A special example of such representation: let’s assume

\[ \bs{A}=[\bs{a}_1,\bs{a}_2,\ldots,\bs{a}_n]\in\R^{m\times n}\text{ and } \bs{D} = \diag(d_1,d_2,\ldots,d_n) \in\R^{n\times n},\]

then we have right away \(\bs{AD}=[\bs{a}_id_i]_{i=1}^n\).

Exercise With multiplication we care ranks of matrices. There is a quick conclusion: If \(\bs{x}\neq \bs{0}, \bs{y}\neq \bs{0}\), then \(\rank(\bs{xy'})=1\). Conversely, if \(\rank(\bs{A})=1\), then \(\exists\ \bs{x}\neq \bs{0}, \bs{y}\neq \bs{0}\) s.t. \(\bs{xy'}=\bs{A}\). Prove it.


There are two types of norms in this course we consider:

Property What makes these norms?

Exercise Try to prove them for Euclidean 2-norm, Frobenius norm and spectral 2-norm.

Inner Products

There are two types of inner products we consider:

A famous inequality related to these inner products is the Cauchy-Schwarz inequality, which states

Eigenvalue Decomposition (EVD)

The first matrix decomposition we’re gonna talk about is the eigenvalue decomposition.

Eigenvalues and Eigenvectors

For square matrix \(\bs{A}\in\R^{n\times n}\), if \(\bs{0}\neq \bs{x}\in\C^n\) and \(\lambda\in\C\) is s.t. \(\bs{Ax} = \lambda\bs{x}\), then \(\lambda\) is called an engenvalue of \(\bs{A}\) and \(\bs{x}\) is called the \(\lambda\)-engenvector of \(\bs{A}\).

Ideally, we want a matrix to have \(n\) eigenvectors and \(n\) corresponding eigenvectors, linearly independent to each other. This is not always true.

Existence of EVD

Theorem \(\bs{A}\in\R^{n\times n}\) have \(n\) eigenvalues iff. there exists an invertible \(\bs{X}\in\R^{n\times n}\) s.t. \(\bs{X}^{-1}\bs{A}\bs{X}=\bs{\Lambda}\), i.e. \(\bs{A}\) is diagonizable. This gives \(\bs{A}=\bs{X}\bs{\Lambda}\bs{X}^{-1}\), which is called the eigenvalue decomposition (EVD).

Theorem (Spectral Theorem for Symmetric Matrices) For symmetric matrix \(\bs{A}\in\R^{n\times n}\) there always exists an orthogonal matrix \(\bs{Q}\), namely \(\bs{Q}'\bs{Q}=\bs{I}\), that gives

\[ \bs{A}=\bs{Q\Lambda Q}' = \sum_{i=1}^n \lambda_i \bs{q}_i \bs{q}_i' \]

where \(\bs{q}\) are column vectors of \(\bs{Q}\). This is called the symmetric EVD, aka. \(\bs{A}\) being orthogonally diagonalizable.

Properties of EVD

Property We have several properties following the second theorem above. For all \(i=1,2,\ldots, n\)

The second theorem above can also be represented as

Theorem If \(\bs{A}=\bs{A}'\), then \(\bs{A}\) has \(n\) orthogonal eigenvectors.

Singular Value Decomposition (SVD)

For general matrices, we have singular value decomposition.


The most famous form of SVD is define as

\[ \bs{A} = \bs{U} \bs{\Sigma} \bs{V}' \]

where \(\bs{A}\in\R^{m\times n}\), \(\bs{U}\in\R^{m\times m}\), \(\bs{\Sigma}\in\R^{m\times n}\) and \(\bs{V}\in\R^{n\times n}\). Specifically, both \(\bs{U}\) and \(\bs{V}\) are orthogonal (i.e. \(\bs{U}'\bs{U}=\bs{I}\), same for \(\bs{V}\)) and \(\bs{\Sigma}\) is diagonal. Usually, we choose the singular values to be non-decreasing, namely

\[ \bs{\Sigma}=\diag(\sigma_1,\sigma_2,\ldots,\sigma_{\min\{m,n\}})\quad\text{where}\quad \sigma_1 \ge \sigma_2 \ge \cdots \ge \sigma_{\min\{m,n\}}. \]


Here we define a list of terms that’ll be used from time to time:

Three Forms of SVD

Besides the regular SVD given above, we have the outer product SVD:

\[ \bs{A} = \sum_{i=1}^{\min\{m,n\}}\!\!\!\sigma_i \bs{u}_i \bs{v}_i' \]

and condensed SVD:

\[ \bs{A} = \bs{U}_r\bs{\Sigma}_r\bs{V}_r' \]

where \(r=\rank(\bs{A})\) is also the number of non-zero singular values. In this form, we have \(\bs{\Sigma}_r\in\R^{r\times r}\) with proper chunked \(\bs{U}_r\) and \(\bs{V}_r\).

Existence of SVD

Theorem (Existence of SVD) Let \(\bs{A}\in\R^{m\times n}\) and \(r=\rank(\bs{A})\). Then \(\exists\ \bs{U}_r\in\R^{m\times r}\), \(\bs{V}_r\in\R^{n\times r}\) and \(\bs{\Sigma}_r\in\R^{r\times r}\) s.t. \(\bs{A} = \bs{U}_r\bs{\Sigma}_r\bs{V}_r'\) where \(\bs{U}_r\) and \(\bs{V}_r\) are orthogonal and \(\bs{\Sigma}_r\) is diagonal. This means condensed SVD exists and therefore the rest two forms.

Proof Define symmetric \(\bs{W}\in\R^{(m+n)\times(m+n)}\) as

\[ \bs{W} = \begin{bmatrix} \bs{0} & \bs{A} \\ \bs{A}' & \bs{0} \end{bmatrix} \]

which has an orthogonal EVD as \(\bs{W} = \bs{Z}\bs{\Lambda}\bs{Z}'\) where \(\bs{Z}'\bs{Z}=\bs{I}\). Now, assume \(\bs{z}\in\R^{m+n}\) is an eigenvector of \(\bs{W}\) corresponding to \(\lambda\), then \(\bs{W}\bs{z} = \lambda \bs{z}\). Denote the first \(m\) entries of \(\bs{z}\) as \(\bs{x}\) and the rest \(\bs{y}\), which gives

\[ \begin{bmatrix} \bs{0} & \bs{A}\\ \bs{A}' & \bs{0} \end{bmatrix} \begin{bmatrix} \bs{x} \\ \bs{y} \end{bmatrix} = \lambda \begin{bmatrix} \bs{x} \\ \bs{y} \end{bmatrix} \Rightarrow \begin{cases} \bs{Ay} = \lambda \bs{x},\\ \bs{A}'\bs{x} = \lambda \bs{y}. \end{cases} \]

Using this results

\[ \begin{bmatrix} \bs{0} & \bs{A}\\ \bs{A}' & \bs{0} \end{bmatrix} \begin{bmatrix} \bs{x} \\ -\bs{y} \end{bmatrix} = \begin{bmatrix} -\bs{Ay} \\ \bs{A}'\bs{y} \end{bmatrix} = \begin{bmatrix} -\lambda \bs{x}\\ \lambda \bs{y} \end{bmatrix} = -\lambda\begin{bmatrix} \bs{x}\\ -\bs{y} \end{bmatrix} \]

which means \(-\lambda\) is also an engenvalue of \(\bs{W}\). Hence, we know

\[ \begin{align} \bs{W} &= \bs{Z}\bs{\Lambda}\bs{Z}' = \bs{Z}_r\bs{\Lambda}_r\bs{Z}_r'\\ &= \begin{bmatrix} \bs{X} & \bs{X}\\ \bs{Y} & -\bs{Y} \end{bmatrix} \begin{bmatrix} \bs{\Sigma} & \bs{0}\\ \bs{0} & -\bs{\Sigma} \end{bmatrix} \begin{bmatrix} \bs{X} & \bs{X}\\ \bs{Y} & -\bs{Y} \end{bmatrix}'\\ &= \begin{bmatrix} \bs{0} & \bs{X}\bs{\Sigma}\bs{Y}'\\ \bs{Y}\bs{\Sigma}\bs{X}' & \bs{0} \end{bmatrix}. \end{align} \]

Therefore, we conclude \(\bs{A}=\bs{X}\bs{\Sigma}\bs{Y}'\) where now all we need to prove is the orthogonality of \(\bs{X}\) and \(\bs{Y}\). Let’s take a look at \(\bs{z}=(\bs{x},\bs{y})\) we just defined. Let

\[ \norm{\bs{z}}=\bs{z}'\bs{z}=\bs{x}'\bs{x} + \bs{y}'\bs{y} = 2. \]

From orthogonality of eigenvectors corresponding to different eigenvalues, we also know

\[ \bs{z}'\bar{\bs{z}} = \bs{x}'\bs{x} - \bs{y}'\bs{y} = 0 \]

which altogether gives \(\norm{\bs{x}}=\norm{\bs{y}}=1\).Q.E.D.

Properties of SVD

Property There are several characteristics we have about SVD:

Exercise Show how to use SVD to calculate these two norms.

Applications of SVD

  1. One of the most importance usages of SVD is computing projections. \(\bs{P}\in\R^{n\times n}\) is a projection matrix iff. \(\bs{P}^2=\bs{P}\). More commonly, we consider orthogonal projection \(\bs{P}\) that’s also symmetric. Now let’s consider dataset \(\bs{A}\in\R^{m\times n}\) where we have \(n\) observations, each with \(m\) dimensions. Suppose we want to project this dataset onto \(\bs{W}\subseteq\R^m\) that has \(k\) dimensions, i.e.

    \[ \bs{W} = \span\{\bs{q}_1,\bs{q}_2,\ldots,\bs{q}_k\},\quad \bs{q}_i'\bs{q}_j=\1{i=j} \]

    then the projection matrix would be \(\bs{P}_{\bs{W}}=\bs{Q}_k\bs{Q}_k'\).

  2. The nearest orthogonal matrix of \(\bs{A}\in\R^{p\times p}\) is given by

    \[ \min_{\bs{X}'\bs{X}=\bs{I}}\norm{\bs{A}-\bs{X}}_F \]

    which solves if we have optima for

    \[ \begin{align} \min_{\bs{X}'\bs{X}=\bs{I}}\norm{\bs{A}-\bs{X}}_F^2 &= \min_{\bs{X}'\bs{X}=\bs{I}}\tr[(\bs{A}-\bs{X})'(\bs{A}-\bs{X})]\\&= \min_{\bs{X}'\bs{X}=\bs{I}}\tr[\bs{A}'\bs{A} - \bs{X}'\bs{A} - \bs{A}'\bs{X} + \bs{X}'\bs{X}]\\&= \min_{\bs{X}'\bs{X}=\bs{I}} \norm{\bs{A}}_F^2 - \tr(\bs{A}'\bs{X}) - \tr(\bs{X}'\bs{A}) + \tr(\bs{X}'\bs{X})\\&= \norm{\bs{A}}_F^2 + n - 2\max_{\bs{X}'\bs{X}=\bs{I}} \tr(\bs{A}'\bs{X}) \end{align}. \]

    Now we try to solve

    \[ \max_{\bs{X}'\bs{X}=\bs{I}} \tr(\bs{A}'\bs{X}) \]

    and claim the solution is given by \(\bs{X} = \bs{U}\bs{V}'\) where \(\bs{U}\) and \(\bs{V}\) are derived from SVD of \(\bs{A}\), namely \(\bs{A} = \bs{U\Sigma V}'\). Proof: We know

    \[ \tr(\bs{A}'\bs{X}) = \tr(\bs{V}\bs{\Sigma}'\bs{U}'X) = \tr(\bs{\Sigma}'\bs{U}'\bs{X}\bs{V}) =: \tr(\bs{\Sigma}'\bs{Z}) \]

    where we define \(\bs{Z}\) as the product of the three orthogonal matrices, which therefore is orthogonal: \(\bs{Z}'\bs{Z}=\bs{I}\).

    Orthogonality of \(\bs{Z}\) gives \(\forall i\)

    \[ z_{i1}^2 + z_{i2}^2 + \cdot + z_{ip}^2 = 1 \Rightarrow z_{ii} \ge 1. \]

    Hence, (note all singular values are non-negative)

    \[ \tr(\bs{\Sigma}'\bs{Z}) = \sum_{i=1}^p \sigma_i z_{ii} \le \sum_{i=1}^p \sigma_i \]

    which gives optimal \(\bs{Z}^*=\bs{I}\) and thus the solution follows.

  3. The orthogonal Procrustes problem seeks the solution to

    \[ \min_{\bs{X}'\bs{X}=\bs{I}}\norm{\bs{A}-\bs{BX}}_F \]

    which is, similar to the problem above, given by the SVD of \(\bs{BA}'=\bs{U\Sigma V}'\), namely \(\bs{X}=\bs{UV}'\).

  4. Nearest symmetric matrix for \(\bs{A}\in\R^{p\times p}\) seeks solution to \(\min\norm{\bs{A}-\bs{X}}_F\), which is simply

    \[ \bs{X} = \frac{\bs{A}+\bs{A}'}{2}. \]

    In order to prove it, write \(\bs{A}\) in the form of

    \[ \bs{A} = \frac{\bs{A} + \bs{A}'}{2} + \frac{\bs{A} - \bs{A}'}{2} =: \bs{X} + \bs{Y}. \]

    Notice \(\tr(\bs{X}'\bs{Y})=0\), hence by Pythagoras we know \(\bs{Y}\) is the minimum we can find for the problem above.

  5. Best rank-\(r\) approximation. In order to find the best rank-\(r\) approximation in Frobenius norm, we need solution to

    \[ \min_{\rank(\bs{X})\le r} \norm{\bs{A}-\bs{X}}_F \]

    which is merely \(\bs{X}=\bs{U}_r\bs{\Sigma}_r\bs{V}_r'\). See condensed SVD above for notation.

    The best approximation in 2-norm, namely solution to

    \[ \min_{\rank(\bs{X})\le r} \norm{\bs{A}-\bs{X}}_2, \]

    is exactly identical to the one above. We may prove both by reduction to absurdity. Proof: Suppose \(\exists \bs{B}\in\R^{n\times p}\) s.t.

    \[ \norm{\bs{A}-\bs{B}}_2 < \norm{\bs{A}-\bs{X}}_2 = \sigma_{r+1}. \]

    Now choose \(\bs{w}\) from kernel of \(\bs{B}\) and we have

    \[ \bs{Aw}=\bs{Aw}+\bs{0} = (\bs{A}-\bs{B})\bs{w} \]

    and thus

    \[ \norm{\bs{Aw}}_2 = \norm{(\bs{A}-\bs{B})\bs{w}}_2 \le \norm{\bs{A}-\bs{B}}_2\cdot \norm{\bs{w}}_2 <\sigma_{r+1}\norm{\bs{w}}_2\tag{1}. \]

    Meanwhile, note \(\bs{w}\in\span\{v_1,v_2,\ldots,v_{r+1}\}=\bs{W}\), assume particularly \(\bs{w}=\bs{v}_{r+1}\bs{\alpha}\), then

    \[ \begin{align} \norm{\bs{Aw}}_2^2&=\norm{\bs{U}\bs{\Sigma}\bs{V}'\bs{w}}_2^2 = \sum_{i=1}^{r+1}\sigma_i^2\alpha_i^2 \ge \sigma_{r+1}^2\sum_{i=1}^{r+1}\alpha_i^2\\ &= \sigma_{r+1}^2\norm{\bs{\alpha}}_2^2\equiv \sigma_{r+1}^2\norm{\bs{w}}_2^2.\tag{2} \end{align} \]

    Due to contradiction between eq. (1) and (2) we conclude such \(\bs{B}\) doesn’t exist.