allenfrostline

Eigenvectors from Eigenvalues?


2019-11-15

The paper uploaded yesterday by Peter Denton, Stephen Parke, Terence Tao and Xining Zhang has immediately set off a heated globalwide discussion. A new theorem on something that has been deeply rooted in moderm mathematics for hundreds of years, that is unseen and unexpected, which is basically the reason why people are so excited about it. In this post, however, I’d like to share some doubts of mine on this new algorithm of finding eigenvectors. I’ll mainly cover two aspects: assumptions and computations. As a claim: I’ve enjoyed reading the (multiple) rigorous proofs on Terry’s blog and have no intention of arguing on the correctness of the theorem.

The theorem in the paper reads:$\newcommand{1}[1]{\unicode{x1D7D9}_{\{#1\}}} \newcommand{Corr}{\text{Corr}} \newcommand{E}{\text{E}} \newcommand{Cov}{\text{Cov}} \newcommand{Var}{\text{Var}} \newcommand{span}{\text{span}} \newcommand{bs}{\mathbf} \newcommand{R}{\mathbb{R}} \newcommand{R}{\mathbb{C}} \newcommand{rank}{\text{rank}} \newcommand{\norm}[1]{\left\lVert#1\right\rVert} \newcommand{diag}{\text{diag}} \newcommand{tr}{\text{tr}} \newcommand{braket}[1]{\left\langle#1\right\rangle} \newcommand{C}{\mathbb{C}}$

Theorem. Let $\bs{A}$ be an $n\times n$ Hermitian matrix with eigenvalues $\{\lambda_i(\bs{A}):i=1,2,\ldots,n\}$. Let $\bs{v}_i$ be a unit eigenvector corresponding to $\lambda_i(\bs{A})$ and let $v_{i,j}$ be the $j$-th entry of this vector. Then

$$ |v_{i,j}|^2 \prod_{k=1;k\neq i}^n [\lambda_i(\bs{A}) - \lambda_k(\bs{A})] = \prod_{k=1;k\neq i}^n [\lambda_i(\bs{A}) - \lambda_k(\bs{M}_j)] $$

where $\bs{M}_j$ is the $(n-1)\times(n-1)$ Hermitian matrix by removing the $j$-th row and column from $\bs{A}$. Namely, given the denominator is non-zero, the following can be used to calculate any eigenvector from pre-calculated eigenvalues:

$$ |v_{i,j}|^2 = \frac{\prod_{k=1;k\neq i}^n [\lambda_i(\bs{A}) - \lambda_k(\bs{M}_j)]}{\prod_{k=1;k\neq i}^n [\lambda_i(\bs{A}) - \lambda_k(\bs{A})]}. $$

The statement of this theorem restricts our universe to only Hermitian matrices, namely matrices that are (symmetrically) self-adjoint: $\bs{A}=\bar{\bs{A}'}$. This is so strong an assumption that most machine learning problems are out of our consideration except e.g. EVD-related ones for covariance matrices.

Algorithm1. The theorem gives the following algorithm right out-of-box:

function v_ij(lambda_A: Array[n-1], lambda_M: Array[n-1]) {
    numerator = 1
    denominator = 1
    for (k = 1...n) {
        if (k == i) continue
        numerator *= (lambda_A[i] - lambda_M[k])
        denominator *= (lambda_A[i] - lambda_A[k])
    }
    return (numerator / denominator)
}

Time complexity of this algorithm: $O(n)$, which means the time complexity for the whole eigenvector $\bs{v}_i$ would be $O(n^2)$ and $O(n^3)$ if we’re looking for all eigenvectors.

Algorithm2: As a comparison, let’s see how the classical algorithm works in calculating eigenvectors:

function v_i(A: Array[n-1][n-1], lambda_A: Array[n-1]) {
    T = A - I * lambda_A[i]  # O(n^2)
    v_i = solve(D * v == 0)  # O(f(u))
}

where $f(u)$ is a strictly monotonically decreasing function of the required precision $u$. This precision is pre-determined for the numerical method of solving the high-dimensional polynomial system $\bs{Dv}=\bs{0}$ (the closed-form solution e.g. Gaussian elimination or LU decomposition requires $O(\frac{2}{3}n^3+2n^2)\equiv O(n^3)$ time). For $n\gg 1$, it can be proved that $O(f(u))\ll O(n^2)$ and thus the time complexity of the original algorithm is also $O(n^2)$ for each eigenvector, and $O(n^3)$ for all of them.

Therefore, with all eigenvalues given, the time complexity of the new algorithm is identical to the classical one. However, if we take into consideration the time spent on finding the $n$ eigenvalues of $\bs{A}$ and $n(n-1)$ eigenvalues of $\{\bs{M}_j:j=1,2,\ldots,n\}$, the new algorithm is in fact doing not less but even more calculation, and as each eigenvalue costs us $O(n^3)$ time, this is just non-negligible.

As a conclusion, despite the novelty of using only eigenvalues as inputs, the new paper is not actually “overturning” the way most of us are used to when calculating the eigenvectors, as some media has mistakenly reported. It is true that the very beauty comes from simplicity, but nothing lasts when the world loses its rationality.