Singular Value Decomposition (SVD)

  • We saw in earlier lectures that a matrix can have complex eigenvalues and eigenvectors. Symmetric matrices have the nice property that all eigenvalues and eigenvectors are real. The singular value decomposition (SVD) generalizes the spectral decomposition to general rectangular matrices.

  • Statisticians call SVD the singly most valuable decomposition.

  • Let $\mathbf{A} \in \mathbb{R}^{m \times n}$ with $\text{rank}(\mathbf{A})=r$. We assume $m \ge n$. Instead of the eigen-equation, now we have \begin{eqnarray*} \mathbf{A} \mathbf{v}_i &=& \sigma_i \mathbf{u}_i, \quad i = 1,\ldots,r \\ \mathbf{A} \mathbf{v}_i &=& 0 \, \mathbf{u}_i, \quad i = r+1,\ldots,n, \end{eqnarray*} where the left singular vectors $\mathbf{u}_i \in \mathbb{R}^m$ are orthonormal, the right singular vectors $\mathbf{v}_i \in \mathbb{R}^n$ are orthonormal, and the singular values $$ \sigma_1 \ge \cdots \ge \sigma_r > \sigma_{r+1} = \cdots = \sigma_{n} = 0. $$

  • Collecting above equations into matrix multiplication format: $$ \mathbf{A} \begin{pmatrix} \mid & & \mid \\ \mathbf{v}_1 & \cdots & \mathbf{v}_n \\ \mid & & \mid \end{pmatrix} = \begin{pmatrix} \mid & & \mid \\ \mathbf{u}_1 & \cdots & \mathbf{u}_n \\ \mid & & \mid \end{pmatrix} \begin{pmatrix} \sigma_1 & & & \\ & \ddots & & \\ & & \sigma_r & \\ & & & \mathbf{O}_{n-r} \end{pmatrix}, $$ or $$ \mathbf{A} \mathbf{V} = \mathbf{U} \boldsymbol{\Sigma}. $$ Multiplying both sides by $\mathbf{V}'$, we have the famous singular value decomposition (SVD) $$ \mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}' = \sigma_1 \mathbf{u}_1 \mathbf{v}_1' + \cdots + \sigma_r \mathbf{u}_r \mathbf{v}_r', $$ where $\mathbf{U} \in \mathbb{R}^{m \times n}$ has orthogonal columns, $\boldsymbol{\Sigma} = \text{diag}(\sigma_1, \ldots, \sigma_r, \mathbf{0}_{n-r})$, and $\mathbf{V} \in \mathbb{R}^{n \times n}$ is orthogonal.

In [1]:
using LinearAlgebra, Random
In [2]:
A = [3.0 0.0; 4.0 5.0]
Out[2]:
2×2 Array{Float64,2}:
 3.0  0.0
 4.0  5.0
In [3]:
# eigenvalues and eigenvectors
eigen(A)
Out[3]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
eigenvalues:
2-element Array{Float64,1}:
 3.0
 5.0
eigenvectors:
2×2 Array{Float64,2}:
  0.447214  0.0
 -0.894427  1.0
In [4]:
# singular values and singular vectors
# different from eigenvalues and eigenvectors
dump(svd(A))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((2, 2)) [-0.316227766016838 -0.9486832980505135; -0.9486832980505135 0.3162277660168379]
  S: Array{Float64}((2,)) [6.70820393249937, 2.2360679774997894]
  Vt: Array{Float64}((2, 2)) [-0.7071067811865475 -0.7071067811865475; -0.7071067811865475 0.7071067811865475]
  • Reduced form of the SVD. If we just keep the first $r$ singular triplets ($\sigma_i, \mathbf{u}_i, \mathbf{v}_i$), then $$ \mathbf{A} = \mathbf{U}_r \boldsymbol{\Sigma}_r \mathbf{V}_r', $$ where $\mathbf{U}_r \in \mathbb{R}^{m \times r}$, $\boldsymbol{\Sigma}_r = \text{diag}(\sigma_1, \ldots, \sigma_r)$, and $\mathbf{V}_r \in \mathbb{R}^{n \times r}$.

  • Full SVD. Opposite to the reduced form of SVD, we can also augment the $\mathbf{U}$ matrix to be a square orthogonal matrix to obtain the full SVD $$ \mathbf{A} = \begin{pmatrix} \mid & & \mid \\ \mathbf{u}_1 & \cdots & \mathbf{u}_m \\ \mid & & \mid \end{pmatrix} \begin{pmatrix} \sigma_1 & & & \\ & \ddots & & \\ & & \sigma_r & \\ & & & \mathbf{O}_{n-r} \\ \\ \\ & & \mathbf{O}_{m-n, n} \end{pmatrix} \begin{pmatrix} - & \mathbf{v}_1' & - \\ & \vdots & \\ - & \mathbf{v}_n' & - \end{pmatrix}. $$

In [5]:
m, n, r = 5, 3, 2
# a rank r matrix by rank factorization
A = randn(m, r) * randn(r, n)
Out[5]:
5×3 Array{Float64,2}:
 -1.16112     0.324579   -0.10827 
  2.26232    -0.62308     0.150383
 -0.0428611   0.0698842  -0.38009 
 -3.3883      0.90145    -0.019036
 -4.08441     0.961217    0.791759
In [6]:
# svd: U is mxn, Σ is nxn, V is nxn
# SVD
dump(svd(A))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((5, 3)) [-0.1960693270966463 -0.2575624775376599 -0.3156365104437382; 0.3823966691752967 0.4234764299099389 -0.8061377060537435; … ; -0.5740058567059791 -0.36750786101335015 -0.4872799567574954; -0.6970099974875842 0.6109167343256656 0.048536501409241595]
  S: Array{Float64}((3,)) [6.087804441897142, 0.7815770948252015, 1.3099000213337518e-16]
  Vt: Array{Float64}((3, 3)) [0.9666467268360039 -0.24469601331408258 -0.07561723722428665; 0.03627692476763585 -0.16145633852616323 0.9862128753363182; -0.253531241125513 -0.9560626087332981 -0.14719442229730606]
In [7]:
dump(svd(A, full=true))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((5, 5)) [-0.1960693270966463 -0.2575624775376599 … -0.6762361543969052 -0.5816327414253052; 0.3823966691752967 0.4234764299099389 … -0.02850319328569248 0.15417544410604989; … ; -0.5740058567059791 -0.36750786101335015 … 0.545731141603897 -0.013820759268128924; -0.6970099974875842 0.6109167343256656 … -0.27193998717631196 0.2542649376707681]
  S: Array{Float64}((3,)) [6.087804441897142, 0.7815770948252015, 1.3099000213337518e-16]
  Vt: Array{Float64}((3, 3)) [0.9666467268360039 -0.24469601331408258 -0.07561723722428665; 0.03627692476763585 -0.16145633852616323 0.9862128753363182; -0.253531241125513 -0.9560626087332981 -0.14719442229730606]
In [8]:
Asvd = svd(A, full=true)
Asvd.V'Asvd.V
Out[8]:
3×3 Array{Float64,2}:
 1.0          1.38778e-17  3.29597e-17
 1.38778e-17  1.0          5.55112e-17
 3.29597e-17  5.55112e-17  1.0        

SVD tells us everything about a matrix

  • SVD and four fundamental subspaces. Given the full SVD \begin{eqnarray*} \mathbf{A} &=& \begin{pmatrix} \mid & & \mid & \mid & & \mid \\ \mathbf{u}_1 & \cdots & \mathbf{u}_r & \mathbf{u}_{r+1} & \cdots & \mathbf{u}_m \\ \mid & & \mid & \mid & & \mid \end{pmatrix} \begin{pmatrix} \sigma_1 & & & \\ & \ddots & & \\ & & \sigma_r & \\ & & & \mathbf{O}_{n-r} \\ \\ & & \mathbf{O}_{m-n, n} & \end{pmatrix} \begin{pmatrix} - & \mathbf{v}_1' & - \\ & \vdots & \\ - & \mathbf{v}_r' & - \\ - & \mathbf{v}_{r+1}' & - \\ & \vdots & \\ - & \mathbf{v}_n' & - \end{pmatrix} \\ &=& \begin{pmatrix} \mathbf{U}_r & \mathbf{U}_{m-r} \end{pmatrix} \begin{pmatrix} \boldsymbol{\Sigma}_r & & \\ & & \mathbf{O}_{n-r} \\ \\ & \mathbf{O}_{m-n,n} & \end{pmatrix} \begin{pmatrix} \mathbf{V}_r' \\ \mathbf{V}_{n-r}' \end{pmatrix}, \end{eqnarray*} Then

    1. $\mathcal{C}(\mathbf{A}) = \mathcal{C}(\mathbf{U}_r)$; $\quad \mathbf{U}_r \mathbf{U}_r'$ is the orthogonal projector into $\mathcal{C}(\mathbf{A})$.
    2. $\mathcal{N}(\mathbf{A}') = \mathcal{C}(\mathbf{U}_{m-r})$; $\quad \mathbf{U}_{m-r} \mathbf{U}_{m-r}'$ is the orthogonal projector into $\mathcal{N}(\mathbf{A}')$.
    3. $\mathcal{C}(\mathbf{A}') = \mathcal{C}(\mathbf{V}_{r})$; $\quad \mathbf{V}_{r} \mathbf{V}_{r}'$ is the orthogonal projector into $\mathcal{C}(\mathbf{A}')$.
    4. $\mathcal{N}(\mathbf{A}) = \mathcal{C}(\mathbf{V}_{n-r})$; $\quad \mathbf{V}_{n-r} \mathbf{V}_{n-r}'$ is the orthogonal projector into $\mathcal{N}(\mathbf{A})$.
  • $\text{rank}(\mathbf{A}) = r = \text{# positive singular values}$.

  • Frobenius norm $\|\mathbf{A}\|_{\text{F}}^2 = \sum_{i,j} a_{ij}^2 = \text{tr}(\mathbf{A}' \mathbf{A}) = \sum_i \sigma_i^2$.

  • Spectral norm or $\ell_2$ matrix norm : $\|\mathbf{A}\|_2 = \max \frac{\|\mathbf{A} \mathbf{x}\|}{\|\mathbf{x}\|} = \sigma_1$.

In [9]:
Random.seed!(216)
# generate a rank deficient matrix
m, n, r = 5, 3, 2
A = randn(m, r) * randn(r, n)
# full svd
Asvd = svd(A, full=true)
Out[9]:
SVD{Float64,Float64,Array{Float64,2}}([-0.1919068385243238 0.4892656676362979 … -0.48787314523533865 0.5001847905748491; -0.2735530364296413 0.488020176853115 … 0.7554211505497237 -0.14506082046150626; … ; -0.6936060591096974 -0.5724294014443431 … 0.1694628906740698 0.3982219823102384; 0.5648467629106955 -0.39924456286605636 … 0.23644394403673177 0.2068506642093175], [3.92086624799641, 1.5634315539256385, 7.808175744336089e-17], [0.7810581198939158 0.37019090542038724 0.5028985055573496; -0.6228044632019009 0.5203737519744842 0.584230912287077; -0.04541821180510714 -0.7695257317335348 0.6369986924919032])
In [10]:
dump(Asvd)
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((5, 5)) [-0.1919068385243238 0.4892656676362979 … -0.48787314523533865 0.5001847905748491; -0.2735530364296413 0.488020176853115 … 0.7554211505497237 -0.14506082046150626; … ; -0.6936060591096974 -0.5724294014443431 … 0.1694628906740698 0.3982219823102384; 0.5648467629106955 -0.39924456286605636 … 0.23644394403673177 0.2068506642093175]
  S: Array{Float64}((3,)) [3.92086624799641, 1.5634315539256385, 7.808175744336089e-17]
  Vt: Array{Float64}((3, 3)) [0.7810581198939158 0.37019090542038724 0.5028985055573496; -0.6228044632019009 0.5203737519744842 0.584230912287077; -0.04541821180510714 -0.7695257317335348 0.6369986924919032]
In [11]:
# Frobenius norm
norm(A)  norm(Asvd.S)
Out[11]:
true
In [12]:
# projector to C(A)
A * pinv(A'A) * A' # WARNING: this is very inefficient code; take Biostat 257
Out[12]:
5×5 Array{Float64,2}:
  0.276209    0.291268   -0.0350405  -0.146962   -0.303735
  0.291268    0.312995   -0.0105586  -0.0896191  -0.349355
 -0.0350405  -0.0105586   0.123583    0.313667   -0.09265 
 -0.146962   -0.0896191   0.313667    0.808765   -0.163242
 -0.303735   -0.349355   -0.09265    -0.163242    0.478448
In [13]:
# projector to C(A)
Asvd.U[:, 1:2] * Asvd.U[:, 1:2]'
Out[13]:
5×5 Array{Float64,2}:
  0.276209    0.291268   -0.0350405  -0.146962   -0.303735
  0.291268    0.312995   -0.0105586  -0.0896191  -0.349355
 -0.0350405  -0.0105586   0.123583    0.313667   -0.09265 
 -0.146962   -0.0896191   0.313667    0.808765   -0.163242
 -0.303735   -0.349355   -0.09265    -0.163242    0.478448
In [14]:
# they should be equal by the uniqueness of orthogonal projector
A * pinv(A'A) * A'  Asvd.U[:, 1:2] * Asvd.U[:, 1:2]'
Out[14]:
true

Proof of the SVD using eigen-decomposition

  • Relating SVD to eigen-decomposition. From SVD $\mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}'$, we have \begin{eqnarray*} \mathbf{A}' \mathbf{A} &=& (\mathbf{V} \boldsymbol{\Sigma} \mathbf{U}') (\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}') = \mathbf{V} \boldsymbol{\Sigma}^2 \mathbf{V}' \\ \mathbf{A} \mathbf{A}' &=& (\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}') (\mathbf{V} \boldsymbol{\Sigma} \mathbf{U}') = \mathbf{U} \boldsymbol{\Sigma}^2 \mathbf{U}'. \end{eqnarray*} This says

    • $\mathbf{u}_i$ are simply the eigenvectors of the symmetric matrix $\mathbf{A} \mathbf{A}'$
    • $\mathbf{v}_i$ are simply the eigenvectors of the symmetric matrix $\mathbf{A} \mathbf{A}'$
    • $\sigma_i$ are simply $\sqrt{\lambda_i}$.

      Proof of SVD: Start from positive eigenvalues $\lambda_i > 0$ and corresponding (orthonormal) eigenvectors $\mathbf{v}_i$ of $\mathbf{A} \mathbf{A}'$. Set $\sigma_i = \sqrt \lambda_i$ and $$ \mathbf{u}_i = \frac{\mathbf{A} \mathbf{v}_i}{\sigma_i}, \quad i = 1,\ldots,r. $$ Verify that $\mathbf{u}_i$ are orthonormal. Lastly, augment $\mathbf{u}_i$s by an orthogonal basis in $\mathcal{N}(\mathbf{A}')$ and augment $\mathbf{v}_i$s by an orthogonal basis in $\mathcal{N}(\mathbf{A})$.

In [15]:
A
Out[15]:
5×3 Array{Float64,2}:
 -1.0641     0.119504     0.0684963
 -1.31293   -1.57973e-5  -0.0936312
 -0.726329  -0.584099    -0.757408 
 -1.56673   -1.47246     -1.89051  
  2.11855    0.495045     0.749092 
In [16]:
dump(Asvd)
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((5, 5)) [-0.1919068385243238 0.4892656676362979 … -0.48787314523533865 0.5001847905748491; -0.2735530364296413 0.488020176853115 … 0.7554211505497237 -0.14506082046150626; … ; -0.6936060591096974 -0.5724294014443431 … 0.1694628906740698 0.3982219823102384; 0.5648467629106955 -0.39924456286605636 … 0.23644394403673177 0.2068506642093175]
  S: Array{Float64}((3,)) [3.92086624799641, 1.5634315539256385, 7.808175744336089e-17]
  Vt: Array{Float64}((3, 3)) [0.7810581198939158 0.37019090542038724 0.5028985055573496; -0.6228044632019009 0.5203737519744842 0.584230912287077; -0.04541821180510714 -0.7695257317335348 0.6369986924919032]
In [17]:
eigen(A'A)
Out[17]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
eigenvalues:
3-element Array{Float64,1}:
 -8.881784244014104e-16
  2.4443182238103383   
 15.37319213467742     
eigenvectors:
3×3 Array{Float64,2}:
 -0.0454182   0.622804  -0.781058
 -0.769526   -0.520374  -0.370191
  0.636999   -0.584231  -0.502899
In [18]:
eigen(A * A')
Out[18]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
eigenvalues:
5-element Array{Float64,1}:
 -3.108624468950438e-16  
 -4.3519893263681266e-306
  3.108624468950437e-16  
  2.4443182238103365     
 15.37319213467741       
eigenvectors:
5×5 Array{Float64,2}:
 -0.238916  0.420577   0.699875    0.489266  -0.191907
  0.146071  0.483747  -0.657006    0.48802   -0.273553
  0.885542  0.128291   0.27527    -0.188106  -0.296984
 -0.370672  0.226039  -0.0523861  -0.572429  -0.693606
  0.0       0.722186   0.0        -0.399245   0.564847
  • Another relation of SVD to eigen-decomposition: $$ \begin{pmatrix} \mathbf{O}_n & \mathbf{A}' \\ \mathbf{A} & \mathbf{O}_m \end{pmatrix} = \frac{1}{\sqrt 2} \begin{pmatrix} \mathbf{V} & \mathbf{V} \\ \mathbf{U} & - \mathbf{U} \end{pmatrix} \begin{pmatrix} \boldsymbol{\Sigma} & \mathbf{O}_n \\ \mathbf{O}_n & - \boldsymbol{\Sigma} \end{pmatrix} \frac{1}{\sqrt 2} \begin{pmatrix} \mathbf{V}' & \mathbf{U}' \\ \mathbf{V}' & - \mathbf{U}' \end{pmatrix}. $$
In [19]:
A
Out[19]:
5×3 Array{Float64,2}:
 -1.0641     0.119504     0.0684963
 -1.31293   -1.57973e-5  -0.0936312
 -0.726329  -0.584099    -0.757408 
 -1.56673   -1.47246     -1.89051  
  2.11855    0.495045     0.749092 
In [20]:
dump(svd(A))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((5, 3)) [-0.1919068385243238 0.4892656676362979 -0.48537185871842553; -0.2735530364296413 0.488020176853115 -0.3087090651625637; … ; -0.6936060591096974 -0.5724294014443431 -0.0627438970552384; 0.5648467629106955 -0.39924456286605636 -0.6502760779051611]
  S: Array{Float64}((3,)) [3.92086624799641, 1.5634315539256385, 7.808175744336089e-17]
  Vt: Array{Float64}((3, 3)) [0.7810581198939158 0.37019090542038724 0.5028985055573496; -0.6228044632019009 0.5203737519744842 0.584230912287077; -0.04541821180510714 -0.7695257317335348 0.6369986924919032]
In [21]:
B = [zeros(n, n) A';
    A zeros(m, m)]
Out[21]:
8×8 Array{Float64,2}:
  0.0        0.0          0.0        …  -0.726329  -1.56673  2.11855 
  0.0        0.0          0.0           -0.584099  -1.47246  0.495045
  0.0        0.0          0.0           -0.757408  -1.89051  0.749092
 -1.0641     0.119504     0.0684963      0.0        0.0      0.0     
 -1.31293   -1.57973e-5  -0.0936312      0.0        0.0      0.0     
 -0.726329  -0.584099    -0.757408   …   0.0        0.0      0.0     
 -1.56673   -1.47246     -1.89051        0.0        0.0      0.0     
  2.11855    0.495045     0.749092       0.0        0.0      0.0     
In [22]:
Beig = eigen(B)
Out[22]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
eigenvalues:
8-element Array{Float64,1}:
 -3.9208662479964014    
 -1.5634315539256338    
 -4.9281046358664486e-17
 -2.0583475421247505e-17
  1.7928006363002645e-16
  3.1086244689504383e-15
  1.5634315539256418    
  3.920866247996409     
eigenvectors:
8×8 Array{Float64,2}:
 -0.552291  -0.440389   0.0292924  …  -6.10623e-16  -0.440389  -0.552291
 -0.261764   0.36796    0.496304       2.15106e-16   0.36796   -0.261764
 -0.355603   0.413114  -0.410831       1.94289e-16   0.413114  -0.355603
 -0.135699  -0.345963  -0.450252       0.420577      0.345963   0.135699
 -0.193431  -0.345082   0.441479       0.483747      0.345082   0.193431
 -0.209999   0.133011  -0.412612   …   0.128291     -0.133011   0.209999
 -0.490454   0.404769   0.127129       0.226039     -0.404769   0.490454
  0.399407   0.282309   0.0            0.722186     -0.282309  -0.399407
In [23]:
# this should be V
Beig.vectors[1:n, 1:n] * sqrt(2)
Out[23]:
3×3 Array{Float64,2}:
 -0.781058  -0.622804   0.0414257
 -0.370191   0.520374   0.70188  
 -0.502899   0.584231  -0.581003 
In [24]:
# this should be U
Beig.vectors[n+1:end, 1:n] * sqrt(2)
Out[24]:
5×3 Array{Float64,2}:
 -0.191907  -0.489266  -0.636753
 -0.273553  -0.48802    0.624346
 -0.296984   0.188106  -0.583522
 -0.693606   0.572429   0.179788
  0.564847   0.399245   0.0     

Some exercises

  • Question: If $\mathbf{A} = \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}'$ is symmetric positive definite, what is its SVD?

    Answer: The SVD is exactly same as eigen-decomposition $\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}' = \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}'$.

In [25]:
# a pd matrix
A = [1 1; 1 3]
Out[25]:
2×2 Array{Int64,2}:
 1  1
 1  3
In [26]:
eigen(A)
Out[26]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
eigenvalues:
2-element Array{Float64,1}:
 0.5857864376269051
 3.414213562373095 
eigenvectors:
2×2 Array{Float64,2}:
 -0.92388   0.382683
  0.382683  0.92388 
In [27]:
dump(svd(A))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((2, 2)) [-0.3826834323650894 -0.9238795325112865; -0.9238795325112867 0.3826834323650897]
  S: Array{Float64}((2,)) [3.4142135623730945, 0.5857864376269051]
  Vt: Array{Float64}((2, 2)) [-0.38268343236508984 -0.9238795325112866; -0.9238795325112866 0.38268343236508984]
  • Question: If $\mathbf{A} = \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}'$ is symmetric and indefinite (has negative eigenvalues), then what is its SVD?

    Answer: Its SVD is $$ \mathbf{A} = \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}' = \mathbf{Q} \begin{pmatrix} |\lambda_1| & & \\ & \ddots & \\ & & |\lambda_n| \end{pmatrix} \begin{pmatrix} \text{sgn}(\lambda_1) & & \\ & \ddots & \\ & & \text{sgn}(\lambda_n) \end{pmatrix} \mathbf{Q}'. $$

In [28]:
# an indefinite matrix
A = [1 2; 2 3]
Out[28]:
2×2 Array{Int64,2}:
 1  2
 2  3
In [29]:
eigen(A)
Out[29]:
Eigen{Float64,Float64,Array{Float64,2},Array{Float64,1}}
eigenvalues:
2-element Array{Float64,1}:
 -0.2360679774997897
  4.23606797749979  
eigenvectors:
2×2 Array{Float64,2}:
 -0.850651  0.525731
  0.525731  0.850651
In [30]:
dump(svd(A))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((2, 2)) [-0.5257311121191336 -0.8506508083520399; -0.8506508083520399 0.5257311121191336]
  S: Array{Float64}((2,)) [4.236067977499789, 0.23606797749978947]
  Vt: Array{Float64}((2, 2)) [-0.5257311121191337 -0.8506508083520399; 0.8506508083520399 -0.5257311121191337]
  • Question: Why the singular values of an orthogonal matrix $\mathbf{Q}$ are all 1?

    Answer: $\mathbf{Q}' \mathbf{Q} = \mathbf{Q} \mathbf{Q}' = \mathbf{I}_n$.

In [31]:
# Hadamard matrix of order 2
H = (1 / sqrt(2)) * [1 1; 1 -1]
Out[31]:
2×2 Array{Float64,2}:
 0.707107   0.707107
 0.707107  -0.707107
In [32]:
# check orthogonality
H'H
Out[32]:
2×2 Array{Float64,2}:
 1.0  0.0
 0.0  1.0
In [33]:
# singular values are all 1
dump(svd(H))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((2, 2)) [-0.7071067811865475 -0.7071067811865475; -0.7071067811865475 0.7071067811865476]
  S: Array{Float64}((2,)) [1.0, 0.9999999999999999]
  Vt: Array{Float64}((2, 2)) [-1.0 -0.0; -0.0 -1.0]
  • Why are all eigenvalues of a square matrix less than or equal to $\sigma_1$?

    Answer: Orthogonal matrices preserve vector length $$ \|\mathbf{A} \mathbf{x}\| = \|\mathbf{U} \boldsymbol{\Sigma} \mathbf{V}' \mathbf{x}\| = \|\boldsymbol{\Sigma} \mathbf{V}' \mathbf{x}\| \le \sigma_1 \|\mathbf{V}' \mathbf{x}\| = \sigma_1 \|\mathbf{x}\|. $$ Now an eigenvector $\mathbf{x}$ satisifies $$ \|\mathbf{A} \mathbf{x}\| = |\lambda| \|\mathbf{x}\|. $$ Thus we have $|\lambda| \le \sigma_1$.

In [34]:
Random.seed!(216)
A = randn(5, 5)
Out[34]:
5×5 Array{Float64,2}:
 -1.28506    0.0582264   0.854177    0.848249   0.183508 
 -1.44549   -0.135717    0.576381   -0.358689  -1.46013  
 -0.244914  -0.8972     -0.0627216  -1.07614    0.0341577
 -0.326449  -2.23444     0.668146    0.336559  -1.38244  
  1.8623     0.915744   -0.0148676   0.321647   0.527643 
In [35]:
eigvals(A) .|> abs
Out[35]:
5-element Array{Float64,1}:
 1.6948209445252074
 2.1234712333046373
 2.1234712333046373
 0.6922214431747319
 1.368017351950627 
In [36]:
svdvals(A)
Out[36]:
5-element Array{Float64,1}:
 3.6804683700651917
 2.1069753146853443
 1.598133837981803 
 1.1531124089665103
 0.5064142417525395
  • Question: If $\mathbf{A} = \mathbf{x} \mathbf{y}'$, find the singular values and singular vectors. Check $|\lambda| \le \sigma_1$.

    Answer: TODO.

Geometry of SVD

Visualize how a $2 \times 2$ matrix acts on a vector via SVD: $$ \mathbf{A} \mathbf{x} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}' \mathbf{x}. $$

TODO

SVD and generalized inverse

Using the SVD, the Moore-Penrose inverse is given by $$ \mathbf{A}^+ = \mathbf{V} \boldsymbol{\Sigma}^+ \mathbf{U}^T = \mathbf{V}_r \boldsymbol{\Sigma}_r^{-1} \mathbf{U}_r' = \sum_{i=1}^r \sigma_i^{-1} \mathbf{v}_i \mathbf{u}_i', $$ where $\boldsymbol{\Sigma}^+ = \text{diag}(\sigma_1^{-1}, \ldots, \sigma_r^{-1}, 0, \ldots, 0)$, $r= \text{rank}(\mathbf{A})$. This is how the pinv function is implemented in Julia, Matlab, Python, or R.

In [37]:
Random.seed!(216)
m, n, r = 5, 3, 2
A = randn(m, r) * randn(r, n)
Out[37]:
5×3 Array{Float64,2}:
 -1.0641     0.119504     0.0684963
 -1.31293   -1.57973e-5  -0.0936312
 -0.726329  -0.584099    -0.757408 
 -1.56673   -1.47246     -1.89051  
  2.11855    0.495045     0.749092 
In [38]:
Asvd = svd(A)
Asvd.Vt[1:2, :]' * Diagonal(inv.(Asvd.S[1:2])) * Asvd.U[:, 1:2]'
Out[38]:
3×5 Array{Float64,2}:
 -0.233131  -0.2489     0.0157725   0.0898613   0.271563 
  0.144729   0.136605  -0.0906491  -0.256015   -0.0795545
  0.158217   0.147279  -0.108384   -0.302872   -0.0767433
In [39]:
pinv(A)
Out[39]:
3×5 Array{Float64,2}:
 -0.233131  -0.2489     0.0157725   0.0898613   0.271563 
  0.144729   0.136605  -0.0906491  -0.256015   -0.0795545
  0.158217   0.147279  -0.108384   -0.302872   -0.0767433

Eckart-Young theorem: best approximation to a matrix

In [40]:
using Images, LinearAlgebra, Interact

img = load("golub-561-838-rank-120.jpg")
channels = float(channelview(img))
size(channels)

Unable to load WebIO. Please make sure WebIO works for your Jupyter client. For troubleshooting, please see the WebIO/IJulia documentation.

Out[40]:
(3, 838, 561)
In [41]:
channels[3, :, :]
Out[41]:
838×561 Array{Float32,2}:
 0.486275  0.482353  0.482353  0.439216  …  0.478431  0.462745  0.478431
 0.482353  0.470588  0.447059  0.435294     0.466667  0.478431  0.478431
 0.482353  0.447059  0.423529  0.54902      0.435294  0.427451  0.478431
 0.443137  0.435294  0.54902   0.984314     0.537255  0.462745  0.478431
 0.490196  0.439216  0.482353  0.976471     0.486275  0.415686  0.478431
 0.454902  0.431373  0.47451   0.976471  …  0.490196  0.458824  0.478431
 0.498039  0.458824  0.47451   0.956863     0.486275  0.45098   0.478431
 0.498039  0.447059  0.47451   0.976471     0.517647  0.462745  0.478431
 0.470588  0.431373  0.490196  0.968628     0.498039  0.45098   0.478431
 0.466667  0.423529  0.490196  0.980392     0.494118  0.458824  0.478431
 0.462745  0.423529  0.490196  0.976471  …  0.501961  0.462745  0.478431
 0.462745  0.427451  0.490196  0.976471     0.501961  0.462745  0.478431
 0.470588  0.427451  0.490196  0.964706     0.501961  0.454902  0.478431
 ⋮                                       ⋱                      ⋮       
 0.478431  0.431373  0.396078  0.431373     0.396078  0.419608  0.478431
 0.486275  0.435294  0.392157  0.423529     0.403922  0.427451  0.478431
 0.490196  0.435294  0.388235  0.415686     0.392157  0.419608  0.478431
 0.494118  0.435294  0.388235  0.415686     0.407843  0.419608  0.478431
 0.486275  0.435294  0.392157  0.423529  …  0.411765  0.423529  0.478431
 0.478431  0.431373  0.396078  0.431373     0.4       0.415686  0.478431
 0.478431  0.411765  0.388235  0.407843     0.407843  0.403922  0.494118
 0.470588  0.427451  0.419608  0.45098      0.431373  0.423529  0.482353
 0.470588  0.458824  0.447059  0.458824     0.470588  0.466667  0.458824
 0.423529  0.423529  0.415686  0.419608  …  0.431373  0.439216  0.431373
 0.388235  0.380392  0.372549  0.364706     0.380392  0.392157  0.372549
 1.0       1.0       1.0       1.0          1.0       1.0       1.0     
In [42]:
function rank_approx(F::SVD, k)
    U, S, Vt = F.U, F.S, F.Vt
    M = U[:, 1:k] * Diagonal(S[1:k]) * Vt[1:k, :]
    M .= clamp.(M, 0.0, 1.0)
end
svdfactors = (svd(channels[1, :, :]), svd(channels[2, :, :]), svd(channels[3, :, :]))

n = 100
@manipulate throttle = 0.1 for k1 in 1:n, k2 in 1:n, k3 in 1:n
    colorview(RGB,
              rank_approx(svdfactors[1], k1),
              rank_approx(svdfactors[2], k2),
              rank_approx(svdfactors[3], k3)
              )
end
Out[42]:
In [43]:
function rank_approx(F::SVD, k)
    U, S, Vt = F.U, F.S, F.Vt
    M = U[:, 1:k] * Diagonal(S[1:k]) * Vt[1:k, :]
    M .= clamp.(M, 0.0, 1.0)
end
svdfactors = (svd(channels[1, :, :]), svd(channels[2, :, :]), svd(channels[3, :, :]))
Out[43]:
(SVD{Float32,Float32,Array{Float32,2}}(Float32[-0.003201127 -0.0029711816 … 0.063501656 -0.018671475; -0.0041820407 -0.003956642 … 0.03171113 0.037942678; … ; -0.0063472968 -0.0061110826 … -0.024975946 -0.0205824; -0.04618214 -0.038482077 … -0.004199179 -0.0015758779], Float32[468.93704, 54.575035, 39.72175, 28.824512, 22.577719, 19.781693, 18.400646, 16.750614, 15.209273, 13.959431  …  0.018809672, 0.018746085, 0.018220583, 0.018147789, 0.017952539, 0.017445087, 0.01719639, 0.017101845, 0.01633041, 0.015538299], Float32[-0.003792159 -0.0050134063 … -0.005576717 -0.0040300395; -0.0009210165 -0.0010787451 … -0.001424789 -0.0004640298; … ; 0.039903246 -0.0046934625 … 0.069537506 -0.0671383; -0.04118015 -0.01409792 … -0.00090598874 0.029236076]), SVD{Float32,Float32,Array{Float32,2}}(Float32[-0.012597024 -0.016373156 … -0.009303845 -0.025560979; -0.012313485 -0.016414464 … -0.06487671 -0.0071538123; … ; -0.011940538 -0.016432794 … 0.0043642614 -0.009932868; -0.055147693 -0.07242119 … 0.02291314 0.009163091], Float32[419.95956, 60.726654, 34.20169, 29.930443, 23.477402, 21.344364, 16.51298, 16.356625, 15.089839, 12.111712  …  0.017568486, 0.016782546, 0.016724119, 0.016491562, 0.016371625, 0.01577605, 0.015774291, 0.015144074, 0.0146682635, 0.013853766], Float32[-0.015238976 -0.014965594 … -0.01480304 -0.015215306; -0.0031973203 -0.002948952 … -0.0029819617 -0.0033105335; … ; -0.04289294 0.08758404 … -0.028907152 0.01769764; 0.0010402817 0.031613845 … -0.011965975 -0.035158202]), SVD{Float32,Float32,Array{Float32,2}}(Float32[-0.028401136 -0.044610694 … -0.0144824395 -0.0026077332; -0.02591753 -0.040858477 … 0.029586107 -0.030325025; … ; -0.022012524 -0.035037924 … -0.05976062 0.0013332515; -0.05971126 -0.094077684 … -0.005332835 -0.009971296], Float32[383.6886, 59.94334, 34.007774, 27.138838, 24.214834, 21.362972, 17.506678, 16.533255, 14.504245, 12.662757  …  0.020750163, 0.02036577, 0.019924434, 0.019518789, 0.019465964, 0.01912894, 0.018933607, 0.017828913, 0.017233869, 0.016600385], Float32[-0.03475884 -0.031883895 … -0.032425266 -0.034796096; -0.01866697 -0.015722007 … -0.013515507 -0.018629666; … ; 0.011948569 -0.047937628 … 0.02514852 0.044578794; -0.012338921 0.06579735 … 0.01310142 -0.027308453]))
In [44]:
k1, k2, k3 = 5, 5, 5
colorview(RGB,
    rank_approx(svdfactors[1], k1),
    rank_approx(svdfactors[2], k2),
    rank_approx(svdfactors[3], k3)
)
Out[44]:
  • SVD has some inherent optimality properties. It prescribes how to approximate a general matrix $\mathbf{A}$ by a low rank matrix.

  • Before talking about approximation, we need a metric that measures the quality of approximation. We discuss 3 matrix norms.

    • Spectral norm or $\ell_2$ norm: $$ \|\mathbf{A}\|_2 = \max \frac{\|\mathbf{A} \mathbf{x}\|}{\|\mathbf{x}\|} = \sigma_1. $$

    • Frobenius norm: $$ \|\mathbf{A}\|_{\text{F}} = \sqrt{\sum_{i,j} a_{ij}^2} = \text{tr}(\mathbf{A}' \mathbf{A}) = \sqrt{\sigma_1^2 + \cdots + \sigma_r^2}. $$

    • Nuclear norm: $$ \|\mathbf{A}\|_{\text{nuc}} = \sigma_1 + \cdots + \sigma_r. $$

  • These 3 norms already have different values for the identity matrix: \begin{eqnarray*} \|\mathbf{I}_n\|_2 &=& 1 \\ \|\mathbf{I}_n\|_{\text{F}} &=& \sqrt{n} \\ \|\mathbf{I}_n\|_{\text{nuc}} &=& n. \end{eqnarray*} Indeed for any orthogonal matrix $\mathbf{Q} \in \mathbb{R}^{n \times n}$, \begin{eqnarray*} \|\mathbf{Q}\|_2 &=& 1 \\ \|\mathbf{Q}\|_{\text{F}} &=& \sqrt{n} \\ \|\mathbf{Q}\|_{\text{nuc}} &=& n. \end{eqnarray*}

  • Invariance under orthogonal transform. For all three norms, $$ \|\mathbf{Q}_1 \mathbf{A} \mathbf{Q}_2'\| = \|\mathbf{A}\| \text{ for orthogonal } \mathbf{Q}_1 \text{ and } \mathbf{Q}_2. $$

  • Eckart-Young theorem. Assuming SVD of $\mathbf{A} \in \mathbb{R}^{m \times n}$ with rank $r$: $$ \mathbf{A} = \mathbf{U} \boldsymbol{\Sigma} \mathbf{V}' = \sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i'. $$ Then the matrix $$ \mathbf{A}_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i' $$ is the best rank-$k$ approximation to the original matrix $\mathbf{A}$ in the 3 matrix norms ($\ell_2$, Frobenius, and nuclear). More precisely, $$ \|\mathbf{A} - \mathbf{B}\| $$ is minimized by $\mathbf{A}_k$ among all matrices $\mathbf{B}$ with rank $\le k$.

  • Proof of Eckart-Young theorem for the $\ell_2$ norm.

    First we note $$ \|\mathbf{A} - \mathbf{A}_k\|_2 = \left\|\sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i' - \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i' \right\|_2 = \left\|\sum_{i=k+1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i' \right\|_2 = \sigma_{k+1}. $$ We want to show that $$ \|\mathbf{A} - \mathbf{B}\|_2 = \max \frac{\|(\mathbf{A} - \mathbf{B}) \mathbf{x}\|}{\|\mathbf{x}\|} \ge \sigma_{k+1} $$ for any $\mathbf{B}$ with $\text{rank}(\mathbf{B}) \le k$. We show this by choosing a particular $\mathbf{x}$ such that $$ \frac{\|(\mathbf{A} - \mathbf{B}) \mathbf{x}\|}{\|\mathbf{x}\|} \ge \sigma_{k+1}. $$ Choose $\mathbf{x} \ne \mathbf{0}$ such that $$ \mathbf{B} \mathbf{x} = \mathbf{0} \text{ and } \mathbf{x} = \sum_{i=1}^{k+1} c_i \mathbf{v}_i. $$ There exists such $\mathbf{x}$ because (1) $\mathcal{N}(\mathbf{B})$ has dimension $\ge n-k$ because $\text{rank}(\mathbf{B}) \le k$ (rank-nullity theorem) and (2) $\mathbf{v}_1, \ldots, \mathbf{v}_{k+1}$ span a subspace of dimension $k+1$. Thus $\mathcal{N}(\mathbf{B})$ and $\text{span}\{\mathbf{v}_1, \ldots, \mathbf{v}_{k+1}\}$ has a non-trivial intersection. For this $\mathbf{x}$, we have \begin{eqnarray*} & & \|(\mathbf{A} - \mathbf{B}) \mathbf{x}\|^2 \\ &=& \|\mathbf{A} \mathbf{x}\|^2 \\ &=& \left\|\left(\sum_{i=1}^r \sigma_i \mathbf{u}_i \mathbf{v}_i'\right)\left(\sum_{i=1}^{k+1} c_i \mathbf{v}_i\right)\right\|^2 \\ &=& \left\| \sum_{i=1}^{k+1} c_i \sigma_i \mathbf{u}_i \right\|^2 \\ &=& \sum_{i=1}^{k+1} c_i^2 \sigma_i^2 \\ &\ge& \left(\sum_{i=1}^{k+1} c_i^2\right) \sigma_{k+1}^2 \\ &=& \|\mathbf{x}\|^2 \sigma_{k+1}^2. \end{eqnarray*} The proof is finished.

  • (Nathan Srebro's) Proof of Eckart-Young theorem for the Frobenius norm.

    Let $\mathbf{B}$ be a matrix of rank $k$. By the rank factorization, $\mathbf{B} = \mathbf{C} \mathbf{R}$, where $\mathbf{C} \in \mathbb{R}^{m \times k}$ and $\mathbf{R} \in \mathbb{R}^{k \times n}$. Using SVD of $\mathbf{B}$, we will assume that $\mathbf{C}$ has orthogonal columns (so $\mathbf{C}' \mathbf{C} = \mathbf{D}$) and $\mathbf{R}$ has orthonormal rows (so $\mathbf{R} \mathbf{R}' = \mathbf{I}_k$). We want to show $\mathbf{C} = \mathbf{U}_k \boldsymbol{\Sigma}_k$ and $\mathbf{R} = \mathbf{V}_k'$, where $(\boldsymbol{\Sigma}_k, \mathbf{U}_k, \mathbf{V}_k)$ are the top $k$ singular values/vectors of $\mathbf{A}$. In order for $\mathbf{B} = \mathbf{C} \mathbf{R}$ to minimize $$ f(\mathbf{C}, \mathbf{R}) = \|\mathbf{A} - \mathbf{C} \mathbf{R}\|_{\text{F}}^2, $$ the gradient (partial derivatives) must vanish \begin{eqnarray*} \frac{\partial f(\mathbf{C}, \mathbf{R})}{\partial \mathbf{C}} &=& - 2(\mathbf{A} - \mathbf{C} \mathbf{R}) \mathbf{R}' = \mathbf{O}_{m \times k} \\ \frac{\partial f(\mathbf{C}, \mathbf{R})}{\partial \mathbf{R}} &=& - 2 \mathbf{C}' (\mathbf{A} - \mathbf{C} \mathbf{R}) = \mathbf{O}_{k \times n}. \end{eqnarray*} The first equation shows $$ \mathbf{A} \mathbf{R}' = \mathbf{C} \mathbf{R} \mathbf{R}' = \mathbf{C}. $$ Then by the second equation $$ \mathbf{A}' \mathbf{A} \mathbf{R}' = \mathbf{A}' \mathbf{C} = \mathbf{R}' \mathbf{C}' \mathbf{C} = \mathbf{R}' \mathbf{D}. $$ This says the columns of $\mathbf{R}'$ (rows of $\mathbf{R}$) must be eigenvectors of $\mathbf{A}' \mathbf{A}$, or equivalently, right singular vectors $\mathbf{v}_i$ of $\mathbf{A}$. Similarly the columns of $\mathbf{C}$ must be eigenvectors of $\mathbf{A} \mathbf{A}'$, or equivalently, left singular vectors $\mathbf{u}_i$ of $\mathbf{A}$: $$ \mathbf{A} \mathbf{A}' \mathbf{C} = \mathbf{A} \mathbf{R}' \mathbf{D} = \mathbf{C} \mathbf{D}. $$ Which $\mathbf{u}_i$ and $\mathbf{v}_i$ shall we take to minimize $f$? Apparently we should choose those with largest singular values to achieve the minimum value $\sigma_{k+1}^2 + \cdots + \sigma_r^2$.

Singular vectors and Rayleigh quotient

  • Goal: Maximize the Rayleigh quotient $$ \text{maximize} \quad f(\mathbf{x}) = \frac{\|\mathbf{A} \mathbf{x}\|^2}{\|\mathbf{x}\|^2} = \frac{\mathbf{x}' \mathbf{A}' \mathbf{A} \mathbf{x}}{\mathbf{x}' \mathbf{x}} = \frac{\mathbf{x}' \mathbf{S} \mathbf{x}}{\mathbf{x}' \mathbf{x}}. $$

  • Let's calculate the partial derivatives of the objective function $f(\mathbf{x})$ $$ \frac{\partial f(\mathbf{x})}{\partial x_i} = \frac{\left(2\sum_j s_{ij} x_j\right) (\mathbf{x}' \mathbf{x}) - (\mathbf{x}' \mathbf{S} \mathbf{x}) (2x_i)}{(\mathbf{x}' \mathbf{x})^2} = 2 (\mathbf{x}' \mathbf{x})^{-1} \left(\sum_j s_{ij} x_j - f(\mathbf{x}) x_i \right). $$ Collecting partial derivatives into the gradient vector and setting it to zero $$ \nabla f(\mathbf{x}) = 2 (\mathbf{x}' \mathbf{x})^{-1} \left[ \mathbf{S} \mathbf{x} - f(\mathbf{x}) \cdot \mathbf{x} \right] = \mathbf{0} $$ yields $$ \mathbf{S} \mathbf{x} = f(\mathbf{x}) \cdot \mathbf{x}. $$ Thus the optimal $\mathbf{x}$ must be an eigenvector of $\mathbf{S} = \mathbf{A}' \mathbf{A}$ with corresponding eigenvalue $f(\mathbf{x})$. Which one shall we choose? Apparently the top right singular vector $\mathbf{x}^\star = \mathbf{v}_1$ gives us the maximal value which is equal to $\lambda_1 = \sigma_1^2$.

  • Above we have shown $$ \max_{\mathbf{x} \ne \mathbf{0}} \frac{\|\mathbf{A} \mathbf{x}\|}{\|\mathbf{x}\|} = \sigma_1, $$ which is the spectral norm (or $\ell_2$ norm) of a matrix $\mathbf{A}$.

In [45]:
Random.seed!(216)
m, n, r = 5, 3, 2
A = randn(m, r) * randn(r, n)
Out[45]:
5×3 Array{Float64,2}:
 -1.0641     0.119504     0.0684963
 -1.31293   -1.57973e-5  -0.0936312
 -0.726329  -0.584099    -0.757408 
 -1.56673   -1.47246     -1.89051  
  2.11855    0.495045     0.749092 
In [46]:
x = randn(n)
norm(A * x) / norm(x)
Out[46]:
1.6977178927883125
In [47]:
Asvd = svd(A)
dump(Asvd)
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((5, 3)) [-0.1919068385243238 0.4892656676362979 -0.48537185871842553; -0.2735530364296413 0.488020176853115 -0.3087090651625637; … ; -0.6936060591096974 -0.5724294014443431 -0.0627438970552384; 0.5648467629106955 -0.39924456286605636 -0.6502760779051611]
  S: Array{Float64}((3,)) [3.92086624799641, 1.5634315539256385, 7.808175744336089e-17]
  Vt: Array{Float64}((3, 3)) [0.7810581198939158 0.37019090542038724 0.5028985055573496; -0.6228044632019009 0.5203737519744842 0.584230912287077; -0.04541821180510714 -0.7695257317335348 0.6369986924919032]
In [48]:
x = Asvd.Vt[1, :]
norm(A * x) / norm(x)
Out[48]:
3.9208662479964085
  • Similarly, the second right singular vector maximizes the Rayleigh quotient subject to an orthogonality constraint \begin{eqnarray*} \text{maximize} &\quad& f(\mathbf{x}) = \frac{\|\mathbf{A} \mathbf{x}\|^2}{\|\mathbf{x}\|^2} = \frac{\mathbf{x}' \mathbf{A}' \mathbf{A} \mathbf{x}}{\mathbf{x}' \mathbf{x}} = \frac{\mathbf{x}' \mathbf{S} \mathbf{x}}{\mathbf{x}' \mathbf{x}} \\ \text{subject to} &\quad& \mathbf{x} \perp \mathbf{v}_1. \end{eqnarray*}

    The proof uses the method of Lagrange multipliers.

  • Submatrices have smaller singular values. Let $\mathbf{B}$ be a submatrix of $\mathbf{A} \in \mathbb{R}^{m \times n}$. Then $$ \|\mathbf{B}\| \le \|\mathbf{A}\| $$ or equivalently $$ \sigma_1 (\mathbf{B}) \le \sigma_1 (\mathbf{A}). $$

    Proof: Let $\tilde{\mathbf{y}} \in \mathbb{R}^n$ hold corresponding entries in $\mathbf{y}$ and be zero elsewhere. Then $$ \sigma_1 (\mathbf{B}) = \max \frac{\|\mathbf{B} \mathbf{y}\|}{\|\mathbf{y}\|} = \max \frac{\|\mathbf{A} \tilde{\mathbf{y}}\|}{\|\tilde{\mathbf{y}}\|} \le \max \frac{\|\mathbf{A} \mathbf{x}\|}{\|\mathbf{x}\|} = \sigma_1 (\mathbf{A}). $$

In [49]:
dump(svd(A[1:3, 1:2]))
SVD{Float64,Float64,Array{Float64,2}}
  U: Array{Float64}((3, 2)) [-0.5670728681476196 -0.38825236705036237; -0.7073948338005123 -0.22220529766185115; -0.4219252437614734 0.8943628487201375]
  S: Array{Float64}((2,)) [1.8473013324735643, 0.5714710759462295]
  Vt: Array{Float64}((2, 2)) [0.9953106418210693 0.09673017253024528; 0.09673017253024528 -0.9953106418210693]

Principal component analysis (PCA)

  • PCA is a dimension reduction technique that finds the most informative linear combinations of the $p$ random variables $\mathbf{X} \in \mathbb{R}^p$. Mathematically it finds the linear combinations of the $p$ variables that have the largest variances. If we know the population covariance of the $p$ variables is $\text{Cov}(\mathbf{X}) = \boldsymbol{\Sigma} \in \mathbb{R}^{p \times p}$, then $$ \text{Var}(\mathbf{a}' \mathbf{X}) = \mathbf{a}' \text{Cov}(\mathbf{X}) \mathbf{a} = \mathbf{a}' \boldsymbol{\Sigma} \mathbf{a}. $$ Apparently the larger magnitude of $\mathbf{a}$, the large variance $\text{Var}(\mathbf{a}' \mathbf{X})$. It makes sense to maximize the normalized version $$ \text{maximize} \quad \frac{\mathbf{a}' \boldsymbol{\Sigma} \mathbf{a}}{\mathbf{a}' \mathbf{a}}. $$

  • Given a data matrix $\mathbf{X} \in \mathbb{R}^{n \times p}$, where there are $n$ observations of $p$ variables. We would substitute in the sample covariance matrix $$ \widehat{\boldsymbol{\Sigma}} = \frac{\tilde{\mathbf{X}}'\tilde{\mathbf{X}}}{n-1}, $$ where $\tilde{\mathbf{X}}$ is the column-centered data, and maximize the Rayleigh quotient $$ \text{maximize} \quad \frac{\mathbf{a}' \widehat{\boldsymbol{\Sigma}} \mathbf{a}}{\mathbf{a}' \mathbf{a}}. $$ From earlier discussion we know the optimal $\mathbf{a}^\star$ maximizing this Rayleigh quotient is the top eigenvector of $\widehat{\boldsymbol{\Sigma}}$, or equivalently, the top right singular vector $\mathbf{v}_1$ of $\tilde{\mathbf{X}}$, achieving the optimal value $\lambda_1$.

    Similarly, right singular vectors $\mathbf{v}_k$ maximizes $$ \frac{\mathbf{a}' \widehat{\boldsymbol{\Sigma}} \mathbf{a}}{\mathbf{a}' \mathbf{a}} $$ subject to the constraint $\mathbf{a} \perp \mathbf{v}_i$ for $i =1,\ldots,k-1$.

  • Coumns of $$ \mathbf{V}_r = \begin{pmatrix} \mid & & \mid \\ \mathbf{v}_1 & \cdots & \mathbf{v}_r \\ \mid & & \mid \end{pmatrix} \in \mathbb{R}^{p \times r} $$ are called the principal components.

  • Columns of $\mathbf{X} \mathbf{V}_r \in \mathbb{R}^{n \times r}$ are called the principal scores.

PCA example: Fisher's Iris data

In [50]:
using LinearAlgebra, RDatasets, Plots, StatsBase
plotly() # using plotly for 3D-interacive graphing

# load iris dataset
iris = dataset("datasets", "iris")

# retrieve features and labels
X_labels = convert(Vector, iris[!, 5])
X        = convert(Matrix, iris[!, 1:4])
Out[50]:
150×4 Array{Float64,2}:
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2
 4.6  3.1  1.5  0.2
 5.0  3.6  1.4  0.2
 5.4  3.9  1.7  0.4
 4.6  3.4  1.4  0.3
 5.0  3.4  1.5  0.2
 4.4  2.9  1.4  0.2
 4.9  3.1  1.5  0.1
 5.4  3.7  1.5  0.2
 4.8  3.4  1.6  0.2
 4.8  3.0  1.4  0.1
 ⋮                 
 6.0  3.0  4.8  1.8
 6.9  3.1  5.4  2.1
 6.7  3.1  5.6  2.4
 6.9  3.1  5.1  2.3
 5.8  2.7  5.1  1.9
 6.8  3.2  5.9  2.3
 6.7  3.3  5.7  2.5
 6.7  3.0  5.2  2.3
 6.3  2.5  5.0  1.9
 6.5  3.0  5.2  2.0
 6.2  3.4  5.4  2.3
 5.9  3.0  5.1  1.8
In [51]:
X_labels
Out[51]:
150-element Array{CategoricalString{UInt8},1}:
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 "setosa"   
 ⋮          
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
 "virginica"
In [52]:
# center but not scale
 = zscore(X, mean(X, dims=1), ones(1, size(X, 2)))
Out[52]:
150×4 Array{Float64,2}:
 -0.743333    0.442667   -2.358  -0.999333
 -0.943333   -0.0573333  -2.358  -0.999333
 -1.14333     0.142667   -2.458  -0.999333
 -1.24333     0.0426667  -2.258  -0.999333
 -0.843333    0.542667   -2.358  -0.999333
 -0.443333    0.842667   -2.058  -0.799333
 -1.24333     0.342667   -2.358  -0.899333
 -0.843333    0.342667   -2.258  -0.999333
 -1.44333    -0.157333   -2.358  -0.999333
 -0.943333    0.0426667  -2.258  -1.09933 
 -0.443333    0.642667   -2.258  -0.999333
 -1.04333     0.342667   -2.158  -0.999333
 -1.04333    -0.0573333  -2.358  -1.09933 
  ⋮                                       
  0.156667   -0.0573333   1.042   0.600667
  1.05667     0.0426667   1.642   0.900667
  0.856667    0.0426667   1.842   1.20067 
  1.05667     0.0426667   1.342   1.10067 
 -0.0433333  -0.357333    1.342   0.700667
  0.956667    0.142667    2.142   1.10067 
  0.856667    0.242667    1.942   1.30067 
  0.856667   -0.0573333   1.442   1.10067 
  0.456667   -0.557333    1.242   0.700667
  0.656667   -0.0573333   1.442   0.800667
  0.356667    0.342667    1.642   1.10067 
  0.0566667  -0.0573333   1.342   0.600667
In [53]:
# Xsvd.V contains the principal components
Xsvd = svd()
Xsvd.V
Out[53]:
4×4 Adjoint{Float64,Array{Float64,2}}:
  0.361387   -0.656589   0.58203     0.315487
 -0.0845225  -0.730161  -0.597911   -0.319723
  0.856671    0.173373  -0.0762361  -0.479839
  0.358289    0.075481  -0.545831    0.753657
In [54]:
# compute the top 3 principal scores
Y =  * Xsvd.V[:, 1:3]
Out[54]:
150×3 Array{Float64,2}:
 -2.68413  -0.319397    0.0279148
 -2.71414   0.177001    0.210464 
 -2.88899   0.144949   -0.0179003
 -2.74534   0.318299   -0.0315594
 -2.72872  -0.326755   -0.0900792
 -2.28086  -0.74133    -0.168678 
 -2.82054   0.0894614  -0.257892 
 -2.62614  -0.163385    0.0218793
 -2.88638   0.578312   -0.0207596
 -2.67276   0.113774    0.197633 
 -2.50695  -0.645069    0.075318 
 -2.61276  -0.0147299  -0.10215  
 -2.78611   0.235112    0.206844 
  ⋮                              
  1.16933   0.16499    -0.281836 
  2.10761  -0.372288   -0.0272911
  2.31415  -0.183651   -0.322694 
  1.92227  -0.409203   -0.113587 
  1.41524   0.574916   -0.296323 
  2.56301  -0.277863   -0.29257  
  2.41875  -0.304798   -0.504483 
  1.94411  -0.187532   -0.177825 
  1.52717   0.375317    0.121898 
  1.76435  -0.0788589  -0.130482 
  1.90094  -0.116628   -0.723252 
  1.39019   0.282661   -0.36291  
In [55]:
# group results by testing set labels for color coding
setosa     = Y[X_labels .== "setosa"    , :]
versicolor = Y[X_labels .== "versicolor", :]
virginica  = Y[X_labels .== "virginica" , :];
In [56]:
# # visualize first 2 principal components in 2D interacive plot
# p  = scatter(setosa[:, 1], setosa[:, 2], marker=:circle, linewidth=0)
# scatter!(versicolor[:, 1], versicolor[:, 2], marker=:circle, linewidth=0)
# scatter!(virginica[:, 1], virginica[:, 2], marker=:circle, linewidth=0)
# plot!(p, xlabel="PC1", ylabel="PC2")
In [57]:
# # visualize first 3 principal components in 3D interacive plot
# p  = scatter(setosa[:, 1], setosa[:, 2], setosa[:, 3], marker=:circle, linewidth=0)
# scatter!(versicolor[:, 1], versicolor[:, 2], versicolor[:, 3], marker=:circle, linewidth=0)
# scatter!(virginica[:, 1], virginica[:, 2], virginica[:, 3], marker=:circle, linewidth=0)
# plot!(p, xlabel="PC1", ylabel="PC2", zlabel="PC3")

PCA example: genomics

Above picture is from the article Genes mirror geography within Europe by Novembre et al (2008) published in Nature.

Matrix completion

Missing data is ubiquitous. Matrix completion (a hot topic in machine learning) aims to recover the missing values in a huge matrix.

Candes and Tao proposes a technique to complete a matrix $\mathbf{Y}$ with a large number of missing entries by an optimization problem \begin{eqnarray*} &\text{minimize}& \|\mathbf{X}\|_{\text{nuc}} \\ &\text{subject to }& x_{ij}=y_{ij} \text{ for all observed entries} y_{ij}. \end{eqnarray*} That is we seek the matrix with minimal nuclear norm that agrees with the observed entries.

See an example here.

Compressed sensing

See an example here.