Symmetric Positive Definite Matrices

  • In last lecture, we learnt every symmetric matrix $\mathbf{A} = \mathbf{A}'$ admits an eigen-decomposition $$ \mathbf{A} = \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}', $$ where $\mathbf{Q}$ is orthogonal and $\boldsymbol{\Lambda} = \text{diag}(\lambda_1, \ldots, \lambda_n)$ has eigenvalues on its diagonal.

  • A positive definite matrix is a symmetric matrix with all its eigenvalues positive. That is $\lambda_i > 0$ for all $i$. We use $\mathbf{A} \succ \mathbf{O}$ to indicate positive definiteness of $\mathbf{A}$.

    A positive semidefinite matrix is a symmetric matrix with all its eigenvalues nonnegative. That is $\lambda_i \ge 0$ for all $i$. We use $\mathbf{A} \succeq \mathbf{O}$ to indicate positive definiteness of $\mathbf{A}$.

  • Examples:

    1. $\begin{pmatrix} 2 & 0 \\ 0 & 6 \end{pmatrix}$ is positive definite. Its eigenvalues are 2 and 6.
    2. $\mathbf{Q} \begin{pmatrix} 2 & 0 \\ 0 & 6 \end{pmatrix} \mathbf{Q}'$ is positive definite if $\mathbf{Q}' = \mathbf{Q}^{-1}$. Its eigenvalues are 2 and 6.
    3. $\mathbf{C} \begin{pmatrix} 2 & 0 \\ 0 & 6 \end{pmatrix} \mathbf{C}'$ is positive definite if $\mathbf{C}$ is invertible. Easier to show using the energe-based definition (below).
    4. $\begin{pmatrix} a & b \\ b & c \end{pmatrix}$ is positive definite when $a > 0$ and $ac > b^2$. Show by energy-based definition or Schur complement.
    5. $\begin{pmatrix} 2 & 0 \\ 0 & 0 \end{pmatrix}$ is positive semidefinite. Its eigenvalues are 2 and 0.

Energy-based definition

  • A symmetric matrix $\mathbf{A}$ is positive definite if and only if the energy $\mathbf{x}' \mathbf{A} \mathbf{x} > 0$ for all $\mathbf{x} \ne \mathbf{0}$.

    A symmetric matrix $\mathbf{A}$ is positive semidefinite if and only if the energy $\mathbf{x}' \mathbf{A} \mathbf{x} \ge 0$ for all $\mathbf{x} \ne \mathbf{0}$.

    Proof of the "only if" part: $\mathbf{A} = \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}' = \sum_{i=1}^n \lambda_i \mathbf{q}_i \mathbf{q}_i'$, where $\lambda_i > 0$ and $\mathbf{q}_i$ are orthonormal to each other. Then $$ \mathbf{x}' \mathbf{A} \mathbf{x} = \mathbf{x}' \left( \sum_{i=1}^n \lambda_i \mathbf{q}_i \mathbf{q}_i' \right) \mathbf{x} = \sum_{i=1}^n \lambda_i (\mathbf{q}_i' \mathbf{x})^2 $$ Since $\mathbf{Q}$ is orthogonal, $\mathbf{Q}' \mathbf{x} \ne \mathbf{0}$ (otherwise it conflicts with linear indepence between $\mathbf{q}_i$s). Therefore $\mathbf{x}' \mathbf{A} \mathbf{x} = \sum_{i=1}^n \lambda_i (\mathbf{q}_i' \mathbf{x})^2 > 0$.

    Proof of the "if" part: Take $\mathbf{x} = \mathbf{q}_i$. Then $\mathbf{x}' \mathbf{A} \mathbf{x} = \lambda_i > 0$.

  • Show Examples 3 and 4 above.

  • If $\mathbf{A}_1$ and $\mathbf{A}_2$ are positive definite, then $\alpha_1 \mathbf{A}_1 + \alpha_2 \mathbf{A}_2$ is positive definite for any $\alpha_1, \alpha_2 > 0$.

    Proof is trivial using the energy-based definition.

Gram-matrix based definition

  • A square matrix $\mathbf{A}$ is positive semidefinite if and only if $\mathbf{A} = \mathbf{B}' \mathbf{B}$ for some matrix $\mathbf{B}$.

    A square matrix $\mathbf{A}$ is positive definite if and only if $\mathbf{A} = \mathbf{B}' \mathbf{B}$ for some matrix $\mathbf{B}$ with independent columns.

    Proof of the "if" part: use the energy-based definition.

    Proof of the "only if" part: take $\mathbf{B} = \boldsymbol{\Lambda}^{1/2} \mathbf{Q}'$.

Schur complement test for positive definiteness

  • Consider a matrix $\mathbf{X}$ partitioned as $$ \mathbf{X} = \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{B}' & \mathbf{C} \end{pmatrix}, $$ where the diagonal blocks $\mathbf{A}$ and $\mathbf{C}$ are symmetric. Then

    1. $\mathbf{X} \succ \mathbf{O}$ if and only if $\mathbf{A} \succ \mathbf{O}$ and $\mathbf{S} = \mathbf{C} - \mathbf{B}' \mathbf{A}^{-1} \mathbf{B} \succ \mathbf{O}$.
    2. If $\mathbf{A} \succ \mathbf{O}$, then $\mathbf{X} \succeq \mathbf{O}$ if $\mathbf{S} \succeq \mathbf{O}$.

      Note: the matrix $\mathbf{S} = \mathbf{C} - \mathbf{B}' \mathbf{A}^{-1} \mathbf{B}$ is called the Schur complement of the block $\mathbf{A}$ of $\mathbf{X}$.

      Proof: TODO.

Covariance test for positive definiteness

  • This is a trick due to Igram Olkin (1985).

  • A matrix $\mathbf{A} \in \mathbf{R}^{n \times n}$ is positive semidefinite if and only if $\mathbf{A} = \text{Cov}(\mathbf{X})$ for a random vector $\mathbf{X} \in \mathbb{R}^n$.

    Proof of the "if" part: We use the energy-based definition: $$ \mathbf{a}' \mathbf{A} \mathbf{a} = \mathbf{a}' \text{Cov}(\mathbf{X}) \mathbf{a} = \text{Var}(\mathbf{a}' \mathbf{X}) \ge 0 $$ for any vector $\mathbf{a}$.

    Proof of the "only if" part: Let $\mathbf{Y} \sim \text{Normal}(\mathbf{0}, \mathbf{I}_n)$. Then $\mathbf{A} = \text{Cov}(\mathbf{A}^{1/2} \mathbf{Y})$.

  • Schur lemma: If $\mathbf{A}, \mathbf{B} \in \mathbb{R}^{n \times n}$ are positive semidefinite, then their Hadamard product (elementwise product) $\mathbf{A} \circ \mathbf{B} = (a_{ij} b_{ij})$ is positive semidefinite.

    Proof: Let $\mathbf{A} = \text{Cov}(\mathbf{X})$ and $\mathbf{B} = \text{Cov}(\mathbf{Y})$. Then $\mathbf{A} \circ \mathbf{B} = \text{Cov}(\mathbf{X} \circ \mathbf{Y})$.

  • Exercise: Show the symmetric matrix with entries $a_{ij} = i(n-j+1)$ for $j \ge i$ is positive semidefinite.

    Hint: Casella-Berger Example 5.4.5 (p230).

In [1]:
n = 5
A = [j  i ? i * (n - j + 1) : j * (n - i + 1) for i in 1:n, j in 1:n]
Out[1]:
5×5 Array{Int64,2}:
 5  4  3  2  1
 4  8  6  4  2
 3  6  9  6  3
 2  4  6  8  4
 1  2  3  4  5
In [3]:
using LinearAlgebra
eigvals(A)
Out[3]:
5-element Array{Float64,1}:
  1.6076951545867355
  2.000000000000001 
  2.9999999999999996
  6.0               
 22.39230484541327  

Cholesky decomposition

  • The Cholesky decomposition a special case of LU decomposition for symmetric, positive definite matrix $\mathbf{A}$.

  • Any positive definite matrix $\mathbf{A}$ admits the decomposition $$ \mathbf{A} = \mathbf{L} \mathbf{L}', $$ where $\mathbf{L}$ is a lower triangular matrix with positive diagonal entries.

    Proof (by induction): If $n=1$, then $\ell = \sqrt{a}$. For $n>1$, the block equation $$ \begin{eqnarray*} \begin{pmatrix} a_{11} & \mathbf{a}' \\ \mathbf{a} & \mathbf{A}_{22} \end{pmatrix} = \begin{pmatrix} \ell_{11} & \mathbf{0}_{n-1}' \\ \mathbf{l} & \mathbf{L}_{22} \end{pmatrix} \begin{pmatrix} \ell_{11} & \mathbf{l}' \\ \mathbf{0}_{n-1} & \mathbf{L}_{22}' \end{pmatrix} \end{eqnarray*} $$ has solution $$ \begin{eqnarray*} \ell_{11} &=& \sqrt{a_{11}} \\ \mathbf{l} &=& \ell_{11}^{-1} \mathbf{a} \\ \mathbf{L}_{22} \mathbf{L}_{22}' &=& \mathbf{A}_{22} - \mathbf{l} \mathbf{l}' = \mathbf{A}_{22} - a_{11}^{-1} \mathbf{a} \mathbf{a}'. \end{eqnarray*} $$
    Now $a_{11}>0$ (why?), so $\ell_{11}$ and $\mathbf{l}$ are uniquely determined. $\mathbf{A}_{22} - a_{11}^{-1} \mathbf{a} \mathbf{a}'$ is positive definite because $\mathbf{A}$ is positive definite (why?). By the induction hypothesis, $\mathbf{L}_{22}$ exists and is unique.

  • Pivot test of positive definiteness. A symmetric matrix $\mathbf{A}$ is positive definite if the Cholesky decomposition can be carried out without encountering zero pivot.

In [1]:
using LinearAlgebra

A = Float64.([4 12 -16; 12 37 -43; -16 -43 98])
Out[1]:
3×3 Array{Float64,2}:
   4.0   12.0  -16.0
  12.0   37.0  -43.0
 -16.0  -43.0   98.0
In [2]:
# Cholesky without pivoting
Achol = cholesky(Symmetric(A))
Out[2]:
Cholesky{Float64,Array{Float64,2}}
U factor:
3×3 UpperTriangular{Float64,Array{Float64,2}}:
 2.0  6.0  -8.0
  ⋅   1.0   5.0
  ⋅    ⋅    3.0
In [3]:
Achol.L
Out[3]:
3×3 LowerTriangular{Float64,Array{Float64,2}}:
  2.0   ⋅    ⋅ 
  6.0  1.0   ⋅ 
 -8.0  5.0  3.0
In [5]:
# LU decomposition is different from Cholesky
lu(A)
Out[5]:
LU{Float64,Array{Float64,2}}
L factor:
3×3 Array{Float64,2}:
  1.0   0.0       0.0
 -0.75  1.0       0.0
 -0.25  0.263158  1.0
U factor:
3×3 Array{Float64,2}:
 -16.0  -43.0   98.0     
   0.0    4.75  30.5     
   0.0    0.0    0.473684

Positive definite matrices and minimization problems

  • For a scalar function $f: \mathbb{R} \mapsto \mathbb{R}$, the test for a minimum is: $$ \frac{df(x)}{dx} = 0 \quad (\text{first derivative test}) $$ and $$ \frac{d^2f(x)}{dx^2} > 0 \quad (\text{second derivative test}). $$

  • For a multivariate function $f: \mathbb{R}^n \mapsto \mathbb{R}$, the test for a minimum is: $$ \nabla f(\mathbf{x}) = \begin{pmatrix} \frac{\partial f}{\partial x_1} \\ \vdots \\ \frac{\partial f}{\partial x_n} \end{pmatrix} = \mathbf{0} \quad (\text{first derivative test}) $$ and $$ \nabla^2 f(\mathbf{x}) = \begin{pmatrix} \frac{\partial^2 f}{\partial x_i \partial x_j} \end{pmatrix} \succ \mathbf{O} \quad (\text{second derivative test}). $$

  • TODO: graph.

The ellipse $ax^2 + 2bxy + cy^2 = 1$

  • Consider the 2-by-2 matrix $$ \mathbf{A} = \begin{pmatrix} 5 & 4 \\ 4 & 5 \end{pmatrix}. $$ We can test whether it is positive definite by

    • eigenvalues: $\lambda_1 = 9$, $\lambda_2 = 1$.
    • energy test: $$ (x, y) \begin{pmatrix} 5 & 4 \\ 4 & 5 \end{pmatrix} \begin{pmatrix} x \\ y \end{pmatrix} = 5x^2 + 8xy + 5y^2 = 9 \left( \frac{x + y}{\sqrt 2} \right)^2 + 1 \left( \frac{x-y}{\sqrt 2} \right)^2 > 0. $$
    • Schur complemennt: $a = 5 > 0$, $c - a^{-1} b^2 = 5 - 16/5 > 0$.
  • TODO: Visualize $5x^2 + 8xy + 5y^2 = 1$. Axes align with eigenvectors and axe lengths are $1 / \sqrt \lambda$.