An important property of real symmetric matrices is that they have an orthonormal basis of eigenvectors. If #A# is a symmetric #(n\times n)#-matrix with eigenvalues #\lambda_1,\dots,\lambda_n# and associated orthonormal basis of eigenvectors #\alpha=\basis{\vec{a}_1,\dots,\vec{a}_n}#, then #A# is conjugate to the diagonal matrix #D# whose diagonal entries are #\lambda_1,\dots,\lambda_n#:
\[ A={}_\varepsilon I_\alpha \,D\,{}_\alpha I_\varepsilon \]
We can also write this as \[ A=\lambda_1\vec{a}_1\vec{a}_1^\top+\cdots+\lambda_n\vec{a}_n\vec{a}_n^\top\] where the vectors #\vec{a}_i# of #\alpha# are viewed as column vectors. This way, #A# is written as a sum of rank-one matrices #\vec{a}_i\vec{a}_i^\top# for #i=1,\ldots,n#.
If we assume #|\lambda_1|\ge |\lambda_2| \ge\cdots \ge |\lambda_n|#, then the first terms of this sum have "more clout" than the terms at the end. By taking the first #k# terms (or ignoring terms for which #|\lambda|# is small enough so as to be negligible), we can find good approximations of #A# by matrices of lower rank. In practice, this can lead to significant cost savings in return for a slight loss of quality.
For non-symmetric matrices this does not work, and for non-square matrices, we do not even have eigenvalues or eigenvectors. But in that case there is a useful decomposition as a sum of rank-one matrices. Before we get into this, we define an alternative to eigenvalues in the case of a matrix that is not square:
If #A# is an #(m\times n)#-matrix and #\lambda# is an eigenvalue of #A^\top A# distinct from #0#, then #\lambda# is positive, and also an eigenvalue of #A\, A^\top # with the same multiplicity as for #A^\top A#.
In that case, we call #\sqrt{\lambda}# a singular value of #A#.
Let #E_\lambda# be the eigenspace in #\mathbb{R}^n# of #A^\top A# with eigenvalue #\lambda#, and let #F_\lambda# be the eigenspace in #\mathbb{R}^m# of #A\, A^\top # with the same eigenvalue.
Suppose that #\vec{v}# is an eigenvector of #A^\top A# with eigenvalue #\lambda#, so #\vec{v}# is a nonzero element of #E_\lambda#. Then
\[\lambda (\dotprod{\vec{v}}{\vec{v}}) =\dotprod{(\lambda \vec{v})}{\vec{v}}=\dotprod{(A^\top A \vec{v})}{\vec{v}}=\dotprod{(A \vec{v})}{(A\vec{v})}\ge0\]
Since \(\dotprod{\vec{v}}{\vec{v}}\gt0\), it follows that #\lambda\ge0#. Because of the assumption that #\lambda# is unequal to #0#, the eigenvalue #\lambda# is positive.
Now put #\vec{u} = \lambda^{-\frac12} A\vec{v}#. We then have
\[\begin{array}{rcl}A^\top\vec{u} &=& \lambda^{-\frac12} A^\top A\vec{v} = \lambda^{-\frac12} \lambda\vec{v} = \lambda^{\frac12} \vec{v} \\&&\text{and so }\\ (A\, A^\top)\vec{u} &=&A\,( A^\top\vec{u}) =\lambda^{\frac12} A \vec{v} = \lambda\vec{u}\end{array}\] The first line shows that #\vec{u}# is unequal to the zero vector (for otherwise its image under #A^\top# would be the zero vector, which contradicts that it is #\lambda^{\frac12} \vec{v}#). The second line shows that #\vec{u}# is an eigenvector of #A\, A^\top# with eigenvalue #\lambda#, so #\vec{u}# belongs to #F_\lambda# and #\lambda# is an eigenvalue of #A\, A^\top #.
Therefore, the map that assigns to a vector #\vec{v}# in #E_\lambda# the vector #\lambda^{-\frac12}A\vec{v}# is a linear map #E_\lambda\to F_\lambda#. In a similar way, the map that assigns to #\vec{u}# in #F_\lambda# the vector #\lambda^{-\frac12}A^\top \vec{u}#, is a linear map #F_\lambda\to E_\lambda#. The second equation above shows that these are each other's inverses:
\[\lambda^{-\frac12}(A\,(\lambda^{-\frac12}A^\top)\vec{u} =\lambda^{-1} \lambda \vec{u}=\vec{u}\]
In particular, #E_\lambda# and #F_\lambda# have the same dimension, so the multiplicity of #\lambda# as an eigenvalue of #A^\top A# coincides with the multiplicity of #\lambda# as an eigenvalue of #A\, A^\top#.
If the matrix #A# is symmetric, then #A^\top A# is equal to #A^2# and the singular values of #A# are simply the absolute values of the eigenvalues of #A# which are not zero.
The matrix \[A = \left(\begin{array}{cc}1&0 \\0&1 \\ 1&1\end{array}\right)\] has singular values #\sqrt3# and #1#, since \[A^\top A=\matrix{2&1\\ 1&2}\] has eigenvalues #3# and #1#. The matrix \[A\, A^\top=\matrix{1&0&1\\ 0&1&1\\ 1&1&2}\] has eigenvalues #3#, #1#, and #0#. Indeed, the positive eigenvalues of #A\, A^\top# are equal to those of #A^\top A#.
Because #A^\top A= A^\top\,(A^\top)^\top#, the matrices #A# and #A^\top# have the same singular values.
For the best approximation of a diagonal matrix in the case of a matrix that is not square, we need the following definition.
An #(m\times n)#-matrix #A# is called a generalized diagonal matrix if #A_{ij}=0# when #i\ne j#. The numbers #A_{ii}# are called the diagonal entries of #A#.
The matrix #A=\left(\begin{array}{ccc}2&0&0\\ 0&3&0\end{array}\right)# is a generalized diagonal matrix with diagonal entries #2# and #3#. Moreover, \[A\,A^\top=\matrix{4&0\\ 0&9}\qquad\text{and}\qquad A^\top A=
\matrix{4&0&0\\ 0&9&0\\ 0&0&0}\]are diagonal matrices with the same positive diagonal elements.
If #A# is a generalized diagonal matrix, then both #A\,A^\top# and #A^\top A# are diagonal matrices with the same positive diagonal entries, but possibly with a different number of zeros on the diagonal.
The following decomposition of an arbitrary matrix #A# as the product of two orthogonal matrices with a generalized diagonal matrix in between is achieved by use of the eigenvalue theory for #A^\top A# and #A\, A^\top#:
Each #(m\times n)#-matrix #A# with rank #r# can be written as #A=U\, D\, V^\top# where
- #D# is a generalized diagonal #(m\times n)#-matrix, whose the diagonal entires are the singular values of #A# in decreasing size,
- #V# (and hence also #V^\top#) is an orthogonal #(n\times n)#-matrix, and
- #U# is an orthogonal #(m\times m)#-matrix.
The columns of #V# form an orthonormal basis of #\mathbb{R}^n# consisting of eigenvectors of the symmetric matrix #A^\top A#. The first #r# columns are eigenvectors with positive eigenvalues. If #r\lt n#, then the remaining #n-r# columns are eigenvectors with eigenvalue zero.
The columns of #U# form an orthonormal basis of #\mathbb{R}^m# consisting of eigenvectors of the symmetric matrix #A\, A^\top#. The first #r# columns are eigenvectors with positive eigenvalues. If #r\lt m#, then the remaining #m-r# columns are eigenvectors with eigenvalue zero.
For each positive eigenvalue #\lambda# of #A^\top A# we can obtain a column of #U# by applying the map #\lambda^{-\frac12} A# to the corresponding eigenvector of #A^\top A#. The result is an eigenvector of #A\,A^\top# with the same eigenvalue #\lambda#.
The decomposition is called the singular value decomposition. If #\vec{u}_1,\ldots,\vec{u}_m# are the columns of #U#, and #\vec{v}_1,\ldots,\vec{v}_n# are the columns of #V#, and if, furthermore, #\lambda_1,\ldots,\lambda_r# are the positive eigenvalues of #A\, A^\top#, then the decomposition can be rewritten as
\[ A =\sum_{j=1}^r \lambda_j^{\frac12}{\vec{u}_j} {\vec{v}_j^\top}\]
Since all rows of a matrix with rank one are scalar multiples of each other, an #(m\times n)#-matrix with rank one has the following form:\[M=\matrix{c_1x_1& c_1x_2&\cdots &c_1x_n\\c_2x_1& c_2x_2&\cdots &c_2x_n\\\vdots&\vdots&\vdots &\vdots\\c_mx_1& c_mx_2&\cdots &c_mx_n\\}\]Alternatively, we may write:\[M=\matrix{c_1\\c_2\\\vdots\\c_m}\matrix{x_1&x_2&\cdots &x_n}\]For example:\[\matrix{14&21&35\\22&33&55}=\matrix{7\\11}\matrix{2&3&5}\]
If #r\lt n#, then the last #n-r# columns of #V# are eigenvectors of #A^\top A# corresponding to eigenvalue zero. These #n-r# columns therefore form an orthonormal basis #\basis{\vec{v}_{r+1},\ldots\vec{v}_n}# for the kernel of #A^\top A#.
A vector #\vec{v}# is in the kernel of #A^\top A# if and only if it is in the kernel of #A#.
Indeed, if #\vec{v}# is in the kernel of #A#, then\[A^\top A\vec{v}=A^\top\vec{0}=\vec{0}\] showing that #\vec{v}# is in the kernel of #A^\top A#. Conversely, if #\vec{v}# is in the kernel of #A^\top A#, then\[\dotprod{A\vec{v}}{A\vec{v}}=\dotprod{A^\top A\vec{v}}{\vec{v}}=\dotprod{\vec{0}}{\vec{v}}=0\]From the positive-definiteness of the inner product it then follows that #A\vec{v}=\vec{0}# showing that #\vec{v}# is in the kernel of #A#.
Similarly, it follows that the last #m-r# columns of #U# form an orthonormal basis #\basis{\vec{u}_{r+1},\ldots\vec{u}_m}# for the kernel of #A\, A^\top#, and that a vector #\vec{u}# is in the kernel of #A\, A^\top# if and only if it is in the kernel of #A^\top#.
We use these results in the proof of the statement.
Choose an orthonormal basis for each eigenspace #E_\lambda#, where #\lambda# runs over the positive eigenvalues of #A^\top A# in decreasing size. Put these bases together in this order. This is an orthonormal system (notice that #A^\top A# is a symmetric linear map and see Invariant orthogonal complements for symmetric linear maps). Say #\basis{\vec{v}_1,\ldots,\vec{v}_r}#. Augment this basis by vectors #\basis{\vec{v}_{r+1},\ldots\vec{v}_n}# to an orthonormal basis for #\mathbb{R}^n#.
Let #\vec{u}_i = \lambda^{-\frac12}A\vec{v}_i# for #i=1,\ldots,r#. This system is orthonormal (see theorem Diagonalizability of symmetric maps). Complete this system with #\vec{u}_{r+1},\ldots,\vec{u}_m# to an orthonormal basis for #\mathbb{R}^m#.
Let #U# be the matrix whose columns are the vectors #\vec{u}_1,\ldots,\vec{u}_m# and let #V# be the matrix whose columns are the vectors #\vec{v}_1,\ldots,\vec{v}_n#. Then #U\,D\,V^\top# is a decomposition as required. It is clear that #D# is a generalized diagonal matrix of size #m\times n#. Because the selected bases are orthonormal, the matrices #U# and #V# are orthogonal.
We will prove that #A# coincides with #U\,D\,V^\top#. To this end, we show that #\dotprod{\vec{u}_i}{(U\,D\, V^\top\vec{v}_j)}# equals #\dotprod{\vec{u}_i}{(A\vec{v}_j)}# for all #i# and #j# with #1\le i\le m# and #1\le j\le n#. We begin by rewriting the first inner product:
\[\begin{array}{rcl}\dotprod{\vec{u}_i}{(U\,D\, V^\top\vec{v}_j)} &=& \dotprod{U^\top \vec{u}_i}{(DV^\top\vec{v}_j)} \\ &=& \dotprod{U^{-1} \vec{u}_i}{(DV^{-1}\vec{v}_j)} \\ &=&\dotprod{\vec{e}_i}{(D\vec{e}_j)}\end{array}\]
If #j\gt r#, then we have #\dotprod{\vec{u}_i}{(U\,D\, V^\top\vec{v}_j)}=0=\dotprod{\vec{u}_i}{(A\vec{v}_j)}# because #A\vec{v}_j=\vec{0}# and #D\vec{e}_j = \vec{0}#. The same holds if #i\gt r#, because then #A^\top\vec{u}_j=\vec{0}# and #D^\top U^\top \vec{u}_j = D^\top\vec{e}_i = \vec{0}#. We therefore assume that #i,j\le r#. We now have
\[\begin{array}{rcl}\dotprod{\vec{u}_i}{(U\,D\, V^\top\vec{v}_j)} &=&\dotprod{\vec{e}_i}{(D\vec{e}_j)}\\&=&\dotprod{\vec{e}_i}{(\lambda^{\frac12}\vec{e}_j)}\\ &= &\lambda_j^{\frac12}\delta_{ij}\\ &=& \lambda_j^{\frac12}\dotprod{\vec{u}_i}{\vec{u}_j}\\ &=&\dotprod{\vec{u}_i}{(A\vec{v}_j)} \end{array} \]
We conclude that #U\,D\, V^\top\vec{v}_j# and #A\vec{v}_j# have the same inner products with every column vector of #U#. These columns constitute a basis for #\mathbb{R}^m#, so this means that #U\,D\, V^\top\vec{v}_j=A\vec{v}_j# holds for every #j#. We conclude #U\,D\, V^\top# and #A# have the same images on each row vector of #V#. Because these rows are a basis for #\mathbb{R}^n#, we have established that #U\,D\, V^\top=A#.
To deduce the structure of #U# we use the equality
\[ A\, A^\top =U\, (D\, D^\top) U^{-1}\] The matrix #D\,D^\top# is a diagonal matrix. This shows that the columns of #U# form an orthonormal basis of eigenvectors of #A\,A^\top#. In a similar way, the statement on the structure of #V# can be derived from the equality \(A^\top A =V\,(D^\top D)V^{-1}\).
Finally, we derive the expression of #A# as a sum of rank-one matrices:
\[\begin{array}{rcl}A &= &U\, D\, V^\top \\ &=&\displaystyle\sum_{j=1}^m\matrix{0&\cdots& 0&\vec{u}_j&0&\cdots&0}D\, \matrix{\vec{v}_1^\top\\ \vec{v}_2^\top\\ \vdots \\ \vec{v}_n^\top}\\ &=&\displaystyle\sum_{j=1}^r \lambda_j^{\frac12}\matrix{0&\cdots& 0&\vec{u}_j&0&\cdots&0} \matrix{\vec{v}_1^\top\\ \vec{v}_2^\top\\ \vdots \\\vec{v}_n^\top}\\&=&\displaystyle\sum_{j=1}^r \lambda_j^{\frac12}{\vec{u}_j} {\vec{v}_j^\top}\\\end{array}\]
The proof of the theorem provides a method for finding a singular value decomposition #U\,D\,V^\top# of an #(m\times n)#-matrix #A#. Let #r# be the rank of #A#.
- Calculate #A^\top A# and its eigenvalues #\lambda_j\gt0# in decreasing size for #j=1,\ldots,r#.
- For #D# we take the generalized diagonal matrix of size #m\times n# with the singular values #\sqrt{\lambda_j}# for #j=1,\ldots,r# on the diagonal; if necessary, this diagonal is augmented by zeros.
- For #V# we take a matrix whose columns form an orthonormal basis #\basis{\vec{v}_1,\vec{v}_2,\dots,\vec{v}_n}# of eigenvectors of #A^\top A# with eigenvalues #\lambda_j\ge0# for #j=1,\ldots,n#.
- For #U# we define #\vec{u}_j=\lambda_j^{-\frac12}A\vec{v}_j# if #j\le r# (that is, if #\lambda_j\gt 0#). This way we get an orthonormal system (in #\mathbb {R}^m#) of vectors, which we use as columns for #U#. If needed, we augment it to an orthonormal basis, and use it to fill the other columns of #U#.
The singular value decomposition is often abbreviated to SVD in the literature.
We determine a singular value decomposition of the matrix \[A=\left(\begin{array}{rrr}14&16&4\\ -2&-13&-22\end{array}\right)\] The matrix \[A^\top A=\matrix{200&250&100\\ 250&425&350\\ 100&350&500}\] has eigenvalues #900#, #225#, and #0#, and orthonormal basis of eigenvectors:\[\vec{v}_1=\frac13\cv{1\\2\\2},\qquad\vec{v}_2=\frac13\cv{2\\1\\-2},\qquad\vec{v}_3=\frac13\cv{2\\-2\\1}\]We find #\sqrt{\lambda_1}=30# and #\sqrt{\lambda_2}=15#, and #\vec{u}_1=\frac1{30}A\vec{v}_1=\frac15\cv{3\\-4}# and #\vec{u}_2=\frac1{15}A\vec{v}_2=\frac15\cv{4\\3}#. So
\[ \matrix{14&16&4\\ -2&-13&-22} = \matrix{\frac35 &\frac45 \\
-\frac45&\frac35}\, \matrix{30&0&0\\ 0&15&0}\,
\matrix{\frac13&\frac23&\frac23\\ \frac23&\frac13&-\frac23\\ \frac23&-\frac23&\frac13} \] Furthermore, #A=30\vec{u}_1\vec{v}_1^\top+15\vec{u}_2\vec{v}_2^\top=\left(\begin{array}{rrr}6&12&12\\ -8&-16&-16
\end{array}\right)+\left(\begin{array}{rrr}8&4&-8\\ 6&3&-6 \end{array}\right)#.
By theorem Diagonalizability of symmetric matrices, a symmetric matrix #A# has a decomposition #A=X^{-1} \, D\, X = U\, D\, U^\top #, with #U=X^{-1}# an orthogonal matrix, and #D# a diagonal matrix. The diagonal entries of #D# are the eigenvalues of #A#. If all eigenvalues of #A# are nonnegative, this decomposition is a singular value decomposition #A = U\, D\, U^\top# with #V = U#. More generally, we can take the same #U# as before, replace #D # by # D\, T# and put #V=U\,T#, where #T# is a suitable diagonal matrix with diagonal entries from #\pm1#, so as to obtain a singular value decomposition for #A#.
The singular value decomposition is not unique: in the algorithm we can replace a normalized eigenvector by its negative, and the orthonormal bases for the kernels of #A^\top A# and #A\, A^\top# provide freedom of choice. Thanks to the ordering by size of the diagonal entries, the diagonal matrix #D# is unique.
Calculate the singular value decomposition of the matrix
\[ A = \matrix{4 & 11 & 14 \\ 8 & 7 & -2 \\ } \]
\(\matrix{4 & 11 & 14 \\ 8 & 7 & -2 \\ } =\) \( \matrix{{{3}\over{\sqrt{10}}} & -{{1}\over{\sqrt{10}}} \\ {{1}\over{\sqrt{10}}} & {{3}\over{\sqrt{10}}} \\ }\, \matrix{6\cdot \sqrt{10} & 0 & 0 \\ 0 & 3\cdot \sqrt{10} & 0 \\ } \, \matrix{{{1}\over{3}} & {{2}\over{3}} & {{2}\over{3}} \\ {{2}\over{3}} & {{1}\over{3}} & -{{2}\over{3}} \\ {{2}\over{3}} & -{{2}\over{3}} & {{1}\over{3}} \\ } \)
We first calculate the matrix product
\[\begin{array}{rcl}
A^\top A &=& \matrix{80 & 100 & 40 \\ 100 & 170 & 140 \\ 40 & 140 & 200 \\ }\end{array}\] The characteristic polynomial of this matrix is \(p_{ A^\top A }(x) = \left(360-x\right)\cdot \left(x-90\right)\cdot x \). The eigenvalues of # A^\top A# are the solutions of the characteristic equation of this matrix. These are #{\left[ 360 , 90 , 0 \right] }#. The singular values are the square roots of the eigenvalues unequal to #0#. These are #{\left[ 6\cdot \sqrt{10} , 3\cdot \sqrt{10} \right] }#. Therefore, the matrix #D# in the singular value decomposition is
\[D = \matrix{6\cdot \sqrt{10} & 0 & 0 \\ 0 & 3\cdot \sqrt{10} & 0 \\ }\] In order to find the matrix #V# we look for an orthonormal basis of eigenvectors of #A^\top A#. Determination of orthonormal bases for the null space of #A^\top A-\lambda \, I_3# for #\lambda= 6\cdot \sqrt{10} , 3\cdot \sqrt{10} , 0 # gives
\[ \begin{array}{ll} \\ \text{ eigenvalue } 6\cdot \sqrt{10} :&\matrix{{{1}\over{3}} \\ {{2}\over{3}} \\ {{2}\over{3}} \\ } , \\ \text{ eigenvalue } 3\cdot \sqrt{10} :&\matrix{{{2}\over{3}} \\ {{1}\over{3}} \\ -{{2}\over{3}} \\ } , \\ \text{ eigenvalue } 0 :&\matrix{{{2}\over{3}} \\ -{{2}\over{3}} \\ {{1}\over{3}} \\ } \end{array}\] Thus we find
\[ V = \matrix{{{1}\over{3}} & {{2}\over{3}} & {{2}\over{3}} \\ {{2}\over{3}} & {{1}\over{3}} & -{{2}\over{3}} \\ {{2}\over{3}} & -{{2}\over{3}} & {{1}\over{3}} \\ }\] Finally, we need to find the matrix #U# of the singular value decomposition. To this end we calculate the images #D_{ii}^{-1}A\vec{v}_i# for #i=1,2# where #\vec{v}_i# is the #i#-th column of #V# and #D_{ii}# is the #i#-th diagonal element of #D#. This gives the complete matrix #U#:
\[ U = \matrix{{{3}\over{\sqrt{10}}} & -{{1}\over{\sqrt{10}}} \\ {{1}\over{\sqrt{10}}} & {{3}\over{\sqrt{10}}} \\ }\] Thus we have found the following singular value decomposition for #A#:
\[\matrix{4 & 11 & 14 \\ 8 & 7 & -2 \\ } = \matrix{{{3}\over{\sqrt{10}}} & -{{1}\over{\sqrt{10}}} \\ {{1}\over{\sqrt{10}}} & {{3}\over{\sqrt{10}}} \\ }\, \matrix{6\cdot \sqrt{10} & 0 & 0 \\ 0 & 3\cdot \sqrt{10} & 0 \\ } \, \matrix{{{1}\over{3}} & {{2}\over{3}} & {{2}\over{3}} \\ {{2}\over{3}} & {{1}\over{3}} & -{{2}\over{3}} \\ {{2}\over{3}} & -{{2}\over{3}} & {{1}\over{3}} \\ } \]
The corresponding expression as a sum of rank-one matrices is
\[\begin{array}{rcl}A& =&\displaystyle\sum_{i=1}^{2}D_{ii}\, \vec{u}_i\,\vec{v}_i^\top \\ & =& 6\cdot \sqrt{10} \, \matrix{{{3}\over{\sqrt{10}}} \\ {{1}\over{\sqrt{10}}} \\ } \, \matrix{{{1}\over{3}} & {{2}\over{3}} & {{2}\over{3}} \\ } + 3\cdot \sqrt{10} \, \matrix{-{{1}\over{\sqrt{10}}} \\ {{3}\over{\sqrt{10}}} \\ } \, \matrix{{{2}\over{3}} & {{1}\over{3}} & -{{2}\over{3}} \\ } \\ & =& 6\cdot \sqrt{10} \, \matrix{{{1}\over{\sqrt{10}}} & {{\sqrt{2}}\over{\sqrt{5}}} & {{\sqrt{2}}\over{\sqrt{5}}} \\ {{1}\over{3\cdot \sqrt{10}}} & {{\sqrt{2}}\over{3\cdot \sqrt{5}}} & {{\sqrt{2}}\over{3\cdot \sqrt{5}}} \\ } + 3\cdot \sqrt{10} \, \matrix{-{{\sqrt{2}}\over{3\cdot \sqrt{5}}} & -{{1}\over{3\cdot \sqrt{10}}} & {{\sqrt{2}}\over{3\cdot \sqrt{5}}} \\ {{\sqrt{2}}\over{\sqrt{5}}} & {{1}\over{\sqrt{10}}} & -{{\sqrt{2}}\over{\sqrt{5}}} \\ } \end{array}\]