Gram-Schmidt Orthogonalization and QR-Factorization

Section 7.4 Gram-Schmidt Orthogonalization and QR-Factorization

In this section we explore the notion of starting with a linearly independent set \(\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\}\) and transforming it into an orthonormal basis \(\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\}\)of span\(\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\}.\) Along with the notion of projection we’ll need some additional tools first.

Theorem 7.4.1.

The set \(\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\}\in\R^m\) is an orthonormal set if

\begin{equation*} \boldsymbol{Q}:=\begin{pmatrix}|\amp |\amp \amp |\\ \vec{q_1}\amp \vec{q_2}\amp \cdots\amp \vec{q_n}\\|\amp |\amp \amp | \end{pmatrix} \end{equation*}

satisfies the condition that \(\boldsymbol{Q}^T\boldsymbol{Q}=\I.\)

Proof.

The proof is a worksheet exercise.

Definition 7.4.2. Orthogonal matrices.

An orthogonal matrix is an \(n\times n\) matrix with orthonormal columns. (We wish they were called orthonormal matrices, but they are not.)

Corollary 7.4.3. The inverse of an orthogonal matrix.

For orthogonal \(\boldsymbol{Q}\in\R^{n\times n},\,\boldsymbol{Q}^{-1}=\boldsymbol{Q}^T.\)

Proof.

The proof is a worksheet exercise.

Matrix-vector multiplication by orthogonal matrices preserves the length of a vector and the inner product.

Theorem 7.4.4. Orthogonal matrices multiplying vectors.

Consider the inner product space \((\R^n,\R,\boldsymbol{\cdot}),\) let \(\boldsymbol{Q}\) be an othogonal matrix and let \(\x,\y\in\R^n.\) Then

\(\|\boldsymbol{Q}\x\|=\|\x\|\) and
\(\displaystyle \dpr{(\boldsymbol{Q}\x)}{(\boldsymbol{Q}\y)}=\dpr{\x}{\y}\)

Proof.

The proof is a homework exercfise.

Remark 7.4.5.

In light of this definition we may recast our problem in this section as that of creating an orthonormal basis from a linearly independent basis. An understanding of the process by which we will accomplish this requires an appreciation for one particularly interesting property of projection matrices, which are illustrated by the next two theorems.

Theorem 7.4.6.

Let \(\P\) be a projection. Then col\((\I-\P)=N(\P).\)

Proof.

We want to demonstrate subset inclusion in each direction. First fix a vector of coefficients \(\b\in\R^m\) and form the linear combination \((\I-\P)\b\) of the columns of \(\I-\P.\) To see if this is in \(N(\P)\) we multiply on the left by \(\P:\)

\begin{equation*} \P(\I-\P)\b=(\P-\P^2)\b=0_{m\times m}\b=\vec{0} \end{equation*}

so indeed col\((\I-\P)\subseteq N(\P).\) Next, to show that \(N(\P)\subseteq\) col\((\I-\P),\) note that if \(\b\in N(\P)\) so that \(\P\b=\vec{0},\) then \((\I-\P)\b=\b\) which tells us that \(\b\in\) col\((\I-\P)\) since there is a coefficient vector (namely \(\b\) itself) from which we can form \(\b\) as a linear combination of the columns of \((\I-\P).\) Thus \(N(\P)\) is a subset of col\((\I-\P)\) and hence col\((\I-\P)=N(\P)\) as desired.

Theorem 7.4.7.

Let \(\P\) be a projection. Then \(N(\I-\P)=\) col\((\P).\)

Proof.

This follows by applying Theorem 7.4.6 to the projection matrix \(\I-\P.\)

Remark 7.4.8.

Theorem 7.4.6 and Theorem 7.4.7 show that any projection matrix \(\P\) decomposes \(\R^m\) into two orthogonal subspaces col\((\P)\) and col\((\I-\P).\) For any \(\x\in\) col\((\P)\) and \(\y\in\) col \((\I-\P)\) we have \(\x\perp\y.\) This notion forms the basis for a process by which we take a linearly independent set \(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}\) and transform it into an orthogonal basis \(\left\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\right\}\) of span\(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}.\) This process is called Gram-Schmidt orthogonalization.

Remark 7.4.9.

Figure 7.4.10. Step 2 of the Gram-Schmidt process

Suppose \(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}\subset\R^m\) is a linearly independent set.

First we create an orthogonal (but not necessarily orthonormal) basis \(\left\{\vec{u_i}\right\}_{i=1}^n\) as follows.

We select the initial direction to be that of \(\vec{v_1}\) and set \(\vec{u_1}=\vec{v_1}\text{.}\)

We create \(\vec{u_2}\) by decomposing \(\vec{v_2}\) using Theorem 7.1.27:

\begin{gather} \vec{v_2}=\vec{u_2}+\frac{\dpr{\vec{u_1}}{\vec{v_2}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}\tag{7.4.1} \end{gather}

which yields

\begin{gather} \vec{u_2}=\vec{v_2}-\frac{\dpr{\vec{u_1}}{\vec{v_2}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}\tag{7.4.2} \end{gather}

Note that \(\vec{u_2}\perp\vec{u_1}\) (take dot products).

To create \(\vec{u_3}\) we decompose \(\vec{v_3}\) as:

\begin{gather} \vec{v_3}=\vec{u_3}+\frac{\dpr{\vec{u_1}}{\vec{v_3}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}+\frac{\dpr{\vec{u_2}}{\vec{v_3}}}{\dpr{\vec{u_2}}{\vec{u_2}}}\vec{u_2}\tag{7.4.3} \end{gather}

yielding

\begin{gather} \vec{u_3}=\vec{v_3}-\frac{\dpr{\vec{u_1}}{\vec{v_3}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}-\frac{\dpr{\vec{u_2}}{\vec{v_3}}}{\dpr{\vec{u_2}}{\vec{u_2}}}\vec{u_2}\tag{7.4.4} \end{gather}

Taking dot products shows \(\vec{u_3}\perp\vec{u_1}\) and \(\vec{u_3}\perp\vec{u_2}\text{.}\)

We continue in this manner until the set of linearly independent vectors is exhausted. The \(j^{\text{th}}\) such vector \(\vec{u_j}\) will be orthogonal to each of \(\vec{u_1},\ldots,\vec{u_{j-1}}.\)

Finally we normalize each of the \(\vec{u_j}\) to obtain the orthonormal vectors \(\vec{q_j}.\)

Theorem 7.4.11.

Suppose \(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}\subset\R^m\) is a linearly independent set in \(\R^m.\) Then the process of Gram-Schmidt orthogonalization, described above in Remark 7.4.9, creates an orthonormal basis \(\,\left\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\right\}\) for span\(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}.\)

As an algorithm:

\begin{align} \vec{u_1}\amp=\vec{v_1}\notag\\ \vec{u_2}\amp=\left(\I-\P_{\{\vec{u_1}\}}\right)\vec{v_2}\,=\,\vec{v_2}-\frac{\vec{u_1}^T\vec{v_2}}{\vec{u_1}^T\vec{u_1}}\vec{u_1}\notag\\ \vec{u_3}\amp=\left(\I-\P_{\{\vec{u_1},\vec{u_2}\}}\right)\vec{v_3}\,=\,\vec{v_3}-\frac{\vec{u_1}^T\vec{v_3}}{\vec{u_1}^T\vec{u_1}}\vec{u_1} -\frac{\vec{u_2}^T\vec{v_3}}{\vec{u_2}^T\vec{u_2}}\vec{u_2}\notag\\ \amp \qquad\qquad\vdots\nonumber\notag\\ \vec{u_n}\amp=\left(\I-\P_{\{\vec{u_j}\}_{j=2}^n}\right)\vec{v_n}\,=\,\vec{v_n}-\sum_{i=1}^{n-1}\frac{\vec{u_i}^T\vec{v_n}}{\vec{u_i}^T\vec{u_i}}\vec{u_i}\tag{7.4.5} \end{align}

Normalizing, for each \(i\in\{1,2,\ldots,n\}\) we define the unit vectors

\begin{equation*} \vec{q_i}:=\frac{\vec{u_i}}{\left\|\vec{u_i}\right\|}. \end{equation*}

Then the set

\begin{equation*} \left\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\right\} \end{equation*}

is an orthonormal basis for span\(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}.\)

A neat feature of the Gram-Schmidt process is that it provides another factorization of a matrix. How? We move the subtracted terms in (7.4.5) to the other side to get

\begin{align} \vec{v_1}\amp=\vec{u_1}\,=\,\left\|\vec{u_1}\right\|\vec{q_1}\notag\\ \vec{v_2}\amp=\vec{u_2}+\frac{\vec{u_1}^T\vec{v_2}}{\vec{u_1}^T\vec{u_1}}\vec{u_1}\,=\,\left\|\vec{u_2}\right\|\vec{q_2}+\frac{\vec{u_1}^T\vec{v_2}}{\left\|\vec{u_1}\right\|}\frac{\vec{u_1}}{\left\|\vec{u_1}\right\|}\notag\\ \amp=\left\|\vec{u_2}\right\|\vec{q_2}+\left(\dpr{\vec{q_1}}{\vec{v_2}}\right)\vec{q_1}\notag\\ \vec{v_3}\amp=\vec{u_3}+\frac{\vec{u_1}^T\vec{v_3}}{\vec{u_1}^T\vec{u_1}}\vec{u_1}+\frac{\vec{u_2}^T\vec{v_3}}{\vec{u_2}^T\vec{u_2}}\vec{u_2}\notag\\ \amp=\left\|\vec{u_3}\right\|\vec{q_3}+\left(\dpr{\vec{q_1}}{\vec{v_3}}\right)\vec{q_1}+\left(\dpr{\vec{q_2}}{\vec{v_3}}\right)\vec{q_2}\notag\\ \amp \qquad\qquad\qquad\vdots\nonumber\notag\\ \vec{v_n}\amp=\vec{u_n}+\sum_{i=1}^{n-1}\frac{\vec{u_i}^T\vec{v_n}}{\vec{u_i}^T\vec{u_i}}\vec{u_i}\notag\\ \amp=\left\|\vec{u_n}\right\|\vec{q_n}+\sum_{i=1}^{n-1}\left(\dpr{\vec{q_i}}{\vec{v_n}}\right)\vec{q_i}\tag{7.4.6} \end{align}

which expresses each vector \(\vec{v_j}\) as a linear combination of the \(\displaystyle{\{\vec{q_i}\}_{i\le j}}\) whose coefficients are the dot products of \(\vec{v_j}\) with the unit direction vectors in each of the directions determined up to that point.

Remember the \(LU\)-factorization? We may now construct the \(QR\)-factorization in which we write \(\A\) as the product of orthogonal and upper-triangular matrices. This factorization is made explicit in the next theorem.

Theorem 7.4.12.

Let \(\A=\begin{pmatrix}|\amp |\amp \cdots\amp |\\\vec{v_1}\amp \vec{v_2}\amp \cdots\amp \vec{v_n}\\|\amp |\amp \cdots\amp | \end{pmatrix}\in\R^{n\times n}\) be nonsingular. Then the \(\boldsymbol{Q}\boldsymbol{R}\)-factorization of \(\A\) is given by \(\A=\boldsymbol{Q}\boldsymbol{R}\) where

\begin{equation*} \boldsymbol{Q}=\begin{pmatrix}|\amp |\amp \amp |\\ \vec{q_1} \amp \vec{q_2} \amp \cdots\amp \vec{q_n}\\|\amp |\amp \amp | \end{pmatrix} \end{equation*}

is the orthogonal matrix of unit column vectors obtained from the column vectors \(\A_{\ast j}\) of \(\A\) by the Gram-Schmidt process described in Theorem 7.4.11 and \(\boldsymbol{R}\) is the upper-triangular matrix

\begin{equation*} \boldsymbol{R}=\left(\begin{array}{ccccc} \left\|\vec{u_1}\right\| \amp \displaystyle{\vec{q_1}^T\vec{v_2}} \amp \displaystyle{\vec{q_1}^T\vec{v_3}} \amp \cdots \amp \displaystyle{\vec{q_1}^T\vec{v_n}}\\ 0 \amp \left\|\vec{u_2}\right\| \amp \displaystyle{\vec{q_2}^T\vec{v_3}} \amp \cdots \amp \displaystyle{\vec{q_2}^T\vec{v_n}}\\ \vdots \amp \vdots \amp \ddots \amp \ddots \amp \vdots\\ 0 \amp \cdots \amp \cdots\amp \left\|\vec{u_{n-1}}\right\| \amp \displaystyle{\vec{q_{n-1}}^T\vec{v_n}}\\ 0 \amp \cdots \amp \cdots \amp \amp \left\|\vec{u_n}\right\|\end{array}\right). \end{equation*}

Proof.

The proof, a simple calculation, is a worksheet exercise.

Example 7.4.13.

Find the \(QR\)-factorization of \(\A=\left(\begin{array}{rrr}5\amp -29\amp -16\\-10\amp 25\amp -40\\10\amp -28\amp 13 \end{array} \right).\)

Solution: Using Gram-Schmidt on the columns of \(\A=\left(\begin{array}{rrr}5\amp -29\amp -16\\-10\amp 25\amp -40\\10\amp -28\amp 13 \end{array} \right)\) yields \(\boldsymbol{Q}\boldsymbol{R}=\left(\begin{array}{rrr}1/3\amp -14/15\amp 2/15\\-2/3\amp -1/3\amp -2/3\\2/3\amp 2/15\amp -11/15 \end{array} \right)\left(\begin{array}{rrr}15\amp -45\amp 30\\0\amp 15\amp 30\\0\amp 0\amp 15 \end{array} \right).\)

Prev Top Next