Skip to main content

Section 7.4 Gram-Schmidt Orthogonalization and QR-Factorization

In this section we explore the notion of starting with a linearly independent set \(\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\}\) and transforming it into an orthonormal basis \(\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\}\)of span\(\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\}.\) Along with the notion of projection we’ll need some additional tools first.

Proof.

The proof is a worksheet exercise.

Definition 7.4.2. Orthogonal matrices.

An orthogonal matrix is an \(n\times n\) matrix with orthonormal columns. (We wish they were called orthonormal matrices, but they are not.)

Proof.

The proof is a worksheet exercise.
Matrix-vector multiplication by orthogonal matrices preserves the length of a vector and the inner product.

Proof.

The proof is a homework exercfise.

Remark 7.4.5.

In light of this definition we may recast our problem in this section as that of creating an orthonormal basis from a linearly independent basis. An understanding of the process by which we will accomplish this requires an appreciation for one particularly interesting property of projection matrices, which are illustrated by the next two theorems.

Proof.

We want to demonstrate subset inclusion in each direction. First fix a vector of coefficients \(\b\in\R^m\) and form the linear combination \((\I-\P)\b\) of the columns of \(\I-\P.\) To see if this is in \(N(\P)\) we multiply on the left by \(\P:\)
\begin{equation*} \P(\I-\P)\b=(\P-\P^2)\b=0_{m\times m}\b=\vec{0} \end{equation*}
so indeed col\((\I-\P)\subseteq N(\P).\) Next, to show that \(N(\P)\subseteq\) col\((\I-\P),\) note that if \(\b\in N(\P)\) so that \(\P\b=\vec{0},\) then \((\I-\P)\b=\b\) which tells us that \(\b\in\) col\((\I-\P)\) since there is a coefficient vector (namely \(\b\) itself) from which we can form \(\b\) as a linear combination of the columns of \((\I-\P).\) Thus \(N(\P)\) is a subset of col\((\I-\P)\) and hence col\((\I-\P)=N(\P)\) as desired.

Proof.

This follows by applying Theorem 7.4.6 to the projection matrix \(\I-\P.\)

Remark 7.4.8.

Theorem 7.4.6 and Theorem 7.4.7 show that any projection matrix \(\P\) decomposes \(\R^m\) into two orthogonal subspaces col\((\P)\) and col\((\I-\P).\) For any \(\x\in\) col\((\P)\) and \(\y\in\) col \((\I-\P)\) we have \(\x\perp\y.\) This notion forms the basis for a process by which we take a linearly independent set \(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}\) and transform it into an orthogonal basis \(\left\{\vec{q_1},\vec{q_2},\ldots,\vec{q_n}\right\}\) of span\(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}.\) This process is called Gram-Schmidt orthogonalization.

Remark 7.4.9.

Figure 7.4.10. Step 2 of the Gram-Schmidt process
Suppose \(\left\{\vec{v_1},\vec{v_2},\ldots,\vec{v_n}\right\}\subset\R^m\) is a linearly independent set.
First we create an orthogonal (but not necessarily orthonormal) basis \(\left\{\vec{u_i}\right\}_{i=1}^n\) as follows.
We select the initial direction to be that of \(\vec{v_1}\) and set \(\vec{u_1}=\vec{v_1}\text{.}\)
We create \(\vec{u_2}\) by decomposing \(\vec{v_2}\) using Theorem 7.1.27:
\begin{gather} \vec{v_2}=\vec{u_2}+\frac{\dpr{\vec{u_1}}{\vec{v_2}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}\tag{7.4.1} \end{gather}
which yields
\begin{gather} \vec{u_2}=\vec{v_2}-\frac{\dpr{\vec{u_1}}{\vec{v_2}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}\tag{7.4.2} \end{gather}
Note that \(\vec{u_2}\perp\vec{u_1}\) (take dot products).
To create \(\vec{u_3}\) we decompose \(\vec{v_3}\) as:
\begin{gather} \vec{v_3}=\vec{u_3}+\frac{\dpr{\vec{u_1}}{\vec{v_3}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}+\frac{\dpr{\vec{u_2}}{\vec{v_3}}}{\dpr{\vec{u_2}}{\vec{u_2}}}\vec{u_2}\tag{7.4.3} \end{gather}
yielding
\begin{gather} \vec{u_3}=\vec{v_3}-\frac{\dpr{\vec{u_1}}{\vec{v_3}}}{\dpr{\vec{u_1}}{\vec{u_1}}}\vec{u_1}-\frac{\dpr{\vec{u_2}}{\vec{v_3}}}{\dpr{\vec{u_2}}{\vec{u_2}}}\vec{u_2}\tag{7.4.4} \end{gather}
Taking dot products shows \(\vec{u_3}\perp\vec{u_1}\) and \(\vec{u_3}\perp\vec{u_2}\text{.}\)
We continue in this manner until the set of linearly independent vectors is exhausted. The \(j^{\text{th}}\) such vector \(\vec{u_j}\) will be orthogonal to each of \(\vec{u_1},\ldots,\vec{u_{j-1}}.\)
Finally we normalize each of the \(\vec{u_j}\) to obtain the orthonormal vectors \(\vec{q_j}.\)
A neat feature of the Gram-Schmidt process is that it provides another factorization of a matrix. How? We move the subtracted terms in (7.4.5) to the other side to get
\begin{align} \vec{v_1}\amp=\vec{u_1}\,=\,\left\|\vec{u_1}\right\|\vec{q_1}\notag\\ \vec{v_2}\amp=\vec{u_2}+\frac{\vec{u_1}^T\vec{v_2}}{\vec{u_1}^T\vec{u_1}}\vec{u_1}\,=\,\left\|\vec{u_2}\right\|\vec{q_2}+\frac{\vec{u_1}^T\vec{v_2}}{\left\|\vec{u_1}\right\|}\frac{\vec{u_1}}{\left\|\vec{u_1}\right\|}\notag\\ \amp=\left\|\vec{u_2}\right\|\vec{q_2}+\left(\dpr{\vec{q_1}}{\vec{v_2}}\right)\vec{q_1}\notag\\ \vec{v_3}\amp=\vec{u_3}+\frac{\vec{u_1}^T\vec{v_3}}{\vec{u_1}^T\vec{u_1}}\vec{u_1}+\frac{\vec{u_2}^T\vec{v_3}}{\vec{u_2}^T\vec{u_2}}\vec{u_2}\notag\\ \amp=\left\|\vec{u_3}\right\|\vec{q_3}+\left(\dpr{\vec{q_1}}{\vec{v_3}}\right)\vec{q_1}+\left(\dpr{\vec{q_2}}{\vec{v_3}}\right)\vec{q_2}\notag\\ \amp \qquad\qquad\qquad\vdots\nonumber\notag\\ \vec{v_n}\amp=\vec{u_n}+\sum_{i=1}^{n-1}\frac{\vec{u_i}^T\vec{v_n}}{\vec{u_i}^T\vec{u_i}}\vec{u_i}\notag\\ \amp=\left\|\vec{u_n}\right\|\vec{q_n}+\sum_{i=1}^{n-1}\left(\dpr{\vec{q_i}}{\vec{v_n}}\right)\vec{q_i}\tag{7.4.6} \end{align}
which expresses each vector \(\vec{v_j}\) as a linear combination of the \(\displaystyle{\{\vec{q_i}\}_{i\le j}}\) whose coefficients are the dot products of \(\vec{v_j}\) with the unit direction vectors in each of the directions determined up to that point.
Remember the \(LU\)-factorization? We may now construct the \(QR\)-factorization in which we write \(\A\) as the product of orthogonal and upper-triangular matrices. This factorization is made explicit in the next theorem.

Proof.

The proof, a simple calculation, is a worksheet exercise.

Example 7.4.13.

Find the \(QR\)-factorization of \(\A=\left(\begin{array}{rrr}5\amp -29\amp -16\\-10\amp 25\amp -40\\10\amp -28\amp 13 \end{array} \right).\)
Solution: Using Gram-Schmidt on the columns of \(\A=\left(\begin{array}{rrr}5\amp -29\amp -16\\-10\amp 25\amp -40\\10\amp -28\amp 13 \end{array} \right)\) yields \(\boldsymbol{Q}\boldsymbol{R}=\left(\begin{array}{rrr}1/3\amp -14/15\amp 2/15\\-2/3\amp -1/3\amp -2/3\\2/3\amp 2/15\amp -11/15 \end{array} \right)\left(\begin{array}{rrr}15\amp -45\amp 30\\0\amp 15\amp 30\\0\amp 0\amp 15 \end{array} \right).\)