The notion of projecting to a subspace is fundamental to applied linear algebra. Given \(\v\in V\) and a subspace \(S\) of \(V,\) the process of finding the closest vector in \(S\) to \(\v\) underlies the solution to various and such diverse problems as finding a least-squares regression line, rectifying physical forces, compressing data, modeling video games, and many more.
Subsection7.1.1Orthogonal complements, projections and projectors
Definition7.1.1.Orthogonal complement of a subspace.
Let \(V\) be an inner product space and \(S\) a subspace of \(V.\) Then the orthogonal complement \(\,S^\perp\) of \(S\) is the set of all vectors in \(V\) which are orthogonal to every vector in \(S,\) and \(\vec{0}.\) That is
The proof for general vector spaces lies a bit beyond the scope of this course, but we can prove it for \(V=\R^n.\)
Let \(S\) be a subspace of \(\R^n.\) If \(\dim(S)=0\) then \(S=\left\{\vec{0}\right\}\) and \(S^\perp=\R^n.\) In this case, since \(0+n=n\) the theorem holds.
So suppose \(\dim(S)=k\gt0\) and fix a basis \(\vec{v_1},\vec{v_2},\ldots,\vec{v_k}\) for \(S.\) Form the matrix
The proof is accomplished by establishing two subset inclusions: \(S\subset\left(S^\perp\right)^\perp,\) which is relatively easy and \(\left(S^\perp\right)^\perp\subset S,\) which is beyond the scope of this course (interested readers may consult [1] for a proof depending on the notion of direct sum).
So far now we have collected some general theory about projectors. After stating a crucial lemma we consider the notion of projecting specifically to a subspace.
Definition7.1.9.Operators.
An operator \(\,\boldsymbol{\mathcal{F}}\) on a vector space \(V\) is a function \(\boldsymbol{\mathcal{F}}:V\rightarrow V.\)
Definition7.1.10.Projectors.
Let \(V\) be a vector space. An operation \(\boldsymbol{\boldsymbol{\mathcal{P}}}\) on \(V\) for which two consecutive applications of \(\boldsymbol{\mathcal{P}}\) has the same effect as that of a single application of \(\boldsymbol{\mathcal{P}}\) is called a projector. That is, \(\boldsymbol{\mathcal{P}}\) is a projector on \(V\) if for all \(\v\in V,\)
The identity operator on a vector space \(V\) is defined by \(\boldsymbol{\mathcal{I}}\v=\v,\) for all \(\v\in V.\)
Theorem7.1.12.\(\boldsymbol{\mathcal{P}}\) and \(\boldsymbol{\mathcal{I}}-\boldsymbol{\mathcal{P}}\).
Let \(V\) be a vector space and \(\boldsymbol{\mathcal{P}}\) a projector defined on \(V.\) Denote by \(\boldsymbol{\mathcal{I}}\) the identity operator on \(V.\) Then \(\boldsymbol{\mathcal{I}}-\boldsymbol{\mathcal{P}}\) is a projector on \(V.\)
Proof.
The proof is a worksheet exercise.
Definition7.1.13.The orthogonal projection of a vector to a subspace.
Let \(S\) be a subspace of a vector space \(V\) and fix \(\v\in V.\) The orthogonal projection of \(\v\) to \(S,\) denoted \(\v_S,\) is the unique vector in \(S\) which is closest to \(\v\) as measured by the Euclidean distance. That is, over all \(\u\in S,\,\|\v-\u\|\) is minimized when \(\u=\v_S.\)
In the next definition note that we do not give an explicit formula for the orthogonal projector, but as is necessary for any function we do define it by its output.
Definition7.1.14.The orthogonal projector.
Let \(S\) be a subspace of a vector space \(V\) and fix \(\v\in V.\) Since \(\v_S\) is unique we define the orthogonal projector on \(V\) associated with \(S\) as the function \(\boldsymbol{\mathcal{P}}_S:\,V\rightarrow S\) where \(\boldsymbol{\mathcal{P}}_S\v=\v_S.\)
Unless an orthogonal projectors satisfies Definition 7.1.10, it is misnamed. In the next two theorems we prove that orthogonal projectors are indeed projectors.
Corollary7.1.15.\(\boldsymbol{\boldsymbol{\mathcal{P}}_S}\) is the identity on \(\boldsymbol{S}\).
Let \(S\) be a subspace of a vector space \(V\) and let \(\boldsymbol{\mathcal{P}}_S\) be the orthogonal projector associated with \(S.\) If \(\v\in S,\) then \(\boldsymbol{\mathcal{P}}_S\v=\v.\)
Proof.
Let \(S\) be a subspace of a vector space \(V,\) let \(\boldsymbol{\mathcal{P}}_S\) be the orthogonal projection operator associated with \(S\) and fix \(\v\in S.\) The closest vector in \(S\) to \(\v\) is then \(\v,\) since \(\v\) has the minimum possible Euclidean distance \(0\) from \(\v.\) It follows that \(\boldsymbol{\mathcal{P}}_S\v=\v.\)
Corollary7.1.16.\(\boldsymbol{\mathcal{P}}_S\) is a projector.
Let \(S\) be a subspace of a vector space \(V\) and let \(\boldsymbol{\mathcal{P}}_S\) be the orthogonal projector associated with \(S.\) Then \(\boldsymbol{\mathcal{P}}_S\) is a projector.
Proof.
Assume the hypotheses ald let \(\v\in V\) be arbitrary. By Corollary 7.1.15 we have \(\boldsymbol{\mathcal{P}}_S\v=\v.\) Applying \(\boldsymbol{\mathcal{P}}_S\) to both sides gives
where \(\v_S=\boldsymbol{\mathcal{P}}_S\v\) and \(\v_{S^\perp}\in S^\perp.\)
Proof.
A general proof of this important theorem is beyond the scope of this text, but we will prove it in the case of \(V=\R^n.\)
The cases of \(S=\R^n\) and \(S=\left\{\vec{0}\right\}\) are trivial because in the former, \(\v_S=\v\) and in the latter, \(\v_S=\vec{0}.\)
So assume \(S\) is a nontrivial proper subspace of \(V\) with \(k=\dim(S)\) satisfying \(1\le k\lt n.\) By Theorem 7.1.5 we have \(\dim(S^\perp)=n-k.\) Choose bases \(\left\{\boldsymbol{v_1},\boldsymbol{v_2},\ldots,\boldsymbol{v_k}\right\}\) and \(\left\{\boldsymbol{u_1},\boldsymbol{u_2},\ldots,\boldsymbol{u_{n-k}}\right\}\) for \(S\) and \(S^\perp\) respectively. We will prove that
Define \(\x:=c_1\boldsymbol{v_1}+c_2\boldsymbol{v_2}+\cdots+c_k\boldsymbol{v_k}\in S\) and \(\y:=c_1'\boldsymbol{u_1}+c_2'\boldsymbol{u_2}+\cdots+{c'}_{n-k}\boldsymbol{u_{n-k}}\in S^\perp.\) This gives us \(\x=-\y\) so \(\x\in S\cap S^\perp\) and hence \(\x\cdot\x=0=\|\x\|^2=0\) so \(\x=\vec{0}.\) Now \(\y=-\x=\vec{0}\) and since the respective sets \(\left\{\boldsymbol{v_i}\right\}\) and \(\left\{\boldsymbol{u_i}\right\}\) are linearly independent, we must have all \(c_i=0\) and all \(c_i'=0,\) guaranteeing the linear independence of the set
which by Theorem 5.4.30 is a now basis for \(\R^n.\) We have proven that the disjoint union of the bases for \(S\) and for \(S^\perp\) is a basis for \(\R^n,\) meaning any vector in \(\R^n\) can be expressed as a linear combination of the vectors in that union and since we can separate terms in that linear combination as we wish, for any \(\v\in\R^n\) we may write
where \(\v_S,\u_S\in S\) and \(\v_{S^\perp},\u_{S^\perp}\in S^\perp.\) This leads immediately to \(\v_S-\u_S\in S\cap S^\perp\) which means that \(\dpr{\left(\v_S-\u_S\right)}{\left(\v_S-\u_S\right)}=0\) which gives \(\v_S=\u_S.\) With this in hand, (7.1.2) yields \(\v_{S^\perp}=\u_{S^\perp}\) and uniqueness is proved.
Corollary7.1.18.The Pythagorean Theorem for orthogonal decompositions.
Let \(S\) be a subspace of a vector space \(V\) and let \(\u\in S\) and \(\v\in S^\perp.\) Then
Theorem7.1.19.\(\boldsymbol{\mathcal{I}}-\boldsymbol{\mathcal{P}_S}\) is the orthogonal projector to \(\boldsymbol{S}^{\boldsymbol{\perp}}\).
If \(V\) is a vector space with subspace \(S\) and \(\boldsymbol{\mathcal{P}}_S\) the associated orthogonal projector, then the orthogonal projector to \(S^\perp\) is \(\boldsymbol{\mathcal{P}}_{S^\perp}=\boldsymbol{\mathcal{I}}-\boldsymbol{\mathcal{P}}_S.\)
Proof.
Let \(V\) be a vector space with subspace \(S\) and \(\boldsymbol{\mathcal{P}}_S\) the associated orthogonal projector. By Theorem 7.1.3 and Definition 7.1.13 the orthogonal projector \(\boldsymbol{\mathcal{P}}_{S^\perp}\) is defined. We know \(S^\perp\) is a subspace of \(V\) by Theorem 7.1.3 so by Theorem 7.1.17, for any \(\v\in V\) we have the unique decomposition \(\v=\v_{S^\perp}+\v_{S^{\perp\perp}}.\) By Theorem 7.1.8 this may be re-expressed as \(\v=\v_{S^\perp}+\v_S.\) which may be expressed as \(\boldsymbol{\mathcal{I}}\v=\boldsymbol{\mathcal{P}}_{S^\perp}\v+\boldsymbol{\mathcal{P}}_S\v\) or
Since \(\v\) was arbitrary we may conclude that \(\boldsymbol{\mathcal{P}}_{S^\perp}=\boldsymbol{\mathcal{I}}-\boldsymbol{\mathcal{P}}_S.\)
Definition7.1.20.Projection error.
Let \(V\) be a vector space and \(S\) be a subspace of \(V.\) Let \(\v\in V\) and let \(\v_S=\boldsymbol{\mathcal{P}}_S\v\) be the projection of \(\v\) to \(S.\) The projection error associated with the projection of \(\v\) to \(S\) is \(\vec{e_{\boldsymbol{\mathcal{P}}_S}}:=\v-\boldsymbol{\mathcal{P}}_S\v.\) When the subspace associated with the projection is unambiguous we may drop the reference to \(\boldsymbol{\mathcal{P}}_S\) and simply write \(\vec{e}\) for the projection error.
Figure7.1.21.Projecting a vector to a one-dimensional subspace of the plane
Definition 7.1.13 is a fine definition but using it to find \(\v_S\) requires that we solve an optimization problem. The following theorem gives a somewhat improved characterization of \(\v_S\) relying on the orthogonality of the projection error, evident in Figure 7.1.21.
Lemma7.1.22.
Let \(S\) be a subspace of a vector space \(V\) and fix \(\v\in V.\) If \(\u\in S\) satisfies \(\v-\u\in S^\perp\) then for all \(\w\in S\) we have \(\|\v-\u\|\le\|\v-\w\|\) and \(\|\v-\u\|=\|\v-\w\|\) if and only if \(\w=\u.\) That is, for each \(\v\in V\) there is a unique closest element of \(S\) to \(\v.\)
Proof.
Let \(S\) be a subspace of a vector space \(V\) and fix \(\v\in V.\) Since the square root function is increasing, the direction if the inequality is preserved when taking the square root so we will first prove that if \(\u\in S\) satisfies \(\v-\u\in S^\perp\) then for all \(\w\in S\) we have \(\|\v-\u\|^2\le\|\v-\w\|^2\) from which we may conclude \(\|\v-\u\|\le\|\v-\w\|.\)
So fix \(\u\in S\) for which \(\v-\u\in S^\perp\) and let \(\w\in S.\) Then
\begin{align}
\|\v-\w\|^2=\amp\left\|\left(\v-\u\right)+\left(\u-\w\right)\right\|^2\notag\\
\amp\|\v-\u\|^2+\|\u-\w\|^2\qquad(\text{by the Pythagorean Theorem}),\tag{7.1.3}\\
\amp\qquad(\text{since }\v-\u\in S^\perp\text{ and }\u-\w\in S\notag\\
\amp\ge\|\v-\u\|^2.\notag
\end{align}
Now since the square root function is increasing, we have \(\|\v-\u\|\le\|\v-\w\|.\) Suppose then that \(\|\v-\u\|=\|\v-\w\|.\) In this case by (7.1.3) we must have \(\|\u-\w\|^2=0\) which guarantees that \(\w=\u.\)
Theorem7.1.23.Equivalent characterization of the orthogonal projection.
Let \(S\) be a subspace of a vector space \(V.\) Then \(\u\) is the orthogonal projection to \(S\) if and only if \(\vec{e_\boldsymbol{\mathcal{P}}}=(\v-\u) \in S^{\perp}.\) In this case \(\boldsymbol{\mathcal{P}}\v\) is unique. That is, if \(\vec{w_1},\vec{w_2}\in S\) and \(\vec{e_1}=\v-\vec{w_1},\vec{e_2}=\v-\vec{w_2}\in S^{\perp}\) then \(\vec{w_1}=\vec{w_2}=\boldsymbol{\mathcal{P}}\v.\)
Proof.
Suppose \(\boldsymbol{\mathcal{P}}\v=\boldsymbol{\mathcal{P}}_S\v\in S\) be the orthogonal projection of \(\v\) in \(S.\) Let \(\u\in S\) and define the function \(F_{\u}(t):\R\rightarrow\R\) by \(F_{\u}(t)=\|\v-\boldsymbol{\mathcal{P}}\v+t\u \|^2.\) Applying the properties from Theorem 2.4.2 we find
Since \(S\) is a subspace, \(\boldsymbol{\mathcal{P}}\v+t\u \in S\) and so by Definition 7.1.13 we know \(F_{\u}(t)\) has a minimum when \(t=0.\)
Thus \(F_{\u}'(0)=2(\v-\boldsymbol{\mathcal{P}}\v)\cdot \u=0.\) This shows that \((\v-\boldsymbol{\mathcal{P}}\v)\cdot\u=0\) for all \(\u\in S\) so that \(e_P=(\v-\boldsymbol{\mathcal{P}}\v)\in S^{\perp}.\)
Next suppose that \(\w \in S\) so that \(\v-\w \in S^{\perp}\) . Let \(\u \in S\) and consider
Thus \(\|\w-\u\|^2\le\|\v-\u\|^2\) for all \(\u\in S\) and so by Definition 7.1.13 we must have \(\w=\boldsymbol{\mathcal{P}}_S\v.\)
Finally we show uniqueness. If \(\v=\vec{0}\) then since \(\vec{0} \in S\) we know \(P\vec{0}=\vec{0}.\) Now suppose that \(\v\ne\vec{0}\) and \(\vec{w_1},\vec{w_2}\in S\) with both \(\vec{e_1}=\v-\vec{w_1}\) and \(\vec{e_2}=\v-\vec{w_2}\) in \(S^{\perp}.\) Then
Corollary7.1.24.Orthogonal projections and orthogonal decompositions.
In \(\v=\v_S+\v_{S^\perp},\,\v_S\) is the orthogonal projection of \(\v\) to \(S,\) and \(\v_{S^\perp}\) is the orthogonal projection of \(\v\) to \(S^\perp.\)
Proof.
The proof is a worksheet exercise.
Theorem7.1.25.Projection of \(\v\) to a subspace spanned by an orthogonal basis.
Let \(V\) be a vector space, let \(\v\in V\) and let \(S\) be a \(k\)-dimensional subspace of \(V\) with orthogonal basis
Suppose the hypotheses and express \(\v\) by Theorem 7.1.17 as \(\v=\v_S+\v_{S^\perp};\) by Corollary 7.1.24\(\,\v_S\) is the orthogonal projection of \(\v\) to \(S.\) Using Theorem 5.4.50 to express \(\v_S\) in the orthogonal basis for \(S\) we have
Suppose the hypotheses and express \(\v\) by Theorem 7.1.17 as \(\v=\v_S+\v_{S^\perp};\) by Corollary 7.1.24\(\,\v_S\) is the orthogonal projection of \(\v\) to \(S.\) Using Corollary 5.4.51 to express \(\v_S\) in the orthonormal basis for \(S\) we have
Suppose \(\v\in\R^n\) and \(\w\in\R^n\setminus\left\{\vec{0}\right\}.\) Let \(S=\text{span}\{\w\}\) and let \(\v_S\) be the orthogonal projection of \(\v\) to \(S.\) By Theorem 7.1.23, since every nonzero vector in span\(\left\{\w\right\}\) is parallel to \(\w,\) we need only show that \(\displaystyle{\left(\v-\frac{\dpr{\w}{\v}}{\dpr{\w}{\w}}\w\right)\perp\w}.\)
By Property 4 of Definition 5.1.27 we have that \(\displaystyle{\left(\v-\frac{\dpr{\w}{\v}}{\dpr{\w}{\w}}\w\right)\perp\w}\) and hence every vector in span\(\left\{\w\right\}\) is orthogonal to \(\displaystyle{\frac{\dpr{\w}{\v}}{\dpr{\w}{\w}}\w}.\) By Theorem 7.1.23 the orthogonal projection of \(\v\) to span\(\left\{\w\right\}\) is given by \(\displaystyle{\mathcal{P}_{\{\w\}}\v=\frac{\dpr{\w}{\v}}{\dpr{\w}{\w}}\w.}\)
Example7.1.28.
What is the projection of the vector \(\x=\begin{pmatrix}x\\y\\z \end{pmatrix}\) to the \(z\)-axis in \(\R^3\text{?}\) Answer: \(\boldsymbol{\mathcal{P}}\begin{pmatrix}x\\y\\z \end{pmatrix} =\begin{pmatrix}0\\0\\z \end{pmatrix}.\) Note that the operation \(\boldsymbol{\mathcal{P}}\) on \(\x\) can be expressed as matrix-vector multiplication:
What is the projection of the vector \(\x=\begin{pmatrix}x\\y\\z \end{pmatrix}\) to the \(xz\)-plane \(\left\{\left.\begin{pmatrix}x\\y\\z \end{pmatrix} \right|\,y=0,\,x,z\in\R\right\}\) in \(\R^3\text{?}\) Answer: \(\begin{pmatrix}x\\0\\z \end{pmatrix}.\) Note that as in Example 7.1.28 the operation \(\boldsymbol{\mathcal{P}}\) on \(\x\) can be expressed as matrix-vector multiplication:
Note that the matrices \(\P\) and \(\I-\P\) in each of the examples above have the property that \(\P^2=\P\) and \((\I-\P)^2=\I-\P.\) This important property of projections naturally applies to projection matrices.
Remark7.1.31.
It turns out that the orthogonal projection operator \(\P\) for a subspace \(S\) of \(\R^m\) can always be defined via multiplication by a matrix \(\P.\) The projection matrices in Examples 7.1.28 and Example 7.1.29 were easy to derive, but it is less clear how to derive a projection matrix for the cases for subspaces of \(\R^n\) which are not coordinate axes, coordinate planes, or any higher-dimensional equivalents.
Our goal in this section is to determine how to build the matrix \(\P\) so that \(\mathcal{P}_S\b=\P\b\) for a projection of a vector \(\b \in \R^m\) to a subspace \(S\) of \(\R^m.\) Concretely, suppose we want to project \(\b\) to the subspace of \(\R^m\) spanned by a basis \(B=\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\subset\R^m.\) Since \(B\) is linearly independent, by Corollary 6.1.38 the matrix
has rank \(r(\A)=k\le m.\) Although such an \(\A\) may not be invertible since it may not be square, for such \(\A,\,\A^T\A\) is always invertible. This fact is crucial to our derivation of the formula for projection matrices.
We demonstrate that for such \(\A,\,\A^T\A\) is always invertible by first showing in Lemma 7.1.32 that \(N\left(\A^T\A\right)=N(\A),\) then noting that \(N(\A)=\left\{\vec{0}\right\}.\) After Corollary 7.1.33 we will return to our present derivation, in Remark 7.1.34.
Lemma7.1.32.The nullspace of the Gram matrix is the same as the nullspace of \(\A\).
We prove this by showing that \(N(\A^T\A)=N(\A)\) are subsets of each other. First, let \(\x\in N(\A)\text{;}\) WTS \(\x\in N(\A^T\A).\) Then \(\A\x=\vec{0}\) so \(\A^T\A\x=\A^T\vec{0}=\vec{0}.\) Thus if \(\x\in N(\A),\) then \(\x\in N(\A^T\A)\) and we conclude that \(N(\A)\subseteq N(\A^T\A).\)
Now let \(\y\in N(\A^T\A);\) WTS \(\y\in N(\A).\) Since \(\y\in N(\A^T\A),\) we have \(\A^T\A\y=\vec{0}\) so that \(\y^T\A^T\A\y=\y^T\vec{0}=0.\) But \(\y^T\A^T\A\y\) equals the sum of the squares of the components of \(\A\y,\) so all the components of \(\A\y\) equal zero, making \(\A\y=\vec{0}\) as desired. Thus if \(\y\in N(\A^T\A),\) then \(\y\in N(\A),\) and we infer that \(N(\A^T\A)\subseteq N(\A).\)
From the two subset inclusions we conclude that \(N(\A^T\A)=N(\A),\) as desired.
Corollary7.1.33.When \(\boldsymbol{\A}\) is of full rank, the Gram matrix is nonsingular.
For the matrix \(\A\) in (7.1.4), \(\A^T\A\) is invertible.
Proof.
Let \(\A\,=\,\begin{pmatrix}|\amp |\amp \cdots\amp |\\ \vec{v_1}\amp \vec{v_2}\amp \cdots\amp \vec{v_k}\\|\amp |\amp \cdots\amp | \end{pmatrix}\) where the set of \(\vec{v_i}\)’s is linearly independent. By Lemma 7.1.32, \(\,N(\A^T\A)=N(\A).\) Since \(\A\) has linearly independent columns by construction, by Theorem 5.4.6\(\,N(\A)=\left\{\vec{0}\right\}\) and hence \(N\left(\A^T\A\right)=\left\{\vec{0}\right\}.\) Thus by Theorem 5.4.6 the set of columns of \(\,\A^T\A\) is linearly independent so by Theorem 5.4.8\(\A^T\A\) is invertible.
Remark7.1.34.
Continuing with Remark 7.1.31 we find the projection matrix \(\P\) and the projection \(\P\b\) of \(\b\) to span\(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}.\) We seek the particular linear combination of the columns of \(\A\,=\,\begin{pmatrix}|\amp |\amp \cdots\amp |\\ \vec{v_1}\amp \vec{v_2}\amp \cdots\amp \vec{v_k}\\|\amp |\amp \cdots\amp | \end{pmatrix}\) which equals \(\P\b;\) that is we seek \(\x_{\P\b}=\begin{pmatrix}x_1\\x_2\\ \vdots\\x_k \end{pmatrix}\) for which
\begin{align}
\x_{\P\b}^T\amp =\vec{0}^T\qquad\text{ or } \nonumber\tag{7.1.8}\\
\A^T\A\x_{\P\b}\amp =\A^T\b.\tag{7.1.9}
\end{align}
We have that
\begin{align*}
\x_{\P\b}^T=\vec{0}^T\amp \Longleftrightarrow\x_{\P\b}=\vec{0}\,\overset{(7.1.2)}{\Longleftrightarrow}\,\P\b=\vec{0}\\
\amp \Longleftrightarrow\b\perp\vec{v_i}\text{ for each } i
\end{align*}
which is detectable by checking whether \(\A^T\b=\vec{0}.\) If this holds, we conclude that \(\b\in\left(\text{ span } \{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\right)^\perp\) but we can make no conclusion about \(\P.\) That is not a problem, since the condition \(\A^T\b=\vec{0}\) is handled in the case below.
If \(\x_{\P\b}^T\ne\vec{0},\) then \(\A^T\A\x_{\P\b}=\A^T\b.\) Equation (7.1.9) is called the normal equation. Since \(\A^T\A\) is invertible by Corollary 7.1.33, the normal equation is solved by
Thus \(\P\b=\A\x_{\P\b}=\A(\A^T\A)^{-1}\A^T\b\) and we have \(\P=\A(\A^T\A)^{-1}\A^T.\)
Definition7.1.35.
Let \(\A\,=\,\begin{pmatrix}|\amp |\amp \cdots\amp |\\ \vec{v_1}\amp \vec{v_2}\amp \cdots\amp \vec{v_n}\\|\amp |\amp \cdots\amp | \end{pmatrix}\in\R^{m\times n}\) where the set of \(\vec{v_i}\)’s is linearly independent. Then the normal equation is
\begin{equation}
\P_{\text{ col }(\A)}=\A(\A^T\A)^{-1}\A^T.\tag{7.1.12}
\end{equation}
Example7.1.36.
Project \(\begin{pmatrix}1\\1\\1 \end{pmatrix}\) to span\(\left\{\begin{pmatrix}1\\2\\2 \end{pmatrix} \right\}.\)
Solution: We have \(\A=\begin{pmatrix}1\\2\\2 \end{pmatrix}\) so that \(\A^T\A=9\) and \(\A^T\b=5.\) Solving \(9\x_{\P\b}=5\) for \(\x_{\P\b}\) yields \(\displaystyle{\x_{\P\b}=\frac{5}{9}}.\) Thus \(\P\b=\A\x_{\P\b}=\begin{pmatrix}5/9\\10/9\\10/9\end{pmatrix} .\)
Example7.1.37.
Project \(\left(\begin{array}{r}3\\1\\-2 \end{array} \right)\) to span\(\left\{\begin{pmatrix}1\\1\\1 \end{pmatrix} ,\begin{pmatrix}0\\1\\2 \end{pmatrix} \right\}.\)
Solution: We have \(\A=\begin{pmatrix}1\amp 0\\1\amp 1\\1\amp 2 \end{pmatrix}\) so that \(\A^T\A=\begin{pmatrix}3\amp 3\\3\amp 5 \end{pmatrix}\) and \(\A^T\b=\left(\begin{array}{rr}2\\-3 \end{array} \right).\) Solving \(\begin{pmatrix}3\amp 3\\3\amp 5 \end{pmatrix} \x_{\P\b}=\left(\begin{array}{r}2\\-3 \end{array} \right)\) for \(\x_{\P\b}\) yields \(\x_{\P\b}=\left(\begin{array}{r}19/6\\-5/2 \end{array} \right).\) Thus \(\P\b=\A\x_{\P\b}=\left(\begin{array}{r}19/6\\2/3\\-11/6 \end{array} \right).\)
Example7.1.38.
Find the projection matrix \(\P\) for span\(\left\{\begin{pmatrix}1\\1\\1 \end{pmatrix} ,\begin{pmatrix}0\\1\\2 \end{pmatrix} \right\}\) and verify that for \(\b=\left(\begin{array}{r}3\\1\\-2 \end{array} \right)\) we have \(\P\b=\left(\begin{array}{r}19/6\\2/3\\-11/6 \end{array} \right)\) as in Example 7.1.37.
Solution: As above, we have \(\A^T\A=\begin{pmatrix}3\amp 3\\3\amp 5 \end{pmatrix}\) so that \(\displaystyle{\left(\A^T\A\right)^{-1}=\frac{1}{6}\left(\begin{array}{rr}5\amp -3\\-3\amp 3 \end{array} \right)}.\) Thus
Tactical Alert! There is a significant difference in the amount of work required between computing the projection of a given vector \(\b\) to the span of \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\},\) and finding a matrix \(\P\) which will project any vector to span\(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\) - just as there is a difference between solving \(\A\x=\b\) for a given \(\A,\b\) by elimination on the matrix \(\A\) augmented by that specific \(\b,\) and finding \(\A^{-1}\) for use in the solution \(\x=\A^{-1}\b\) which is applicable to any\(\,\b.\) To find \(\P_{\text{ col } (\A)}\b\) we need only solve the normal equation \(\A^T\b=\A^T\A\x_{\P\b}\) for \(\x_{\P\b}.\) That is, we do not need to go the extra step and find the projection matrix \(\P\) unless we’re asked to do so. Algorithmically,
If you are asked to project some \(\b\) to the span of some set of linearly independent vectors \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\text{:}\)
Form a matrix \(\A\) with the vectors \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\) as columns (skip this step if you have been asked to project to col\((\A)\)).
Solve the normal equation \(\A^T\A\x_{\P\b}=\A^T\b\) for \(\x_{\P\b}.\)
Calculate \(\P_{\text{ col } (\A)}\b=\A\x_{\P\b}.\)
If you are asked to find the projection matrix \(\P\) for the span of some linearly independent set \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\text{:}\)
Form a matrix \(\A\) with the vectors \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\) as columns (skip this step if you have been asked to find the projection matrix for col\((\A)\)).
Calculate \(\A\left(\A^T\A\right)^{-1}\A^T.\)
Remark7.1.40.
Note that if you are asked to project a set \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\) of vectors which has not already been identified to be linearly independent:
Form the matrix \(\A\) with \(\{\vec{v_1},\,\vec{v_2},\,\ldots\vec{v_k}\}\) as columns (skip this step if you have been asked to project to col\((\A)\)).
Find a basis \(\{\vec{u_1},\,\vec{u_2},\,\ldots\vec{u_{k'}}\}\) for col\((\A).\)
Form a matrix \(B\) with \(\{\vec{u_1},\,\vec{u_2},\,\ldots\vec{u_{k'}}\}\) as columns.
A final note - if we are asked to project a vector \(\b\) to \(\left(\text{ col } (\A)\right)^\perp\) we may take advantage of the following identity:
\begin{equation*}
\b=\P_{\text{ col } (\A)}\b\,+\,\P_{\left(\text{ col } (\A)\right)^\perp}\b.
\end{equation*}
That is, \(\P_{\left(\text{ col } (\A)\right)^\perp}\b=\b-\P_{\text{ col } (\A)}\b.\) If the projection matrix for the projection to col\((\A)\) is \(\P,\) then the projection matrix for the projection to \(\left(\text{ col } (\A)\right)^\perp\) is simply \(\I-\P.\)
Subsection7.1.3Application of Projectors: Least Squares Approximation
In this short section we solve the normal equations to construct a polynomial \(p(x)\) of degree \(n\) which is as close as possible, in the Euclidean sense, to all the points in a given set of \(m\gt n+1\) points in \(\R^2.\)
Recall Subsection 3.1.5 in which we described how to find the equations of the unique line passing through two points with different \(x\)-coordinates and of the unique parabola passing through three points with different \(x\)-coordinates. Of course the line cannot be forced to pass through, for example, three non-collinear points, there will be no solution. But we can find the line that is simultaneously closest to the three points. You may already be familiar with this line, called the regression line in basic algebra and statistics.
The line will not contain every \((x_i,y_i)\) when \(\left(\begin{array}{c}y_1\\y_2\\ \vdots\\y_n\end{array}\right)\) is not an element of the column space of the matrix \(\A=\left(\begin{array}{cc}x_1\amp 1\\x_2\amp 1\\ \vdots\amp \vdots\\x_n\amp 1 \end{array} \right).\)
So generally, we are left with the task of finding the line \(\left\{\left.(x,y)\in\R^2\right|y=mx+b\right\},\) specified by the values of \(m\) and \(b\) which is closest to the points in \(\{(x_1,y_1),(x_2,y_2),\ldots,(x_n,y_n)\}.\) To accomplish this task we may appeal to the methods of Remark 7.1.31 by solving the normal equations for \(m,b.\) We need not find the actual projection of \(\left(\begin{array}{c}y_1\\y_2\\ \vdots\\y_n \end{array} \right)\) to col\((\A)\) any more than we’d need to find \(\A^{-1}\) to solve \(\A\x=\b;\) see Remark 7.1.39.
Example7.1.43.
Let \((1,0),(2,1),(3,3)\in\R^2.\) Find the line closest in the least-squares sense to this set of points.
which are satisfied by \(\displaystyle{m=\frac{3}{2}}\) and \(\displaystyle{b=-\frac{5}{3}}.\) The equation describing the least-squares line is therefore \(\displaystyle{y=\frac{3}{2}x-\frac{5}{3}}.\) !!!!! GRAPH HERE !!!!!
Remark7.1.44.
If we are asked to find the closest parabola in the least-squares sense to \(\{(x_1,y_1),(x_2,y_2),\ldots,(x_n,y_n)\}\) the process is the same, except that we now model with the equation \(y=ax^2+bx+c\) which yields
which are often found, without explanation, in elementary statistics texts or notes. Now you know where they come from! Corresponding formulas can be derived for the coefficients of higher-order least-squares curves like parabolas, cubics, etc.