Chapter 4 Orthogonality
4.1 Dot Product, Norm, and Euclidean Distance
Definition 4.1 (transpose) Let A∈Mm,n(R), the transpose of A is the matrix tA∈Mn,m(R) given by
(tA)i,j=Aj,i
for all 1≤i≤n and 1≤j≤m.
Remark. The transpose corresponds to exchanging the rows with the columns.
Example 4.1 The transpose of a column vector X=[X1…Xn] is the row vector tX=[X1,…,Xn].
Example 4.2 The transpose of the matrix
A=[102321] is the matrix
tA=[130221].
Proposition 4.1 If A∈Mn,m(K) and B∈Mm,l(K), then t(AB)=(tB)(tA).
Proof. The coefficient at index (i,j) of AB is ∑mk=1Ai,kBk,j, so the coefficient at index (i,j) of t(AB) is ∑mk=1Aj,kBk,i.
The coefficient of (tB)(tA) at index (i,j) is
m∑k=1(tB)i,k(tA)k,j=m∑k=1Bk,iAj,k=m∑k=1Aj,kBk,i.
The matrices t(AB) and (tB)(tA) have the same coefficients, so they are equal.
Definition 4.2 (dot product) Let u,v be two vectors in Rn. The dot product of u and v is the real number
⟨u,v⟩=n∑i=1uivi
where (ui) and (vi) are the coordinates of u and v in the canonical basis.
Remark. If X and Y are column vectors in Rn, then ⟨X,Y⟩ is identified with the 1×1 matrix tXY.
Proposition 4.2 For all u,v,w∈Rn and λ∈R, the following properties hold:
- ⟨u,v⟩=⟨v,u⟩,
- ⟨u+v,w⟩=⟨u,w⟩+⟨v,w⟩,
- ⟨λu,v⟩=λ⟨u,v⟩,
- ⟨u,u⟩≥0,
- ⟨u,u⟩=0⟺u=0.
Proof. We prove the properties one after the other:
- ⟨u,v⟩=∑ni=1uivi=∑ni=1viui=⟨v,u⟩,
- ⟨u+v,w⟩=∑ni=1(ui+vi)wi=∑ni=1uiwi+∑ni=1viwi=⟨u,w⟩+⟨v,w⟩,
- ⟨λu,v⟩=∑ni=1λuivi=λ∑ni=1uivi=λ⟨u,v⟩,
- ⟨u,u⟩=∑ni=1u2i≥0,
- ⟨u,u⟩=∑ni=1u2i=0⟺∀i, ui=0⟺u=0.
Definition 4.3 (euclidean norm) Let u∈Rn, the Euclidean norm (or simply norm) of u is the positive real number
||u||=√⟨u,u⟩=√n∑i=1u2i.
Remark. The last property of Proposition 4.2 shows that ||u||=0⟺u=0.
Definition 4.4 (euclidean distance) Let u,v∈Rn, the Euclidean distance (or simply distance) between u and v is the positive real number
d(u,v)=||u−v||=√n∑i=1(ui−vi)2.
Remark. The norm ||u|| of a vector u is thus the distance from u to 0.
4.2 Orthogonality
Definition 4.5 Let u,v be two vectors in Rn. They are orthogonal if ⟨u,v⟩=0.
We then write u⊥v.
Example 4.3 If u=[11] and v=[1−1], then u and v are orthogonal because ⟨u,v⟩=1−1=0.
Example 4.4 The vectors e1,…,en of the canonical basis are orthogonal because ⟨ei,ej⟩=1 if i=j and ⟨ei,ej⟩=0 if i≠j.
Definition 4.6 Let E be a subset of Rn. The orthogonal of E is the set
{y∈Rn∣∀x∈E,⟨x,y⟩=0}. It is denoted E⊥. If u∈E⊥, we also write u⊥E.
Lemma 4.1 The set E⊥ is a vector subspace of Rn.
Proof. Let y1,y2∈E⊥. For all x∈E, ⟨y1+y2,x⟩=⟨y1,x⟩+⟨y2,x⟩=0, so y1+y2∈E⊥. If y∈E⊥ and λ∈R, ⟨λy,x⟩=λ⟨y,x⟩=0, so λy∈E⊥.
Definition 4.7 A set F={u1,…,uk} is orthogonal if for all i,j, ui⊥uj.
Lemma 4.2 Let F be an orthogonal set of non-zero vectors. Then F is linearly independent.
Proof. Let u1,…,uk∈F and λ1,…,λk∈R such that λ1u1+⋯+λkuk=0. We want to show that λ1=⋯=λk=0. Let v=λ1u1+⋯+λkuk and compute ⟨v,ui⟩. Since v=0, ⟨v,u⟩=0, and by linearity, ⟨v,ui⟩=∑kj=1λj⟨uj,ui⟩=λi⟨ui,ui⟩ for a fixed i. Since ui≠0, ⟨ui,ui⟩≠0, so λi=0. This shows that the set is linearly independent.
Definition 4.8 A basis B=(u1,…,un) of Rn is orthonormal if for all i≠j, ui⊥uj, and ||ui||=1 for all i∈{1,…,n}.
Remark. By Lemma 4.2, an orthogonal set with n vectors of norm 1 is automatically an orthonormal basis.
Example 4.5 The canonical basis is an orthonormal basis.
Example 4.6 The set u1=1√2[11] and u2=1√2[−11] is an orthonormal basis because ||u1||=||u2||=1 and u1⊥u2.
4.3 Projections and Closest Point
Definition 4.9 Let u be a vector in Rn and let D be the line generated by u. The projection of a vector v∈Rn onto D is the vector ⟨u,v⟩⟨u,u⟩u. It is denoted as PD(v).
Definition 4.10 Let E be a vector subspace of Rn, and u1,…,uk be an orthonormal basis of E. The projection of a vector v onto E is the vector PE(v)=∑ki=1⟨v,ui⟩ui∈E.
Remark. This is the sum of the projections onto the lines generated by ui.

Figure 4.1: The projection of vector w onto a plane.
Lemma 4.3 Let E be a vector subspace of Rn. The map u↦PE(u) is a linear application.
Proof. Let v1,v2∈Rn. Then
PE(v1+v2)=k∑i=1⟨v1+v2,ui⟩ui=PE(v1)+PE(v2)
For λ∈R and v∈Rn,
PE(λv)=λPE(v)
Thus, the orthogonal projection onto E is a linear application.
Lemma 4.4 Let E be a vector subspace of Rn and v∈Rn. Then v−PE(v)∈E⊥ and
||v||2=||PE(v)||2+||v−PE(v)||2.
Proof. Let v⊥=v−PE(v). Then v=PE(v)+v⊥.
For i∈{1,…,k},
⟨ui,v⊥⟩=⟨v,ui⟩−⟨PE(v),ui⟩=0
Thus, v⊥=v−PE(v)∈E⊥.
Now,
||v||2=||PE(v)||2+||v−PE(v)||2.
Lemma 4.5 (Cauchy-Schwarz Inequality) For u,v∈Rn, we have
|⟨u,v⟩|≤||u||||v||.
Proof. By construction, we have ⟨u,v⟩=⟨PD(u),v⟩ where D is the line generated by v. Since ||PD(u)||≤||u||, the result follows.
Proposition 4.3 (Triangle Inequality) For u,v∈Rn, we have
||u+v||≤||u||+||v||.
Remark. Let a,b,c be three points in Rn. Using u=a−b and v=b−c, we obtain
d(a,c)≤d(a,b)+d(b,c).
Proof. We compute ||u+v||2:
||u+v||2=||u||2+2⟨u,v⟩+||v||2≤(||u||+||v||)2
Thus, ||u+v||≤||u||+||v||.
Proposition 4.4 Let E be a vector subspace of Rn and u∈Rn. For all v∈E,
||u−v||≥||u−PE(u)||.
Remark. This means that PE(u) is the closest point in E to u. In particular, this point PE(u) does not depend on the orthonormal basis chosen to define it.
Definition 4.11 (Gram-Schmidt Orthogonalization) Let u1,…,uk be a basis of a subspace F of Rn. The vectors are defined by recurrence as follows:
v1=u1v2=u2−⟨u2,v1⟩⟨v1,v1⟩v1v3=u3−⟨u3,v1⟩⟨v1,v1⟩v1−⟨u3,v2⟩⟨v2,v2⟩v2⋮=⋮vk=uk−⟨uk,v1⟩⟨v1,v1⟩v1−⋯−⟨uk,vk−1⟩⟨vk−1,vk−1⟩vk−1
Finally, set ei=vi‖.
Theorem 4.1 The set (e_1, \dots, e_k) is an orthonormal basis of F.
Proof. We prove by induction on i = 1, \dots, k that (e_1, \dots, e_i) is an orthonormal basis of \mathrm{Vect}(e_1, \dots, e_i).
Base case k=1: the vector e_1 is non-zero and has norm 1.
Inductive step: Assume the result for k-1. It remains to show that e_k is orthogonal to e_i for i < k since \|e_k\| = 1, which is equivalent to v_k being orthogonal to v_i for i < k. We calculate \langle v_k, v_i \rangle = \langle u_k, v_i \rangle - \frac{\langle u_k, v_i \rangle}{\langle v_i, v_i \rangle} \langle v_i, v_i \rangle = 0.
Example 4.7 Let F be the plane defined by x + y + z = 0 and v = (1, 2, 3). We want to calculate the orthogonal projection of v onto F. We start by calculating an orthonormal basis for this. The vectors u_1 = (1, -1, 0) and u_2 = (0, 1, -1) form a basis of F.
Applying the Gram-Schmidt orthonormalization process, we get v_1 = u_1 and e_1 = \left(\frac{1}{\sqrt{2}}, \frac{-1}{\sqrt{2}}, 0\right), then v_2 = u_2 + \frac{1}{2}u_1 = \left(\frac{1}{2}, \frac{1}{2}, -1\right) and finally e_2 = \sqrt{\frac{2}{3}} \left(\frac{1}{2}, \frac{1}{2}, -1\right) = \left(\frac{1}{\sqrt{6}}, \frac{1}{\sqrt{6}}, -\frac{2}{\sqrt{6}}\right).
Since \langle v, e_1 \rangle = -\frac{1}{\sqrt{2}} and \langle v, e_2 \rangle = -\frac{3}{\sqrt{6}}, we get
\begin{align*} P_F(v) &= \frac{-1}{\sqrt{2}} e_1 - \frac{3}{\sqrt{6}} e_2 \\ &= (-1, 0, 1). \end{align*}
4.4 Orthogonal Matrices
Definition 4.12 (Orthogonal Matrix) Let A \in \mathrm{M}_n(\mathbf{R}). The matrix A is orthogonal if ^tAA = I_n.
Lemma 4.6 Let A \in \mathrm{M}_n(\mathbf{R}). Denote A_1, \dots, A_n as the column vectors of the matrix A. Let P be the matrix ^tAA. The coefficient at index (i,j) of P is \langle A_i, A_j \rangle.
Proof. Let B = ^tA. By the definition of matrix multiplication, the coefficient P_{i,j} of P is \sum_{k=1}^n B_{i,k} A_{k,j}. Since B_{i,k} = A_{k,i}, P_{i,j} = \sum_{k=1}^n A_{k,i} A_{k,j} = \langle A_i, A_j \rangle.
Proposition 4.5 A matrix A is orthogonal if and only if A is invertible with inverse ^tA if and only if the columns of A form an orthonormal basis.
Proof. The matrix equation ^tAA = I_n means exactly that ^tA is the inverse of A.
By Lemma 4.6, let P = ^tAA. Thus, the coefficient P_{i,j} is \langle A_i, A_j \rangle. Thus, P = I_n \iff \left\{\begin{matrix}\langle A_i, A_j \rangle = 0, \text{ for } i \neq j\\ \langle A_i, A_j \rangle = 1, \text{ for } i = j \end{matrix}\right. . This means exactly that the set of columns of matrix A is an orthonormal basis.
Remark. An orthogonal matrix corresponds exactly to a change of basis matrix from the standard basis to an orthonormal basis.
Example 4.8 Here are some examples.
- The matrix \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} is an orthogonal matrix in dimension 2.
- The matrix \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} is orthogonal in dimension 3.
- The matrix \frac{1}{9} \begin{bmatrix} -8 & 4 & 1 \\ 4 & 7 & 4 \\ 1 & 4 & -8 \end{bmatrix} is orthogonal in dimension 3.
Proposition 4.6 Let A be an orthogonal matrix in \mathbf{R}^n. For all vectors X, Y \in \mathbf{R}^n, we have
\langle AX, AY \rangle = \langle X, Y \rangle.
Proof. We have \langle AX, AY \rangle = ^t(AX) AY = ^tX ^tA A Y = ^tXY = \langle X, Y \rangle.
Corollary 4.1 Orthogonal changes of basis preserve orthogonality, norms, and distances: For all X, Y \in \mathbf{R}^n and orthogonal matrix A:
- X \bot Y \iff AX \bot AY,
- \|AX\| = \|X\|, and
- d(AX, AY) = d(X, Y).
Proof. We have
- X \bot Y \iff \langle X, Y \rangle = 0 \iff \langle AX, AY \rangle = 0 \iff AX \bot AY,
- \|AX\|^2 = \langle AX, AX \rangle = \langle X, X \rangle = \|X\|^2, and
- d(AX, AY) = \|AX -AY\| = \|A(X - Y)\| = \|X - Y\| = d(X, Y).
4.5 Exercises
Exercise 4.1 (Pythagorean Theorem) Let u, v \in \mathbf{R}^n.
- Show that \|u + v\|^2 = \|u\|^2 + 2 \langle u, v \rangle + \|v\|^2.
- Deduce that u \bot v \iff \|u + v\|^2 = \|u\|^2 + \|v\|^2.
- What is the connection with the Pythagorean theorem?
Exercise 4.2 Consider the vectors u_1 = (1, 0, 1), u_2 = (1, 1, 1), and u_3 = (-1, 1, 0). Apply the Gram-Schmidt process to the set (u_1, u_2, u_3).
Exercise 4.3 In \mathbf{R}^3, consider the vectors v_1 = (1, 1, 0) and v_2 = (1, 1, 1). Let F = \mathrm{Span}(v_1, v_2).
- Find an orthonormal basis for F using the Gram-Schmidt orthonormalization process.
- Calculate the image of the standard basis vectors under the orthogonal projection onto F, denoted P_F.
- Deduce the matrix of P_F in the standard basis.
- Provide a system of equations defining F^\bot.
- Provide an orthonormal basis for F^\bot.
- What is the distance from the vector (1, -1, 1) to F?
Exercise 4.4 Verify that the matrix A = \frac{-1}{3} \begin{bmatrix} -2 & 1 & 2 \\ 2 & 2 & 1 \\ 1 & -2 & 2 \end{bmatrix} is an orthogonal matrix.
Exercise 4.5 In \mathbf{R}^3, consider the point C = (1, 2, 1).
- Provide an equation for the sphere S centered at C with radius 2, i.e., the set of points at distance 2 from C.
- Provide an equation for C^\bot. What is the dimension of this subspace? Give an orthonormal basis for it.
- Represent S and C^\bot using Geogebra 3D.
- Justify that C^\bot \cap S = \emptyset.
Exercise 4.6 Let \mathcal{B} = (u_1, \dots, u_n) be a basis of \mathbf{R}^n and \mathcal{B}' = (e_1, \dots, e_n) the orthonormal basis obtained by the Gram-Schmidt process. Let R be the change-of-basis matrix from \mathcal{B}' to \mathcal{B}.
- Provide the entries of R.
- Observe that R is an upper triangular matrix.
- Let Q be the change-of-basis matrix from the standard basis \mathcal{B}_0 to \mathcal{B}'. Justify that Q is an orthogonal matrix.
- Let A be an invertible matrix in \mathrm{M}_n(\mathbf{R}). Denote u_1, \dots, u_n as its column vectors. Interpreting A as the change-of-basis matrix from \mathcal{B}_0 to \mathcal{B}, justify that A = QR.
- Justify that the solution to a linear system AX = B, where X \in \mathbf{R}^n is the unknown and B \in \mathbf{R}^n is given, is solved by X = R^{-1} (^tQ) B, and that this is computed quite easily.