Chapter 4 Orthogonality

4.1 Dot Product, Norm, and Euclidean Distance

Definition 4.1 (transpose) Let AMm,n(R), the transpose of A is the matrix tAMn,m(R) given by

(tA)i,j=Aj,i

for all 1in and 1jm.

Remark. The transpose corresponds to exchanging the rows with the columns.

Example 4.1 The transpose of a column vector X=[X1Xn] is the row vector tX=[X1,,Xn].

Example 4.2 The transpose of the matrix

A=[102321] is the matrix

tA=[130221].

Proposition 4.1 If AMn,m(K) and BMm,l(K), then t(AB)=(tB)(tA).

Proof. The coefficient at index (i,j) of AB is mk=1Ai,kBk,j, so the coefficient at index (i,j) of t(AB) is mk=1Aj,kBk,i.

The coefficient of (tB)(tA) at index (i,j) is

mk=1(tB)i,k(tA)k,j=mk=1Bk,iAj,k=mk=1Aj,kBk,i.

The matrices t(AB) and (tB)(tA) have the same coefficients, so they are equal.

Definition 4.2 (dot product) Let u,v be two vectors in Rn. The dot product of u and v is the real number

u,v=ni=1uivi

where (ui) and (vi) are the coordinates of u and v in the canonical basis.

Remark. If X and Y are column vectors in Rn, then X,Y is identified with the 1×1 matrix tXY.

Proposition 4.2 For all u,v,wRn and λR, the following properties hold:

  • u,v=v,u,
  • u+v,w=u,w+v,w,
  • λu,v=λu,v,
  • u,u0,
  • u,u=0u=0.

Proof. We prove the properties one after the other:

  • u,v=ni=1uivi=ni=1viui=v,u,
  • u+v,w=ni=1(ui+vi)wi=ni=1uiwi+ni=1viwi=u,w+v,w,
  • λu,v=ni=1λuivi=λni=1uivi=λu,v,
  • u,u=ni=1u2i0,
  • u,u=ni=1u2i=0i, ui=0u=0.

Definition 4.3 (euclidean norm) Let uRn, the Euclidean norm (or simply norm) of u is the positive real number

||u||=u,u=ni=1u2i.

Remark. The last property of Proposition 4.2 shows that ||u||=0u=0.

Definition 4.4 (euclidean distance) Let u,vRn, the Euclidean distance (or simply distance) between u and v is the positive real number

d(u,v)=||uv||=ni=1(uivi)2.

Remark. The norm ||u|| of a vector u is thus the distance from u to 0.

4.2 Orthogonality

Definition 4.5 Let u,v be two vectors in Rn. They are orthogonal if u,v=0.

We then write uv.

Example 4.3 If u=[11] and v=[11], then u and v are orthogonal because u,v=11=0.

Example 4.4 The vectors e1,,en of the canonical basis are orthogonal because ei,ej=1 if i=j and ei,ej=0 if ij.

Definition 4.6 Let E be a subset of Rn. The orthogonal of E is the set

{yRnxE,x,y=0}. It is denoted E. If uE, we also write uE.

Lemma 4.1 The set E is a vector subspace of Rn.

Proof. Let y1,y2E. For all xE, y1+y2,x=y1,x+y2,x=0, so y1+y2E. If yE and λR, λy,x=λy,x=0, so λyE.

Definition 4.7 A set F={u1,,uk} is orthogonal if for all i,j, uiuj.

Lemma 4.2 Let F be an orthogonal set of non-zero vectors. Then F is linearly independent.

Proof. Let u1,,ukF and λ1,,λkR such that λ1u1++λkuk=0. We want to show that λ1==λk=0. Let v=λ1u1++λkuk and compute v,ui. Since v=0, v,u=0, and by linearity, v,ui=kj=1λjuj,ui=λiui,ui for a fixed i. Since ui0, ui,ui0, so λi=0. This shows that the set is linearly independent.

Definition 4.8 A basis B=(u1,,un) of Rn is orthonormal if for all ij, uiuj, and ||ui||=1 for all i{1,,n}.

Remark. By Lemma 4.2, an orthogonal set with n vectors of norm 1 is automatically an orthonormal basis.

Example 4.5 The canonical basis is an orthonormal basis.

Example 4.6 The set u1=12[11] and u2=12[11] is an orthonormal basis because ||u1||=||u2||=1 and u1u2.

4.3 Projections and Closest Point

Definition 4.9 Let u be a vector in Rn and let D be the line generated by u. The projection of a vector vRn onto D is the vector u,vu,uu. It is denoted as PD(v).

Definition 4.10 Let E be a vector subspace of Rn, and u1,,uk be an orthonormal basis of E. The projection of a vector v onto E is the vector PE(v)=ki=1v,uiuiE.

Remark. This is the sum of the projections onto the lines generated by ui.

The projection of vector w onto a plane.

Figure 4.1: The projection of vector w onto a plane.

Lemma 4.3 Let E be a vector subspace of Rn. The map uPE(u) is a linear application.

Proof. Let v1,v2Rn. Then

PE(v1+v2)=ki=1v1+v2,uiui=PE(v1)+PE(v2)

For λR and vRn,

PE(λv)=λPE(v)

Thus, the orthogonal projection onto E is a linear application.

Lemma 4.4 Let E be a vector subspace of Rn and vRn. Then vPE(v)E and

||v||2=||PE(v)||2+||vPE(v)||2.

Proof. Let v=vPE(v). Then v=PE(v)+v.

For i{1,,k},

ui,v=v,uiPE(v),ui=0

Thus, v=vPE(v)E.

Now,

||v||2=||PE(v)||2+||vPE(v)||2.

Lemma 4.5 (Cauchy-Schwarz Inequality) For u,vRn, we have

|u,v|||u||||v||.

Proof. By construction, we have u,v=PD(u),v where D is the line generated by v. Since ||PD(u)||||u||, the result follows.

Proposition 4.3 (Triangle Inequality) For u,vRn, we have

||u+v||||u||+||v||.

Remark. Let a,b,c be three points in Rn. Using u=ab and v=bc, we obtain

d(a,c)d(a,b)+d(b,c).

Proof. We compute ||u+v||2:

||u+v||2=||u||2+2u,v+||v||2(||u||+||v||)2

Thus, ||u+v||||u||+||v||.

Proposition 4.4 Let E be a vector subspace of Rn and uRn. For all vE,

||uv||||uPE(u)||.

Remark. This means that PE(u) is the closest point in E to u. In particular, this point PE(u) does not depend on the orthonormal basis chosen to define it.

Definition 4.11 (Gram-Schmidt Orthogonalization) Let u1,,uk be a basis of a subspace F of Rn. The vectors are defined by recurrence as follows:

v1=u1v2=u2u2,v1v1,v1v1v3=u3u3,v1v1,v1v1u3,v2v2,v2v2=vk=ukuk,v1v1,v1v1uk,vk1vk1,vk1vk1

Finally, set ei=vi.

Theorem 4.1 The set (e_1, \dots, e_k) is an orthonormal basis of F.

Proof. We prove by induction on i = 1, \dots, k that (e_1, \dots, e_i) is an orthonormal basis of \mathrm{Vect}(e_1, \dots, e_i).

Base case k=1: the vector e_1 is non-zero and has norm 1.

Inductive step: Assume the result for k-1. It remains to show that e_k is orthogonal to e_i for i < k since \|e_k\| = 1, which is equivalent to v_k being orthogonal to v_i for i < k. We calculate \langle v_k, v_i \rangle = \langle u_k, v_i \rangle - \frac{\langle u_k, v_i \rangle}{\langle v_i, v_i \rangle} \langle v_i, v_i \rangle = 0.

Example 4.7 Let F be the plane defined by x + y + z = 0 and v = (1, 2, 3). We want to calculate the orthogonal projection of v onto F. We start by calculating an orthonormal basis for this. The vectors u_1 = (1, -1, 0) and u_2 = (0, 1, -1) form a basis of F.

Applying the Gram-Schmidt orthonormalization process, we get v_1 = u_1 and e_1 = \left(\frac{1}{\sqrt{2}}, \frac{-1}{\sqrt{2}}, 0\right), then v_2 = u_2 + \frac{1}{2}u_1 = \left(\frac{1}{2}, \frac{1}{2}, -1\right) and finally e_2 = \sqrt{\frac{2}{3}} \left(\frac{1}{2}, \frac{1}{2}, -1\right) = \left(\frac{1}{\sqrt{6}}, \frac{1}{\sqrt{6}}, -\frac{2}{\sqrt{6}}\right).

Since \langle v, e_1 \rangle = -\frac{1}{\sqrt{2}} and \langle v, e_2 \rangle = -\frac{3}{\sqrt{6}}, we get

\begin{align*} P_F(v) &= \frac{-1}{\sqrt{2}} e_1 - \frac{3}{\sqrt{6}} e_2 \\ &= (-1, 0, 1). \end{align*}

4.4 Orthogonal Matrices

Definition 4.12 (Orthogonal Matrix) Let A \in \mathrm{M}_n(\mathbf{R}). The matrix A is orthogonal if ^tAA = I_n.

Lemma 4.6 Let A \in \mathrm{M}_n(\mathbf{R}). Denote A_1, \dots, A_n as the column vectors of the matrix A. Let P be the matrix ^tAA. The coefficient at index (i,j) of P is \langle A_i, A_j \rangle.

Proof. Let B = ^tA. By the definition of matrix multiplication, the coefficient P_{i,j} of P is \sum_{k=1}^n B_{i,k} A_{k,j}. Since B_{i,k} = A_{k,i}, P_{i,j} = \sum_{k=1}^n A_{k,i} A_{k,j} = \langle A_i, A_j \rangle.

Proposition 4.5 A matrix A is orthogonal if and only if A is invertible with inverse ^tA if and only if the columns of A form an orthonormal basis.

Proof. The matrix equation ^tAA = I_n means exactly that ^tA is the inverse of A.

By Lemma 4.6, let P = ^tAA. Thus, the coefficient P_{i,j} is \langle A_i, A_j \rangle. Thus, P = I_n \iff \left\{\begin{matrix}\langle A_i, A_j \rangle = 0, \text{ for } i \neq j\\ \langle A_i, A_j \rangle = 1, \text{ for } i = j \end{matrix}\right. . This means exactly that the set of columns of matrix A is an orthonormal basis.

Remark. An orthogonal matrix corresponds exactly to a change of basis matrix from the standard basis to an orthonormal basis.

Example 4.8 Here are some examples.

  1. The matrix \frac{1}{\sqrt{2}} \begin{bmatrix} 1 & -1 \\ 1 & 1 \end{bmatrix} is an orthogonal matrix in dimension 2.
  2. The matrix \begin{bmatrix} 0 & 1 & 0 \\ 0 & 0 & 1 \\ 1 & 0 & 0 \end{bmatrix} is orthogonal in dimension 3.
  3. The matrix \frac{1}{9} \begin{bmatrix} -8 & 4 & 1 \\ 4 & 7 & 4 \\ 1 & 4 & -8 \end{bmatrix} is orthogonal in dimension 3.

Proposition 4.6 Let A be an orthogonal matrix in \mathbf{R}^n. For all vectors X, Y \in \mathbf{R}^n, we have

\langle AX, AY \rangle = \langle X, Y \rangle.

Proof. We have \langle AX, AY \rangle = ^t(AX) AY = ^tX ^tA A Y = ^tXY = \langle X, Y \rangle.

Corollary 4.1 Orthogonal changes of basis preserve orthogonality, norms, and distances: For all X, Y \in \mathbf{R}^n and orthogonal matrix A:

  1. X \bot Y \iff AX \bot AY,
  2. \|AX\| = \|X\|, and
  3. d(AX, AY) = d(X, Y).

Proof. We have

  1. X \bot Y \iff \langle X, Y \rangle = 0 \iff \langle AX, AY \rangle = 0 \iff AX \bot AY,
  2. \|AX\|^2 = \langle AX, AX \rangle = \langle X, X \rangle = \|X\|^2, and
  3. d(AX, AY) = \|AX -AY\| = \|A(X - Y)\| = \|X - Y\| = d(X, Y).

4.5 Exercises

Exercise 4.1 (Pythagorean Theorem) Let u, v \in \mathbf{R}^n.

  1. Show that \|u + v\|^2 = \|u\|^2 + 2 \langle u, v \rangle + \|v\|^2.
  2. Deduce that u \bot v \iff \|u + v\|^2 = \|u\|^2 + \|v\|^2.
  3. What is the connection with the Pythagorean theorem?

Exercise 4.2 Consider the vectors u_1 = (1, 0, 1), u_2 = (1, 1, 1), and u_3 = (-1, 1, 0). Apply the Gram-Schmidt process to the set (u_1, u_2, u_3).

Exercise 4.3 In \mathbf{R}^3, consider the vectors v_1 = (1, 1, 0) and v_2 = (1, 1, 1). Let F = \mathrm{Span}(v_1, v_2).

  1. Find an orthonormal basis for F using the Gram-Schmidt orthonormalization process.
  2. Calculate the image of the standard basis vectors under the orthogonal projection onto F, denoted P_F.
  3. Deduce the matrix of P_F in the standard basis.
  4. Provide a system of equations defining F^\bot.
  5. Provide an orthonormal basis for F^\bot.
  6. What is the distance from the vector (1, -1, 1) to F?

Exercise 4.4 Verify that the matrix A = \frac{-1}{3} \begin{bmatrix} -2 & 1 & 2 \\ 2 & 2 & 1 \\ 1 & -2 & 2 \end{bmatrix} is an orthogonal matrix.

Exercise 4.5 In \mathbf{R}^3, consider the point C = (1, 2, 1).

  1. Provide an equation for the sphere S centered at C with radius 2, i.e., the set of points at distance 2 from C.
  2. Provide an equation for C^\bot. What is the dimension of this subspace? Give an orthonormal basis for it.
  3. Represent S and C^\bot using Geogebra 3D.
  4. Justify that C^\bot \cap S = \emptyset.

Exercise 4.6 Let \mathcal{B} = (u_1, \dots, u_n) be a basis of \mathbf{R}^n and \mathcal{B}' = (e_1, \dots, e_n) the orthonormal basis obtained by the Gram-Schmidt process. Let R be the change-of-basis matrix from \mathcal{B}' to \mathcal{B}.

  1. Provide the entries of R.
  2. Observe that R is an upper triangular matrix.
  3. Let Q be the change-of-basis matrix from the standard basis \mathcal{B}_0 to \mathcal{B}'. Justify that Q is an orthogonal matrix.
  4. Let A be an invertible matrix in \mathrm{M}_n(\mathbf{R}). Denote u_1, \dots, u_n as its column vectors. Interpreting A as the change-of-basis matrix from \mathcal{B}_0 to \mathcal{B}, justify that A = QR.
  5. Justify that the solution to a linear system AX = B, where X \in \mathbf{R}^n is the unknown and B \in \mathbf{R}^n is given, is solved by X = R^{-1} (^tQ) B, and that this is computed quite easily.