Skip to main content

Section 1.3 Linear algebra

Subsection 1.3.1 Motivation

One of the most profound ideas of linear algebra is that any finite dimensional vector space over \(\R\) or \(\C\) is secretly \(\R^n\) or \(\C^n\). This insight allows us to reduce the study of vector spaces and the maps between them to the study of matrices.

The key idea is that every finite dimensional vector space can be represented in coordinates once we choose a basis. We denote the representation of a vector \(v \in V\) with respect to a basis \(\mathcal V\) by \(v_\mathcal{V}\text{.}\) Better yet, that basis can be chosen to be orthonormal by way of the Gram-Schmidt process and the dot product structure of Euclidean space. The coordinatization of \(V\) also gives unique representations of linear maps betwen those spaces.

Typical examples introduced in a linear algebra course include the space of polynomials of degree less than or equal to \(n\text{.}\) At the same time, we usually also get to see a very suggestive example of a useful linear map and the representation of that map in matrix form.

Let \(P_n\) denote the space of polynomials of degree \(\leq n\text{.}\) Consider the map \(D: P_3 \to P_2\) defined by

\begin{equation*} D(a_0 + a_1 t + a_2 t^2 + a_3 t^3) = a_1 + 2 a_2 t + 3 a_3 t^2. \end{equation*}

That is, \(D\) is the map that takes the derivative of a polynomial. It isn't hard to use the standard basis for \(P_n\) to get the matrix representation

\begin{equation*} D(p) = \bbm 0 \amp 1 \amp 0 \amp 0 \\ 0 \amp 0 \amp 2 \amp 0 \\ 0 \amp 0 \amp 0 \amp 3 \ebm \bbm a_0 \\ a_1 \\ a_2 \\ a_3 \ebm \end{equation*}

for the action of \(D\) on \(P_3\text{.}\)

This example is a good place to begin asking questions about how far we can push finite dimensional linear algebra. The fact that differentiation of polynomials is wonderful - but what else can we apply it to? Nice functions have power series that converge absolutely, and we like to think of an absolutely convergent power series as sort of an “infinite polynomial”. Our intution might lead us to make a connection with calculus at this point. When we learn to work with power series, we learn that for a convergent power series,

\begin{equation*} \frac{d}{dx} \sum a_n (x-a)^n = \sum n a_n (x-a)^{n-1}. \end{equation*}

In analogy with our example about polynomials above, we're tempted to write, for a function \(f\) defined by a convergent power series, that

\begin{equation*} D(f) = \underbrace{\bbm 0 \amp 1 \amp 0 \amp 0 \amp \ldots \\ 0 \amp 0 \amp 2 \amp 0 \amp \ldots \\ 0 \amp 0 \amp 0 \amp 3 \amp \ldots \\ \vdots \amp \amp \amp \amp \ddots \ebm}_{A} \bbm a_0 \\ a_1 \\ a_2 \\ a_3 \\ \vdots \ebm = a_1 + 2a_2 (x - a) + 3 a_3 (x-1)^2 + \ldots. \end{equation*}

This idea is shot through with issues that need to be addressed.

  • The object \(A\) is some kind of \(\infty \times \infty\) matrix. How does that make sense?
  • What are the vector spaces that \(D\) is mapping between?
  • Does the idea of coordinatization still work?
  • If it does, what exactly is “\(\R^\infty\)” supposed to be?
  • Do infinite dimensional vector spaces and bases make sense at all?

Subsection 1.3.2 Inner products

The dot product of two vectors in \(\C^n\) is

\begin{equation} x \cdot y = \sum_{i=1}^n \cc y_i x_i.\label{def-dot}\tag{1.3.1} \end{equation}

Standard notation for the dot product is \(\ip{x}{y}\) and in \(\C^n\) is equivalent to \(y\ad x\text{,}\) where \(\ad\) designates the conjugate transpose of a matrix. The dot product has the following properties:

  1. \(\displaystyle \ip{x}{y} = \cc{\ip{y}{x}} \hspace{.2in} \text{conjugate symmetry}\)
  2. \(\displaystyle \ip{x + y}{z} = \ip{x}{y} + \ip{y}{z} \hspace{.2in} \text{linearity in the first term}\)
  3. \(\displaystyle \ip{x}{x} \geq 0 \hspace{.2in} \text{non-negativity}\)

Once we have the dot product, we can start building the geometry of \(\C^n\text{.}\) First, note that

\begin{equation} \norm{x}^2 = \ip{x}{x} = \sum_{i=1}^n \abs{x}^2.\label{eq-Euclidean-norm}\tag{1.3.2} \end{equation}

Motivated by the real case, we say that two vectors \(x, y\) are orthogonal and write \(x \perp y\) if \(\ip{x}{y} = 0\text{.}\)

Another important inequality is indicated by the relationship between angles and the dot product in \(\R^n\text{,}\) where we have

\begin{equation*} \ip{x}{y} = \norm{x}\norm{y}\cos \theta, \end{equation*}

where \(\theta\) is the angle between the vectors. While the idea of “angle” doesn't make sense in \(\C^n\) (at least in the same way), we still have the Cauchy-Schwarz inequality

\begin{equation} \abs{\ip{x}{y}} \leq \norm{x}\norm{y}.\tag{1.3.3} \end{equation}

Orthogonality also underlies the vector version of the Pythagorean theorem,

\begin{equation} \norm{x}^2 + \norm{y}^2 = \norm{x+ y}^2 \iff x\perp y.\tag{1.3.4} \end{equation}

Finally, it would be remiss to leave out the single most important inequality in mathematics, our old friend the triangle inequality, which in vector terms can be expressed

\begin{equation} \norm{x + y} \leq \norm{x} + \norm{y}\tag{1.3.5} \end{equation}

Because finite dimensional vector spaces have representations in coordinates as \(\R^n\) or \(\C^n\text{,}\) all finite dimensional vector spaces carry the geometric structure delineated above.

Subsection 1.3.3 Basis and coordinates

Let \(V\) be a vector space over a field \(\F\text{.}\) Recall that a (finite) set of vectors \(S \subset V\) is linearly independent if only the trivial solution exists for the equation

\begin{equation} 0 = \sum_\mathcal{I} c_i v_i.\tag{1.3.6} \end{equation}

A set \(S\) of vectors in \(V\) is said to span \(V\) if every vector in \(V\) can be realized as a linear combination of vectors in \(S\text{.}\) That is, given \(v \in V\text{,}\) there exist coefficients \(c_i\) so that

\begin{equation*} v = \sum_{\mathcal I} c_i v_i. \end{equation*}

A basis \(\mathcal V\) for \(V\) is a subset of \(V\) so that \(\mathcal V\) is linearly independent and \(\mathcal V\) spans \(V\text{.}\) It is a major result that every vector space has a basis. The full result requires the invocation of Zorn's Lemma or other equivalents of the axiom of choice and will not be proven here. (A nice argument can be found here.) Our interest is in modeling vector spaces the carry the logic and structure of Euclidean space. The dimension of \(V\) is the order of a basis \(\mathcal V\text{.}\) If the basis has a finite number of elements, say \(n\text{,}\) then \(V\) is called finite dimensional. In particular, (and clearly providing motivation for the definition), \(\dim \R^n = n\text{.}\)

Suppose that \(V\) is a finite dimensional vector space with a basis \(\mathcal V\text{.}\) Let \(v\) be a vector in \(V\text{.}\) Then the coordinates of \(v\) with respect to \(\mathcal V\) are the constants \(c_i\) so that \(v = \sum_{\mathcal I} c_i v_i\text{.}\) These coordinates are unique once we have fixed a basis \(\mathcal V\text{.}\) That is, we have a bijective relationship between the vectors \(v \in V\) and the coordinate representations \(\bbm c_1 \\ \vdots \\ c_n \ebm \in \F^n\text{.}\) In \(\F^n\text{,}\) the coordinate representation of a vector is straightforward to compute using the dot product.

Furthermore, we can use the coordinate representation to write representing matrices for linear functions \(T:V \to W\text{.}\) Suppose that \(V, W\) are vector spaces of dimension \(m,n\) respectively over \(\F\text{.}\) Then

where \(A\) is the matrix that represents \(T\) and \(i\) is the natural bijection- the coordinatization - between \(V, W\) and \(\F^m, \F^n\) respectively. We should note that matrix multiplication is defined so that
reduces to the diagram
That is, the representing matrix of a composition is the product of the representing matrices of the functions.

Any basis of a vector space can be replaced with an equivalent basis of orthonormal vectors - the algorithm for creating an orthonormal basis from a basis is called the Gram-Schmidt process.

Subsection 1.3.4 Operators

When a linear function maps \(V\) into itself, special things happen. First, the matrix that represents \(T: \F^n \to \F^n\) is square. There are a large number of equivalences between the structure of square matrices, linear maps, and sets of vectors. Many of these are captured in the invertible matrix theorem, one of the central objects of study in elementary linear algebra.

Operators contain more information than the invertibility of the functions that they represent. For the following discussion, let us fix a basis of a vector space \(V\) and let \(A\) be the matrix that represents a function \(T: V \to V\text{.}\) A scalar \(\la\) and a vector \(v\) are said to be an eigenpair for \(A\) if

\begin{equation*} A v = \la v. \end{equation*}

It is straightforward to see that the set of all vectors \(v\) for which the eigenvector equation holds is a subspace of \(V\text{,}\) called the eigenspace associated with \(\la\text{.}\) The eigenspaces of the matrix \(A\) are its invariant subspaces, which is to say that a vector in an eigenspace is mapped by \(A\) to the same eigenspace. It turns out that knowing the invariant subspaces of \(A\) are often enough to completely characterize \(A\text{.}\) If \(A\) is \(n\times n\) and \(A\) has \(n\) linearly independent eigenvectors (that is, one can find a basis of \(\F^n\) consisting of eigenvectors of \(A\)), then

\begin{equation*} A = S D S\inv, \end{equation*}

where \(S\) is a matrix of eigenvectors and \(D\) is a diagonal matrix of the associated eigenvalues (including repetition of course). (One should think of \(S\) as a change of basis matrix under which the operator \(A\) becomes diagonal.)

Many operators are not diagonalizable, even very simple ones. For example, \(A = \bbm 1 \amp 1 \\ 0 \amp 1 \ebm\) only has a one-dimensional eigenspace. Diagonalizability is so useful that we give characterizations of those operators a special name, the Spectral Theorem. An operator on a real vector space is called symmetric if \(A^T = A\text{.}\) An operator on a complex vector space is called Hermitian (or conjugate symmetric) if \(A\ad = \cc{A^T} = A\text{.}\) One of the major theorems of elementary linear algebra is that such operators are diagonalizable and that there exists an orthonormal basis of eigenvectors for \(V\text{.}\)

For complex operators, one can say more. \(A\) is called normal if \(A A\ad = A\ad A\text{.}\) One reason that complex vector spaces are so much nicer than real vector spaces is that normal operators turn out to have orthonormal diagonalizations.