2.3 Computer Application Considerations for Linear Least-Squares

Condition Numbers

Theory versus application

In the previous section, it was shown that the problem

$$ \underset{\,b \in \Bbb R^n}{\text{min}}\ \|\boldsymbol{\rm y} - \boldsymbol{\rm Xb}\|^2 $$

has a unique solution, provided that the square matrix $\,\boldsymbol{\rm X}^t\boldsymbol{\rm X}\,$ is invertible.

Theoretically (using exact arithmetic), the invertibility of a square matrix $\,A\,$ is easily characterized in a variety of ways: for example, $\,A\,$ is invertible if and only if the determinant of $\,A\,$ (denoted by $\,\text{det}(A)\,$) is nonzero, and in this case, the unique solution to $\,Ax = b\,$ is given by $\,x = A^{-1}b\,.$

However, when the entries of a matrix are represented only to the degree of accuracy of a digital computer, the question of invertibility is more subtle: there are invertible matrices $\,A\,$ for which correct numerical solutions to $\,Ax = b\,$ are difficult to obtain, stemming from the fact that small changes in the matrices $\,A\,$ and $\,b\,$ can lead to large changes in the solution $\,x\,.$

Example: MATLAB Casts Suspicion on a Solution Vector

To illustrate this point, reconsider the MATLAB example of the previous section, where a $4^{\text{th}}$-order polynomial least-squares approximate of the data

$$ y(t) = 3.2 - 0.7t + 0.05t^2 + \sin\frac{2\pi t}3 + \langle\text{noise}\rangle $$

was found. If, alternatively, a $7^{\text{th}}$-order polynomial approximate was sought, the MATLAB user would have been faced with this warning, casting suspicion on the resulting vector $\,\boldsymbol{\rm b}\,$:

MATLAB casts suspicion on a result

The size of $\,\boldsymbol{\rm X}^t\boldsymbol{\rm X}\,$ is only $\,8\times8\,$; however, the entries of $\,\boldsymbol{\rm X}^t\boldsymbol{\rm X}\,$ are such that the matrix does not lend itself nicely to numerical computation. This problem is the subject of the current section.

The contents of this section

More precisely, this section gives an analysis of the system $\,Ax = b\,$ in order to understand the sensitivity of solutions $\,x\,$ to small changes in the matrices $\,A\,$ and $\,b\,.$ A number $\,\text{cond}(A)\,$ will be defined that measures this sensitivity.

For a ‘sensitive’ system, correct numerical solution can be challenging, but the problem can often be overcome by replacing the system $\,Ax = b\,$ with an equivalent system $\,\tilde Ax = b\,,$ where the new matrix $\,\tilde A\,$ is more suitable for numerical computations. This replacement requires a knowledge of discrete orthogonal functions, which are also discussed in this section.

$\phi\, :\, X\rightarrow Y\,$ is a map between normed spaces

Let $\,\phi\, :\, X\rightarrow Y\,$ be a function between normed spaces. The reader is referred to Appendix 2 for a review of normed spaces. The symbol $\,\|\cdot\|\,$ will be used to denote both the $\,X\,$ and $\,Y\,$ norms, with context determining the correct interpretation.

Relative error

Let $\,x\in X\,,$ and let $\,\tilde x\,$ be an approximation to $\,x\,.$ For $\,x \ne 0\,,$ the relative error of $\,\tilde x\,$ (as an approximation to $\,x\,$) is defined to be the number:

$$ \frac{\|\tilde x - x\|}{\|x\|} $$

Similarly, for $\,\phi(x)\ne 0\,,$ the relative error of $\,\phi(\tilde x)\,$ (as an approximation to $\,\phi(x)\,$) is the number:

$$ \frac{\|\phi(\tilde x) - \phi(x)\|}{\|\phi(x)\|} $$
Condition number for a function between normed spaces

A desirable situation for the function $\,\phi\,$ is this: when the relative error in $\,\tilde x\,$ is small, so is the relative error in $\,\phi(\tilde x)\,.$ In other words, small input errors lead to small output errors. To quantify this input/output relationship, the idea of a condition number is introduced:

Definition condition number for $\,\phi\, :\, X\rightarrow Y\,,$ at $\,x\in X$

Let $\,\phi\ :\ X\rightarrow Y\,$ be a function between normed spaces, and let $\,x\in X\,.$ If a real number $\,c \gt 0\,$ exists for which

$$ \frac{\|\phi(\tilde x) - \phi(x)\|}{\|\phi(x)\|} \le c\,\frac{\|\tilde x - x\|}{\|x\|} $$

for all $\,\tilde x\,$ sufficiently close to $\,x\,,$ then $\,c\,$ is called a condition number for $\,\phi\,$ at $\,x\,.$

Interpreting the condition number

This definition shows that $\,c\,$ gives information about the amplification of relative input errors. If $\,c\,$ is small, then small input errors must produce small output errors. However, if $\,c\,$ is large, or if no such positive number $\,c\,$ exists, then small input errors may produce large output errors.

In numerical problems, where numbers often inherently contain error (due to a forced finite computer representation), such considerations become important.

Note that the definition of condition number is dependent on both the norm in the domain $\,X\,,$ and the norm in the codomain $\,Y\,.$

What are to be viewed as the ‘input’ and ‘output’ for the system $\,Ax = b\,$?

For an invertible matrix $\,A\,,$ the system $\,Ax = b\,$ will be analyzed as follows: the matrices $\,A\,$ and $\,b\,$ are taken as the ‘inputs’ to the system, and a solution vector $\,x\,$ is the ‘output’.

When the inputs $\,A\,$ and $\,b\,$ change slightly (say, by replacing theoretical matrices $\,A\,$ and $\,b\,$ by their computer representations), then it is desired to study how the solution $\,x\,$ changes.

The Euclidean norm $\,\|\cdot\|\,$ will be used to measure ‘changes’ in the $\,n \times 1\,$ matrix $\,b\,.$ It is also necessary to have a means by which to measure changes in the matrix $\,A\,$; this means will be provided by a matrix norm, defined next. In this next definition, $\,\text{sup}\,$ denotes the supremum, i.e., the least upper bound.

Definition the $2$-norm for matrices

Let $\,A\,$ be any $\,n \times n\,$ matrix, and let $\,\|\cdot\|$ be the Euclidean norm on $\,\Bbb R^n\,.$ The $2$-norm of the matrix $\,A\,$ is given by:

$$ \|A\|_2 := \underset{\substack{x\in\Bbb R^n\,,\cr x\ne 0}} {\text{sup}} \frac{\|Ax\|}{\|x\|} $$

It can be shown that the $2$-norm is indeed a norm on the vector space of all $\,n\times n\,$ matrices. Consequently, if $\,\|A\|_2 = 0\,,$ then $\,A = \boldsymbol{0}\,,$ where $\,\boldsymbol{0}\,$ denotes the $\,n \times n\,$ zero matrix. Thus, if $\,\|A\|_2 = 0\,,$ then $\,A\,$ is not invertible. Equivalently, if $\,A\,$ is invertible, then $ \,\|A\|_2 \ne 0\,.$

MATLAB approximation to $\,\|A\|_2$

The number $\|A\|_2\,$ is approximated in MATLAB via the command:

norm(A)

There are other matrix norms (see, e.g., [S&B, p. 178]), many of which are more easily calculated than $\,\|\cdot\|_2\,,$ but it will be seen that the $2$-norm is particularly well suited to the current situation.

The reader versed in functional analysis will recognize the $2$-norm as the norm of the bounded linear operator $\,A: \Bbb R^n\rightarrow \Bbb R^n\,,$ when the Euclidean norm is used in $\,\Bbb R^n\,.$

An important consequence of the definition

It is immediate from the definition of $\,\|A\|_2\,$ that for any $\,x\ne 0\,,$

$$ \|A\|_2 \ge \frac{\|Ax\|}{\|x\|}\,, $$

so that for all $\,x\,$:

$$ \|Ax\| \le \|A\|_2\cdot \|x\| \tag{1} $$

Thus, the number $\|A\|_2\,$ gives the greatest magnification that a vector may attain under the mapping determined by $\,A\,.$ That is, the norm of an output $\,\|Ax\|\,$ cannot exceed $\,\|A\|_2\,$ times the norm of the corresponding input $\,\|x\|\,.$

Computing $\|A\|_2$

The $2$-norm is not easy to compute. ($\bigstar$ When $\,A\,$ has real number entries, $\|A\|_2\,$ is the square root of the largest eigenvalue of the matrix $\,A^tA\,.$) However, it is easy to obtain an upper bound for $\,\|A\|_2\,$ as the square root of the sum of the squared entries from $\,A\,,$ as follows:

Getting an upper bound for $\,\|A\|_2$

Let $\,x = (x_1,\ldots,x_n)^t\,,$ so that $\,x\,$ is a column vector, and:

$$ \|x\| = \left( \sum_{i=1}^n x_i^2 \right)^{1/2} $$

Let $\,\alpha_{ij}\,$ denote the element in row $\,i\,$ and column $\,j\,$ of the matrix $\,A\,.$ The $\,i^{\text{th}}\,$ component of the column vector $\,Ax\,$ is denoted by $\,(Ax)_i\,,$ and is the inner product of the $\,i^{\text{th}}\,$ row of $\,A\,$ with $\,x\,$:

$$ (Ax)_i = \sum_{j=1}^n \alpha_{ij}x_j $$

Then:

$$ \begin{align} \|Ax\|^2 &=\sum_{i=1}^n (Ax)_i^2 = \sum_{i=1}^n \left( \sum_{j=1}^n \alpha_{ij}x_j \right)^2\cr\cr &\le \sum_{i=1}^n \left[ \left( \sum_{j=1}^n \alpha_{ij}^2 \right)^{1/2} \left( \sum_{j=1}^n x_j^2 \right)^{1/2} \right]^2\cr &\qquad \text{(Cauchy-Schwarz)}\cr\cr &= \|x\|^2 \sum_{i=1}^n\sum_{j=1}^n \alpha_{ij}^2 \end{align} $$

(The Cauchy-Schwarz Inequality is reviewed in Appendix $2\,.$)

The Frobenius norm, $\,\|A\|_F\,,$ is an upper bound for $\,\|A\|_2$

Thus, for any $\,x\ne 0\,,$ division by $\,\|x\|^2\,$ yields

$$ \frac{\|Ax\|^2}{\|x\|^2} \le \sum_{i=1}^n\sum_{j=1}^n \alpha_{ij}^2\ , $$

so that taking square roots gives:

$$ \begin{align} \|A\|_2 &:= \underset{\substack{x\in\Bbb R^n\cr x\ne 0}}{\text{sup}} \frac{\|Ax\|}{\|x\|}\cr\cr &\le \left( \sum_{i=1}^n \sum_{j=1}^n \alpha_{ij}^2 \right)^{1/2}\cr\cr &:= \|A\|_F \end{align} $$

The square root of the sum of the squared entries from $\,A\,$ gives another norm of $\,A\,,$ called the Frobenius norm, and denoted by $\,\|A\|_F\,$ [G&VL, p. 56].

$\bigstar$   Singular values of $\,A$

Many of the ideas discussed in this section are conveniently expressed in terms of the singular values of $\,A\,.$ In particular, the norms $\,\|A\|_2\,$ and $\,\|A\|_F\,$ have simple characterizations in terms of the singular values.

The singular value decomposition of a matrix is discussed in Appendix $2\,.$

With the matrix $2$-norm in hand, the analysis of the system $\,Ax = b\,$ can now begin. It is assumed that $\,A\,$ is an invertible $\,n \times n\,$ matrix with real number entries. It is desired to study the sensitivity of solutions $\,x\,$ of this system to small changes in $\,A\,$ and $\,b\,.$

Motivation for the definition of the condition number of a matrix

[S&B, 178–180]   Begin by supposing that only the column vector $\,b\,$ changes slightly; $\,A\,$ remains fixed. What is the corresponding effect on the solution $\,x\,$? The analysis of this situation will motivate the definition of the condition number of a matrix.

Let

$$ \begin{gather} Ax = b\cr \text{and}\cr A\tilde x = \tilde b \end{gather} $$

be true statements; that is, $\,x\,$ denotes the system solution when column vector $\,b\,$ is used, and $\,\tilde x\,$ denotes the system solution when column vector $\,\tilde b\,$ is used. Define

$$ \begin{gather} \Delta x := \tilde x - x\cr \text{and}\cr \Delta b := \tilde b - b\ , \end{gather} $$

so that:

$$ A\Delta x = \Delta b $$

Since $\,A\,$ is invertible (by hypothesis):

$$ \Delta x = A^{-1}\Delta b $$

Taking norms and using property ($1$) gives:

$$ \begin{align} \|\Delta x\| &= \|A^{-1}\Delta b\|\cr\cr &\le \|A^{-1}\|_2\cdot \|\Delta b\| \end{align} $$

Let $\,x\ne 0\,$; division by the positive number $\,\|x\|\,$ gives:

$$ \frac{\|\Delta x\|}{\|x\|} \le \frac{\|A^{-1}\|_2\cdot \|\Delta b\|}{\|x\|} $$

Since $\,b = Ax\,,$ it follows that $\,\|b\| \le \|A\|_2\cdot\|x\|\,$ and hence $\,\frac1{\|x\|} \le \frac{\|A\|_2}{\|b\|}\,$; substitution into the previous display yields:

$$ \frac{\|\Delta x\|}{\|x\|} \le \|A^{-1}\|_2\ \|A\|_2\ \frac{\|\Delta b\|}{\|b\|} $$

This inequality shows that the constant $\,\|A^{-1}\|_2\,\|A\|_2\,$ gives information about how an error in $\,b\,$ is potentially ‘amplified’ to produce an error in the solution $\,x\,$.

Thus, the number $\,\|A^{-1}\|_2\,\|A\|_2\,$ is a condition number for the function that ‘inputs’ a column vector $\,b\,,$ and ‘outputs’ a solution to the system $\,Ax = b\,.$ With this motivation, the next definition should come as no surprise:

Definition condition number of a matrix

Let $\,A\,$ be an invertible matrix. The condition number of $\,A\,$ is:

$$ \text{cond}(A) := \|A^{-1}\|_2\cdot\|A\|_2 $$
MATLAB approximation to $\,\text{cond}(A)$

The condition number $\,\text{cond}(A)\,$ is approximated in MATLAB via the command:

cond(A)

Properties of $\,\text{cond}(A)$

When $\,\text{cond}(A)\,$ is close to $\,1\,,$ $\,A\,$ is said to be well-conditioned, and when $\,\text{cond}(A)\,$ is large, $\,A\,$ is said to be ill-conditioned.

It is always true that $\,\text{cond}(A) \ge 1\,,$ as follows. Since $\,\|A\|_2\ge\frac{\|Ay\|}{\|y\|}\,$ for all $\,y\ne 0\,,$ take $\,y\,$ to be $\,A^{-1}x\,,$ thus obtaining

$$ \begin{align} \|A\|_2 &\ge \frac{\|A(A^{-1}x)\|}{\|A^{-1}x\|}\cr\cr &= \frac{\|x\|}{\|A^{-1}x\|}\cr\cr &\ge \frac{\|x\|}{\|A^{-1}\|_2\,\|x\|}\ , \end{align} $$

from which:

$$ \begin{align} \|A\|_2\,\|A^{-1}\|_2 &\ge\frac{\|x\|}{\|A^{-1}\|_2\,\|x\|}\cdot\|A^{-1}\|_2\cr\cr &= 1 \end{align} $$
Orthogonal matrix

If $\,A\,$ is an orthogonal matrix (that is, $\,A^t A = I\,,$ or equivalently, $\,A^{-1} = A^t\,$), then it can be shown that $\,\text{cond}(A) = 1\,.$

rcond(A)

When MATLAB issues warnings regarding ill-conditioned matrices, it reports the information in terms of the reciprocal of the condition number:

rcond(A) := 1/(cond(A))

If $\,A\,$ is well-conditioned, then  rcond(A)  is near $\,1.0\,$; if $\,A\,$ is ill-conditioned, then  rcond(A)  is near $\,0\,.$

It is important to note that, in the system $\,Ax=b\,,$ the sensitivity of solutions $\,x\,$ to small changes in $\,b\,$ clearly has more to do with $\,A\,$ than with $\,b\,.$

Since $\,A\,$ is invertible if and only if $\,\text{det}(A) \ne 0\,,$ one might conjecture that when the determinant of $\,A\,$ is close to $\,0\,,$ then solutions to $\,Ax = b\,$ are sensitive to small changes in $\,A\,.$ However, this is not the case. The examples below show that $\,\text{det}(A)\,$ can be made as small as desired, while keeping $\,\text{cond}(A) = 1\,.$

determinant can be made small, keeping condition number equal to 1
$\text{cond}(A)\,$ does indeed measure sensitivity to changes in BOTH $\,A\,$ and $\,b$

The definition of the condition number of a matrix was motivated by studying the sensitivity of solutions to $\,Ax = b\,$ to small changes in $\,b\,.$ Does the number $\,\text{cond}(A)\,,$ as defined, also measure the sensitivity of solutions to changes in both $\,A\,$ and $\,b\,$? The following discussion shows that the answer is ‘Yes’.

[G&VL, 79–80] Allow both $\,A\,$ and $\,b\,$ to vary in the system $\,Ax = b\,$; recall that $\,A\,$ is assumed to be invertible.

Denote a change in $\,A\,$ by $\,\epsilon F\,,$ where $\,F\,$ is an $\,n \times n\,$ constant matrix and $\epsilon\gt 0\,.$ Similarly, denote a change in $\,b\,$ by $\,\epsilon f\,,$ where $\,f\,$ is a fixed $\,n \times 1\,$ matrix. Different values of $\,\epsilon\,$ lead to different approximations $\,\tilde A := A + \epsilon F\,$ and $\,\tilde b := b + \epsilon f\,$ of $\,A\,$ and $\,b\,,$ respectively.

Let $\,x(\epsilon)\,$ denote the solution of the ‘perturbed’ system $\,\tilde Ax = \tilde b\,$; the situation is then summarized by:

$$ (A + \epsilon F)\,x(\epsilon) = b + \epsilon f\,,\ \ x(0) = x $$

Differentiating both sides with respect to $\,\epsilon\,$ yields:

$$ (A + \epsilon F)\,\dot x(\epsilon) + F\,x(\epsilon) = f $$

Evaluation at $\,\epsilon = 0\,$ and a slight rearrangement yields:

$$ A\,\dot x(0) = f - Fx $$

Or, since $\,A\,$ is invertible:

$$ \dot x(0) = A^{-1}(f-Fx) \tag{2} $$

IN PROGRESS