Appendix 3: Derivation of the Least-Squares Approximation Formula

Let $\,f_1,\ldots,f_N\,$ be functions from $\,\Bbb R^m\,$ into $\,\Bbb R\,.$ Let $\,{\bf b} = (b_1,\ldots,b_m)\,,$ and define $\,{\bf f}\,:\,\Bbb R^m\rightarrow\Bbb R^N\,$ by:

$$ {\bf f}({\bf b}) = \begin{bmatrix} f_1({\bf b})\\ f_2({\bf b})\\ \vdots\\ f_N({\bf b}) \end{bmatrix} $$

The derivative of $\,{\bf f}\,$ is represented by the $\,N \times m\,$ Jacobian matrix:

$$ {\bf Df} = \begin{bmatrix} \frac{\partial f_1}{\partial b_1} & \frac{\partial f_1}{\partial b_2} & \cdots & \frac{\partial f_1}{\partial b_m} &\\ \frac{\partial f_2}{\partial b_1} & \frac{\partial f_2}{\partial b_2} & \cdots & \frac{\partial f_2}{\partial b_m} &\\ \vdots & \vdots & \vdots & \vdots\\ \frac{\partial f_N}{\partial b_1} & \frac{\partial f_N}{\partial b_2} & \cdots & \frac{\partial f_N}{\partial b_m} &\\ \end{bmatrix} $$

As in the dissertation proper, the notation $\,{\bf A}_{ij}\,$ is used to denote the entry in row $\,i\,$ and column $\,j\,$ of the matrix $\,{\bf A}\,.$

Note that $\,{\bf Df}_{ij} = \frac{\partial f_i}{\partial b_j}\,.$ Define $\,S: \Bbb R^m \rightarrow \Bbb R\,$ by:

$$ \begin{align} S({\bf b}) &:= {\bf f}({\bf b})^t \cdot {\bf f}({\bf b})\cr\cr &= \sum_{i=1}^N \bigl( f_i(b_1,\ldots,b_m) \bigr)^2 \end{align} $$

Then,

$$ \frac{\partial S}{\partial b_k} = 2\sum_{i=1}^N f_i({\bf b})\frac{\partial f_i}{\partial b_k}\ , $$

and so

$$ \begin{align} {\bf D}S^t &= \begin{bmatrix} \frac{\partial S}{\partial b_1}\\ \frac{\partial S}{\partial b_2}\\ \vdots\\ \frac{\partial S}{\partial b_m}\\ \end{bmatrix}\cr\cr &= 2 \begin{bmatrix} \sum_{i=1}^N f_i({\bf b})\frac{\partial f_i}{\partial b_1}\\ \sum_{i=1}^N f_i({\bf b})\frac{\partial f_i}{\partial b_2}\\ \vdots\\ \sum_{i=1}^N f_i({\bf b})\frac{\partial f_i}{\partial b_m}\\ \end{bmatrix}\cr\cr &= 2 \begin{bmatrix} \frac{\partial f_1}{\partial b_1} & \frac{\partial f_2}{\partial b_1} & \cdots & \frac{\partial f_N}{\partial b_1} &\\ \frac{\partial f_1}{\partial b_2} & \frac{\partial f_2}{\partial b_2} & \cdots & \frac{\partial f_N}{\partial b_2} &\\ \vdots & \vdots & \vdots & \vdots \\ \frac{\partial f_1}{\partial b_m} & \frac{\partial f_2}{\partial b_m} & \cdots & \frac{\partial f_N}{\partial b_m} &\\ \end{bmatrix} \, \begin{bmatrix} f_1({\bf b})\\ f_2({\bf b})\\ \vdots\\ f_N({\bf b}) \end{bmatrix}\cr\cr &= 2({\bf Df})^t\bigl( {\bf f}({\bf b}) \bigr) \end{align} $$

The linear least-squares approximation formula in the text is now an application of this result, taking $\,{\bf f}({\bf b}) = {\bf y} - {\bf Xb}\,.$ The $\,i^{\text{th}}\,$ component function is

$$ f_i = y_i - \sum_{k=1}^m {\bf X}_{ik}b_k\,, $$

so that:

$$ \frac{\partial f_i}{\partial b_j} = -{\bf X}_{ij} $$

Then:

$$ \begin{align} {\bf D}S^t &= 2({\bf Df})^t \bigl( {\bf f}({\bf b})\bigr)\cr\cr &= 2\left[ \frac{\partial f_j}{\partial b_i}\right]({\bf y} - {\bf Xb})\cr\cr &= 2[-{\bf X}_{ji}]({\bf y} - {\bf Xb})\cr\cr &= 2(-{\bf X}^t)({\bf y} - {\bf Xb})\cr\cr &= -2({\bf X}^t{\bf y} - {\bf X}^t{\bf X}{\bf b})\cr\cr &= 2({\bf X}^t{\bf X}{\bf b} - {\bf X}^t{\bf y}) \end{align} $$