2.4 Nonlinear Least Squares Approximation

A Linearizing Technique and Gradient Methods

Introduction

Let $\,\{(t_i,y_i)\}_{i=1}^N\,$ be a given data set. Consider the following minimization problem:

Find a sinusoid that best approximates the data, in the least-squares sense. That is, find real constants $\,A\ne 0\,,$ $\,\phi\,,$ and $\,\omega\ne 0\,$ so that the function $\,f\,$ given by

$$ f(t) = A\sin(\omega t + \phi) $$

minimizes the sum of the squared errors between $\,f(t_i)\,$ and $\,y_i\,$:

$$ \begin{align} SSE &:= \sum_{i=1}^N \bigl( y_i - f(t_i) \bigr)^2\cr &= \sum_{i=1}^N \bigl( y_i - A\sin(\omega t_i + \phi) \bigr)^2 \end{align} $$
A nonlinear least-squares problem

This problem differs from the least-squares problem considered in Section 2.2, in that the parameters to be ‘adjusted’ ($\,A\,,$ $\,\phi\,,$ and $\,\omega\,$) do not all appear linearly in the formula for the approximate $\,f\,.$ (This notion of linear dependence on a parameter is made precise below.)

Consequently, the resulting problem is called nonlinear least-squares approximation, and is exceedingly more difficult to solve, because it does not lend itself to exact solution by matrix methods.

Indeed, nonlinear least-squares problems must generally be solved by iterative techniques. Nonlinear least-squares approximation is the subject of this section.

First, a definition:

Definition linear dependence on a parameter

Suppose $\,f\,$ is a function that depends on time, a fixed parameter $\,b\,,$ and possibly other parameters. This dependence is denoted by $\,f = f(t,b,x)\,,$ where $\,x\,$ represents all other parameters.

The function $\,f\,$ is said to be linear with respect to the parameter $\,b\,$ if and only if the function $\,f\,$ has the form

$$ f(t,b,x) = b\cdot f_1(t,x) + f_2(t,x) $$

for functions $\,f_1\,$ and $\,f_2\,$ that may depend on $\,t\,$ and $\,x\,,$ but not on $\,b\,.$

Generalizing the definition

This definition is generalized in the obvious way. For example, a function $\,f = f(t,b_1,b_2,x)\,$ is linear with respect to the parameters $\,b_1\,$ and $\,b_2\,$ if and only if $\,f\,$ has the form

$$ f(t,b_1,b_2,x) = b_1\cdot f_1(t,x) + b_2\cdot f_2(t,x) + f_3(t,x) $$

for functions $\,f_1\,,$ $\,f_2\,,$ and $\,f_3\,$ that may depend on $\,t\,$ and $\,x\,,$ but not on $\,b_1\,$ and $\,b_2\,.$

Example

The function $\,f(t,A,\phi,\omega) = A\sin(\omega t + \phi)\,$ is linear with respect to the parameter $\,A\,,$ but not linear with respect to $\,\phi\,$ or $\,\omega\,.$

When $\,f\,$ depends linearly on $\,b\,,$ so does $\,\frac{\partial SSE}{\partial b}$

If $\,f\,$ depends linearly on a parameter $\,b\,,$ then $\,\frac{\partial SSE}{\partial b}$ also depends linearly on $\,b\,,$ as follows:

$$ \begin{align} SSE &= \sum_{i=1}^N \bigl( y_i - b\cdot f_1(t_i,x) - f_2(t_i,x) \bigr)^2\,;\cr\cr \frac{\partial SSE}{\partial b} &= \sum_{i=1}^N\ 2 \bigl( y_i - b\cdot f_1(t_i,x)\cr &\qquad\qquad - f_2(t_i,x) \bigr) \cdot \bigl( -f_1(t_i,x) \bigr)\cr\cr &= b\cdot\sum_{i=1}^N\,2\bigl(f_1(t_i,x)\bigr)^2\cr &\qquad - 2\sum_{i=1}^N y_if_1(t_i,x) - f_1(t_i,x)f_2(t_i,x) \end{align} $$

Similarly, if $\,f\,$ depends linearly on parameters $\,b_1,\ldots,b_m\,,$ then so do $\,\frac{\partial SSE}{\partial b_i}\,$ for all $\,i = 1,\ldots,m\,.$ Consequently, as seen in Section 2.2, matrix methods can be used to find parameters that make the partials equal to zero.

Unfortunately, when $\,f\,$ does NOT depend linearly on a parameter $\,b\,,$ the situation is considerably more difficult, as illustrated next:

Example

Let $\,f(t) = A\sin(\omega t + \phi)\,,$ so that, in particular, $\,f\,$ does not depend linearly on $\,\omega\,.$ In this case, differentiation of

$$ SSE = \sum_{i=1}^N \bigl( y_i - A\sin(\omega t_i + \phi) \bigr)^2 $$

with respect to $\,\omega\,$ yields:

$$ \begin{align} \frac{\partial SSE}{\partial\omega} &= \sum_{i=1}^N\ 2 \bigl( y_i - A\sin(\omega t_i + \phi) \bigr)\cr &\qquad\qquad \cdot \bigl( -At_i\cos(\omega t_i + \phi) \bigr) \end{align} $$

Even if $\,A\,$ and $\,\phi\,$ are held constant, the equation $\,\frac{\partial SSE}{\partial\omega} = 0\,$ is not easy to solve for $\,\omega\,.$

A linearizing technique

Since a linear dependence on parameters is desirable, it is reasonable to try and replace the class of approximating functions,

$$ \begin{align} {\cal F} &= \{ f\ |\ f(t) = A\sin(\omega t + \phi)\,,\cr &\qquad A\ne 0\,, \phi\in\Bbb R\,, \omega \ne 0 \} \end{align} \tag{1} $$

with another class $\,\overset{\sim }{\cal F}\,,$ that satisfies the following properties:

If it is possible to find such a class $\,\overset{\sim }{\cal F}\,,$ then the process of replacing $\,\cal F\,$ by $\,\overset{\sim }{\cal F}\,$ is called a linearizing technique.

Example

For example, the next lemma shows that the set

$$ \begin{align} \overset{\sim}{\cal F} &:= \{ f\ |\ f(t) = C\sin\omega t + D\cos\omega t\,,\cr &\qquad \omega\ne 0\,, C\ \text{ and } D \text{ not both zero} \} \end{align} $$

is equal to the set $\,\cal F\,$ given in ($1$). Thus, any function representable in the form $\,A\sin(\omega t + \phi)\,$ is also representable in the form $\,C\sin\omega t + D\cos\omega t\,,$ and vice versa.

Both classes $\,\cal F\,$ and $\,\overset{\sim}{\cal F}\,$ involve three parameters: $\,\cal F\,$ involves $\,A\,,$ $\,\phi\,$ and $\,\omega\,$; and $\,\overset{\sim}{\cal F}\,$ involves $\,C\,,$ $\,D\,$ and $\,\omega\,.$

However, a function $\,f(t) = C\sin\omega t + D\cos\omega t\,$ from $\,\overset{\sim}{\cal F}\,$ depends linearly on both $\,C\,$ and $\,D\,,$ whereas $\,f(t) = A\sin(\omega t + \phi)\,$ only depends linearly on $\,A\,.$ Thus, the class $\,\overset{\sim}{\cal F}\,$ is more desirable.

Lemma a linearizing transformation

Let $\,\cal F\,$ and $\,\overset{\sim }{\cal F}\,$ be classes of functions of one variable, defined by:

$$ \begin{align} {\cal F} &= \{ f\ |\ f(t) = A\sin(\omega t + \phi)\,,\cr &\qquad A\ne 0\,, \phi\in\Bbb R\,, \omega \ne 0 \} \end{align} $$ $$ \text{and} $$ $$ \begin{align} \overset{\sim}{\cal F} &:= \{ f\ |\ f(t) = C\sin\omega t + D\cos\omega t\,,\cr &\qquad \omega\ne 0\,, C\ \text{ and } D \text{ not both zero} \} \end{align} $$

Then, $\,\cal F = \overset{\sim}{\cal F}\,.$

Proof

Let $\,f\in \cal F\,.$ For all $\,t\,,$ $\,A\,,$ $\,\phi\,$ and $\,\omega\,$:

$$ \begin{align} f(t) &= A\sin(\omega t + \phi)\cr\cr &= A(\sin\omega t\cos\phi + \cos\omega t\sin\phi)\cr\cr &= \overbrace{(A\cos\phi)}^{C}\sin\omega t + \overbrace{(A\sin\phi)}^{D}\cos\omega t \tag{*} \end{align} $$

Define $\, C := A\cos\phi\,$ and $\,D := A\sin\phi\,.$ Since $\,f\in\cal F\,,$ one has $\,A\ne 0\,.$ Therefore, the only way that both $\,C\,$ and $\,D\,$ can equal zero is if $\,\cos\phi = \sin\phi = 0\,,$ which is impossible. Thus, $\,f\in \overset{\sim}{\cal F}\,.$

Now, let $\,f\in\overset{\sim}{\cal F}\,,$ so that $\,f(t) = C\sin\omega t + D\cos \omega t\,$ for $\,\omega\ne 0\,,$ for numbers $\,C\,$ and $\,D\,$ that are not both zero. It may be useful to refer to the flow chart below while reading the remainder of the proof.

Flow chart for proof

If $\,C = 0\,,$ then $\,D\ne 0\,.$ In this case:

$$ \begin{align} &C\sin\omega t + D\cos\omega t\cr &\quad = D\cos\omega t\cr &\quad = D\sin(\omega t + \frac\pi 2) \end{align} $$

Take $\,A := D\,,$ and $\,\phi = \frac\pi 2\,$ to see that $\,f\in\cal F\,.$

Now, let $\,C\ne 0\,.$ If there exist constants $\,A\ne 0\,$ and $\,\phi\,$ for which $\,C = A\cos\phi\,$ and $\,D = A\sin\phi\,,$ then $\,A\,$ and $\,\phi\,$ must satisfy certain requirements, as follows:

Since $\,C\ne 0\,,$ it follows that $\,\cos\phi \ne 0\,.$ Thus:

$$ \frac DC = \frac{A\sin\phi}{A\cos\phi} = \tan\phi $$

There are an infinite number of choices for $\,\phi\,$ that satisfy this requirement. Choose $\,\phi := \tan^{-1}\frac DC\,,$ where $\,\tan^{-1}\,$ denotes the arctangent function, defined by:

$$ \begin{gather} y = \tan^{-1}x\cr \iff\cr \bigl( \tan y = x\ \text{ and }\ y\in (-\frac\pi 2,\frac\pi 2)\, \bigr) \end{gather} $$ the arctangent function

Also note that

$$ \begin{align} &C^2 + D^2\cr &\quad = (A\cos\phi)^2 + (A\sin\phi)^2\cr &\quad = A^2\,, \end{align} $$

so that $\,A\,$ must equal either $\,\sqrt{C^2 + D^2}\,$ or $\,-\sqrt{C^2 + D^2}\,.$

The table below gives the correct formulas for

$$ \cos\bigl(\tan^{-1}\frac DC\bigr) \ \ \text{and}\ \ \sin\bigl(\tan^{-1}\frac DC\bigr)\ , $$

for all possible sign choices of C and D. In this table:

$$ K := \sqrt{C^2 + D^2} $$
Formulas for $\,\sin\bigl(\tan^{-1}\frac DC\bigr)\,$ and $\,\cos\bigl(\tan^{-1}\frac DC\bigr)\,$
$D$ $C$ $\sin\bigl(\tan^{-1}\frac DC\bigr)$ $\cos\bigl(\tan^{-1}\frac DC\bigr)$
$+$ $+$ $\frac DK$ $\frac CK$
$+$ $-$ $-\frac DK$ $-\frac CK$
$-$ $+$ $\frac DK$ $\frac CK$
$-$ $-$ $-\frac DK$ $-\frac CK$

If $\,C \gt 0\,,$ then take $\,A := \sqrt{C^2 + D^2}\,.$ Using (*):

$$ \begin{align} &A\sin(\omega t + \phi)\cr\ &\quad = \sqrt{C^2 + D^2}\,\cos\bigl(\tan^{-1}\frac DC\bigr)\sin\omega t\cr &\qquad + \sqrt{C^2+D^2}\,\sin\bigl(\tan^{-1}\frac DC\bigr)\cos\omega t\cr &= \sqrt{C^2 + D^2}\,\frac{C}{\sqrt{C^2+D^2}}\,\sin\omega t\cr &\qquad + \sqrt{C^2+D^2}\,\frac{D}{\sqrt{C^2+D^2}}\,\cos\omega t\cr &\quad = C\sin\omega t + D\cos\omega t \end{align} $$

If $\,C \lt 0\,,$ take $\,A := -\sqrt{C^2 + D^2}\,.$ Then:

$$ \begin{align} &A\sin(\omega t + \phi)\cr\ &\quad = -\sqrt{C^2 + D^2}\,\cos\bigl(\tan^{-1}\frac DC\bigr)\sin\omega t\cr &\qquad - \sqrt{C^2+D^2}\,\sin\bigl(\tan^{-1}\frac DC\bigr)\cos\omega t\cr &= -\sqrt{C^2 + D^2}\,\frac{-C}{\sqrt{C^2+D^2}}\,\sin\omega t\cr &\qquad - \sqrt{C^2+D^2}\,\frac{-D}{\sqrt{C^2+D^2}}\,\cos\omega t\cr &\quad = C\sin\omega t + D\cos\omega t \end{align} $$

In both cases, it has been shown that $\,f\in\cal F\,.$     $\blacksquare$

General approach

The following iterative approach can be used to find a choice of parameters $\,C\,,$ $\,D\,$ and $\,\omega\,$ so that the function $\,f(t) = C\sin\omega t + D \cos\omega t\,$ best approximates the data, in the least-squares sense. Observe that, once a value of $\,\omega\,$ is fixed, the remaining problem is just linear least-squares approximation, treated in Section $2.2\,.$

One algorithm for ‘adjusting’ $\,\omega\,$ is based on the following result from multivariable calculus:

Theorem a differentiable function $\,h\,$ changes most rapidly in the direction of the gradient; $\,\triangledown h\,$ is the gradient of $\,h$

Let $\,h\,$ be a function from $\,\Bbb R^m\,$ into $\,\Bbb R\,,$ and let $\,\boldsymbol{\rm x} = (x_1,\ldots,x_m)\,$ be an element of $\,\Bbb R^m\,.$

If $\,h\,$ has continuous first partials in a neighborhood of $\,\boldsymbol{\rm x}\,,$ then the function $\,\triangledown h\ :\ \Bbb R^m\rightarrow\Bbb R^m\,$ defined by

$$ \triangledown h(x_1,\ldots,x_m) := \bigl( \frac{\partial h}{\partial x_1}(\boldsymbol{\rm x}),\ldots, \frac{\partial h}{\partial x_m}(\boldsymbol{\rm x}) \bigr) $$

has the following property:

At $\,\boldsymbol{\rm x}\,,$ the function $\,h\,$ increases most rapidly in the direction of the vector $\,\triangledown h(\boldsymbol{\rm x})\,$ (the rate of change is then $\,\|\triangledown h(\boldsymbol{\rm x})\|\,$), and decreases most rapidly in the direction of $\,-\triangledown h(\boldsymbol{\rm x})\,$ (the rate of change is then $\,-\|\triangledown h(\boldsymbol{\rm x})\|\,$).

The function $\,\triangledown h\,$ is called the gradient of $\,h\,.$

This theorem will be used to ‘adjust’ parameter(s) so that the function $\,SSE\,$ will decrease most rapidly.

When $\,\|\triangledown SSE(\omega)\|\,$ is large, the function $\,SSE\,$ is changing rapidly at $\,\omega\,$; when $\,\|\triangledown SSE(\omega)\|\,$ is small, the function $\,SSE\,$ is changing slowly at $\,\omega\,.$

The MATLAB implementation given in this section allows the user to choose the value of $\,\epsilon\,.$ ‘Optimal’ values of $\,\epsilon\,$ will depend on both the function $\,SSE\,,$ and the current value of $\,\omega\,,$ as suggested by the sketches below (where, for convenience, $\,SSE\,$ is a function of one variable only).

These sketches also illustrate how ‘bouncing’ can occur, and how $\,\omega\,$ can converge to a non-optimal minimum. In general, experience and experimentation by the analyst will play a role in the success of this technique.

far away from minimum

Far away from minimum; large $\,\epsilon\,$ will speed process initially

convergence to wrong minimum

Convergence to ‘wrong’ minimum

bouncing

‘Bouncing’

A general nonlinear least-squares problem

The previous theorem is now applied to a general nonlinear least-squares problem. MATLAB implementation, for a special case of the situation discussed here, will follow.

Let $\,f_1,f_2,\ldots,f_m\,$ be functions that depend on time, and on parameters $\,\boldsymbol{\rm c} := (c_1,\ldots,c_p)\,.$ That is, $\,f_i = f_i(t,\boldsymbol{\rm c})\,$ for $\,i = 1,\ldots,m\,.$ The dependence on any of the inputs may be nonlinear.

Define a function $\,f\,$ that depends on time, the parameters $\,\boldsymbol{\rm c}\,,$ and parameters $\,\boldsymbol{\rm b} = (b_1,\ldots,b_m)\,,$ via:

$$ f(t,c,\boldsymbol{\rm b}) = b_1f_1(t,\boldsymbol{\rm c}) + \cdots + b_mf_m(t,\boldsymbol{\rm c}) $$

Thus, $\,f\,$ is linear with respect to the parameters in $\,\boldsymbol{\rm b}\,,$ but not in general with respect to the parameters in $\,\boldsymbol{\rm c}\,.$

Given a data set $\,\{(t_i,y_i)\}_{i=1}^N\,,$ let $\,\boldsymbol{\rm t} = (t_1,\ldots,t_N)\,$ and $\,\boldsymbol{\rm y} = (y_1,\ldots,y_N)\,.$ The sum of the squared errors between $\,f(t_i,\boldsymbol{\rm c},\boldsymbol{\rm b})\,$ and $\,y_i\,$ is

$$ SSE = \sum_{i=1}^N \bigl( y_i - f(t_i,\boldsymbol{\rm c},\boldsymbol{\rm b}) \bigr)^2\,, $$

which has partials

$$ \frac{\partial SSE}{\partial c_j} = \sum_{i=1}^N 2\bigl( y_i - f(t_i,\boldsymbol{\rm c},\boldsymbol{\rm b}) \bigr) \bigl( -\frac{\partial f}{\partial c_j}(t_i,\boldsymbol{\rm c},\boldsymbol{\rm b}) \bigr)\,, $$

for $\,j = 1,\ldots,p\,.$ Note that:

$$ \begin{align} &\frac{\partial f}{\partial c_j}(t_i,\boldsymbol{\rm c},\boldsymbol{\rm b})\cr &\quad = b_1\frac{\partial f_1}{\partial c_j}(t_i,\boldsymbol{\rm c}) + \cdots + b_m\frac{\partial f_m}{\partial c_j}(t_i,\boldsymbol{\rm c}) \end{align} $$

Then, $\,\triangledown SSE = (\frac{\partial SSE}{\partial c_1},\ldots, \frac{\partial SSE}{\partial c_p})\,. $

The algorithm

The algorithm:

MATLAB IMPLEMENTATION: Nonlinear Least-Squares, Gradient Method

Using the function  nonlin

The following MATLAB function uses the gradient method discussed in this section to fit data with a sum of the form:

$$ f(t) = A + Bt + \sum_{i=1}^m C_i\sin\omega_i t + D_i\cos\omega_i t $$

The fundamental period of $\,\sin\omega_i t\,$ is $\,P_i := \frac{2\pi}{\omega_i}\,.$ To use the function, type:

[E,f] = nonlin(P,D,epsil,tol);

Required inputs

The required inputs are:

Outputs from the function

The function returns the following information:

The MATLAB function  nonlin  is given next.

MATLAB function nonlin MATLAB function nonlin

For Octave, the line

X(:,1) = ones(t);

must be changed to:

X(:,1) = ones(size(t));

Also,  endfunction  must be added as the last line.

(THIS SECTION IS IN PROGRESS)