Chapter 2: ‘Fitting’ a Data Set with a Function

2.1 Random Behavior

The Turning Point Test

What is ‘random’ behavior?

What is random behavior? For the purposes of this dissertation, a list is called random if the list entries are completely non-deterministic; i.e., if the occurrence of any observation in the list in no way influences the occurrence of any other observation.

Should a list be random, then there is no sense in searching for deterministic components.

This section presents a test for random behavior, called the turning point test. The idea behind the test is this: if data is truly random, then certain behavior dictated by randomness is expected. A probabilistic comparison of properties of the actual data with what is expected—should the data be truly random—is used to support or deny the hypothesis of random behavior.

The turning point test is adapted from [Ken, 21-24]. Several other tests for randomness are discussed in the same cited section.

Some review of probability and statistics concepts is interspersed throughout this section. The reader is referred to [Dghty] for additional material.

A MATLAB program to apply the Turning Point Test is included in this section.

Definition neighbor; peak; trough; turning point

Given an entry $\,y_i\,$ in a list $\,(\ldots,y_{i-1},y_i,y_{i+1},\ldots)\,,$ the adjacent entries $\,y_{i-1}\,$ and $\,y_{i+1}\,$ are called the neighbors of $\,y_i\,.$

An entry in a list is called a peak if it is strictly greater than each neighbor; and is called a trough if it is strictly less than each neighbor.

A list entry is a turning point if it is either a peak or a trough.

turning points: peaks and troughs

Turning points

Turning Point Test

The turning point test is so named because it counts the number of turning points in a finite list of data, and compares this number with what is expected, should the data be truly random.

Hypothesis for the turning point test

For the turning point test that is developed in this section, it is assumed that the entries in a finite list are allowed to come from an interval of real numbers. Since there are an infinite number of choices for any list entry, the probability that two entries in the list are equal is zero. In particular, the probability that there are equal adjacent entries is zero.

If a finite list contains a large number of duplicate values, then the hypothesis that the list entries come from some interval is probably unwarranted, and the turning point test as developed here will not apply.

$\bigstar\,$ Probability considerations

the probability of an event is 1/2

An event $\,E\,$ is a subset of a sample space $\,S\,.$ The probability of $\,E\,$ is determined by how much ‘room’ $\,E\,$ takes up in $\,S\,.$

Suppose that two numbers are chosen from an interval $\,I\,.$ The corresponding sample space is $\,I\times I\,.$ The phrase ‘the probability that there are equal adjacent entries is zero’, means, precisely, that the measure of the set $\{(x,x)\,|\,x\in I\}\,$ (as a subset of $\,I \times I\,$) is zero.

the diagonal in I x I has measure zero

Suppose that three numbers are chosen from an interval $\,I\,.$ The corresponding sample space is $\,I \times I \times I\,.$ The phrase ‘the probability that there are equal adjacent entries is zero’, means, precisely, that the set

$$ S := \{(x,x,y)\,|\,x,y\in I\} \cup \{(x,y,y)\,|\,x,y\in I\}\,, $$

(as a subset of $\,I \times I \times I\,$), has measure zero. The set $\,S\,$ is the intersection of two planes in $\,\Bbb R^3\,$ ($\,x = y\,$ and $\,y = z\,$) with the cube $\,I \times I \times I\,$; the resulting set is ‘thin’ in $\,\Bbb R^3\,.$

The hypothesis that the entries in a finite list come from an interval justifies taking the sample space, in the following development of the turning point test, to be all possible arrangements of three distinct values.

If, on the other hand, the entries come from some finite set, then the probability that there are equal adjacent entries is nonzero, and one would have to enlarge the sample space to account for these possible repeat values.

Probability of finding a turning point in a set of $\,3\,$ distinct values

Let $\,\ell\,,$ $\,m\,,$ and $\,g\,$ be three distinct real numbers, with $\,\ell \lt m\lt g\,.$ The letters were chosen to remind the reader of this ordering: $\,\ell\,$ for ‘least’, $\,m\,$ for ‘middle’, and $\,g\,$ for ‘greatest’.

There are $\,3\cdot 2\cdot 1 = 6\,$ ways that these three numbers can be arranged in three slots. If the ordering is random, then these $\,6\,$ possible arrangements will occur with equal probability. Only four arrangements yield a turning point:

turning point analysis for three real numbers

Thus, the probability of finding a turning point in a set of three distinct real values is $\,\frac46 = \frac 23\,.$

$S\,,$ sample space

Let $\,S\,$ denote the sample space containing all possible arrangements of three distinct values, so that $\,S\,$ contains the six $3$-tuples investigated above.

A ‘counting’ random variable, $\,C\,:\,S\rightarrow \{0,1\}$

Define a ‘counting’ random variable $\,C\,:\,S\rightarrow \{0,1\}\,$ via:

$$ C(a,b,c) := \cases{ 1 & \text{if } b \text{ is a turning point}\cr\cr 0 & \text{otherwise}} $$
Derive:   $E(C) = \mu_C = \frac 23$

The probability that $\,b\,$ is a turning point is $\,\frac 23\,.$ It follows that the expected value of $\,C\,,$ denoted by both $\,E(C)\,$ and $\,\mu_C\,,$ is:

$$ E(C) = \mu_C = (1)(\frac 23) + (0)(\frac 13) = \frac 23 $$

Here, $\,E\,$ is the expected value operator.

Derive:   $E(C^2) = \frac 23$

Define the function $\,C^2\,$ by:

$$C^2(a,b,c) := C(a,b,c)\cdot C(a,b,c)$$

Since $\,1^2 = 1\,$ and $\,0^2 = 0\,,$ it follows that:

$$ C^2(a,b,c) = \cases{ 1 & \text{if}\ b\ \text{ is a turning point}\cr\cr 0 & \text{otherwise}} $$

Therefore, $\,E(C^2)\,$ also equals $\,\frac 23\,.$

Derive:   $\text{var}(C) = \frac 29$

Let $\,\text{var}(C)\,$ denote the variance of $\,C\,.$ Using the definition of variance, and the linearity of the expected value operator, one computes:

$$ \begin{align} \text{var}(C) &:= E\bigl((C-\mu_C)^2\bigr)\cr &= E(C^2 - 2\mu_CC + \mu_C^2)\cr &= E(C^2) - 2\mu_CE(C) + E(\mu_C^2)\cr &= E(C^2) - 2(\mu_C)^2 + (\mu_C)^2\cr &= E(C^2) - (\mu_C)^2\cr &= \frac 23 - (\frac 23)^2\cr &= \frac 29 \end{align} $$
Count the number of turning points in a list, $\,C_i$

Consider now a list $\,\boldsymbol{\rm y} := (y_1,\ldots,y_n)\,$ of length $\,n\,.$ It is desired to count the number of turning points in this list. Since knowledge of both neighbors is required to classify a turning point, the first and last entries in a list cannot be turning points; so the maximum possible number of turning points present is $\,n - 2\,.$

Using the list $\,\boldsymbol{\rm y}\,,$ define $\,C_i\,,$ for $\,i = 2,\ldots,n-1\,,$ by:

$$ C_i(y_{i-1},y_i,y_{i+1}) := \cases{ 1 & \text{if}\ y_i\ \text{is a turning point}\cr\cr 0 & \text{otherwise} } $$
The $\,C_i\,$ do NOT form a random sample

Each random variable $\,C_i\,$ is distributed identically to the random variable $\,C\,.$

However, it is important to note that this collection of random variables $\,\{C_i\}_{i=2}^{n-1}\,$ is not a random sample corresponding to $\,C\,,$ because $\,C_i\,$ and $\,C_j\,$ are not independent for $\,0\lt |j-i|\le 2\,$; that is, when the $3$-tuples acted on by $\,C_i\,$ and $\,C_j\,$ overlap. This issue is addressed later on in this section.

$T\,$ gives the total number of turning points in a list

Define a random variable $\,T\,$ by:

$$ T := \sum_{i=2}^{n-1} C_i $$

Then, $\,T\,$ gives the total number of turning points in the list.

Derive:   $E(T) = \mu_T = \frac 23(n-2)$

Via linearity of the expected value operator:

$$E(T) = \sum_{i=2}^{n-1} E(C_i) = \frac 23(n-2) := \mu_T $$
Computing $\,\text{var}(T)$

Next, $\,\text{var}(T)\,,$ the variance of the random variable $\,T\,,$ is computed. Since

$$ \text{var}(T) = E(T^2) - (\mu_T)^2\,, $$

one first computes $\,E(T^2)\,$:

$$ \begin{align} E(T^2) &= E\left( \bigl( \sum_{i=2}^{n-1} C_i \bigr)^2 \right)\cr &= E\bigl( (C_2 + \cdots + C_{n-1})^2 \bigr) \end{align} $$
Counting terms of the form $\,C_iC_j$

There are $ \,(n-2)(n-2) = n^2 - 4n + 4\,$ terms in the product $\,(C_2 + \cdots + C_{n-1})^2\,.$ It is necessary to count the number of terms of the form $\,C_iC_j\,$ for $\,j = i\,,$ $\,|j - i| = 1\,,$ $\,|j - i| = 2\,,$ and $\,|j - i| \gt 2\,$; that is, when the indices on $\,C\,$ are the same, or differ by exactly $\,1\,,$ exactly $\,2\,$, or more than $\,2\,.$

This ‘counting’ is easily accomplished by performing the multiplication as a matrix product, and analyzing the result:

counting terms of the form C_iC_j

The main diagonal has $\,n-2\,$ entries, each of the form $\,C_i^2\,.$ Thus, there are $\,n-2\,$ terms of the form $\,C_i^2\,.$

There are $\,(n-2)-1 = n-3\,$ entries on the first diagonal above and below the main diagonal, and these are the terms for which $\,|j - i| = 1\,.$ Thus, there are $\,2(n-3)\,$ terms of the form $\,C_iC_{i+1}\,.$

There are $\,(n-2)-2 = n-4\,$ entries on the second diagonal above and below the main diagonal, and these are the terms for which $\,|j-i| = 2\,.$ Thus, there are $\,2(n - 4)\,$ terms of the form $\,C_iC_{i+2}\,.$

The remaining terms are those for which $\,|j - i| \gt 2\,$; thus, there are

$$ \begin{align} &(n^2-4n+4) - (n-2) - 2(n-3)-2(n-4)\cr &\quad = n^2 - 9n + 20\cr &\quad = (n-4)(n-5) \end{align} $$

terms of the form $\,C_iC_j\,$ for $\,|j - i| \gt 2\,.$

With a slight abuse of summation notation, the findings thus far are summarized as:

$$ \begin{align} &E(T^2)\cr &\quad = E\left( \bigl( \sum_{i=2}^{n-1} C_i \bigr)^2 \right)\cr &\quad = E\biggl( \sum_{n-2} C_i^2 + \sum_{2(n-3)} C_iC_{i+1} \cr &\qquad \quad + \sum_{2(n-4)} C_iC_{i+2} + \sum_{ \substack{(n-4)(n-5)\\ |j-i|\gt 2}} \biggr) \end{align} \tag{*} $$

In each sum, the index denotes the number of terms, and the argument depicts the form of the terms being added. The expectations of each term in (*) must be considered separately.

Investigating $\,E(C_i^2)\,$ and $\,E(C_iC_j)\,, |j-i|\gt 2$

It has already been observed that $\,E(C_i^2) = \frac 23\,,$ since $\,C_i^2 = C_i\,.$

When $\,|j - i| \gt 2\,,$ the random variables $\,C_i\,$ and $\,C_j\,$ have non-overlapping domains. Under the assumption of randomly generated data, the occurrence or non-occurrence of a turning point for $\,C_i\,$ in no way influences the existence of a turning point for $\,C_j\,$ in this case; i.e., $\,C_i\,$ and $\,C_j\,$ are independent. Thus:

$$ \begin{align} E(C_iC_j) &=E(C_i)\,E(C_j)\cr &=\frac 23\cdot\frac 23 = \frac 49\,,\ \ \text{for}\ |j-i|\gt 2 \end{align} $$ non-overlapping C_i
Investigating $\,E(C_iC_{i+1})$

However, for $\,j\le 2\,,$ the random variables $\,C_i\,$ and $\,C_{i+j}\,$ have overlapping domains, and $\,E(C_iC_{i+j}) \ne E(C_i)\,E(C_{i+j})\,$; i.e., $\,C_i\,$ and $\,C_{i+j}\,$ are not independent for $\,j \le 2\,.$

overlapping C_i

The proof of this statement follows.

To evaluate $\,E(C_iC_{i+1})\,$ requires the investigation of existence of turning points in $\,4\,$ consecutive slots. For convenience of notation, let four distinct real numbers be labeled in order of increasing magnitude as $\,a\,,$ $\,b\,,$ $\,c\,$ and $\,d\,.$ There are $\,4\cdot 3\cdot 2\cdot 1 = 24\,$ ways that these four numbers can be arranged in four slots, as shown below:

$abcd$ $bacd$ $cabd$ $dabc$
$abdc$ $badc$ $cadb$ $dacb$
$acbd$ $bcad$ $cbad$ $dbac$
$acdb$ $bcda$ $cbda$ $dbca$
$adbc$ $bdac$ $cdab$ $dcab$
$adcb$ $bdca$ $cdba$ $dcba$
Investigating particular arrangements

For the arrangement $\,abcd\,,$ $\,C_i = 0\,$ and $\,C_{i+1} = 0\,.$ Thus, $\,C_iC_{i+1} = 0\,.$

arrangement abcd

For the arrangement $\,bacd\,,$ $\,C_i = 1\,$ (there is a trough). However, $\,C_{i+1} = 0\,$ (no turning point). Again, $\,C_iC_{i+1} = 0\,.$

arrangement bacd

For the arrangement $\,badc\,,$ $\,C_i = 1\,$ (there is a trough), and $\,C_{i+1} = 1\,$ (there is a peak). Thus, $\,C_iC_{i+1} = 1\,.$

arrangement badc

Indeed, for the product random variable $\,C_iC_{i+1}\,$ to be nonzero, the arrangement of $\,a\,,$ $\,b\,,$ $\,c\,$ and $\,d\,$ must display both a trough and a peak. This occurs in $\,10\,$ of the $\,24\,$ possible arrangements, and hence:

$$ E(C_iC_{i+1}) = \frac{10}{24} = \frac 5{12} $$

Note that $\,\frac{5}{12}\ne \frac 23\cdot\frac 23\,,$ confirming that $\,C_i\,$ and $\,C_{i+1}\,$ are not independent.

Investigating $\,E(C_iC_{i+2})$

The random variables $\,C_i\,$ and $\,C_{i+2}\,$ again have overlapping domains.

investigating C_iC_(i+2)

To evaluate $\,E(C_iC_{i+2})\,$ requires the investigation of turning points in $\,5\,$ consecutive slots. By methods similar to those just discussed, it can be shown that:

$$ E(C_iC_{i+2}) = \frac{54}{120} = \frac 9{20} $$
Combining results

Substitution of the computed expectations into (*) gives:

$$ \begin{align} &E(T^2)\cr\cr &\quad = \frac 23(n-2) + \frac5{12}\cdot 2(n-3)\cr &\qquad + \frac 9{20}\cdot 2(n-4) + \frac49(n-4)(n-5)\cr\cr &\quad = \cdots = \frac{40n^2 - 144n + 131}{90} \end{align} $$
Computing $\,\text{var}(T)$

Thus:

$$ \begin{align} \text{var}(T) &= E(T^2) - (\mu_T)^2\cr\cr &= \frac{40n^2 - 144n + 131}{90} - \bigl(\frac23(n-2)\bigr)^2\cr\cr &= \cdots = \frac{16n - 29}{90} \end{align} $$

With both the mean and variance of the random variable $\,T\,$ now known, Chebyshev’s Inequality (stated next) can be used to compare the actual number of turning points from a given data set with the number that is expected under the hypothesis of random behavior.

Chebyshev’s Inequality  
[Dghty, 121] If the random variable $\,X\,$ has mean $\,\mu\,$ and variance $\,\sigma^2\,,$ then, for any $\,t\gt 0\,$: $$ P\bigl(|X-\mu|\ge t\bigr) \le \frac{\sigma^2}{t^2} $$

Equivalently:

$$ P\bigl(|X-\mu|\lt t\bigr) \ge 1 - \frac{\sigma^2}{t^2} $$

This theorem states that for any random variable $\,X\,$ with mean $\,\mu\,$ and variance $\,\sigma^2\,,$ the probability that $\,X\,$ takes on a value which is at least distance $\,t\,$ from the mean, is at most $\,\frac{\sigma^2}{t^2}\,.$

Chebyshev's Inequality

Observe that Chebyshev’s Inequality is a ‘distribution-free’ result; that is, it is independent of the form of the probability density function for $\,X\,.$

It is interesting to note that no ‘tighter’ bound on $\,P(|X - \mu| \ge t)\,$ is possible, without additional information about the actual distribution of $\,X\,.$ That is, there exists a random variable for which equality is obtained in Chebyshev’s Inequality: $\,P(|X - \mu| \ge t) = \frac{\sigma^2}{t^2}\,$ (see, e.g., [Dghty, 123–124]).

Using Chebyshev’s Inequality to test for random behavior

Here is how Chebyshev’s Inequality and the turning point test are used to investigate the hypothesis that a given data set is random.

Suppose that a finite list of data values is given, where it is assumed that the entries in the list are allowed to come from some interval of real numbers.

As cautioned earlier, if there are a large number of identical adjacent values in the data set, then the hypothesis that the values come from some interval of real numbers is probably unwarranted, and the turning point test as developed here does not apply.

Adjusting the list for occasional identical adjacent values

For any occasional adjacent data values that are identical, delete the repeated value, and decrease $\,n\,$ (the length of the list) by $\,1\,.$ For example, the data list

$$ (1,3,5, \overbrace{2,2},7,6,4,3,9,0,1,5,8) $$

of length $\,14\,$ would be transformed to the list

$$ (1,3,5,2,7,6,4,3,9,0,1,5,8) $$

of length $\,13\,,$ before applying the turning point test.

Let $\,N\,$ denote the length of the (possibly adjusted) data set.

$T_{\text{act}}\,,$ the actual number of turning points

Let $\,T_{\text{act}}\,$ denote the actual number of turning points in the (adjusted) list. Under the hypothesis of random behavior, the expected value and variance of the random variable $\,T\,$ that counts the number of turning points in the list are given by:

$$ \begin{gather} E(T) = \frac 23(N-2) := \mu\cr\cr \text{and}\cr\cr \text{var}(T) = \frac{16N-29}{90} := \sigma^2 \end{gather} $$
Define $\,d := |T_{\text{act}} - \mu|$

Let $\,d := |T_{\text{act}} - \mu|$ denote the distance between the actual and expected number of turning points. By Chebyshev’s Inequality, the probability that the distance of $\,d\,$ or greater between $\,\mu\,$ and $\,T\,$ would be observed, should the data be truly random, is:

$$ P\bigl(|T - \mu|\ge d\bigr) \le \frac{\sigma^2}{d^2} $$ application of Chebyshev's Inequality

If $\,\frac{\sigma^2}{d^2}\,$ is close to $\,0\,,$ then it is unlikely that $\,T_{\text{act}}\,$ turning points would be observed if the data were truly random. In this case, the hypothesis that the data is random would be rejected, and the search for deterministic components could begin.

If $\,\frac{\sigma^2}{d^2}\,$ is close to $\,1\,,$ then there is no reason to reject the hypothesis of random behavior. In this case, it may be fruitless to search for deterministic components.

Example 1: Applying the Turning Point Test

The data graphed below give the biweekly stock price of a mutual fund over a two-year time period.

biweekly stock price of mutual fund over two-year period

There are no identical adjacent values. The total number of data points is $\,N = 61\,.$

The expected number of turning points, under the hypothesis of random behavior, is:

$$ \mu = \frac 23(N - 2)\approx 39.33 $$

The actual number of turning points is

$$ T_{\text{act}} = 28 \,, $$

so that the distance between the actual and expected values is:

$$ d := |39.33 - 28| = 11.33 $$

The variance of T is:

$$ \begin{align} \sigma^2 &= \frac{16N - 29}{90}\cr\cr &= \frac{16(61)-29}{90}\cr\cr &\approx 10.52 \end{align} $$

Chebyshev’s Inequality yields:

$$ P\bigl(|T-\mu|\ge 11.33\bigr) \le \frac{10.52}{(11.33)^2} \approx 0.08 $$

Thus, it is quite unlikely that only $\,28\,$ turning points would be observed, if the data were truly random. The hypothesis of random behavior is therefore rejected, and a search for deterministic components can begin.

‘Local’ random behavior

Some data sets, as in the next example, are ‘locally’ random, and yet exhibit some deterministic behavior from a more ‘global’ point of view. In such instances, short-term data prediction may be unwarranted, whereas longer-term prediction may be possible.

For sufficiently large data sets, the turning point test can be used to help determine the ‘breadth of local random behavior’. This idea is explored in the next example.

Example 2

The data list graphed below was generated within MATLAB by first producing some pure data, via the MATLAB commands

i = [1:100];
y = i/6;

and then introducing noise by use of the MATLAB command rand(A).

For Octave, you need to say  rand(size(A))  instead of  rand(A).
Matlab Command rand(A)

The MATLAB command  rand(A)  produces a matrix the same size as A, with random entries. By default, the random numbers are uniformly distributed in the interval $\,(0,1)\,.$ Then,

2*(rand(A) - 0.5)

gives numbers uniformly distributed in $\,(-1,1)\,.$

The command  rand('normal')  can be used to switch to a normal distribution with mean $\,0\,$ and variance $\,1\,.$ The command  rand('uniform')  then switches back to the uniform distribution.

For Octave, the command  randn  is used to generate random numbers using the standard normal distribution (mean $\,0\,$ and variance $\,1\,$).

Then, the command  randn(size(A))  produces a matrix that is the same size as A, with random numbers generated using the standard normal distribution.

The list  noisey  graphed below was generated by the MATLAB command:

noisey = y + 2*(rand(y) - 0.5);

noisy data with a linear trend
Applying the turning point test to  noisey

The list  noisey  has $\,63\,$ turning points, so $\,T_{\text{act}} = 63\,.$

The list  noisey  has length $\,100\,,$ so $\,N = 100\,,$ and thus $\,\mu = \frac 23(100)\approx 66.67\,$ and $\,\sigma^2 = \frac{16(100)-29}{90} \approx 17.46\,.$

Then, $\,d = |66.67 - 63| = 3.67\,.$

Chebyshev’s Inequality yields:

$$ P\bigl( |T-\mu|\ge 3.67 \bigr) \le \frac{17.46}{(3.67)^2} \approx 1.3 $$

Even though the data clearly illustrate a linear ‘trend’, there is no reason, based on this test, to reject the hypothesis of random behavior.

Here is what the turning point test is revealing in this situation: in moving through the list entry-by-entry, the numbers rise and fall in such a way that they could certainly have been produced by an entirely random process.

Indeed, since the slope of the ‘pure’ line is $\,\frac 16\,,$ and the noise is $\,\pm 1\,,$ it could take more than $\,6\,$ data entries before any increase due to the linear trend is observed.

Prediction of one data value into the future is unwarranted.

noisy data with a linear trend

However, if one were to move through the list by taking, say, every fourth entry, then the ‘local’ random behavior may be overshadowed by the ‘global’ linear trend. This idea is investi- gated in a second, slightly different, application of the turning point test.

Example 2, continued

Generate a new list from noisey, by taking every fourth piece of data. Call the new list noisey4. This is accomplished via the MATLAB command

noisey4 = noisey(l:4:100);

Whereas the time list corresponding to  noisey  has spacing $\,T = 1\,,$ the time list corresponding to  noisey4  has spacing $\,T = 4\,.$ The new list  noisey4  is graphed below, and has length $\,N = 25\,.$

the list noisey4

The list  noisey4  has $\,10\,$ turning points, so $\,T_{\text{act}} = 10\,.$

The list  noisey4  has length $\,25\,,$ so $\,N = 25\,,$ and thus $\mu = \frac 23(25)\approx 16.67\,$ and $\,\sigma^2 = \frac{16(25)-29}{90} \approx 4.12\,.$

Then, $\,d = |16.67 — 10| = 6.67\,.$

Chebyshev’s Inequality yields:

$$ P\bigl(|T-\mu|\ge 6.67\bigr) \le \frac{4.12}{(6.67)^2} \approx 0.09 $$

The hypothesis of random behavior is rejected for noisey4. Thus, predicting future values of the list noisey4, based on identified components, may be warranted. In other words, prediction of $\,4\,$ or more days into the future for the original list  noisey4  may be warranted.

In this example, the data clearly exhibit a linear trend. A method of ‘fitting’ data with a function of a specific form is discussed in Section 2.2.

In general, it is possible, with sufficient data, to continually produce sublists, xk, of a list x, by taking every $\,k^{\text{th}}\,$ entry from x. If the turning point test, when applied to xk, concludes that the hypothesis of random behavior is rejected, then prediction of $\,k\,$ or more units into the future for the original list  x  may be warranted.

MATLAB FUNCTION: Turning Point Test

The following MATLAB function is used by typing

y = tptest(x)

where:  x  is the INPUT row or column vector;  y  is the program OUTPUT.

The output matrix  y  consists of rows, where each row is of the form:

[length nofdup mu k TP P]

The variable  length  is the length of the list produced by taking every $\,k^{\text{th}}\,$ entry from x, and adjusting the resulting sublist to account for adjacent identical values. The number of duplicates found in the list is recorded in nofdup.

The variable  mu  is the expected number of turning points, if the behavior is truly random.

The variable  TP  is the actual number of turning points.

The variable  P  is $\,\frac{\sigma^2}{d^2}\,,$ from Chebyshev’s Inequality:

$$ P\bigl( |T-\mu|\ge d \bigr) \le \frac{\sigma^2}{d^2} $$

The test is repeatedly applied to sublists, until either the list is depleted, or until P $\lt 0.1\,.$

the function tptest

In Octave, the code looks like this:

the function tptest in Octave

The following diary of an actual MATLAB session shows the application of this Turning Point Test to the list  noisey  in Example 2.

an application of the Turning Point Test

Recall that, in Octave, you must say  rand(size(y))  instead of  rand(y) .

Economics Application: Taking Advantage of Turning Points

Introduction

If the hypothesis of random behavior is not rejected for given data, then it may be fruitless to seek deterministic components. However, the economics application discussed in this section shows how one can, even in this situation, often take advantage of the turning points (rises and falls) in stock market data.

The strategy is due to Eliason [Eli]. First, the underlying mathematical theory is presented. Then, an example illustrating the application of this theory to stock market trading is given.

Preliminary notation

Let $\,\boldsymbol{\rm x}(0)\,$ be a given finite list of real numbers.

Let $\,\boldsymbol{\rm x}(n)\,$ denote the list present at the completion of step $\,n\,$ ($\,n \ge 1\,$) in the Martingale Algorithm (below), where $\,\boldsymbol{\rm x}(0)\,$ is the initial input.

Let $\,A\,:\, \{1,2,3,\ldots\}\rightarrow \{W,L\}\,$ be a given function; for $\,n \ge 1\,,$ either $\,A(n) = W\,$ or $\,A(n) = L\,.$

Let $\,P\,$ and $\,B\,$ be functions,

$$ \begin{gather} P\,:\,\{0,1,2,3,\ldots\}\rightarrow \Bbb R\,,\cr B\,:\,\{0,1,2,3,\ldots\}\rightarrow \Bbb R\,; \end{gather} $$

the values assigned to $\,P(n)\,$ and $\,B(n)\,$ (for $\,n \ge 0\,$) are determined by the Martingale Algorithm.

Martingale Algorithm

Initialization, $\,n = 0$

Define $\,P(0) = 0\,.$ If $\,\boldsymbol{\rm x}(0)\,$ has two or more entries, then let $\,B(0)\,$ be the sum of the first and last entries in $\,\boldsymbol{\rm x}(0)\,.$ If $\,\boldsymbol{\rm x}(0)\,$ has only one entry, $\,x\,,$ then let $\,B(0) = x\,.$

STEP $\,n\,$; $\,n\ge 1$
If $\,A(n) = W\,,$ find $\,P(n)\,$ and $\,\boldsymbol{\rm}x(n)$

If $\,A(n) = W\,,$ then let $\,P(n) = P(n-1) + B(n-1)\,.$ In this case, adjust the list $\,\boldsymbol{\rm x}(n-1)\,$ to get the list $\,\boldsymbol{\rm x}(n)\,,$ as follows:

  • If $\,\boldsymbol{\rm x}(n-1)\,$ has more than two entries, then delete the first and last entries to obtain $\,\boldsymbol{\rm x}(n)\,.$
  • If $\,\boldsymbol{\rm x}(n-1)\,$ has only one or two entries, then delete these entries, and STOP the algorithm.
If $\,A(n) = L\,,$ find $\,P(n)\,$ and $\,\boldsymbol{\rm}x(n)$

If $\,A(n) = L\,,$ then let $\,P(n) = P(n-1) - B(n-1)\,.$ In this case, adjust the list $\,\boldsymbol{\rm x}(n-1)\,$ to get the list $\,\boldsymbol{\rm x}(n)\,,$ by appending $\,B(n-1)\,$ to the end of $\,\boldsymbol{\rm x}(n-1)\,.$

Find $\,B(n)$

If $\,\boldsymbol{\rm x}(n)\,$ has two or more entries, then let $\,B(n)\,$ be the sum of the first and last entries in $\,\boldsymbol{\rm x}(n)\,.$ If $\,\boldsymbol{\rm x}(n)\,$ has only one entry, $\,x\,,$ then let $\,B(n) = x\,.$ Go to the next value of $\,n\,.$

The letters $\,A\,,$ $\,P\,,$ and $\,B\,,$ $\,W\,$ and $\,L$

The variable names used in the Martingale Algorithm are suggestive of common roles that these variables play in applications of the algorithm.

The function $\,A\,$ is the ‘Action’’ function; $\,W\,$ denotes a ‘Win’ and $\,L\,$ denotes a ‘Loss’.

The function $\,P\,$ is the ‘Profit’ function, and $\,B\,$ is the ‘Bet’ function. When a WIN occurs, the profit is increased by the previous bet; when a LOSS occurs, the profit is decreased by the previous bet.

Example 1: Applying the Martingale Algorithm

Let $\,\boldsymbol{\rm x} = (1,2,3,5)\,,$ and let $\,A(n) = (L, W, W, L, W,\ldots)\,.$ The table below summarizes the algorithm:

an application of the Martingale Algorithm

Observe that the algorithm STOPPED at $\,n = 5\,,$ and $\,P(5) = 11 = 1 + 2 + 3 +5\,$; that is, when the algorithm stopped, the value of $\,P\,$ is the sum of the digits in the initial list.

Example 2: Applying the Martingale Algorithm With a Different Action List

Suppose that a different action function is used:

$$ A(n) = (L,L,L,W,L,W,W,\ldots) $$

The algorithm is summarized in the table below:

another application of the Martingale Algorithm

This time, the algorithm stopped at $\,n = 10\,,$ but again $\,P(n) = 11\,.$ The next theorem shows that this behavior is no coincidence:

Theorem the series number associated with $\,\boldsymbol{\rm x}$

Let $\,\boldsymbol{\rm x}(0) = (x_1,\ldots,x_m)\,$ be a finite list of real numbers, with $\,m \ge 1\,.$ Let $\,M := x_1 + \cdots + x_m\,$ be the sum of the entries in $\,\boldsymbol{\rm x}(0)\,.$

If a STOP occurs in the Martingale Algorithm at step $\,N\,,$ where $\,\boldsymbol{\rm x}(0)\,$ has been used as the initial input, then $\,P(N) = M\,.$

The number $\,M\,$ is called the series number associated with the list $\,\boldsymbol{\rm x}(0)\,.$

Proof

Definition of $\,T(n)\,,$ the induction statement

The proof is by induction.

For $\,n \ge 1\,,$ let $\,T(n)\,$ be the statement:

‘If the algorithm stops at step $\,n\,,$ then $\,P(n) = M\,$’

A series of $\,W\,$’s is the quickest way to STOP

A series of $\,W\,$’s in the corresponding action list is the quickest way to stop the algorithm. First, action lists containing only $\,W\,$’s are considered. Then, the induction step is applied to action lists that contain at least one $\,L\,.$

$\boldsymbol{\rm x}(0)\,$ has one or two entries; $\,T(1)\,$ is true

If $\,\boldsymbol{\rm x}(0)\,$ has only one entry, $\,\boldsymbol{\rm x}(0) = (x)\,,$ then $\,T(1)\,$ is true (see below).

x(0) has only one entry

If $\,\boldsymbol{\rm x}(0)\,$ has two entries, $\,\boldsymbol{\rm x}(0) = (x,y)\,,$ then $\,T(1)\,$ is true (see below).

x(0) has two entries
$\boldsymbol{\rm x}(0)\,$ has three entries

If $\boldsymbol{\rm x}(0)\,$ has three entries, $\,\boldsymbol{\rm x}(0) = (x, y, z)\,,$ then it takes at least two steps to STOP the algorithm. In this case, $\,T(1)\,$ is vacuously true, and $\,T(2)\,$ is true (see below).

x(0) has three entries
$\boldsymbol{\rm x}(0)\,$ has $\,m\,$ entries, $\,m\,$ is odd

Now suppose $\boldsymbol{\rm x}(0)\,$ has $\,m\,$ entries, $\,\boldsymbol{\rm x}(0) = (x_1,\ldots,x_m)\,,$ where $\,m \ge 4\,.$

If $\,m\,$ is odd, then let $\,m = 2j-1\,$ for $\, j\ge 3\,.$ A series of $\,W\,$’s produces a STOP in $\,j\,$ steps, and $\,T(j)\,$ is true. For $\,1 \le k \lt j\,,$ $\,T(k)\,$ is vacuously true. Only the $\,W\,$’s are shown in the flow chart below.

x(0) has m entries, m odd
$\boldsymbol{\rm x}(0)\,$ has $\,m\,$ entries, $\,m\,$ is even

If $\,m\,$ is even, then let $\,m = 2j\,$ for $\,j \ge 2\,.$ A series of $\,W\,$’s produces a STOP in $\,j\,$ steps, and $\,T(j)\,$ is true. For $\,1 \le k \lt j\,,$ $\,T(k)\,$ is vacuously true.

The action list has at least one $\,L$

Suppose now that the action list has at least one $\,L\,.$ Suppose that $\,T(k)\,$ is true for all $\,k = 1,\ldots, N - 1\,,$ and consider the statement $\,T(N)\,$.

To motivate what follows, consider a typical flow chart that summarizes all possible actions on a list:

x(0) has m entries, m even, at least one L
Important observations

The following observations are important:

Suppose a STOP occurs at step $\,N\,,$ $\,\boldsymbol{\rm x}(N — 1)\,$ has only one entry

Now, the induction argument. Suppose that a STOP occurs in step $\,N\,,$ and suppose that $\,\boldsymbol{\rm x}(N - 1)\,$ has one entry.

The flow chart below is useful in summarizing the results:

a STOP occurs at step N, x(N-1) has only one entry
Notation: $\,P(n)\,$ and $\,\boldsymbol{\rm x}(n)\,,$ $\,P'(n)\,$ and $\,\boldsymbol{\rm x}'(n)$

For ease of notation in what follows, the profit functions and lists are denoted by $\,P(n)\,$ and $\,\boldsymbol{\rm x}(n)\,,$ respectively, along the lower series of $\,W\,$’s, and are denoted by $\,P'(n)\,$ and $\,\boldsymbol{\rm x}'(n)\,$ around the TURN and along the upper series of $\,W\,$’s.

Lower series of $\,W\,$’s
The TURN

Taking the first $\,L\,$ that occurs in step $\,N — k\,,$ one has:

$$ \boldsymbol{\rm x}'(N - k - 1) = (x_{2k-3},\ldots,x_1,x,x_2,\ldots,x_{2k-4})\,, $$

where $\,x_{2k-2}\,$ must equal $\,x_{2k-3}+x_{2k-4}\,,$ since the first and last entries of $\,\boldsymbol{\rm x}'(N-k-1)\,$ are summed and appended to $\,\boldsymbol{\rm x}'(N-k-1)\,$ to produce $\,\boldsymbol{\rm x}(N-k)\,.$

Since

$$ P(N-k) = P'(N-k-1) - (x_{2k-3} + x_{2k-4})\,, $$

it follows that:

$$ \begin{align} &P'(N-k-1)\cr\cr &\quad = P(N-k) + (x_{2k-3} + x_{2k-4})\cr &\quad = P(N) - \bigl(x + x_1 + x_2 + \cdots + x_{2k-3} + \overbrace{x_{2k-2}}^{=x_{2k-3}+x_{2k-4}} \bigr)\cr &\qquad + (x_{2k-3} + x_{2k-4})\cr\cr &\quad = P(N) - (x + x_1 + x_2 + \cdots + x_{2k-3}) \end{align} $$
THIS SECTION IS IN PROGRESS