Due to math content, this page has special requirements (including JavaScript) for full functionality.
With your current viewing scenario, it is not appearing and behaving as it is supposed to!
Please visit Dr. Carol J.V. Fisher's Homepage to learn what this site has to offer.
Watch the "Welcome" video to get started—hope to see you back here soon!

Dr. Carol J.V. Fisher's Homepage

For this exercise, you need INTERNET EXPLORER 6.0 and above, with MathPlayer installed.

MEASURES OF SPREAD

Jump right to the exercises!

Mean and median are measures of central tendency;
that is, they each provide a single number that attempts to describe the center of a collection of data.
The following data sets all have mean equal to  1 :

1, 1, 1, 1, 1                                   -1, 0, 1, 2, 3                                   -1, -1, 1, 3, 3

These three data sets are pictured below (as pebbles of equal weight on a number line).
Notice that each has its balancing point (mean) at  1 , but the data is spread about this mean in very different ways:





Clearly, the mean does not capture any information about the spread or variability of data about the mean.

This exercise explores three common measures of spread: range, variance, and standard deviation.

DEFINITION: range
Let  xmax   and  xmin   denote the greatest and least numbers in a data set, respectively.
The range of the data set is the difference    xmax -xmin  .

Thus, the range is the difference between the greatest and least numbers in the data set.
Since  xmax   is always greater than or equal to  xmin  , it follows that the range is always greater than or equal to zero.

EXAMPLES
The range of the data set  1, 1, 1, 1, 1  is  0 .
The range of the data set  -1, 0, 1, 2, 3  is  3-(-1 )=4 .
The range of the data set  -1, -1, 1, 3, 3  is also  4 .

Since computation of the range uses only two members from a data set,
it is necessarily incomplete in the information that it provides.
However, the range is extremely easy to compute.

Another reasonable way to measure the spread takes into account how far each data element is from the mean:

DEFINITION: deviation from the mean
Suppose a data set has mean  x¯  , and let  xi   denote an element in this data set.
The deviation of  xi   from the mean is given by the formula  xi -x¯  .

From this definition, it is apparent that if a data element is greater than the mean, then its deviation from the mean is positive.
If a data element is less than the mean, then its deviation from the mean is negative.

Merely summing the deviations from the mean is useless as a measure of spread
because the sum of all the deviations is always equal to zero,
as the following calculation shows:

i=1 n (xi -x¯ ) = (x1 -x¯) + (x2 -x¯) +...+ (xn -x¯)

       = (x1 +x 2+...+ xn) -nx¯

       =n x1+ x2+ ...+xn n -nx ¯

       =nx ¯-n x¯ =0

Also, we don't really care whether data elements lie above or below the mean;
we're more interested simply in the distances from the mean.

A reasonable idea is to sum the absolute values of the deviations from the mean, |x i-x ¯|.
However, the absolute value function is not particularly easy to work with.

Instead, we get a good measure of spread by summing the squares of the deviations from the mean,  (x i-x ¯) 2 .
There's just one little problem to resolve first.

DEFINITIONS: population, sample
The entire collection of individuals or objects about which information is desired
is called the population.
A nonempty proper subset (choosing some, but not all) of the population is called a sample.

The formulas for the population mean and the sample mean are identical:
add up the numbers, and divide by how many there are.

The population mean is denoted by  μ  and a sample mean is denoted by  x¯  .

In general, population statistics are reported using Greek letters, like  μ (mu)  and  σ (sigma).

However, sample statistics are reported using Roman letters, like  x  and  s .

The common formulas for measures of spread are slightly different,
depending upon whether you're looking at the entire population, or just a sample from this population, as shown next:

DEFINITIONS: population variance, population standard deviation
Suppose a population has  N  data values with mean  μ .
The variance of the population, denoted by  σ2  , is given by the formula

σ2 = (x-μ) 2 N  ,

where the sum is over all data values  x  in the population.

The standard deviation of the population, denoted by  σ ,
is the square root of the variance:

σ= (x-μ) 2 N  .

To find the variance of a population,
you sum the squared deviations from the mean,
and divide by the number of data values.

DEFINITIONS: sample variance, sample standard deviation
Suppose a sample has  n  data values with sample mean  x¯  .
The sample variance, denoted by  s2  , is given by the formula

s2 = (x-x¯ ) 2 n-1  ,

where the sum is over all data values  x  in the sample.

The sample standard deviation, denoted by  s ,
is the square root of the sample variance:

s= (x-x¯ ) 2 n-1  .

Observe the difference between population variance and sample variance:
for the sample variance, you divide by one less than the number of data values,
instead of the actual number of data values.

Here's one way to understand why:
if you randomly choose a sample from a population,
what's the likelihood that you'll choose both the greatest and the least values,
to represent the true variability in the data set?
NOT MUCH!
A sample tends to underestimate the true variability in a population.
To compensate, we divide by  n-1  instead of  n ;
dividing by a smaller number adjusts the result so it's a bit larger.

Standard deviation has the advantage of having the same units as the data values.
Standard deviation may be informally interpreted as the size of a "typical" deviation from the mean.

On this exercise, you will not key in your answer.
However, you can check to see if your answer is correct.

 
Click on "new problem" to get started!
Want to practice a particular problem type?

Solve:





   (press the "BACK" key to return to this page after printing)