Statistics is the discipline devoted to organizing, summarizing, and drawing conclusions from data.
Given a collection of data, it is often convenient to come up with a single number that
somehow describes its center;
a number that in some way is representative of the entire collection.
Such a number is called a measure of central tendency.
The two most popular measures of central tendency are the mean and the median.
Another measure sometimes used to describe a ‘typical’ data value is the mode.
The mean (or average) is already familiar to you: add up the numbers, and divide by how many there are:
Similarly, the mean of the $\,n\,$ data values
$\,y_1, y_2, y_3, \ldots, y_n\,$
would be denoted by
$\,\bar{y}\,$
and read as ‘$\,y\,$ bar’.
Since dividing by $\,n\,$ is the same as multiplying by $\,\frac{1}{n}\,$,
the notation
$\displaystyle\,\frac{\sum_{i=1}^n\ x_i}{n}\,$
is more commonly written as
$\,\frac 1n\sum_{i=1}^n\ x_i\,$
or
$\displaystyle\,\frac 1n\sum_{i=1}^n\ x_i\,$ .
Notice in the previous example that the number $\,25\,$ seems to be unusually large, compared to the
other numbers.
An outlier is an unusually large or small observation in a data set.
A drawback of the mean is that its value can be greatly affected by the presence of even a single outlier.
If the outlier $\,25\,$ is changed to $\,250\,$,
then the new mean would be $\,32.125\,$,
which does not seem at all representative of a ‘typical’ number
in this data set!
The median, on the other hand, is quite insensitive to outliers.
Just as the median strip
of a highway goes right down the middle,
the median of a set of numbers goes right through the
middle of the ordered list.
Of course, only lists with an odd number of values have a true middle:
the middle number in the ordered list
$\,5,\ 7,\ 20\,$
is $\,7\,$.
See how the definition below solves the problem
when there are an even number of data values:
Finally, a mode is a value that occurs ‘most often’ in a data set.
Whereas a data set has exactly one mean and median,
it can have one or more modes.
For example, consider these data values:
$\,2,\ -1,\ 2,\ 3,\ 0,\ 25,\ -1,\ 2\,$
Re-group them occurring to their frequency:
$$
\begin{align}
\cssId{s102}{2,\ \ 2,\ \ 2,}\ \ \ \ &\cssId{s103}{\text{three occurrences of the number 2}}\\
\cssId{s104}{-1,\ \ -1,}\ \ \ \ &\cssId{s105}{\text{two occurrences of the number -1}}\\
\cssId{s106}{0,}\ \ \ \ &\cssId{s107}{\text{one occurrence of the number 0}}\\
\cssId{s108}{3,}\ \ \ \ &\cssId{s109}{\text{one occurrence of the number 3}}\\
\cssId{s110}{25}\ \ \ \ &\cssId{s111}{\text{one occurrence of the number 25}}
\end{align}
$$
The mode of this data set is $\,2\,$,
since this data value occurs three times,
and this is the most occurrences of any data value.
Every member of the data set $\,3,\ 7,\ 9\,$ is a mode,
since each value occurs only once.
The data set $\,3,\ 3,\ 7,\ 7,\ 9\,$ has two modes:
$\,3\,$ and $\,7\,$.
Each of these numbers occurs twice,
and no number occurs more than two times.
On this exercise, you will not key in your answer. However, you can check to see if your answer is correct. |
PROBLEM TYPES:
|