Unit 1 Describing Data

2.4 Measures of Variation

Range, Variance, Standard Deviation

In this course you will encounter 2 different uses of the Greek letter sigma.

Upper case sigma \( \begin{equation} \sum=\end{equation}\) sum

Lower case sigma \(\begin{equation} \sigma= \end{equation}\) standard deviation

Range:
Difference between the minimum and maximum values in the data x max - x min

VERY sensitive to extreme values. Can increase dramatically with the inclusion of one or more outliers.

Standard Deviation:
A measure of variation of all values from the mean.

sample standard deviation is a statistic

\( \begin{equation} s=\sqrt{\frac{\sum(x-\bar{x})^{2}}{(n-1)}} \end{equation}\)

\(\begin{equation} s=\sqrt{\frac{\left(n \sum x^{2}-\left(\sum x\right)^{2}\right)}{n(n-1)}} \end{equation}\)

population standard deviation is a parameter

\(\begin{equation} \sigma=\sqrt{\frac{\sum(x-\mu)^{2}}{N}} \end{equation}\)

Variance:
A measure of the variation equal to the square of the standard deviation.

The sample variance, \(s^2\), is an unbiased estimator of the population variance, \(\sigma^2\).

Range Rule Of Thumb:

  minimum usual value = \(\begin{equation} \mu-2 \sigma \end{equation}\)

  maximum usual value = \(\begin{equation} \mu+2 \sigma \end{equation}\)

Note: if a sample is NOT REPRESENTATIVE of the population, you can’t tell much about the population center or variation based on the sample.


Calculating Standard Deviation

  1. Calculate the standard deviation and variance of the following data set using the formulas. Check your answer with technology.

    Data Set: 3   5   6   7   9  

    \(\begin{equation} \bar{x}_{A}=6 \quad n_{A}=5 \quad s_{A}=2.24 \end{equation}\)

    \(s_{A}=\sqrt{\frac{\sum(x-\overline{x})^{2}}{n-1}}=\sqrt{\frac{(3-6)^{2}+(5-6)^{2}+(6-6)^{2}+(7-6)^{2}+(9-6)^{2}}{5-1}}=\sqrt{\frac{9+1+0+1+9}{4}}=\sqrt{\frac{20}{4}}=\sqrt{5} \approx 2.24\)

  2. Suppose that Samira records the ages of customers at the Snack Bar on a college campus from 12-2pm tomorrow, while Mary records the ages of customers at the McDonald’s near the highway at the same time. Who will have the larger SD of their ages – Samira or Mary? Explain why.

    Mary will have the larger standard deviation because there is more likely to be a greater variation of ages for the customers at the McDonalds on the highway compared to the ages of the customers at the college snack bar.

  3. Compare and contrast the following data sets. Given the standard deviation of Graph 1 is approximately 4.8, estimate the standard deviations of graphs 2-4. Record your observations. A number line segment from 0 to 11, counting by 1. There are five dots above the 0, one dot above the 5 and 5 dots above the 10.

    Graph 1: Mean=5, Median=5

    Standard Deviation=4.77

    A number line segment from 0 to 11, counting by 1. There are two dots above the 0, seven dots above the 5 and 2 dots above the 10.

    Graph 2: Mean=5, Median=5

    Standard Deviation=3.02

    A number line segment from 0 to 11, counting by 1. There is one dot above the 0, nine dots above the 5 and 1 dot above the 10.

    Graph 3: Mean=5, Median=5

    Standard Deviation=2.13

    A number line segment from 0 to 11, counting by 1. There are eleven dots above the 5.

    Graph 4: Mean=5, Median=5

    Standard Deviation=0

  4. The graphs below show the results for the medal race of the mens' 400 meters at the 2012 and 2016 games. What do you notice about these times? How are the times from the two years similar? How are they different? Which year do you think had a more "exciting" race? Explain.
  5. What was the mean time in each of the 2 years? On the dotplots shown, sketch vertical lines which you feel best represent the mean for each year.

    A number line titled Mens 400 Meter Final Results that goes from 43.0 to 46.0, counting by 0.5. Two sets of data are plotted on the number line. The first set represents the results for the medal race in London in 2012. The recorded times are 43.9, 44.5, 44.5, 44.8, 44.8, 44.8, 44.9, and 45.2. The second set of data is the results for the medal race in Rio in 2016. The recorded times are 43.0, 43.8, 43.9, 44.1, 44.3, 44.4, 44.5, and 44.7.

    The picture is identical to the previous one with a red vertical line drawn at the mean. The mean for the London Olympics is 44.7 and the mean for the Rio Olympics is 44.0.

    The standard deviations from each year are: 2016: 0.471 seconds 2012: 0.349 seconds. Explain to somebody who has never seen standard deviations before how these numbers allow us to compare the results from 2012 and 2016.

    2016's higher standard deviation shows the 8 runners' times are individually farther from the mean (from "average").



    Using Standard Deviation to Identify Unusual Values in the Data Set

  6. Fuel Economy: A sample of midsized cars have the given fuel efficiencies (Combined City/Highway MPG). Use the sample data below to answer the questions.

    (Data from fueleconomy.gov)

    23   20   58   32   12   26   27   20   40   24
    1. Find the max and min USUAL values. Sketch the number line and shade the unusual regions.

        Using Technology   Mean, \(\bar{x}=28.2\)   Standard Deviation, \(s=12.9\)

        Minimum USUAL value = \(\bar{x}-2s=28.2-2(12.9)=2.4\)

        Maximum USUAL value = \(\bar{x}+2s=28.2+2(12.9)=54\)

      A number line that goes from 0 to 70, counting by 10. There is a point on the number line at 28.2 and Mean=28.2 is written above it. There is an open circle at 2 and the number line is shaded from that open circle to the left infinitely. There is an open circle on 54, and the number line is shaded from that open circle to the right infinitely.

    2. Is the value of 58 MPG unusual? Yes
    3. Is the value of 28 MPG unusual? No
  7. Load the data set from StatCrunch Agricultural Land Percentages. The amounts of agricultural land (% of land area) for a sample of countries are listed.

    (2009 Data from World Bank)

    Link to accessible version of the Agricultural Land Percentages Accessible Agricultural Land Percentages

    1. Find the max and min USUAL values. Sketch the number line and shade the unusual regions:

        Using Technology  Mean, \(\overline{x}=38.14\)   Standard Deviation, \(s=22.81\)

        Minimum usual value = \(38.14 - 2(22.81) = -7.48\)

        Maximum usual value = \(38.14 + 2(22.81) = 83.76\)

      A number line that goes from -100 to 100, counting by 50. There is a point plotted at 38.14 and Mean=38.14 is written above it. There is an open circle at -7.48 and the number line is shaded from that number to the left infinitely. There is an open circle on 83.76 and the number line is shaded from that number to the right infintely.

    2. Are there any unusual values in the data? Yes, 84.6 Uruguay
  8. Calculating Measures of Variations in Binned Data

    Mean:  \(\bar{x}=\frac{\sum x_{i} f_{i}}{\sum f}\)

    Standard Deviation  \(s=\sqrt{\frac{\sum\left(x_{i}-\overline{x}\right)^{2} f_{i}}{n-1}}\)

  9. Estimate the mean and standard deviation of the following frequency distribution using the formulas above. Confirm your answers using technology.
    Class Frequency
    0 - 8 6
    9 - 17 10
    18 - 26 5
    27 - 35 3
    36 - 44 1

    \(\bar{x}=\frac{(4 \times 6)+(13 \times 10)+(22 \times 5)+(31 \times 3)+(40 \times 1)}{25} \approx 15.88\)

    \(s=\sqrt{\frac{\left((4-15.88)^{2} \times 6\right)+\left((13-15.88)^{2} \times 10\right)+\left((22-15.88)^{2} \times 5\right)+\left((31-15.88)^{2} \times 3\right)+\left((40-15.88)^{2} \times 1\right)}{25-1}}\)

    \(s \approx \sqrt{\frac{2384.64}{24}} \approx \sqrt{99.36} \approx 9.9679\)

  10. Estimate the mean and standard deviation for the age of a U.S. President at the time of inauguration. (This is ALL the U.S. presidents, so this is a population. You need to use the unadjusted variance and standard deviation.)
    Age Frequency
    40 - 44 2
    45 - 49 7
    50 - 54 13
    55 - 59 12
    60 - 64 7
    65 - 69 3
    70 - 74 1

    \(\mu = 55.1\) years

    \(\sigma = 6.7\) years.

    Note, the sample standard deviation would be \(s = 6.8\) years. You must calculate the unadjusted standard deviation for a population dataset.

  11. Using technology, estimate the mean and standard deviation for one student’s grade information from their first year of college.
    Grade Credit Hours
    A 9
    B 6
    C 14
    D 0
    F 3

    \(\overline{x} = 2.6\)

    \(s = 1.2\)

    A frequency graph with grades on the x-axis and frequency on the y-axis.  The grades are labeled from left to right A, B, C, D, and F.  The frequencies are labeled on the top of each bar above the grades.  A has 9, B has 6, C has 14, D has 0 and F has 3.

  12. The Empirical (68-95-99.7) Rule for Data With a Bell Shaped Distribution

    ABOUT 68% of all the values fall within 1 standard deviation of the mean.  \(\mu \pm \sigma \)

    ABOUT 95% of all the values fall within 2 standard deviations of the mean.  (the other 5% are unusual) \(\mu \pm 2 \sigma\)

    ABOUT 99.7% of all the values fall within 3 standard deviations of the mean.   \(\mu \pm 3 \sigma\)

    A picture of a bell shaped curve. The mean is at the highest point of the curve, the top of the bell. 34% of the area under the curve is one standard deviation to the left of the mean and one standard deviation to the right of the mean. One standard deviation to the left of the mean is labeled mu minus sigma. One standard deviation to the right of the mean is labeled mu plus sigma. 13.5% of the area under the curve is between one standard deviation and two standard deviations to the left of the mean and to the right of the mean. Two standard deviations are labeled mu minus two times sigma and mu plus two times sigma. 2.35% of the area under the curve is between two standard deviations and three standard deviations to the left of the mean and to the right of the mean. Three standard deviations from the mean are labeled mu minus three times sigma and mu plus three times sigma. 0.15% of the area under the curve is more than three standard deviations from the mean. Usual data is between two standard deviations to the left of the mean and two standard deviations to the right of the mean. Unusual data is more than two standard deviations from the mean.

    As an example, assume \(\mu=10 \text { and } \sigma=2\)

    Maximum usual value \(=\mu+2 \sigma=10+2(2)=14\)

    Minimum usual value \(=\mu-2 \sigma=10-2(2)=6\)

  13. IQ SCORES: IQ scores have a bell-shaped distribution with a mean of 100 (\(\mu = 100\)) and a standard deviation of 15 (\(\sigma = 15\)). What percentage of IQ scores are:
    A picture of a bell shaped curve. The mean is at the highest point of the curve, the top of the bell.  34% of the area under the curve is one standard deviation to the left of the mean and one standard deviation to the right of the mean.  13.5% of the area under the curve is between one standard deviation and two standard deviations to the left of the mean and to the right of the mean.  2.35% of the area under the curve is between two standard deviations and three standard deviations to the left of the mean and to the right of the mean.  0.15% of the area under the curve is more than three standard deviations from the mean.

    A picture of a bell shaped curve. The mean is at the highest point of the curve, the top of the bell. The mean is 100. 34% of the area under the curve is one standard deviation to the left of the mean and one standard deviation to the right of the mean. One standard deviation to the left of the mean is 85. One standard deviation to the right of the mean is 115. 13.5% of the area under the curve is between one standard deviation and two standard deviations to the left of the mean and to the right of the mean. Two standard deviations to the left of the mean is 70 and two standard deviations to the right of the mean is 130. 2.35% of the area under the curve is between two standard deviations and three standard deviations to the left of the mean and to the right of the mean. Three standard deviations to the left of the mean is 55 and three standard deviations to the right of the mean is 145. 0.15% of the area under the curve is more than three standard deviations from the mean.

    1. Between 70 and 130? \(13.5+34+34+13.5=95 \%\)
    2. Below 115? \(0.15+2.35+13.5+34+34=84 \%\)
    3. Above 70? \(13.5+34+34+13.5+2.35+0.15=97.5 \%\)
    4. Between 85 and 130? \(34+34+13.5=81.5 \%\)
    5. Usual Values are between        and       . \(70 \text { and } 130\)

      Mimimum usual value: \(\mu-2 \sigma=100-2(15)=70\)

      Maximum usual value: \(\mu+2 \sigma=100+2(15)=130\)

  14. Sketchy Milk Company: A dairy fills gallons of milk with a mean of 125.0 fluid ounces and a standard deviation of 0.3 fluid ounces. The volumes have a bell-shaped distribution. Using the empirical rule, what is the approximate percentage of milk amounts: A picture of a bell shaped curve. The mean is at the highest point of the curve, the top of the bell.  34% of the area under the curve is one standard deviation to the left of the mean and one standard deviation to the right of the mean.  13.5% of the area under the curve is between one standard deviation and two standard deviations to the left of the mean and to the right of the mean.  2.35% of the area under the curve is between two standard deviations and three standard deviations to the left of the mean and to the right of the mean.  0.15% of the area under the curve is more than three standard deviations from the mean.

    A picture of a bell shaped curve. The mean is at the highest point of the curve, the top of the bell. The mean is 125.0. 34% of the area under the curve is one standard deviation to the left of the mean and one standard deviation to the right of the mean. One standard deviation to the left of the mean is 124.7. One standard deviation to the right of the mean is 125.3. 13.5% of the area under the curve is between one standard deviation and two standard deviations to the left of the mean and to the right of the mean. Two standard deviations to the left of the mean is 124.4 and two standard deviations to the right of the mean is 125.6. 2.35% of the area under the curve is between two standard deviations and three standard deviations to the left of the mean and to the right of the mean. Three standard deviations to the left of the mean is 124.1 and three standard deviations to the right of the mean is 125.9. 0.15% of the area under the curve is more than three standard deviations from the mean.

    1. Between 124.4 fluid ounces and 125.6 fluid ounces? \(13.5+34+34+13.5=95 \%\)
    2. Below 124.7 fluid ounces? \(0.15+2.35+13.5=16 \%\)
    3. Above 124.7 fluid ounces? \(34+34+13.5+2.35+0.15=84 \%\)
    4. Between 125.0 fluid ounces and 125.6 fluid ounces? \(34+13.5=47.5 \%\)
    5. Usual values are between        and       ? \(124.4 \text { and } 125.6\)

      Minimum usual value: \(\mu-2 \sigma=125.0-2(0.3)=124.4\) fluid ounces

      Maximum usual value: \(\mu+2 \sigma=125.0+2(0.3)=125.6\) fluid ounces