Chapter 5: Describing Distributions Numerically
I'll concentrate below on instructions for using the TI-84 and Data Desk to compute mean, median, standard deviation and IQR of a data set, and to draw boxplots.
More Property Taxes
For our first example, let's work with the house data set previously encountered in the Chapter 2 and 4 Resources:
| house | size | assess | lot | taxes | stories |
| 20911 | 1561 | 304.0 | 0.20 | 2604 | 1 |
| 20912 | 1038 | 297.6 | 0.20 | 280 | 1 |
| 20918 | 1224 | 289.5 | 0.17 | 2353 | 1 |
| 20921 | 1232 | 292.8 | 0.17 | 756 | 1 |
| 20924 | 1995 | 314.6 | 0.17 | 2620 | 2 |
| 20927 | 1714 | 322.7 | 0.18 | 2632 | 1 |
| 20930 | 1832 | 336.1 | 0.18 | 2779 | 2 |
| 21003 | 1095 | 279.0 | 0.18 | 2321 | 1 |
| 21006 | 2011 | 319.5 | 0.18 | 2663 | 2 |
| 21015 | 1366 | 289.3 | 0.18 | 2415 | 1 |
| 21018 | 1292 | 301.4 | 0.18 | 2477 | 1 |
| 21023 | 1458 | 314.3 | 0.18 | 1386 | 1 |
| 21028 | 2031 | 320.9 | 0.18 | 2676 | 2 |
| 21105 | 1366 | 304.0 | 0.18 | 2473 | 1 |
Let's compute the summary statistics for the 2007 assessed value variable .
Enter the assessed value data into a list, say L1, then press 2ND and QUIT (above the MODE key), as you did in Chapter 4 to create a histogram. Now press STAT, move the cursor to the right to highlight CALC and notice that 1-Var Stats is already highlighted:

Press ENTER and type L1 (2ND and then the 1 key). Your screen should look like this:

Now press ENTER. The calculator will display many different values:

These values are:
- x (the mean)
- ∑x (the sum of the values in L1; you can ignore this for now)
- ∑x2 (the sum of the squares of the values in L1; you can ignore this as well)
- Sx (the standard deviation of the values in L1; we simply call it s)
- sx (the population standard deviation; we'll talk about this in Chapter 6, but we will NEVER use the TI-84 to calculate this, so always ignore this part of the 1-Var Stats output)
- n (the number of data values in L1)
So far we can see that the mean of the 2007 assessed property values for homes in my neighborhood is $306,121, with a standard deviation of $15,898. You could compute these statistics "by hand" but it would take a ridiculously long time: ALWAYS use the calculator or computer to compute summary statistics, especially the standard deviation.
But there's more! Use the down cursor to scroll down the screen as far as you can. You should see:

We can now read off
- minX (the minimum data value in L1)
- Q1 (the first quartile of the values in L1)
- med (the median of the values in L1)
- Q3 (the third quartile of the values in L1)
- maxX (the maximum of the values in L1)
We call these five quantities the 5-number summary for the data set. The median 2007 assessed value of a home in this neighborhood is $304,000. The IQR is given by IQR = Q3-Q1 = 319.5-292.8 = $26,700. Note that the TI-84 doesn't report the IQR directly, but it's a simple subtraction problem once we know Q1 and Q3.
Boxplots with the TI-84
To draw a boxplot of the assessed value data, follow the instructions in the Chapter 4 Resources for making a histogram, but choose the boxplot (or modified boxplot) icon instead of the histogram icon:

Then use ZoomStat to get the boxplot:

We can see a bit more clearly from the boxplot that the data is skewed positively (but notice that we can't tell if the data set is unimodal or bimodal from the boxplot, so we should look at both a histogram and a boxplot whenever possible). Note again that the axis isn't labeled and no scale is indicated, so this would not be a satisfactory graph on a HW solution, exam or project.
Summary statistics from frequency tables
Recall the example from the Chapter 5 Resources with data about the number of attempts students in my Fall 2006 online class made on a 5-point quiz. We displayed the number of attempts like this:
| attempts | count |
| 0 | 3 |
| 1 | 8 |
| 2 | 8 |
| 3 | 4 |
| 4 | 2 |
| 5 | 2 |
| 6 | 1 |
As before, we can enter the number of attempts (the left column) into one list (L1) and the counts into the next list (L2). Now type 1-Var Stats L1 as above, but then type , (a comma, above the 7 key) and then L2:

Now press ENTER. to get the summary statistics for the quiz attempts by the 28 students:

Boxplots with Data Desk
To use a computer to make a boxplot, use Data Desk. Import the houses.txt data file (from the preceding link or from the Data Sets folder in the online classroom) into Data Desk, as we did in the Chapter 4 Resources. Click on the assess variable so that the variable's icon has a Y over it:
![]()
then click on Plot and select BoxPlot Side by Side:

You should see something like this

You can adjust the plot options by clicking on the hyperview menu (the triangle in the upper-left corner of the boxplot window) and selecting BoxPlot Options:


since you have no idea what this means yet; you can also select Set Defaults to make this the default display option.
As with the histogram in Chapter 4, you can make the boxplot window larger by clicking on the lower right corner of the window and dragging it across the screen. The variable name in our Data Desk boxplot is labeled and a scale is indicated on the axis, which is better than the TI-84, but the units are still missing. This would be better:

although I again had to hack this using Photoshop.
Summary Statistics with Data Desk
To compute summary statistics of the 2007 assessed value variable, select the assess variable as Y (as before) and click Calc, then Summaries and then Reports:

You should see output like this:

If you don't see all of the statistics that you want, click the hyperview menu and choose Select Summary Statistics.

Select or deselect the appropriate checkboxes and click OK:

As we saw from the calculator, the mean assessed value is $306,121 with a standard deviation of $15,898.
Median and IQR vs. Mean and Standard Deviation
Keep in mind that you should never simply compute the summary statistics and report them: you should also draw a picture, such as a histogram, boxplot, or stem-and-leaf display. (This is fairly easy to do if you already have the data in the calculator or computer, and it's a good idea to draw the picture before you compute the summary statistics since a picture is often the easiest way to see that you have made a data entry error.)
If the data is roughly unimodal and symmetric, then the mean and standard deviation are usually the most appropriate measures of center and spread, respectively, for the data set; if, on the other hand, the data is strongly skewed or has one or more major outliers, you should report the median and IQR.
A boxplot of the 2006 property tax data for these homes reveals three outliers:

so we should report the median and IQR for the property tax variable, not the mean and standard deviation. If you do see a major outlier, you should investigate it: if it was the result of a data-entry error, you should correct it; if it was something that never should have been included in the data set in the first place (such as the age of the teacher in a data set consisting of the ages of students in a second-grade class), you can remove it; if it was reported in the wrong units (e.g. someone reporting their height in feet rather than inches) you can convert to the proper units. But you should never remove a data point just because it's an outlier.
You might, however, decide to report the summary statistics both with the outlier included and with it omitted. In the property tax data set, three of the homes are owned by senior citizens who participate in a program that freezes their property taxes (although they or their estate have to pay all of the deferred taxes when the home is sold). This explains the outliers, so we might choose to analyze the remaining 12 homes; if the remaining data is roughly unimodal and symmetric, then we could report the mean and standard deviation for the property taxes of a homeowner in this neighborhood not involved in the deferred-tax program.
Comparing groups
Use Data Desk to create a histogram of the size data from the houses.txt data set. You should get something like this:

which appears bimodal. We certainly shouldn't report the mean and standard deviation for a variable like this. In fact, there may be two separate groups here.
With the histogram still open, double-click on the stories variable to open up the variable that lists the number of stories in each house.

Now click on Modify and then Palettes to open up the Data Desk palettes (if some things disappear instead of appear, then click this again to make them reappear).

Click on the knife symbol to select it:

Next hold down the SHIFT key and click on the rightmost bar of the histogram:

You should see that the all the houses in this upper group correspond to the 2-story houses on the data set. Perhaps it would be wise to investigate the 1-story and 2-story houses separately.
Click on the size variable to select it as Y, then hold down the SHIFT key and click on the stories variable to select it as X:

Now click on Plot and Boxplot y by x:

You should see side-by-side boxplots, like this:

Clearly the 2-story houses are bigger than the 1-story houses—which is not terribly surprising! You can make side-by-side boxplots on the TI-84 as well, but you'll need to manually enter the 1-story house sizes into one list and set up a boxplot of it (as described above) and then manually enter the 2-story house sizes into another list and set up another boxplot using Plot2 instead of Plot1; when you press GRAPH you should see both boxplots.
Homework
Work the following problems in Chapter 5: 11, 15, 21, 25, 27, 29, 33, 39 and 45. (As usual, you are encouraged to work additional problems.)
Errata
This is more of an omission than an error, but the full data set of the HALE values for the examples on pp. 72–75 can be found in the data set folder on the CD and on the Intro Stats texbook Web site (look for the file called Ch05_HALE.txt).
The data set for exercise #15 is also on the CD and Web site (Ch05_Population_growth.txt) even though this is not indicated by the usual T icon.
ActivStats
Work the activities on pages 5-1 through 5-4 in the ActivStats lesson book, as time permits
Additional Resources
- Describing Distributions
- Episode 3 from Against All Odds features a discussion of means, medians, quartiles and boxplots.
- Decisions Through Data: Boxplots
- Unit 5 of Decisions Through Data talks about boxplots and Unit 6 discusses standard deviation.
- Carnegie Mellon: Introduction to Statistics
- This open source course has a lesson called "One Quantitative Variable: Numerical Measures" that may be of interest (see Unit 2, Module 1).
- Sofia: Elementary Statistics
- Lessons 2.3 and 2.4 of the Sofia Open Content Initiative's Elementary Statistics course include a discussion of summary statistics.
- Boxplot tool
- A Java applet for creating boxplots.
- TI-83 Resource: 1-VarStats
- Instructions on creating a histogram with the TI-83; check out the link about entering data into lists if you having difficulty with that part of the process.
- TI-83/84 Troubleshooting
- Guide to some common errors encountered when using the TI-84.