|
TABLE 1.2 Reading achievement scores for seventh grader |
||
|
Class |
Number of students |
Percent |
|
2.0-2.9 |
9 |
.95 |
|
3.0-3.9 |
28 |
2.96 |
|
4.0-4.9 |
59 |
6.23 |
|
5.0-5.9 |
165 |
17.42 |
|
6.0-6.9 |
244 |
25.77 |
|
7.0-7.9 |
206 |
21.75 |
|
8.0-8.9 |
146 |
15.42 |
|
9.0-9.9 |
60 |
6.34 |
|
10.0-10.9 |
24 |
2.53 |
|
11.0-11.9 |
5 |
.53 |
|
12.0-12.9 |
1 |
.11 |
|
Total |
947 |
100.01 |
A Puzzle with A Cute Solution
Suppose you have a table of data like the one given above. You have enough information to construct a histogram.
But: What is the mean and standard deviation of the data?
How do you get around only having a table summary?
Answer: Treat the data in class 2.0-2.9 as 9 copies of the mid-class value of 2.5
Treat the data in class 3.0-3.9 as 28 copies of 3.5 etc.
Your 'approximate' data list becomes:
{2.5, …, 2.5, 3.5, …, 11.5, …, 11.5, 12.5}
9 times 28 times 5 times
Formulae:
|
|
fi = # of students in class i mi = midpoint value of class i n = 947 |
|
|
|
Text (page 66) says:
![]()
(pretty close!… works best when the unobserved values in the classes are close to the middle)
1.3 Mathematical Modelling and Density Curves

Suppose you were given the following plot of information
Suppose somebody asked you "To a reasonable degree of accuracy, what do the data show about the relationship between gas consumption and distance?"
You might say… there is evidence of an increasing or positive relationship.
You may go so far as to say:

Note: you have to wait until Econ. 3210 to learn how to measure that straight line relationship.
By summarizing the relationship as a straight line you have given up some detail (i.e. the relationship is not exactly a straight line) but have gotten, in return, a very accurate mathematical model (i.e. a straight line relation) that is easy to understand, explain and work with (e.g. forecast gas use for other values of distance.)
In a sense, the mathematical model (straight line is like an 'idealized' representation of the relationship.
In this chapter (1.3) we do the same kind of thing for distributions: we construct idealized forms called density curves and spend most of our time looking at a particular density curve called the NORMAL.
Density Curves
We can think of a density curve as a smooth idealized version of a relative frequency histogram.
The density curve is an abstraction (soon we will get a test to see if the abstraction is reasonable). It is often easier to work with a density curve than a histogram or a stem plot. It shows more detail than something like a box plot or a 5 number summary but it is not without its faults: it tends to smooth over irregularities such as outliers.
Examples:



Properties that Density Curves Inherit (from relative frequency histograms)


Normal Distribution/Density Curve
(Note: there are often density curves that have these properties but the Normal is 'special' in many other ways.)
(1)
What is so neat about the Normal Density?
This is always true: Try out a blind test on Figure 1.12 and then compare it with what we already know to be true.
68%: 68% of the observations (area/relative frequency) lie in an interval one standard deviation either side of the mean
![]()
95%: 95% of the observations (area) lie in an interval two standard deviation either side of the mean

99.7%: 99.7% of the observations (area) lie in an interval three standard deviation either side of the mean
![]()

Special Case Normal Density:
Called the Standard Normal Density.