The Normal Curve
The Normal or "bell shaped" curve is very well known and understood in statistics. It has lots of uses. Before we can talk about many of the uses, we have to talk about the curve itself. The mathematical formula for the curve is

Where
Y = height of the curve for a particular X (Y and X say where to draw the curve)
p
= a constant = 3.1416e = base of natural log = constant = 2.7183
N = cases; the total area under the curve is N
m
and s = population mean and standard deviation, respectively.
The height of the curve represents the relative frequency of scores. The function itself is not important. The shape of the distribution changes with only two parameters, m and s , (mean and standard deviation; location of center and average distance from center) so if we know these, we can determine everything else.
Because the shape of the curve is known, the relative frequency and percentage of scores at various points of the curve are also known. For example, half of the scores fall above the mean of the curve. The standard normal curve has a mean of zero and a standard deviation of 1.0. It looks like this:

Remember the z Score (to find z from X or raw score):

To find the raw score from the z score:

Number line:
|
|
|
|
- |
- |
- |
- |
- |
0 |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
|
|
|
|
|
5 |
4 |
3 |
2 |
1 |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
|
|
X |
|
|
|
|
|
|
|
|
X |
X |
X |
X |
X |
|
|
|
|
|
|
|
X+2 |
|
|
|
|
|
|
|
|
X |
X |
X |
X |
X |
|
|
|
|
||
|
X*2 |
|
|
|
|
|
|
|
X |
|
X |
|
X |
|
X |
|
X |
|
||
|
X |
|
|
|
|
|
X |
X |
X |
X |
X |
|
|
|
|
|
|
|
|
|
|
X+2 |
|
|
|
|
|
X |
X |
X |
X |
X |
|
|
|
|
|
|
|
||
|
X*2 |
|
X |
|
X |
|
X |
|
X |
|
X |
|
|
|
|
|
|
|
||
Score transformations have a shift and a stretch. The shift moves the whole distribution up or down the number line. Shift corresponds to changes in the mean. To change shift, add or subtract a number. Stretch moves the spread of the scores. Spread corresponds to changes in the standard deviation. To change spread, multiply or divide. For stretch to work properly, you have to use the deviates. That is, set the mean of the distribution to zero. Otherwise, when you multiply or divide, you will move the mean at the same time you change the standard deviation (see the example above).
Interpreting a z score:
If a z score is zero, it's on the mean.
If a z score is positive, it's above the mean.
If a z score is negative, it's below the mean
The value of the z score tells how many standard deviations above or below the mean it is. A z score of 2 is two standard deviations above the mean; z scores greater than 1 and less than -1 are farther than the average score in distance from the mean.
Examples of raw to z
|
X |
|
X- |
s |
z =( |
|
5 |
3 |
2 |
2 |
1 |
|
6 |
3 |
3 |
2 |
1.5 |
|
7 |
3 |
4 |
2 |
2 |
|
8 |
3 |
5 |
2 |
2.5 |
|
5 |
3 |
2 |
4 |
.5 |
|
6 |
3 |
3 |
4 |
.75 |
|
7 |
3 |
4 |
4 |
1 |
|
8 |
3 |
5 |
4 |
1.25 |
|
5 |
10 |
-5 |
4 |
-1.25 |
|
6 |
10 |
-4 |
4 |
-1 |
|
7 |
10 |
-3 |
4 |
-.75 |
|
8 |
10 |
-2 |
4 |
-.50 |
If X is normally distributed, there will be a correspondence between the standard normal and the z score.

Using tabled values of the Normal to estimate percentages
|
Z |
Between mean and z |
Beyond z |
|
Z |
Between mean and z |
Beyond z |
|
0.00 |
0.0 |
50.00 |
|
0.90 |
31.5 |
18.41 |
|
0.10 |
3.98 |
46.02 |
|
1.00 |
34.13 |
15.87 |
|
0.20 |
7.93 |
42.07 |
|
1.10 |
36.43 |
13.57 |
|
0.30 |
11.79 |
38.21 |
|
1.20 |
38.49 |
11.51 |
|
0.40 |
15.54 |
34.46 |
|
1.30 |
40.32 |
09.68 |
|
0.50 |
19.15 |
30.85 |
|
1.40 |
41.92 |
08.08 |
|
0.60 |
22.57 |
27.43 |
|
1.50 |
43.32 |
06.68 |
|
0.70 |
25.80 |
24.20 |
|
1.60 |
44.52 |
05.48 |
|
0.80 |
28.81 |
21.19 |
|
1.70 |
45.54 |
04.46 |
If z = 0, 50 percent of the scores are above it. If z =1.50, 6.68 percent are above it, 43.32 are between 0 and z, and 50 percent are below the mean. (Scores like to hug the mean.) Draw pictures to solve these problems.
Q: What z score separates the bottom 70 percent from the top 30 percent of the scores?

Q: What z score separates the top 10 percent from the bottom 90 percent?

A percentile rank is the percentage of cases up to and including the one in which we are interested. The SAT and GRE scales are scored so that the mean is 500 and the standard deviation is 100 (referring or normed to people long ago so they are equivalent over years). The scores are approximately normally distributed, so we can find percentile ranks using the z score and the normal distribution.
Q: What is the percentile rank of an SAT verbal score of 600?
A: First we find the z score [600-500)/100]=1. Then we find the area for z=1.

We discover that the area below 600 is 50 + 34.13 or 84.13. The percentile rank is about 84 percent.
Q: Suppose our basketball coach wants to estimate how many entering freshmen will be over 6'6'' (78 inches) tall. Further suppose the mean height of entering freshmen is 68 inches, that the standard deviation of height is 6.67 inches, and that there will be 1,000 entering freshmen. What is your estimate?

A: first we have to turn the desired height into z score. Z = [(78-68)/6.67] = 1.499 = 1.5. People beyond 1.5 are 6.68 percent (the tallest people). If there were 100 people, 6.68 would be expected to be taller than 6' 6"; there are 1000, so 66.8 or 67 people should be taller than 6'6'' (.0668*1000=66.8).
Probability
Probability: Long range expected relative frequency of an event
[number of things of interest/number of possible things]
Probability refers to an infinite number of trials
Examples:
Coin toss p(head)=1/2 - heads/(heads+tails)
Dice roll p(1)=1/6 - 1/(1+2+3+4+5+6)
Card draw p(Aª )=1/52 (ace of spades/all cards)
These are probabilities of single events.
Note that the Normal distribution also gives us long run expected frequencies. For any z scores, we can calculate an area and a percentage. The percentage can be interpreted as a probability. For example, 34.14 percent of the scores fall between the mean and a standard deviation above the mean. Therefore the probability that a high school student chosen at random walking into a gym to be tested will score between 500 and 600 on the SAT-V is .34, or about 1/3.
How rare does an event have to be before we decide it did not happen by accident? Coin or die fixed? This is the basic problem of inferential statistics. We will use sample data to make decisions about the population based on probabilities.