Central Tendency

We are interested in variables, things that vary. Variables can take on two or more values over a group of people at one or more times; that is, variables have distributions.

Distributions can be graphed, or they can be summarized by a few numbers. Such summary numbers are called statistics or parameters. The two most important summary numbers describe the two most important characteristics of a distribution, central tendency and variability. Central tendency tells us about the middle of the distribution. Variability tells us about the spread of the distribution about the middle.

Two distributions with the same spread and different middles:

Perhaps height, weight in men and women.

There are 3 summaries of central tendency we will cover: the mode, the median, and the mean.

The Mode

The mode is (1) the most frequently occurring score in the frequency distribution or (2) The midpoint of the most populous class interval.

Can have bimodal and multimodal distributions.

 

The Median

The score that separates the top 50 % of scores from the bottom 50 %, the score value at the 50th percentile.

Even number of scores: median is half way between the two middle scores.

1 2 3 4 | 5 6 7 8 -- the median is a 4.5

Odd number of scores: the median is the number in the middle

1 2 3 4 5 6 7 -- the median is 4.

For grouped frequency data, the median is the midpoint of the class interval that contains the median (50th percentile).

 

Class interval

Mid point

f

 

Cum f

16-18

17

4

26

13-15

14

5

22

10-12

11

8

17

7-9

8

4

9

4-6

5

3

5

1-3

2

2

2

26/2 = 13 so median is between scores 13 and 14. That's in ci 10-12 with midpoint 11. Median is 11. Same procedure for both even and odd number of scores.

 

For ungrouped data with multiple scores at each frequency, we will use the location of the median as the value of the median.

For example

1 2 2 3 3 3 4 4 5 -- (odd total) the median is 3.

1 2 3 4 5 6 7 8 9 -- this line is for counters

1 2 2 3 3 3 4 4 -- (even total) the median is 3.

T & S use a more complicated formula with real limits of class intervals. We will ignore; correct but not used much in psychology. y (Psychology) typically uses the location as the value of the median.

Mean or arithmetic mean

Sum of scores in the distribution divided by the number of people (scores). The symbol for the sample mean is X-bar ( ); the symbol for the population mean is mu (m ).

Sample mean:

Where S (sigma) means "the sum of" or "add 'em all up." This reads X-bar, the sample mean, is the sum of all scores (X) divided by the number of people (scores), N.

Population mean:

Note that the formula for the population mean is the same as that for the sample mean. To find the population mean, add up all the scores and divide by the number of scores.

The mean for a frequency distribution is

where fX means the frequency for a class interval.

X

f

fX

 

10

1

10

 

9

3

27

 

8

2

16

 

7

4

28

 

6

6

36

 

5

5

25

 

Total

N

S fX

 

21

142

=142/21 = 6.76

 

 

Deviation from the mean (Spread)

Little X is found by subtracting the mean from the score. Little x is the deviation from the mean. We subtract the mean from the score instead of the score from the mean so that when the score is larger than the mean, the deviation is positive and when the score is smaller than the mean, the deviation is negative.

 

Raw scores (X)

 

 

9

 

 

 

8

9

10

 

7

8

9

10

11

The mean is 9 (81/9). If we subtract 9, we get:

Deviation scores (x)

 

 

0

 

 

 

-1

0

1

 

-2

-1

0

1

2

The mean has the property that the sum of deviations from the mean is zero, that is

The mean always has this property. Deviations from the median and mode only sum to zero when the median and mode equal the mean. (That is, by accident; the mode and median usually don't have this property).

 

Comparison of Mode, Median and Mean

 

Mode

1) The mode is useful for nominal variables (the best selling brand of car is (say) a Ford, the modal psychology major is female). The mode tells you the way to bet (knowing nothing else, the odds are a psychology major is female).

2) It can be used to tell the largest number of scores in a continuous distribution (e.g., the modal number of publications of a new psychology Ph.D. is zero).

Two other, less important properties of the mode:

1) usually descriptive rather than inferential (about sample rather than population)

2) quick and easy to compute

  

Median

Used for

1) distributions with bad shapes (skewed or extreme observations).

2) distributions that contain scores that are arbitrary ceiling or floor scores.

Example of extreme score:

 

 

6

 

 

 

 

 

 

 

 

5

6

7

 

 

 

 

 

 

4

5

6

7

8

 

 

 

...

26

Mode = 6, Median = 6, Mean = 80/10 = 8

Mean

1) Used for inferential as well as descriptive purposes. Mean varies less from sample to sample than median or mode, so it better estimates the parameter.

2) Based on all data in distribution.

3) Generally preferred to median and mode except when data are problematic (extreme values, arbitrary values). Most commonly used measure of central tendency in y .