Definition of Test Bias
A biased test is one in which there are systematic differences in the meaning of test scores associated with group membership. Another way of saying this is that a biased test is one in which people from two groups who have the same observed score do not have the same standing on the trait of interest. A third way of saying this is that using a test to predict some criterion of interest results in systematic over- or under- prediction based on group membership.
Examples: racist performance appraisal, opening a jar in U.S. and Germany.
Fairness has to do with how a test is used. Fairness and bias are not the same thing. A judgment of fairness rests on values and reasonable people may disagree about the fairness of a test when both agree about the facts of the matter. Suppose we use a test to decide who will be admitted to college. An individualist may say that the test should be administered to all those who apply and those with the highest scores should be admitted, regardless of race, sex, or other group membership, even if this means that some groups will be admitted in greater numbers than others. Others may say that admissions should be in proportion to the numbers from each group that apply, so the test should be used to select those who have high scores in different groups so that the proper proportions should be admitted.
A biased test may be used fairly. Suppose that a test is biased such that males score 10 points higher on average than do females. If we simply add 10 points to the observed scores of the females and use that score for making decisions, the biased test will prove to be fair in use.
Models of Test Bias
The most intuitive definition of bias is observation of a mean difference between groups. So for example, if we saw that females scored higher than males on the SAT Verbal test we might suspect that the test is biased. However, the mean difference by itself is a bad choice of models of bias. This is because a mean difference could demonstrate bias, but it could also reflect a real difference between groups. If you measure the height a representative sample of adult males and females in the U.S. with a tape measure, you will find that males are taller on average. Does this mean that the tape measure is biased? People differ in lots of ways, so finding a mean difference between groups doesn’t necessarily mean that the test is biased. On the other hand, finding no mean difference doesn’t necessarily mean lack of bias. If you developed a new tape measure that showed no mean difference between males and females in height, the new measure would be biased, because there really is a difference. In essence your new measure would be adding inches to the height of females, and this is what we defined bias to be.
The most widely accepted (but not the only) model of test bias is the regression model (a.k.a. the Cleary model). This model places bias into the context of the interpretation of test scores (that is, validity), where it should be. The model says that if different groups share the same regression line, the test is not biased (even if there are differences in means across groups). If the groups have different regression lines, then the test is biased because it is measuring different things for different groups. The model says that people with the same test scores should do equally well on some external criterion. For example, if the test is not biased, then blacks and whites with the same SAT score will show the same freshman grade point average. On the other hand, if the SAT is biased against blacks, then blacks with the same SAT scores as whites will have higher freshman GPAs.