Taylor-Russell Tables
Cronbach: Validity is not "the degree to which a test measures what it purports to measure." Validity is a property of the inferences based on test scores. Validity refers to the quality of decisions or judgments we make given test scores. For example, tests are used to decide who to hire for a job. The validity of a test for that decision is how useful the test is in hiring. A test may be used to decide whether a person is schizophrenic, obsessive-compulsive, or alcoholic. The validity of the test is evaluated in terms of how well it classifies clients into their respective categories.
Hull (a very famous psychologist at the beginning of this century) noted in 1928 that psychological tests rarely predicted job performance with a correlation greater than .30. He took this to mean that tests could never be of much use in selecting people because they did not predict job performance well.
Taylor and Russell (1939) answered Hull in what is one of the most famous papers in industrial and organizational psychology. Their answer was that sometimes tests could be very useful in selecting people even though the correlation between test scores and job performance was not very high. They also noted that sometimes tests are not very useful even when the correlation between test scores and job performance is very high.
According to Taylor and Russell, there are three important factors to consider when judging the usefulness (i.e., validity by Cronbach's definition). They are (a) the correlation between the test score and job performance, (b) the base rate of success on the job, and (c) the selection ratio.
Show T & R quadrants before proceeding to definitions.
The correlation. Other things being equal, the larger the correlation between test scores and job performance, the more useful the test will be. If the correlation is 0, then the test is useless because people with high test scores are no more likely to be successful on the job than people with low test scores. If the correlation is 1.0, then the test predicts job performance perfectly, and picking people with high test scores will always give better job performance.
(Draw examples)
The base rate. Taylor and Russell divided incumbents by their job performance into two groups: successes and failures. If you are doing well enough on the job, you are a success; otherwise you are a failure. This ratio is taken of current incumbents who have not been selected by the test. It is the base rate of success by applicants without using the test. It answers the question "if we hired everyone who applied, what proportion of people would be successful on the job?" It turns out that other things being equal, tests are most useful when the ratio of success to failure is 50/50, and tests get less useful as the ratio moves toward 100/0 or 0/100. Consider 100/0, where everyone who applies is successful. Then the test is useless because it cannot improve on perfect success. On the other hand, consider 0/100, where everyone who applies fails. The test cannot be useful because it cannot pick anyone who will succeed on the job. On the other hand, if the ratio is about 50/50, and the test can pick better people, it can improve the 50/50 ratio of success on the job considerably.
(Draw pictures)
Selection ratio. The selection ratio is the number hired divided by the number who applied. If 100 people apply and 50 are hired, the selection ratio is .5. If 100 people apply and 10 are hired, the selection ratio is .1. We will assume that the top people (i.e., those who score highest on the test) will be selected. For example if we are selecting 10 of 100, we will take the top 10 scorers. In general, other things being equal, the smaller the selection ratio, the more useful the test becomes.
(Draw pictures)

Go over examples in Taylor Russell Tables.
Two major points:
(1) Validity changes with the decision or context of the use of the test. A test which is valid in one situation may not be valid in another situation. Q: Imagine testing for factory jobs in different parts of the country. Can you think of a test that might be valid in one part of the country but not in another?
(2) Tests that do not predict job performance well can be extremely valid if the base rate of success is near .5 and the selection ratio is small. Consider the use of the SAT and GRE.
Base Rate Review
The amount of improvement possible over the BR is set by the BR. The TR tables give the proportion successful if you use the test. The BR is shown at the top of the table. The improvement due to testing is given by the difference between the BR and the proportion successful if you use the test. If BR is 1.00, what improvement is possible? A: None. If the BR if .50, what is the maximum possible improvement? A: .50, because 1.00 - .50 = .50. If the BR is high, e.g., .80, then 80 percent of the people succeed, and the maximum improvement is .20, less than .50. If the base rate is low, say .20, then even very selective situations do not result in everyone succeeding. Perhaps the best that can be achieved is about 50 percent, which results in a gain of .30 (.50 - .20), still less than .50.
Use of tests in context
Review use of clinical tests in Army and in industry.
Introduce utility -- value of actions. This is important for required paper on the use of the SAT.
There are 4 quads:
|
Rejected Would Succeed |
Accepted Succeed |
|
Rejected Would Fail |
Accepted Fail |
Everyone (employers, admissions people, applicants) agree that it is good to accept people who succeed and reject people who fail, and that it is bad to accept people who fail, and reject people who would succeed. Applicants and policy makers disagree most about the value of rejected and would succeed. Business and to some extent academic institutions don't care much about the rejected but would have succeeded, except in spectacular cases (rejected for tenure, won Nobel Prize). Applicants care very much about this category. Setting a proper cutoff point (a reject or accept criterion for a test score) depends on your values about the outcomes.