Reliability
Cosistency or Repeatability
Classical Test Theory
Assumption 1: X = T + E,
where X is an observed score
T is a True Score
E is an error.
Assumption 2:
The correlation between true scores and error scores is zero.
Conclusion 1
The variance of test scores is the sum of the variance of true score and the variance of error.
|
Observed |
True |
Error |
|
10 |
8 |
2 |
|
8 |
9 |
-1 |
|
. |
. |
. |
|
|
. |
. |
|
10 |
. |
. |
|
10 |
10 |
0 |
|
Mean |
|
|
|
50 |
50 |
0 |
|
Variance |
|
|
|
100 |
90 |
Conclusion 2
The correlation between parallel tests equals the ratio of true score variance to observed variance.
The correlation between parallel tests is an index of reliability.
The trick in classical test theory is to use the correlation between two tests to estimate the amount of error in the tests.
The magic of classical test theory is that we can know the amount of variance of true scores and error scores without ever knowing the actual values of the true and error scores.
Estimating Reliability
The major forms:
1. Test - retest. Give the same test to the same people at some later time, and compute the correlation. Time lag varies.
2. Alternate forms. Two forms of the test (form A, form B) are developed and given at the same time to a group of people. I could make up two tests for tests and measures covering the same topic but with slightly different questions.
--> Some people also talk about alternative forms with delay, where there is a time lag and the items are different.
3. Internal consistency. One test is given once. The individual items themselves or else groups of items serve as their own alternate (mini) forms.
Split half- cut test in two, compute correlation. Then correct for short test with Spearman Brown Prophecy Formula.
Alpha - only one estimate. Theoretically estimates the expected correlation with a "randomly parallel" test.
Examples of traits measured and expectations of high or low estimates of reliability.
Internal Consistency
high: verbal analogies (a:b::c:d), mood
low: general knowledge (trivial pursuit sports vs geography), biodata
Alternate Forms
high: analogies, mood
low: simulations (assessment centers), biodata
Test-retest
high: analogies, general knowledge (trivial pursuit), biodata
low: mood
Types of error