Reliability

Cosistency or Repeatability

Classical Test Theory

 

Assumption 1: X = T + E,

where X is an observed score

T is a True Score

E is an error.

 

Assumption 2:

The correlation between true scores and error scores is zero.

Conclusion 1

The variance of test scores is the sum of the variance of true score and the variance of error.

Observed

True

Error

10

8

2

8

9

-1

.

.

.

 

.

.

10

.

.

10

10

0

Mean

 

 

50

50

0

Variance

 

 

100

90

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Conclusion 2

The correlation between parallel tests equals the ratio of true score variance to observed variance.

 

 

The correlation between parallel tests is an index of reliability.

The trick in classical test theory is to use the correlation between two tests to estimate the amount of error in the tests.

The magic of classical test theory is that we can know the amount of variance of true scores and error scores without ever knowing the actual values of the true and error scores.

Estimating Reliability

 The major forms:

1. Test - retest. Give the same test to the same people at some later time, and compute the correlation. Time lag varies.

2. Alternate forms. Two forms of the test (form A, form B) are developed and given at the same time to a group of people. I could make up two tests for tests and measures covering the same topic but with slightly different questions.

--> Some people also talk about alternative forms with delay, where there is a time lag and the items are different.

3. Internal consistency. One test is given once. The individual items themselves or else groups of items serve as their own alternate (mini) forms.

Split half- cut test in two, compute correlation. Then correct for short test with Spearman Brown Prophecy Formula.

Alpha - only one estimate. Theoretically estimates the expected correlation with a "randomly parallel" test.

 

Examples of traits measured and expectations of high or low estimates of reliability.

Internal Consistency

high: verbal analogies (a:b::c:d), mood

low: general knowledge (trivial pursuit sports vs geography), biodata

Alternate Forms

high: analogies, mood

low: simulations (assessment centers), biodata

Test-retest

high: analogies, general knowledge (trivial pursuit), biodata

low: mood

Types of error