Factor Analysis

1. What is factor analysis? It is a statistical technique which essentially boils down a correlation matrix into a few major pieces. It tells you which variables "go together." Factor analysis is really an assumed causal model. If we give a bunch of tests to a bunch of people, we can compute a correlation matrix that shows how highly related each test is to every other test. We will assume that tests are correlated because they share one or more common causes -- some underlying factor or factors. (A figure showing what's what is contained on the top of page 3).

Factor analysis is a statistical technique with several uses in test development and evaluation, including:

Here, of course, we are mainly interested in factor analysis as a tool for construct validation, and that means for theory testing.

 

2. Factor analysis estimates the paths from the factors to the observed variables. If the factors are not correlated, the paths represent the correlation between the factor and the variable. (See the bottom of page 3.)

Variable

Factor

 

V

VR

MR

Vocab 1

.9

0

0

Vocab 2

.9

0

0

Verbal Reasoning 1

0

.9

0

Verbal Reasoning 2

0

.9

0

Mechanical Reasoning 1

0

0

.9

Mechanical Reasoning 2

0

0

.9

 

Factor analysis can also be used to estimate correlations among the factgors. For our example, The factor correlation matrix would look like this:

Factor

V

VR

MR

V

1

 

 

VR

.7

1

 

MR

.6

.6

1

Number of Factors

Factor analysis is sometimes called the psychometrician's Rorschach because unlike many statistical techniques, using factor analysis requires so many judgments. Before we can estimate the paths from the factors to the observed variables, we have to know how many factors there are. Generally, we have to guess, because factors cannot be directly observed. There are many ways to determine the number of factors. I will briefly describe two ways: (a) prior theoretical reasons and (b) the scree test.

Theoretical reasons. Sometimes you have ideas about the number of different things in a matrix based on test content. For exmaple, if you have several tests which measure mathematical reasoning (e.g., geometry problems, algebra problems, maximum-minimum problems) and several tests which measure verbal reasoning (e.g., analogies, vocabulary, synomyms), you would expect two factors, one for math and one for verbal. The idea would be that the underlying math factor would explain the correlation among the math tests, and an underlying verbal factor would explain the correlations among the verbal tests.

Scree test. The scree test was developed by a guy named Cattell. "Scree" is a term from geology. The scree is the rubble at the bottom of a clif. If you take a correlation matrix, you can decompose it into independent weighted combinations of the original variables (these combinations correspond to factors). Each set will have some variance associated with it. The idea in the scree test is that if a factor is important, it will have a large variance. What you do is order the factors by variance, and plot the variance against the factor number. Then you keep the number of factors above the "elbow" in the plot. These are the important factors which account for the bulk of the correlations in the matrix. It's called a scree test because the graph looks a bit like where a cliff meets the plain. In looking at a cliff, you might want to decide where the cliff stops and the plain begins. With the scree test you see where the important factors stop and the unimportant ones start. What we can do is to create a plot of the eigenvalues (variances) against their serial order.

 

The scree plot might look something like this:

Now the geological analogy comes home. We look for the "elbow" and keep the number of factors above the elbow. On this graph, the elbow is either at 2 or 3, so we would keep one or two factors. The idea is that when there are substantive factors, the slope of the line will be steep. When the factors correspond to error or random numbers, the slope will be flat. The elbow is the place where we move from the good, substantive factors to the bad, error factors. This is analogous to moving from the cliff to the plain. The scree tells where the change takes place, and thus the term 'scree test.'

 

Rotation

After you pick the number of factors you want, you tell the computer to estimate the paths from the factors to the observed variables. The computer will do this, and print a table for you. Usually the table that is printed is hard to interpret. The initial table of paths is called an unrotated solution. We would like to understand, label, or name the factors we have. It's easy to understand the factors if the observed variables only correlate highly with a single factor. If variables are correlated with multiple factors, then who knows what the factors are. Exercise in naming factors:

 

 

Factor

Variable

Unrotated

Rotated

Messy

 

I

II

I

II

I

II

Geometry

.80

.80

.88

.03

.70

.70

Algebra

.83

.80

.83

.05

.85

.00

Max-Min

.80

.70

.70

.00

.60

.50

Analogies

.90

-.90

.08

.90

.70

.70

Vocab

.88

-.88

.10

.90

.20

.10

Synonyms

.85

-.80

.05

.85

.40

.40

 

Here the first factor (rotated) is about math, and the second is about verbal. The unrotated matrix tells sort of the same story, but all the variables appear correlated with the first factor, and the second factor contrasts the math and verbal tests. The messy factor pattern is hard to interpret. Something is wrong with this one. Rotation is a means for estimating factor loadings so that the factors are easily interpretable. Once upon a time rotation was done by hand, but now it's done by computer. There are several different ways a computer can use to rotate, but we will not be exploring them here. All of them attempt to make the results more interpretable.