home

Factor Analysis

Factor analysis is a statistical technique with several uses in test development and evaluation, including:

Basically, factor analysis tells us what variables group or go together. Factor analysis boils down a correlation matrix into a few major pieces so that the variables within the pieces are more highly correlated with each other than with variables in the other pieces. Factor analysis is actually a causal model. We assume that observed variables are correlated or go together because they share one or more underlying causes. The underlying causes are called factors.

Basic Concepts

The following page shows two examples of diagrams that represent relations between observed variables, factors, and errors. Looking at the top of the page, we can see three factors (F1, F2, and F3). The factors are not observed, and are represented by circles. The factors may be correlated. Correlations are represented by curved arrows. The straight arrows represent regression coefficients, or in the case of uncorrelated factors, correlations coefficients. The factors are assumed to be causes of the observed variables. Causal direction is represented by a straight arrow with a single head. The head of the arrow indicates the direction of cause to effect. In the diagram, the first factor (F1) causes variation in the first observed variable (X1). The variance in X1 not attributable to the factors is an error term called a uniqueness. (Technical detail -- a uniqueness is made up of all the variance in an observed variable not predicted by the factors. This would include classical test theory unreliability plus any true score variance not predictable by the factors). This is represented in the diagram by e1.

In the second example (bottom half of the page), we have made the stuff in the diagram more concrete. Suppose we have three underlying cognitive abilities: Vocabulary (V, which word means the same thing as pristine?), Verbal Reasoning (VR, if all rabbits are animals and all animals are brown, are all rabbits brown?), and Mechanical Reasoning (e.g., the Bennett Mechanical Comprehension Test, with questions about the smoothness of a ride on a bus, the action of gears and pulleys, levers and planes, and so forth). Correlations among the factors are .7 for the two verbal factors, and .6 from the verbal factors to the mechanical reasoning factor. The causal arrows from the factors to the observed variables show regression coefficients equal to .9, which is large assuming that the factors and observed variables are in standard score form (0,1). The errors are small (all equal to .19) under the same assumptions. Note that there are two tests for each factor.

 

 

It is customary to collect the regression or correlation coefficients that relate the factors to the observed variables into a table. This table is said to contain Factor Loadings; it is sometimes called the Factor Pattern Matrix. The correlations among the factors are also collected into a table, which is said to contain Factor Correlations or to be called the Factor Correlation Matrix.

The regression of observed variables onto the factors can be made more explicit by writing a linear model for each variable, thus:

The regression coefficients are factor loadings, and they are collected into a Factor loading or Factor Pattern Matrix. The factor pattern matrix from the bottom example would look like this:

Factor Pattern Matrix

Variable

Factor

 

V

VR

MR

Vocab 1

.9

0

0

Vocab 2

.9

0

0

Verbal Reasoning 1

0

.9

0

Verbal Reasoning 2

0

.9

0

Mechanical Reasoning 1

0

0

.9

Mechanical Reasoning 2

0

0

.9

 

The factor pattern matrix is also sometimes called F, LX, or L .

The factor correlation matrix would look like this:

Factor Correlation Matrix

Factor

V

VR

MR

V

1

 

 

VR

.7

1

 

MR

.6

.6

1

The factor correlation matrix is also known as F , and when the factors are not correlated, the factor correlation matrix will be an identity matrix, I.

Questions about the basic concepts?

Primary Factor Equations

The equations for the observed variables were shown earlier. Matrix equations are used to show the computation of the correlation matrix or covariance matrix, that is, the relations between the observed variables given the factors. In classical descriptions of factor analysis when the factors are orthogonal (uncorrelated), the primary equation is:

where R is a population correlation matrix, F is a factor pattern matrix, and U2 is a diagonal matrix of uniqueness values. That is, U2 is a square symmetric matrix that has uniqueness values on its main diagonal, and zeros elsewhere.

R is of order [# observed variables, # observed variables],

F is of order [# observe variables, # factors],

and U2 is of order [# observed variables, # observed variables].

With correlated (oblique) factors, we often use notation found in Linear Structural Relations (LISREL). The same equation is then written:

where S has the same meaning as R, the population correlation matrix,

L has the same meaning as F, the factor pattern matrix,

F means the factor correlation matrix, already defined, and

Q has the same meaning as U2.

In our example, the equation might look something like this:

 

 

S (Population Correlation Matrix

 

V1

V2

VR1

VR2

MR1

MR2

V1

1

 

 

 

 

 

V2

.81

1

 

 

 

 

VR1

.57

.57

1

 

 

 

VR2

.57

.57

.81

1

 

 

MR1

.49

.49

.49

.49

1

 

MR2

.49

.49

.49

.49

.81

1

equals

L * F * L ' + Q

.9

0

0

 

 

 

 

 

 

 

 

 

 

.2

0

0

0

0

0

.9

0

0

 

 

 

 

 

 

 

 

 

 

0

.2

0

0

0

0

0

.9

0

 

 

 

 

 

 

 

 

 

+

0

0

.2

0

0

0

0

.9

0

1

.7

.6

.9

.9

0

0

0

0

 

0

0

0

.2

0

0

0

0

.9

.7

1

.6

0

0

.9

.9

0

0

 

0

0

0

0

.2

0

0

0

.9

.6

.6

1

0

0

0

0

.9

.9

 

0

0

0

0

0

.2

(Note. The entries in Q should actually be .19, but that creates format problems.)

The population parameters are generally unknown, and are therefore estimated from sample data using the following series of steps:

  1. Collect data and compute the sample correlation matrix (estimate R or S ).
  2. Subtract error from the main diagonal (subtract U2 or Q ).
  3. Decompose the resulting matrix into k factors so that either

or

Decisions, decisions.

Factor analysis is sometimes called the psychometrician's Rorschach because it's something of an art form where lots of decisions have to be made. According to some, you can see into the results of factor analysis most anything you want. Just how much you can bend the results to meet your expectations or desires is debatable. However, there is no question that several questions need to be answered before you can finish an analysis.

1. Communalities and uniquenesses. Before we can decompose the reduced matrix, that is (R - U2), we have to know what the errors or uniqueness are (we have to know U2). The difference between 1 and the uniqueness is the proportion of variance in the observed variable due to the factors. This proportion is called the communality. Communalities and uniquenesses sum to one, so that if we know one of them, we can find the other by subtraction. There are several different ways to estimate communalities. One way is to assume that they are equal to 1.0, that is, all the variables are completely predicted by the factors. This kind of analysis has a special name, principal components analysis. Principal components analysis is not actually factor analysis. Philosophically, principal components is done for data reduction rather than to explain observed variables with underlying factors. (For more on this, take a course on factor analysis.) You should generally avoid principal components analysis if you want to do factor analysis.

A second method for estimating communalities is to use the squared multiple correlation (SMC) as an initial estimate. What you do is to take the first variable, treat it as a dependent variable in a regression equation in which all the other variables are independent variables. You find the R-square for the is equation (that is, the squared multiple correlation), and use this as the communality estimate. For the second variable, you treat it as a dependent variable, but the first variable, the third variable and all others but the second variable are independent variables. Compute R-square for this one and you have the second communality. You do this once for each observed variable. This method is called principal axis or principal axes factor analysis. This is the method I usually use. There are other methods for estimating communalities, but you need a factor analysis class for these.

When we start the analysis, we will be using SMCs as communality estimates. These are called initial communality estimates. After the factor analysis is done, we can find or compute the actual communality values that result. These will be our final communality estimates.

2. The number of factors. Before we can decompose the matrix into k factors, we have to know or to decide what the number k should be. There are several different approaches to doing this:

1. Prior theoretical notions. This would include both strong and weak theory. For example, we might have a personality theory that suggests three factors (id, ego, superego). Or we might have written items to tap four different facets of job satisfaction (work, pay, coworkers, supervision).

2. The scree test was invented by a fellow named Cattell. Scree is a term from geology. Scree is rubble at the bottom of a cliff. Where the cliff and plain meet, there is this scree. Now in factor analysis, we extract a number of factors, and the factors are related to the observed variables by a set of regression weights. We will apply the set off weights to factors to create as series of composites. Each composite will correspond to a factor. Each composite will have a variance associated with it. The variance associated with the composite formed by a factor is called an eigenvalue. Factor analysis finds factors that have a certain pattern of eigenvalues. The first factor has the largest possible eigenvalue, that is, the biggest composite variance (subject to some constraints you don't need to know about here). The second factor will have the largest possible variance subject to being uncorrelated with the first one. The third will have the largest possible variance subject to being uncorrelated with the first two, and so forth. Each subsequent factor has a maximum variance subject to being uncorrelated with earlier factors. What we can do is to create a plot of the eigenvalues (variances) against their serial order. The plot might look something like this:

Now the geological analogy comes home. We look for the "elbow" and keep the number of factors above the elbow. On this graph, the elbow is either at 2 or 3, so we would keep one or two factors. The idea is that when there are substantive factors, the slope of the line will be steep. When the factors correspond to error or random numbers, the slope will be flat. The elbow is the place where we move from the good, substantive factors to the bad, error factors. This is analogous to moving from the cliff to the plain. The scree tells where the change takes place, and thus the term 'scree test.'

3. Parallel Analysis. Parallel analysis is based on a Monte Carlo study done by Montanelli & Humphreys (1976). What they did was to generate random numbers to find correlation matrices of various numbers of variables and people. They then factor analyzed the resulting correlation matrices. They did this over and over until they had a sampling distribution of scree plots. The took the means of the scree plots, so we know what to expect for random data where there are NO FACTORS but a certain sample size and number of variables. We can compare our sample results to their no factor results. If our sample eigenvalues are larger than theirs, we probably have a real factor. On the other hand, if ours are smaller, we have a factor smaller than that expected from random numbers, and is almost certainly due to error. I will show you how to do this in greater detail later. Parallel analysis tends to overfactor, so it puts an upper limit on the number of factors to extract.

4. Interpretability (naming of factors). In this approach, several different numbers of factors are extracted, and the final solution that is kept is the one that makes the most sense. For example, if you select 6 factors and only 4 make sense, you might try 4. If 4 still make sense, you might keep 4.

5. Keep the number of factors that have eigenvalues greater than 1.0. This is not a good rule, so don't use it. You will see some people doing this, mostly in older articles. The idea was that each input variable had an input variance of 1.0 (the diagonal of the correlation matrix), so a factor ought to have at least this much variance to be kept.

3. Rotation

In garden variety exploratory factor analysis, we get an initial factor pattern matrix. This factor pattern matrix shows the regression weights or correlations between the factors and observed variables. The entries in the matrix correspond to the eigenvalues, that is, the entries are the weights that result in the first eigenvalue being the largest variance, the second being the largest uncorrelated with the first, and so on. (Technical note: the columns of the factor pattern are called eigenvectors, they are the sets of weights that create the composites that result in the different eigenvalues.) The initial factor pattern matrix is unrotated. Unfortunately, the unrotated matrix is usually hard to interpret, so several different methods of rotation have been developed to make interpretation easier. There are two main classes of rotation, orthogonal and oblique. Orthogonal rotations require that the factors remain uncorrelated; oblique rotations allow the factors to become correlated. In all cases, interpretation is easiest if we achieve what is called simple structure. In simple structure, each variable is highly associated with one and only one factor. If that is the case, we can name factors for the observed variables highly associated with them.

For example

 

Factor

Variable

Unrotated

Rotated

Messy

 

I

II

I

II

I

II

Geometry

.80

.80

.88

.03

.70

.70

Algebra

.83

.80

.83

.05

.85

.00

Max-Min

.80

.70

.70

.00

.60

.50

Analogies

.90

-.90

.08

.90

.70

.70

Vocab

.88

-.88

.10

.90

.20

.10

Synonyms

.85

-.80

.05

.85

.40

.40

In this example, the rotated factor pattern matrix shows simple structure. Note that each variable loads (correlates) highly with one and only one factor. We can see that that first factor has to do with mathematical variables (geometry, algebra, and maximum-minimum problems) and the second factor has to do with verbal variables (analogies, vocabulary, and synonyms). The unrotated and rotated matrices tell the same story, but the rotated matrix is easier to understand. The messy matrix is hard to interpret. What might the factors be? Algebra correlates highly with only the first factor, but geometry correlates highly with both factors, as does analogies. Vocabulary, on the other hand, correlates highly with neither factor.

1. Orthogonal rotation. In the old days, factor analysis was computed by hand. My major professor did a factor analysis for his dissertation. I believe that the computations took 6 months of full time work. They used two people with calculators. At each step of the way, they would compare computations and proceed if the numbers agreed. Otherwise, they would have to compute the step over again. They used graphical hand rotation to find simple structure. (Technical note: There is a very nice geometric interpretation of factor analysis in geometry in which factors are axes in multidimensional space. Read Nunnally.) Now, there one analytical algorithm that everybody uses to do orthogonal rotation, VARIMAX. Varimax is available on all the popular statistical software programs.

2. Oblique rotation. Unlike orthogonal rotation, in which a clear winner has emerged, there are several different oblique rotation programs that are being used. In my opinion, there is little reason to prefer one to another. However, I have a favorite that I will show you in SAS. It is called PROMAX. Promax begins with a VARIMAX rotation so that simple structure is achieved as best it can be with orthogonal factors. Then a target matrix is computed. The target matrix is found by raising all the elements in each column to a power and then dividing through by the largest element in the column. This has the effect of making the larger loadings closer to 1.0 and the smaller loading closer to 0, and thus to simple structure. Then what happens is that the unrotated factor pattern matrix is brought into maximum congruence with the target matrix while allowing the factors to become correlated. This is done with the Procrustes rotation. This is another term I believe from Cattell. Procrustes was an innkeeper in Greek legend. He claimed that everyone fit his bed. Those who were initially too short were stretched with a rack until they fit nicely; those who were initially too tall were shortened with a blade. A rather gruesome notion, but I think you get the idea; the rotation will try to best fit the target.

Uses of Factor Analysis in Scale Development and Validation

1. Item analysis. Factor analysis can be used to create subscales of items in a test. For example, in a job satisfaction scale, we might find several different factors corresponding to satisfaction with the work itself, supervision, pay and so forth. We could use the analysis to delete items based on the following criteria:

1. Low final communality (fails to load highly on any factor).

2. Small loading on proper factor (e.g., an item from the work scale doesn't load on the work factor).

3. Large loadings on the wrong factor (e.g., and item from the work scale loads highly on the supervision factor).

Some people advise us to avoid using factor analysis on items for several reasons. One reason is that we often get factors that correspond to characteristics of the distribution of responses rather than content. For example, we may get factors that correspond to easy and hard items. We may get factors of positive and negative items just because a few people missed the NOT in some of the negative items. Another reason is that the distribution of responses and errors cannot be normally distributed (even approximately) with variables that only have 9 or less possible values. This matters if we are going to use maximum likelihood estimates or significance tests. In my opinion you certainly have to watch out for bogus factors. However, when the factors correspond to meaningful content differences, factor analysis presents a very powerful tool for creating multiple scales with high internal consistency and good discriminant validity. High internal consistency will result if you choose items that all have high factor loadings on the same factor (there is a mathematical relation between the loadings and alpha). If you delete items that load on the wrong factor, you promote discriminant validity.

2. Scale validation. When we have developed tests, we can factor analyze a series of test to see whether they conform to the expected pattern or relations. This is, of course, relevant for construct validation. We expect to see that test that purport to measure the same construct should load on the same factor, and that different factors should emerge for different constructs.

Lots of people have factor analyzed MTMM matrices. A matrix that conforms to the Campbell and Fiske criteria will show factors that correspond to traits. Method variance will show up as method factors. Messy factors correspond to other measurement problems.