Confirmatory Factor Analysis
A philosophical difference
In theory, exploratory and confirmatory factor analyses can be thought of as two ends of a spectrum. In exploratory analysis, one is trying to make sense of the data. How many factors are there? What are they? Exploratory analysis can be thought of as a technique for data reduction. Holy cow, we've got all these data, and now what the heck does all this stuff mean... In confirmatory analysis, one tests hypotheses corresponding to prior theoretical notions, which can include the number and nature of factors, but can include much more complex hypotheses, such as the equality of factor pattern matrices across populations.
In confirmatory analysis, we think we already know what the measures mean, and we want to test propositions, such as whether the factor structure of a job satisfaction measure is the same in English speaking Texas as it is in Spanish speaking Mexico.
In practice, what gets done is often somewhere in between the two ends of the spectrum. In exploratory analysis, we often have an idea of how many factors we expect. In confirmatory analysis, it is nearly inevitable that our data will falsify our models (that is, our model is not going to be exactly true in the population, and will be proved false if only the sample size is large enough).
A difference in loss functions
A loss function is a statistical term that explicitly defines goodness (or badness) of a solution to a problem. If we were to estimate the mean of a distribution, we might consider the absolute error of prediction (the larger the worse), the squared error of prediction (what you get taught in graduate school), some threshold of error within which we will call acceptable, but beyond which will be unacceptable without caring about differences (an inch is as good as a mile), functions where it is better to guess too high than too low, and many others.
The parameters in exploratory factor analysis (factor pattern, uniqueness) are typically estimated by least squares (ordinary least squares, OLS). Least squares is simple (for mathematicians) to work with and has some desirable properties. For example, you always get a solution to the problem. The solution has the property that it minimizes the sum of squared errors in the sample used to estimate the parameters. Parameter estimates are often unbiased.
The parameters in confirmatory factor analysis (factor pattern, factor correlation, uniqueness) are typically estimated by maximum likelihood (ML). Statisticians like ML estimates because they are consistent and efficient (they are asymptotically unbiased and converge more quickly to their population values than most other estimators if the hypothesized model is true). ML estimators allow for statistical tests of hypotheses that cannot be obtained with OLS. That is, with ML estimators, we can test the hypothesis that a specific twofactor solution could be true in a population. No such test is available with OLS. On the other hand, ML does not always provide a solution to the problem. Sometimes it wanders off and never comes back. Sometimes it estimates correlations to be 672.23. This can be something of a problem.
The identification problem
A solution is said to be identified when applying the loss function will result in a single, unique, best fitting solution. If more than one solution exists that fits just as well, the problem is said to be unidentified (or sometimes underidentified). In both exploratory and confirmatory analysis, the researcher (this means you) must decide on the number of factors to solve for. Then the program begins to estimate for you the elements of the factor pattern matrix. It turns out that in the case of two variables that share a single underlying cause (factor) the expected correlation is equal to the product of the factor loadings. Consider the following example:
Assume that the correlation between X1 and X2 is .56. A factor pattern in which the first factor loading is .7 and the second is .8 will solve the problem perfectly by either OLS or ML. But note that a solution in which the first factor loading is .8 and the second is .7 will also satisfy the problem perfectly. Because there is more than one good solution to the problem, the problem is said to be unidentified (that is, no single best solution can be produced). A fair amount of work is supposed to go into most confirmatory factor analyses to make sure that the problem is identified before using ML to estimate the parameters.
When to use which
Exploratory analysis is to my mind generally preferable to confirmatory analysis, but my view is somewhat controversial. In scale development, confirmatory analysis will show that your hypothesized model does not fit very well. A statistical test will virtually always reject the very model you hoped to confirm. Unfortunately, the confirmatory programs will offer you little help in producing a better representation of your data. Exploratory analysis, on the other hand, is supposed to help you make sense of the data, and this is typically where you are in scale development.
Confirmatory techniques work best when you have measures that have been carefully developed and have been subjected to (and survived) prior exploratory analyses.
In scale development, you need to worry about difficulty factors emerging in your data.
Pseudo factor approach
By difficulty
At random
Tests in confirmatory factor analysis
Global Model fit (chisquare, gfi, etc.)
Sample LISREL Analysis of the MTMM
Input for LISREL
Input Correlation Matrix

CRA 
CRG 
CRD 
SRA 
SRG 
SRD 
PTA 
PTG 
PTD 
CRA 
1 








CRG 
.41 
1 







CRD 
.44 
.45 
1 






SRA 
.28 
.21 
.24 
1 





SRG 
.20 
.30 
.23 
.44 
1 




SRD 
.19 
.19 
.29 
.42 
.40 
1 



PTA 
.29 
.22 
.25 
.33 
.22 
.21 
1 


PTG 
.24 
.35 
.26 
.25 
.39 
.22 
.48 
1 

PTG 
.20 
.20 
.31 
.22 
.21 
.26 
.42 
.46 
1 
The LX or L matrix

____Traits___ 
___Methods_ 


A 
G 
D 
CR 
SR 
PT 
CRA 
x 
0 
0 
x 
0 
0 
CRG 
0 
x 
0 
x 
0 
0 
CRD 
0 
0 
x 
x 
0 
0 
SRA 
x 
0 
0 
0 
x 
0 
SRG 
0 
x 
0 
0 
x 
0 
SRD 
0 
0 
x 
0 
x 
0 
PTA 
x 
0 
0 
0 
0 
x 
PTG 
0 
x 
0 
0 
0 
x 
PTD 
0 
0 
x 
0 
0 
x 
The input correlation matrix contains the data to be analyzed by LISREL. Strictly speaking, for the significance tests to be meaningful, the input matrix should be a covariance matrix rather than a correlation matrix. This particular matrix is an MTMM corresponding the path diagram for three kinds of evaluation of anger, guilt, and depression.
The next thing LISREL requires is what model to analyze, that is, the hypothesized constraints and parameters to estimate. The CFA can be programmed several ways. The best way to do this is using exogenous variables. Three matrices of parameters will be used.
LX or lambda X
is the factor pattern matrix. It shows the regression coefficients that relate the factors (shown in the columns) to the observed variables (shown in rows). In this example, we have 6 factors. The first three factors correspond to anger (A), guilt (G), and depression (D). The second three factors correspond to measurement methods, clinical rating (CR), self rating (SR), and psychological tests (PT). An x represents a parameter to be estimated. The other parameters are fixed at zero and not estimated. The pattern matrix can be modified to fit most any CFA hypotheses.
PH or F

A 
G 
D 
CR 
SR 
PT 
A 
1 





G 
x 
1 




D 
x 
x 
1 



CR 
0 
0 
0 
1 


SR 
0 
0 
0 
x 
1 

PT 
0 
0 
0 
x 
x 
1 
TD or q

CRA 
CRG 
CRD 
SRA 
SRG 
SRD 
PTA 
PTG 
PTD 
CRA 
x 








CRG 
0 
x 







CRD 
0 
0 
x 






SRA 
0 
0 
0 
x 





SRG 
0 
0 
0 
0 
x 




SRD 
0 
0 
0 
0 
0 
x 



PTA 
0 
0 
0 
0 
0 
0 
x 


PTG 
0 
0 
0 
0 
0 
0 
0 
x 

PTG 
0 
0 
0 
0 
0 
0 
0 
0 
x 
Reduced Correlation Matrix

CRA 
CRG 
CRD 
SRA 
SRG 
SRD 
PTA 
PTG 
PTD 
CRA 
.48 








CRG 
.41 
.50 







CRD 
.44 
.45 
.59 






SRA 
.28 
.21 
.24 
.54 





SRG 
.20 
.30 
.23 
.44 
.56 




SRD 
.19 
.19 
.29 
.42 
.40 
.44 



PTA 
.29 
.22 
.25 
.33 
.22 
.21 
.54 


PTG 
.24 
.35 
.26 
.25 
.39 
.22 
.48 
.70 

PTG 
.20 
.20 
.31 
.22 
.21 
.26 
.42 
.46 
.48 
PH or phi is the factor correlation matrix. It shows the correlations of the factors with one another. Note that in this example, the trait correlations are estimated, the method correlations are estimated, but traits and methods are uncorrelated because the relevant correlations are fixed at zero. To identify the scale of the factors, either one element in each column of LX must be fixed at 1.0, or the main diagonal of PH is fixed at 1.0. I use the second approach because it makes PH a correlation matrix instead of a covariance matrix, and for some unknown reason, tends to be subject to fewer estimation problems.
TD or thetadelta is the matrix of uniqueness. Note that the matrix is diagonal, that is, zeros appear everywhere except the main diagonal. The uniqueness is equal to 1 minus the communality.
The reduced correlation matrix is found by doing a little matrix multiplication, that is:
R  q = L F L ¢ .
This means that if we take the factor pattern matrix, multiply by the factor correlation matrix and multiply by the factor pattern matrix transposed, we get the correlation matrix with communalities on the main diagonal.
This matrix is used in OLS factor analysis, not in LISREL. It is shown here for educational reasons.
If we were to add the uniqueness matrix to the reduced correlation matrix we have an observed correlation matrix that contains unity in the main diagonal.
LISREL OUTPUT
LISREL first prints back the matrices implied by the model statement you supplied. You check this to be sure LISREL is doing what you want.
Next, LISREL prints an iteration history that shows how long it took to reach a solution, and whether it encountered estimation problems. Sometimes the ML solution fails to converge, and LISREL prints a warning. Then LISREL prints results. They will look something like this:
LISREL ESTIMATES (MAXIMUM LIKELIHOOD)
LAMBDA X

____Traits___ 
___Methods_ 


A 
G 
D 
CR 
SR 
PT 
CRA 
.30 
0 
0 
.60 
0 
0 
CRG 
0 
.32 
0 
.63 
0 
0 
CRD 
0 
0 
.41 
.65 
0 
0 
SRA 
.39 
0 
0 
0 
.62 
0 
SRG 
0 
.43 
0 
0 
.61 
0 
SRD 
0 
0 
.29 
0 
.60 
0 
PTA 
.42 
0 
0 
0 
0 
.60 
PTG 
0 
.47 
0 
0 
0 
.69 
PT D 
0 
0 
.34 
0 
0 
.60 
PHI

A 
G 
D 
CR 
SR 
PT 
A 
1 





G 
.35 
1 




D 
.41 
.31 
1 



CR 
0 
0 
0 
1 


SR 
0 
0 
0 
.43 
1 

PT 
0 
0 
0 
.45 
.44 
1 
THETA DELTA
CRA 
CRG 
CRD 
SRA 
SRG 
SRD 
PTA 
PTG 
PTD 
.55 
.50 
.41 
.46 
.44 
.56 
.46 
.30 
.52 
TOTAL COEFFICIENT OF DETERMINATION FOR X VARIABLES IS .992
MEASURES OF GOODNESS OF FIT FOR THE WHOLE MODEL:
CHISQUARE WITH 12 DEGREES OF FREEDOM IS 55 (PROB. LEVEL = .001)
GOODNESS OF FIT INDES IS .89
ADJUSTED GOODNESS OF FIT INDEX IS .85
ROOT MEAN SQUARE RESIDUAL IS .02
A Good Looking MTMM Matrix

CRA 
CRG 
CRD 
SRA 
SRG 
SRD 
PTA 
PTG 
PTD 
CRA 
1 








CRG 
.00 
1 







CRD 
.00 
.00 
1 






SRA 
.81 
.00 
.00 
1 





SRG 
.00 
.81 
.00 
.00 
1 




SRD 
.00 
.00 
.81 
.00 
.00 
1 



PTA 
.81 
.00 
.00 
.81 
.00 
.00 
1 


PTG 
.00 
.81 
.00 
.00 
.81 
.00 
.00 
1 

PTG 
.00 
.00 
.81 
.00 
.00 
.81 
.00 
.00 
1 
Note that this matrix is unidentified for reasons that are not obvious. However, I wanted you to see one set of parameters that might produce it.
The LX or L matrix

____Traits___ 
___Methods_ 


A 
G 
D 
CR 
SR 
PT 
CRA 
.9 
0 
0 
0 
0 
0 
CRG 
0 
.9 
0 
0 
0 
0 
CRD 
0 
0 
.9 
0 
0 
0 
SRA 
.9 
0 
0 
0 
0 
0 
SRG 
0 
.9 
0 
0 
0 
0 
SRD 
0 
0 
.9 
0 
0 
0 
PTA 
.9 
0 
0 
0 
0 
0 
PTG 
0 
.9 
0 
0 
0 
0 
PTD 
0 
0 
.9 
0 
0 
0 
PH or F

A 
G 
D 
CR 
SR 
PT 
A 
1 





G 
0 
1 




D 
0 
0 
1 



CR 
0 
0 
0 
1 


SR 
0 
0 
0 
0 
1 

PT 
0 
0 
0 
0 
0 
1 