Content Disclaimer
Copyright @2014.
All Rights Reserved.
StatsToDo : Factor Analysis Explained

Links : Home Index (Subjects) Contact StatsToDo

Related link :
Factor Analysis Program Page
Factor Analysis - Principal Component Extraction Program Page
Factor Analysis - Factor Rotation Program Page
Factor Analysis - Produce Factor Scores Program Page
Factor Analysis - Parallel Analysis Explained, Tables, and Program Page

Introduction Single Program Additional programs R Codes Related Topics
This page describes the suite of explanations, tables, and programs related to Exploratory Factor Analysis that are available in StatsToDo.

Excellent statistical packages for Factor Analysis are widely available, in software packages such as SAS, STATA, SPSS, and LISREL. There are also excellent free packages available for download. (see references). All of these require users to set up options, then perform all the procedures in a single session.

StatsToDo provides tools for exploratory Factor Analysis in 3 different formats, described separately in 3 separate panels of this page

Exploratory Factor Analysis

StatsToDo presents only a simplified and cursory explanation for exploratory Factor Analysis, sufficient to help users of the programs here. Users looking for further information are referred to the references section.

Exploratory Factor Analysis has no a priori theory or hypothesis, and is sometimes call unsupervised clustering. The variables are clustered according to how they correlated with each other.

Exploratory Factor Analysis has two models

  • Principal Component analysisis is mainly used to reduce multiple measurements into fewer factors. It is carried out using the covariance or correlation matrix. It evaluates the relationship between the measurements and factors in the process.
  • Principal Factor Analysis is used mainly to evaluate the relationship between a set of multiple measurements. It uses the correlation matrix, but replaces the diagonal elements with the communalities, or the largest correlation coefficient for each column. In doing so, it produces a more precise estimate of the relationships between measurements and factors.
As results of Factor Analysis depends on the scalar values, StatsToDo follows common practices and uses the correlation matrix to produce Principal Components, and when data is presented, they are reduced to a similar scalar of standardized z values (z=(value-mean)/SD).

StatsToDo also uses the Principal Component Analysis model by default, as in most cases exploratory Factor Analysis is used to condense multiple measurements into fewer factors, and the relationship between measurements is a secondary consideration, to help interpreting what the resulting factors represent

Procedures and Options

The following steps are used for Factor Analysis in StatsToDo
  1. Data entry is in one of two formats
    • A matrix of correlation coefficients. A covariance matrix can also be used, but is not reccommended
    • A matrix of values, where the coulumns represents variable and rows cases. The program then converts the values to a correlation matrix, used for the rest of Factor Analysis
  2. Eigen Analysis, which produces an array of Eigen values in descending order of magnitude
  3. The complete matrix of Principal Components (factors), in the same order as the Eigen Values
  4. The decision on how many factors to retain for further analysis. One of the 3 following options are available in StatsToDo
    • The user may arbitraily determine the number of factors to retain.
    • The K1 rule, where a factor is retained if its Eigen value is >= 1. This is commonly used and is the default option in StatsToDo
    • Parallel Analysis. Multiple iterations (default=1000) of calculating the Eigen Values, using the same size data, but containing normally distributed random numbers. From these, the 95 percentile values of the Eigen Values are calculated. A factor is then retained if its Eigen value is >= the corresponding 95 percentile value
  5. Factor rotation. The retained factors are subjected to rotation, so that each variable loads predominantly to a factor. The following rotations are commonly used
    • Orthogonal rotation, where the resulting factors are not correlated to each other. The usual procedure is the Standardized Varimax Rotation
    • Oblique rotation, where the factors are allowed to be correlated, enhancing that each variable loads predominantly to a factor. The Oblimin rotation is usually used, as the correlations between the resulting factors are also calculated and presented. The Promax rotation is provided in the R code. It is said to run quicker for large matrices, but it does not estimate correlations between factors
    There are different approaches on which rotation to choose. Usually, the reasons for doing the Factor Analysis and the nature of the variables included determines whether the factors should be correlated (oblique) or uncorrelated (orthogonal).

    If there is no prior theoretical assumptions, then the results from oblique rotation (Oblimin in this case) should be initially adopted, as this provides the closest fit between variables and factors. However, if the oblique factors have no significant correlation with each other, then the results of the orthogonal rotation (Varimax in this case) should be adopted, as what each factor represents is much more clearly defined.

  6. Calculating factor scores. This requires 3 sets of data, the matrix of values with the same number of variables (usually the original data matrix), a two column matrix of means and Standard Deviations (SDs) from the original data set, and the rotated factor matrix
    • The rotated factor matrix is converted to the coefficient matrix.
    • Each value (v) in the data matrix is converted to standardized z value, where z = (v-mean)/SD
    • The factor score value is the product of each z and coefficient, summed across all variables

Technical Considerations

Results of Factor Analysis from different programs and platforms often produce similar but slightly different results. This is because much of the calculation is by iteration, so the results are approximate. Depending on the algorithm version, the initiation values and limits of iteration, results will be slightly different. Three most common discrepancies are discussed here.
  • Values from different programs may be different at the third or more decimal places, more so in the minor factors, and more so if the sample size is small. These differences can be accepted.
  • After rotation, the factors are often in different orders. Users should interpret final factors according to what the variable loadings indicate, and not in the order they appear in the results
  • The positive and negative values for each loading may be opposite results from different programs and procedures. However, the interpretation of each factor can be reversed by changing all the signs in a factor. For example, a factor representing hapiness become one for unhapiness if all the signs of the loadings are reversed. The thing to remeber is that, following Oblimin rotation, changing the sign of loading in a factor will also reverse the factor's correlation coefficients with all the other factors

References

Algorithms : It is difficult to find the algorithms for calculations associated with Factor Analysis, as most modern text book and technical manuals advise users to use one of the commercial packages. I have eventually found some useful algorithms in old text books, and they are as follows

    Press WH, Flannery VP, Teukolsky SA, Vetterling WT (1989). Numerical Recipes in Pascal. Cambridge University Press IBSN 0-521-37516-9 p.395-396 and p.402-404. Jacobi method for finding Eigen values and Eigen vectors

    Norusis MJ (1979) SPSS Statistical Algorithms Release 8. SPSS Inc Chicago

    • p. 86 for converting Eigen values and vectors to Principal Components
    • p. 91-93 for Varimax Rotation
    • p. 94-97 for Oblimin Rotation
    • p. 97-98 for Factor scores

Text books I learnt Factor Analysis some time ago, so all my text books are old, but they are adequate in explaining the basic concepts and provide the calculations used in these pages. Users should search for better and newer text books.

  • Thurston LL (1937) Multiple Factor Analysis. University of Chicago Press. I have not read this book, but this is quoted almost universally as it is the original Factor Analysis text book which set out the Principles and procedures
  • Gorsuch RL (1974) Factor Analysis. W. B. Saunders Company London 1974 ISBN 0-7216-4170-9 A standard text book teaching Factor Analysis at the Master's level. This is my copy and I believe there are later additions of this book available

Orthogonal Powered-Vector Factor Analysis
Overall JE and Klett CJ (1972) Applied Multivariate Analysis. McGraw Hill Series in Psychology. McGraw Hill Book Company New York. Library of Congress No. 73-14716407-047935-6 p.137-156

Sample Size
Mundfrom DJ, Shaw DG, Tian LK (2005) Minimum sample size recommendations for conducting factor analysis. International Journal of Testing 5:2:p 159-168

Free Factor Analysis software and its user manual can be downloaded from http://psico.fcep.urv.es/utilitats/factor/Download.html. This is a package for Windows written by Drs. Lorezo-Seva and Ferrendo from Universitat Rovira i Virgili in Terragona in Spain. The presentation and options are very similar to that from SPSS, and the manual is excellent. The best part is that it is free and yes it is in English.

Teaching and discussion papers on the www There is an enormous list of discussion papers, technical notes and tutorials on the www that can be easily found by Google search. The following is a small sample of this.