Content Disclaimer Copyright @2020. All Rights Reserved. |

**Links : **Home
Index (Subjects)
Contact StatsToDo

Explanation Tables of Minimum Eigen Values Computer Program, R Code
Introduction
Technical Considerations
Example
Parallel Analysis is a procedure sometimes used to determine the number of Factors or Principal Components to retain in the initial stage of Exploratory Factor Analysis. This discussion assumes that the user understands Factor Analysis and the procedure of Principal Component extraction, and no details for these are provided here. A critical decision in Exploratory Factor Analysis is to determine how many Principal Components to retain, as each extraction produces decreasingly significant Factors. Retaining too few leads to a loss of information in the data, and retaining too many includes trivial and random information. Both of these produce misleading and unreproducible results. Traditionally, researchers depend on one or more of the following criteria to determine how many components to retain. - The most common, and the default criteria in most statistical packages is the K1 rule. Principal Components are retained while the Eigen Value (the Variance associated with the component), is >=1. This is based on the argument that, in a correlation matrix, each variable contribute a variance of 1, so a component that accounts for less than that has no meaning and should be discarded. The criticism to the K1 rule is that the Eigen values are inflated by random associations in the data, so that the use of the K1 rule often retains more components or factors than appropriate, particularly when the sample size is small.
- Another rule is using the Scree Test. The Eigen values are plotting against component number, and when the sharp decrease in Eigen values level off (the scree), the remaining components are abandoned. This is based on the arguments that the initial and significant components each extracts a large proportion of the variance from the correlation matrix, while the insignificant ones contain mostly data noise and so their Eigen values are similar. The criticism of using the Scree Test is that it depends on eye balling when there is no sharp transition where the scree begins. Thus researchers often disagree where the scree begins.
- Some researchers tried different retention levels, and match the results with the theoretical model of the data. This does produce neat outcomes, but runs the risk that, if the underlying theory is flawed to start with, then the results tend not to be reproducible.
Parallel Analysis takes a different approach, and is based on the Monte Carlo simulation. A data set of random numbers, but having the same sample size and number of variables as the user's research data, are subjected to analysis, and the Eigen values obtained are recorded. This is repeated many times (often between 50 and 100 iterations, and the tables later on this page used 1000 iterations). The mean and Standard Deviation of the replicated Eigen values for each component are then calculate, from which
the 95
Hayton JC, Allen DG and Scarpello V (2004)Factor Retention Decisions in Exploratory Factor Analysis: a Tutorial on Parallel Analysis. Organizational Research Methods 2004; 7; 191 Watkins MW (2006)Determining Parallel Analysis Criteria. Journal of Modern Applied Statistical Methods Vol. 5, No. 2, 344-346 Free program to do Parallel Analysis from someone else downloadable from WWW Ledesma RD (2007)Determining the Number of Factors to Retain in EFA: an easy-to-use computer program for carrying out Parallel Analysis. Practical Assessment, Research, and Evaluation Volume 12, Number 2, p. 1-11 Accesible on www. Word file with SPSS commands for Parallel Analysis Press WH, Flannery VP, Teukolsky SA, Vetterling WT (1989). Numerical Recipes in Pascal. Cambridge University Press IBSN 0-521-37516-9 p.395-396 and p.402-404. (computer program to extract Eigen Values and Vectors from a correlation matrix using the Jacobi algorithm). Monte Carlo simulation models the statistical process, but uses randomly generated numbers. The calculations are replicated numerous time to produce a standard based on random numbers, against which research data and results can be compared. For an adequate simulation, numerous iterations may be necessary. When the calculations involve iterations and matrix manipulations, the memory requirement and computing time may exceed that allowable from a server. For example, a parallel analysis for 40 variables, a sample size of more than 50, and attempting 50 replication, will exceed the 30 seconds of computing time usually allowable by a php server, causing the program to crash. Resources for Parallel Analysis on this page therefore exists in two forms. These are : - Tables of critical Eigen values for Factor Analysis of 4 to 100 variables, over a range of commonly used sample sizes, and obtained over 1000 replications of simulation for each.
- A small Javascript program to carry out a specific Parallel Analysis, if requirements are not met by the tables.
The algorithm used for resources on this page is from that published in Numerical Recipes for Pascal, translated to Javascript. Please note that, as with any Monte Carlo simulation, each run uses different sets of random numbers, so the results are closely similar but not the same. Users should therefore not be alarmed by minor differences in the output. Simulations on this page use computer generated random numbers that are normally distributed, with mean of 0 and SD of 1. Although this is usually acceptable, and the results differ little from any other methods of generating random numbers, it is different to the strict requirements of Parallel Analysis. - Strictly, each variable in the simulation should mimic the real data. Normally distributed random numbers should only be used for those variables that are normally distributed. For measurements such as Likert Scale, the random number should be integers randomised to values between 1 and 5 without a defined distribution. For binary variables (female/male, alive/dead, no/yes), the number should be randomised between 0 and 1.
- To comply with the strict requirements of Parallel Analysis therefore, the random dataset needs to be customised to the nature of each of the variables in the research data. To use normally distributed random numbers for all variables, such as that in the programs of this page, the user needs to accept the argument that the difference thus created will be trivial and unlikely to affect the decision on how many components to retain. Users should be aware that this argument is accepted by many, but not all, amongst those who are authoritative in these matters.
There are many papers discussing Parallel Analysis and some free software that are available on the www. I have put links to two downloadable software in the introduction panel for users who wish to have a desk top programs for this. I have neither downloaded nor evaluate these programs myself, so the user should evaluate these packages for himself/herself. The tables and programs on this page are for Principal Components only, where the diagonals of the correlation matrix are all 1s, and not Principal Factor Analysis, where the diagonals of the correlation matrix is replaced with estimates of communalities. The results are not appropriate for the latter model.
This section demonstrate how Parallel Analysis can be used instead of the K1 rule to determine the number of factors to retain. This consists of the following steps.
- Using the researcher's data, an initial Principal Component Analysis is carried out, but overriding the K1 rule, so that the number of components retained is the same as the number of variables. This will include the full array of Eigen values, usually in order of magnitude.
- Either looking up the tables on this page, or using the Javascript program, the standards (minimum Eigen values for component retention) are determined by Parallel Analysis.
- The Eigen values from the research data are compared with the Standards. The components to retained are those where the Eigen values from the research data exceeds both 1 (K1 rule), and the standard value from Parallel Analysis.
- The rest of the Factor Analysis is then calculated using only the number of components that are retained. If statistical packages such as SPSS or SAS is used, the program is configured to extract the appropriate number of components.
Example
The following example demonstrate the procedures. The data are the default example data from the FactorAnalysis.php. Please note that this is only an example using a small sample of computer generated random numbers, a Factor Analysis with 6 variables will need a sample size much greater than 25 cases to have any reproducibility at all.
We wish to conduct a Factor Analysis on a data set, which has 6 variables, and a sample size of 25. The correlation matrix for this set of data is as shown in the table to the left.
The original data or the correlation matrix can be initially analysed, to examine the Eigen values. In order of scale, they are 2.68, 1.31, 1.15, 0.45, 0.23, 0.19. These are shown (in red) in the plot to the right.
The parameter for this example are number of variables=6, and sample size=25, the same as that
from the research data.
The plot to the right demonstrate the various methods of decision making. - The Scree test is performed by eyeballing the Eigen values from the data (red line). It can be seen that the Eigen Values decrease rapidly in the first 4 Principal Components before they flattened out. According to the Scree Test therefore, 4 factors should be retained
- Comparing the Eigen values from the data (red line) against K1 rule of 1 (black horizontal line), 3 Eigen values exceeded the value of 1, According to the K1 rule therefore, 3 factors should be retained
- Comparing the Eigen values from the data (red line) against those estimated from Parallel Analysis (blue line), only the first Eigen value exceeded that from Parallel Analysis. Accordingly, only one factor should be retained.
Once the number of factors to be retained is determined (by whichever method chosen), The Factor Analysis can be performed again, with the number of factors to be retained specified
Explanation
4-9 Variables
10-20 Variables
25-50 Variables
60 Variables
70 Variables
80 Variables
90 Variables
100 Variables
The tables show critical Eigen values for retention of a factor. Each table provides values for
Factor Analysis with a particular number of variables. The columns are the sample size of the research data,
and the rows are factor numbers (in order of size). Each cell contains the critical minimum Eigen value, which the
Eigen value of that factor must exceed if it is to be retained. Where there is no value provided, the K1 rule
(Eigen value >=1.0) should be used.
These critical numbers are obtained using data that are normally distributed random numbers to create the
correlation matrix of the appropriate number of variables, and repeated 1000 iterations. The numbers represents the
95 The tables contains only those Eigen values that are >1, as factors with Eigen values of less than 1 are usually not retained.
Three parameters are necessary for Parallel Analysis, the number of variables involved, and the sample size of the research data
the results will be used to compare with. Finally the number of replications.
The number of iterations will determine the stability of the results. Generally, decisions are made using Eigen values to 2 decimal places, and to be stable at this level, about 500 replications are needed. Most of the results are background. The results used for comparison with Eigen values from research data are those in the last column, the 95 percentile values. There is no constraints on the size of the data, as most lap tops will have sufficient memory to cope with matrices of more than 100 variables. However, there are constraints related to time required for computation. The more commonly used browsers (Explorer, Firefox) imposes limits, either in the number of calculations, or amounts of time. When the limit is reached, the browser ask the user whether he/she wishes to continue. This means, with prolonged computation, the user cannot leave the computer to do other things, but needs to nurse the program to its conclusions. For example, the Firefox browser will pause and ask users whether to continue every 10 seconds of computing, and sometimes this becomes tedious when computation may require several minutes.
The R program takes 3 parameters
- nc is sample size and in this example is 25 cases or rows of data
- nv is the number of variables and in this example is 6
- ni is the number of iterations. In most cases this is between 50 and 1000, and in this example set at 100
^{2}. From these the mean and Standard Deviations are calculated.
The final results are the 95 percentile of these values, which can then be used as the standard to determine the number of factors to retain # parameters nc = 25 # nc = number of cases, sample size nv = 6 # nv = number of variabes ni = 100 # ni = number of iterations in Monte Carlo simulation # parallel analysis begin exVect <- rep(0, nv) # null vector for Σx exxVect <- rep(0, nv) # null vector for ΣxThe results are [1] 1.9605460 1.5193420 1.2037633 1.0112186 0.7737773 0.5794345Please note that the values will be slightly different for each calculation, as different random numbers are used. However the differences would be smaller if the number of iterations increases. |