Starting to understanding Factor Analysis. Yeah!

PCA1

New Member
#1
I have wanted to understand factor analysis for quite some time, though I have never actually gotten around to it-- until recently.
Part of the problem is that much of the web content on FA is not overly helpful; it concentrates more on how to code the answer
than providing actual insight into what factor analysis is. Recently I came across a great starting reference, that walks through
a simple 2 factor analysis. This pdf has greatly enhanced my understanding, however, this document did not complete the
analysis with the computation of actual scores for each individual's F1 and F2. If anyone might know of datasets that go to this
final step, then it would be appreciated if urls might be posted to this thread {I am especially interested in psychometric datasets
(perhaps involving IQ, or SAT FAs).}.

http://www.yorku.ca/ptryfos/f1400.pdf
 
Last edited:

PCA1

New Member
#2
Here's my due diligence.


I worked the example from the pdf in my previous post in R.
Here's the code.


% Entered Grades
% I used the data.frame entry because I wanted to do some programming later and
% R did not seem to like it when I tried to take sub-matrices using a "matrix" declaration
gradesB<- data.frame( c(3,6,5),c(7,3,3),c(10,9,8),c(3,9,7),c(10,6,5))

% Here is the call for the factor analysis
gradesB.fa.covar<- principal(t(gradesB), nfactors=2, rotate= 'none',covar=TRUE)

gradesB.fa.covar
Principal Components Analysis
Call: principal(r = t(gradesB), nfactors = 2, rotate = "none",
covar = TRUE)
Unstandardized loadings (pattern matrix) based upon covariance matrix
PC1 PC2 h2 u2 H2 U2
1 3.51 0.03 12.3 0.00016 1.00 1.3e-05
2 -0.15 2.50 6.3 0.01817 1.00 2.9e-03
3 0.14 1.94 3.8 0.03039 0.99 8.0e-03

PC1 PC2
SS loadings 12.34 10.01
Proportion Var 0.55 0.45
Cumulative Var 0.55 1.00
Proportion Explained 0.55 0.45
Cumulative Proportion 0.55 1.00

Standardized loadings (pattern matrix)
item PC1 PC2 h2 u2
1 1 1.00 0.01 1.00 1.3e-05
2 2 -0.06 1.00 1.00 2.9e-03
3 3 0.07 0.99 0.99 8.0e-03

PC1 PC2
SS loadings 1.01 1.98
Proportion Var 0.34 0.66
Cumulative Var 0.34 1.00
Cum. factor Var 0.34 1.00

Mean item complexity = 1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is 0.01
with the empirical chi square 0.01 with prob < NA

Fit based upon off diagonal values = 1
 

PCA1

New Member
#3
% I calculated the factorscores with the built in function.
% This result did not make a great deal of sense to me.
% The third student received marks of 10, 9 and 8.
% Why did they receive PC1 and PC2 scores that were lower than e.g, student 5 who received grades of 10,6, and 5?

factor.scores(t(gradesB),gradesB.fa.covar,Phi=NULL,method="Thurstone",rho=NULL,impute="none")
$scores
PC1 PC2
c.3..6..5. -4.1308097 -2.407048
c.7..3..3. -0.2103228 -5.367622
c.10..9..8. 2.1024528 -1.648465
c.3..9..7. -2.8073744 4.654129
c.10..6..5. 5.0460542 4.769006

$weights
PC1 PC2
[1,] 4.597787 3.595342
[2,] 8.247384 28.461893
[3,] -8.317986 -26.274719

$r.scores
PC1 PC2
PC1 1.000000 0.277824
PC2 0.277824 1.000000

$missing
c.3..6..5. c.7..3..3. c.10..9..8. c.3..9..7. c.10..6..5.
0 0 0 0 0

$R2
[1] NA NA


% I thought it would be neat to solve the linear equations myself.
% In the 2 factor model we have already calculated the factor weights.

% What I thought was why not solve the systems of equations for F1 and F2?
% Each students has 3 grades each of which express the F1 and F2 factor scores.
% So, I could there will be 3 equations in 2 variables.
% That means that I will have 3 estimates for the pairs of factor scores.

% Thus, Student 1 (Finance Mark) = b11 * F1 + b12 * F2 (1)
Student 1 ( Marketing Mark) = b21 * F1 + b22 * F2 (2)
Student 1 (Policy Mark) = b31 * F1 + b32 * F2 (3)

% The bijs have all been calculated in R.
% The only unknowns are F1 and F2 for each student.
% Yet, the F1 and F2 will be constant for each student.
% If F1 were to equal IQ, then a student would have available the same IQ for each subject.



% This is my code to solve the equations.
% I changed the -2 below to -1 and -3 in turn so that I could generate the complete
% output.

% The weights that I used were not the same as those found in the factor analysis (principal component posted above).
% The results from the pdf were duplicated when I went through a principal component factor analysis
% using the decomposition into AA' in R.
% There are a range of R functions that will do a factor analysis, I used "principal" above which might have been in error.
% I will explore fa, factnal ...


weights<- data.frame(c(3.137,0.24),c(-0.132,2.24),c(0.128,1.73))
Tweights <- t(weights)

for (i in 1:5) {
gradesC<- gradesB[-2,]
Tweights1<- Tweights[-2,]
aaa<-solve(Tweights1, gradesC[,i])
print (aaa)}


% This is the output.
% This result makes more sense to me than the official R output with the PCs.
% So, In the first set with 3rd row missing Student 1 had factor scores of 0.748 and 2.72.
In the second set with 2nd row missing Student 1 had factor scores of 0.739 and 2.84
In the third set with 1st row missing Student 1 had factor scores of 1.59 and 2.77
(not sure why in the last set Student 5 had the exact same scores).
% These scores can simply be calculated as the solutions to the linear equations for student 1.
% The Factor 2 scores for student 1 are fairly constant 2.72-2.84.
% It is the Factor 1 scores that jump around more.



York Example 3rd row missing

[1] 0.7480279 2.7226516
[1] 2.119412 1.464180
[1] 2.867440 4.186831
[1] 0.6460241 4.0559264
[1] 2.969444 2.853557



York Example 2nd row missing
[1] 0.7393969 2.8354666
[1] 2.110709 1.577936
[1] 2.850106 4.413403
[1] 0.6504469 3.9981172
[1] 2.983531 2.669427



York Example 1st row missing
[1] 1.591986 2.772385
[1] 2.970412 1.514328
[1] 4.562398 4.286713
[1] 0.2135591 4.0304419
[1] 1.591986 2.772385




The above was done for the unstandardized values,
here is the standardized grades as per the pdf.

gradestan<- data.frame(c(-1.14763,-0.26726,-0.34412),c(0.12751,-1.60356,-1.49117),c(1.08387,1.06904,
+ 1.37646),c(-1.14763,1.06904,0.80294),c(1.08387,-0.26726,-0.34412))
 
Last edited:

PCA1

New Member
#4
% Now I am including the standardized grades

gradestan<- data.frame(c(-1.14763,-0.26726,-0.34412),c(0.12751,-1.60356,-1.49117),c(1.08387,1.06904,
+ 1.37646),c(-1.14763,1.06904,0.80294),c(1.08387,-0.26726,-0.34412))
newgrades <- t(gradestan)
cor(newgrades)


% Here is the factor analysis call
gradestanew.fa<- principal(newgrades, nfactors=2, rotate= 'none')

gradestanew.fa


% These are the same results from the pdf

Principal Components Analysis
Call: principal(r = newgrades, nfactors = 2, rotate = "none")
Standardized loadings (pattern matrix) based upon correlation matrix
PC1 PC2 h2 u2 com
1 0.03 1.00 1.00 0.00009 1
2 0.99 -0.08 0.99 0.00506 1
3 1.00 0.05 0.99 0.00508 1

PC1 PC2
SS loadings 1.98 1.01
Proportion Var 0.66 0.34
Cumulative Var 0.66 1.00
Proportion Explained 0.66 0.34
Cumulative Proportion 0.66 1.00

Mean item complexity = 1
Test of the hypothesis that 2 components are sufficient.

The root mean square of the residuals (RMSR) is 0
with the empirical chi square 0 with prob < NA

Fit based upon off diagonal values = 1
> fa.diagram(gradestanew.fa)


% Calling for the factor scores. Will need to double check these with my linear equation solver.
% Though these do seem to make more sense. The third student scores highest in both factors
% which makes sense as this student had higher marks than any other student.

> gradestanew.fa$scores
PC1 PC2
c..1.14763...0.26726...0.34412. -0.2901439 -1.0138818
c.0.12751...1.60356...1.49117. -1.3883971 0.1610471
c.1.08387..1.06904..1.37646. 1.1132873 0.9464313
c..1.14763..1.06904..0.80294. 0.8253079 -1.0582317
c.1.08387...0.26726...0.34412. -0.2600541 0.9646351
 

PCA1

New Member
#5
That is great!
My solving of the linear equations gives similar answers to the answers in my above post.
Looks like standardizing the variables helped.
There is still quite a bit of jumping around of these factor scores according to which
pair of linear equations are solved.
For example, student 1 has a Factor 2 score of -1.137 with the second row removed, but -0.56 with the first row removed.



for (i in 1:5) {
+ gradesT1<- gradesT[-2,]
+ tweigh1<- tweigh[-2,]
+ aaa<-solve(tweigh1, gradesT1[,i])
+ print (aaa)}

Second Row Removed
[1] -0.2871693 -1.1390149
[1] -1.4997952 0.1725039
[1] 1.324253 1.044142
[1] 0.8616139 -1.1734784
[1] -0.3989119 1.0958374



Third row removed
[1] -0.3618202 -1.1367754
[1] -1.6055615 0.1756768
[1] 1.164601 1.048932
[1] 0.9847134 -1.1771714
[1] -0.1819331 1.0893280

First Row Removed
[1] -0.3157730 -0.5669405
[1] -1.5403212 0.9830247
[1] 1.263080 2.267609
[1] 0.9087815 -2.1168293
[1] -0.3157730 -0.5669405