CFA by 'hand'

Lazar

Phineas Packard
#1
[SOLVED] CFA by 'hand'

Ok so I am trying to do a cfa by hand. I am doing the tutorial found here

The central equation is the model implied covariance matrix [TEX]\Sigma = LL^T+E [/TEX] where L is a matrix of loadings and E is a matrix of errors. The idea is to use maximum liklihood to get estimates for L and E which minimise the descrepency between the implied and the observed covariance matrix. The descrepency function to be minimized is [TEX] F = log \left | \Sigma \right | + tr(S\Sigma{^{-1}}) - log\left | S \right | - p[/TEX] where sigma is the implied covariance matrix and S is the observed and p is the number of indicator items.

Ok so far in R I have got this far:
Code:
dat<- '.804 .399 .500 .367 .451 .510 .487 .833 .433 .283 .372 .377 .621 .529 .805 .339 .551 .543 .478 .362 .441 .733 .332 .341 .570 .461 .695 .438 .780 .556 .626 .455 .667 .438 .693 .825'

dat<- strsplit(dat, ' ')

dat<-as.numeric(unlist(dat))

covar1<- t(matrix(dat, nrow=6, ncol=6))#cov matrix for one factor CFA

L1<- rep(.7,6)#starting values for loadings
LF<- NULL# one factor model with latent variance constrained to 1 is
#an identity matrix so not needed in this case
E1<- diag(.3, 6, 6)#starting values for error


discrepency<- function(covar, L, E){
  #descrepency function 
  # I need to apply ML to this to find L and E
  sigma<- L%*%t(L)+E
  log(det(sigma))+
  sum(diag(covar%*%solve(sigma)))-
  log(det(covar))-
  nrow(covar)}
My questions are:
1. Have I correctly reproduced the descrepency function here. My initial results for the implied covariance matrix based on starting values is different from the tutorial but I think that is because they got it wrong.

2. How do I get the ML estimates for L and E. I have tried stats4:::mle from base but with no success.
 

Lazar

Phineas Packard
#3
From the chat box. The method I have been trying so far is:

Code:
stats4:::mle(discrepency, fixed= list(covar=covar1), start = list(L=L1, E=E1))
#see what the starts and fixed matrixes are above)

The error I get is:

Error in optim(start, f, method = method, hessian = TRUE, ...) :
(list) object cannot be coerced to type 'double'
 

bryangoodrich

Probably A Mammal
#4
The error if I understand is that it's taking a list object where a vector of type double is expected. After you get the error, run traceback and show that output. It'll show exactly the tree of calls made that led to the error.
 

Lazar

Phineas Packard
#5
Yep Bryan traceback indicates the problem is with optim and with the start values. The problem is that I dont know how to include start values and fixed values when they are matrices.
 

Lazar

Phineas Packard
#6
Much closer now:

Code:
discrepency<- function(covar, L1,L2,L3,L4,L5,L6, E1,E2,E3,E4,E5,E6){
  #descrepency function 
  # I need to apply ML to this to find L and E
  sigma<- c(L1,L2,L3,L4,L5,L6)%*%t(c(L1,L2,L3,L4,L5,L6))+diag(c(E1,E2,E3,E4,E5,E6))
  log(det(sigma))+
  sum(diag(covar%*%solve(sigma)))-
  log(det(covar))-
  nrow(covar)}
with:
Code:
stats4:::mle(discrepency, start = list(L1=.07, L2=.07, L3=.07, L4=.07, L5=.07,L6=.07,
                                       E1=.03,E2=.03,E3=.03,E4=.03,E5=.03,E6=.03), 
                                      fixed= list(covar=covar1), method="L-BFGS-B")
Gets me really close to the estimates from MPLUS but still a little way off. Need to work on this some more.

EDIT:
Outputs are
Code:
L1    L2    L3    L4    L5    L6    E1    E2    E3    E4    E5    E6 
0.708 0.574 0.793 0.514 0.772 0.782 0.302 0.503 0.177 0.469 0.185 0.213
Which is consistent with but still concerningly distant from MPLUS or AMOS estimates.
 

Lazar

Phineas Packard
#7
Grr I am an idiot. Coding error in data entry. Replace the covariance matrix script in the op with:

Code:
covar1<- matrix(dat, nrow=6, ncol=6, byrow=FALSE)
covar1<-t(covar1)
covar1<-as.matrix(forceSymmetric(covar1))
and it works perfectly!
 

Lazar

Phineas Packard
#8
A bit more coding and I now have all the major output produced by MPlus and whats more it is all consistent with mplus and amusingly alightly faster :) I want to clean up the code before posting here but the output now looks like:

Code:
> fit
$N.Obs
[1] 358

$Df
[1] 9

$Chi.square
[1] 21.73486

$sig
[1] 0.009758342

$RMSEA.Fit
[1] 0.063

$TLI
[1] 0.973

$CFI
[1] 0.99

$Estimates
   loadings    SE Residual    SE
L1    0.673 0.042    0.042 0.031
L2    0.549 0.046    0.046 0.043
L3    0.748 0.040    0.040 0.025
L4    0.476 0.044    0.044 0.040
L5    0.720 0.040    0.040 0.026
L6    0.742 0.041    0.041 0.027
 

Lazar

Phineas Packard
#9
Ok so I finished the CFA by hand (script below). However, I have an issue I was hoping someone could help solve. The stats4:::mle function gives an inverted hessian matrix of the estimated parameters so that you can calculate the SEs (by taking the square root of the variances/diagonals of that matrix). This gives SEs that are exactly in line with MPlus and AMOS, etc. BUT I have to multiple my estimates from the inverted hessian matrix by a constant (in this case .075) to get the correct SEs. Does anyone know why that constant and where I can work it out from?

Code:
#data cut and pasted from web
dat<- '.804 .399 .500 .367 .451 .510 .487 .833 .433 .283 .372 .377 .621 .529 .805 .339 .551 .543 .478 .362 .441 .733 .332 .341 .570 .461 .695 .438 .780 .556 .626 .455 .667 .438 .693 .825'
dat<- strsplit(dat, ' ')

dat<-as.numeric(unlist(dat))#cov matrix for one factor CFA

covar1<- matrix(dat, nrow=6, ncol=6, byrow=FALSE)
covar1<-t(covar1)
covar1<-as.matrix(forceSymmetric(covar1))


discrepency<- function(covar, L1,L2,L3,L4,L5,L6, E1,E2,E3,E4,E5,E6){
  #descrepency function 
  # I need to apply ML to this to find L and E
  #Every free and fixed parameter must be named. 
  #I have not figured out how to use lists or matrcies here yet.
  sigma<- c(L1,L2,L3,L4,L5,L6)%*%t(c(L1,L2,L3,L4,L5,L6))+diag(c(E1,E2,E3,E4,E5,E6))
  log(det(sigma))+
  sum(diag(covar%*%solve(sigma)))-
  log(det(covar))-
  nrow(covar)}

#The method used here requires the start values be either mins or maxes. Not ideal really
#but it is the quickest of the optimize methods on offer
out<-stats4:::mle(discrepency, start = list(L1=.07, L2=.07, L3=.07,
                                            L4=.07, L5=.07,L6=.07,
                                            E1=.03,E2=.03,E3=.03,
                                            E4=.03,E5=.03,E6=.03), 
                 fixed= list(covar=covar1), nobs=as.integer(358),
                 method="L-BFGS-B", control=list(maxit=10000))


#Not output is of type S4 hence @ rather than $
out1<-round(matrix(out@coef, 6, 2),3)

#SEs the same as MPlus and AMOS but the question is:
#Why do I need to multiple by this constant.075 to get correct SEs?
SE<- (diag(out@vcov)^.5)*.075
out1<- cbind(out1[,1],round(SE[1:6],3), out1[,2], round(SE[7:12],3))
colnames(out1)<- c('loadings', "SE", 'Residual', "SE")

#Chi-square value
N<-358
df<- 9
chi<-out@min*(N-1)
p.value<-1-pchisq(chi, df) 

#RMSEA value
RMSEA<-function(cs, df, N){
                nom <- (cs-df)^.5
                dom <- (df*(N-1))^.5
                RMSEA.est <- nom/dom
                RMSEA.est
                }

#Null model for the calculation of CFI and TLI
#Not only the variances are esitmated all other values are fixed
Null<-stats4:::mle(discrepency, start = list(E1=.03,E2=.03,E3=.03,E4=.03,E5=.03,E6=.03), 
                               fixed= list(L1=.00, L2=.0, L3=.0, L4=.0, L5=.0,L6=.0, covar=covar1),
                   method="L-BFGS-B", control=list(maxit=10000))

#Null Chi for CFI and TLI
Null.chi<-Null@min*(N-1)

#CFI and TLI calculations
TLI<- ((Null.chi/15)-(chi/12))/(Null.chi/15)
CFI<- ((Null.chi-15)-(chi-12))/(Null.chi-15)

#Output  in list format
fit<- list(N.Obs=N, Df=df, Chi.square=chi, sig=p.value, RMSEA.Fit=round(RMSEA(chi, df, N),3),
           TLI=round(TLI,3), CFI=round(CFI,3), Estimates=out1)
EDIT: Now cross posted on SEMNET