SEM with categorical variable/parceling/how to enter in AMOS?

#1
I need to do an SEM and I'm stumped about what to do with one of my variables, which is a nominal categorical variable (previous experience in x vs. no previous experience in x). It is the only observed variable for the latent variable it is attached to. The rest of the variables in my model are measured with continuous indicators and I am parceling some of them to create multiple observed variables that equally represent the underlying concept. What is the best way to code the nominal variable in my model? I have scoured my SEM books and I can't find anything on this. There are some mentions of dummy coding, but there is no guidance on how you actually do this. I am using AMOS. Also, I can't figure out how I could possibly parcel the variance for the indicator since there is only one item and you can't do a CFA on one item. Basically, I am stumped and would be extremely grateful if anyone could provide some clarification.
 

Lazar

Phineas Packard
#3
Is it a predictor or outcome variable? if predictor, which I suspect it is, then you don't do anything to it. Just have it predict the variables you want it to. I think AMOS scales latent variables to have an intercept of zero so the effect of experience in x on latent variable A is the difference between the no experience group (coded 0 or whatever) and the experience group (coded 1 or whatever).
 
#4
Oh, thank you so much for replying! It is a predictor variable. So I don't need to dummy code it, then? Does it limit the analysis in any way to have a categorical variable like that (it is the only predictor of the latent variable)? Would I need to use a different estimation method because of it?
 

Lazar

Phineas Packard
#5
1. If the variable only has two level it is dummy coded.
2. Nope. Different story if it was an outcome variable
3. Nope....all assumption checks should be checked of course.
 
#6
I need to do an SEM and I'm stumped about what to do with one of my variables, which is a nominal categorical variable (previous experience in x vs. no previous experience in x). It is the only observed variable for the latent variable it is attached to. The rest of the variables in my model are measured with continuous indicators and I am parceling some of them to create multiple observed variables that equally represent the underlying concept. What is the best way to code the nominal variable in my model? I have scoured my SEM books and I can't find anything on this. There are some mentions of dummy coding, but there is no guidance on how you actually do this. I am using AMOS. Also, I can't figure out how I could possibly parcel the variance for the indicator since there is only one item and you can't do a CFA on one item. Basically, I am stumped and would be extremely grateful if anyone could provide some clarification.
Hi,
in AMOS, there is one suitable option: use of Bayesian estimation (which is more precise when there are lowscale-level variables that can vary only 0/1 or 0/1/2),
this estimation should deal with your ditotomous variable.

M.
 
#7
Thank you for all of the help so far!

So you just select Bayesian estimation instead of using something like maximum likelihood? Would it affect the interpretation of the model--would I still get the same fit indices and everything? We had a very brief intro to Bayesian estimation in my SEM class and all I can remember is that it was confusing.

I just ran some tests and NONE of my variables are normally distributed, so I guess I can't use MLE anyway unless I run transformations on all of them? It sounds like it would be easier just to use Bayesian estimation?
 
#8
(They are not super non-normal...the statistics are .00 but skewness and kurtosis are all within + or -1 and the histograms and QQ plots look pretty normal too. I know basically no data in psych are normally distributed anyway...if they are within the skewness and kurtosis cutoffs should that still be ok to meet SEM assumptions? My sample is only about 300, so it's not as robust if assumptions are violated.)
 

Lazar

Phineas Packard
#9
There is no reason to use bayes in this case (I would not recommend it in any case unless you really know what you are doing). Stick with ML if you are worried about distribution then use a robust estimator like MLR or MLM.
 
#10
Okay, more questions that I hope someone might be able to answer. Your replies so far have helped a lot!

1) I need to parcel the variance of the indicators for four of my latent variables. I read through the literature I could find on parceling and I am a bit stumped about parceling methods. Is there a good algorithm to help with this? Four of the measures should be very unidimensional; one is probably not (it taps both depression and anxiety but they are not explicitly divided into separate subscales). Is there an optimal way to parcel these items? I found some scripts for parceling algorithms but they are all in SAS or mplus...I am using SPSS and AMOS for my analyses. Additionally, the indicators for some of the measures might be tricky to parcel. From what I have read, you should basically always create three parcels because of over and underidentification. But some of my scales would be hard to divide into 3 (two scales have 8 items each, one has 25, and one has only 4). What is the best way to parcel measures like this? Any advice you can offer would be very appreciated.

2) When trying to set up my model in AMOS, I came across a problem. I will include a picture of the model of my latent variables to help explain. AMOS will let me draw one-way arrows across the model, but it won't let me draw two-way arrows (covariance) between some of the latent predictor variables. Am I doing something wrong? Is there a way to add these? I wanted to include them because they are in the theoretical model that guided the development of my model.
http://tinypic.com/view.php?pic=33yqjro&s=8

3. Dummy coding...just want to make sure I'm understanding it correctly. People with no prior experience would be coded as 0 and people with prior experience would be coded as 1, right? Do I need to do anything else? I should only have one variable for the dummy code since I only have two groups, right?
 
Last edited:
#11
Okay, messed with my data some more...here's what I came up with.

I did EFAs on all of the measures I need to parcel.

Measure 1 is a scale of symptom severity. I came out with 2 factors: depression (16 items) and anxiety (9 items). A couple of items are really crossloaded (e.g. .423 on depression and .338 on anxiety). Should I go with these factors? Should I try to create domain-representative parcels -e.g. 3 parcels with each sampling a bit of the two factors? Should I try to create multiple parcels for each of the two factors?

All of the other measures are unidimensional and have one-factor solutions.

Measure 2 has 4 items, so I created 2 parcels that should each represent the construct equally.
Measure 3 has 6 items, so I created 3 parcels that should represent the construct equally.
Measures 4 and 5 have 8 items each, so for each scale, I created 2 parcels with 4 items each that should represent the construct equally.
To construct these parcels, I just rank-ordered the items based on how highly they loaded and then separated them so that the parcels would have a roughly equivalent loading on the factor. There wasn't any specific algorithm or formula i used to do this.

Finally, one of my latent variables is measured using a scale that has 7 subscales. There are 53 items total. This is probably going to sound stupid, but it's been 3 years since I took my SEM class and I keep getting CFA and SEM muddled in my head. When I set up the SEM in AMOS, should I be adding the total score on each of these subscales as the 7 indicators, or should I create a little box for each individual item and have them feed into another box for each of the 7 subscales? Same thing with the parcels--would I actually load any of the individual items into the SEM, or would I create a variable representing the total or average score for each parcel and load those in as the indicators? I hope that makes sense...

Thanks in advance.
 
Last edited:

Lazar

Phineas Packard
#12
Parcelling is a vexed issue. There is a recent debate on it in the SEM Journal I fall on the side of parcelling almost never being appropriate. If you are doing EFAs I would look into exploratory structural equation modelling (i.e. measurement structure defined by EFA with regression handled by SEM regression parameters) or Bayesian Structural Equation modelling for dealing with partial knowledge of the factor structure (i.e. relatively similar approach to ESEM but requires stronger knowledge of the underlying factor structure). For ESEM a recent review has been published in Annual Review of Clinical Psychology.

In terms of how to use AMOS I have never used it so I can not give technical advice but clearly you want to maintain latent variables in your SEM and not use total scores/average scores/manifest scores or regression parameters will (mot likely) be attenuated.
 
#13
I was only doing the EFAs because one of the articles on parceling suggested you should do it just to verify that the scales are unidimensional. They are all established scales with good evidence of validity and stability of factor structure based on previous research. The one scale with the crossloadings is the only one I am having a bit of a problem with because it isn't unidimensional. The SEM itself is confirmatory--I am testing two different models to see which one fits better. I had planned to parcel because one of my committee members suggested it and then after reading, it seemed like the parcels would be advantageous because each of my scales has so many individual items (so lots of parameters) and my sample size is fairly small (300).

If I don't use parcels and I'm not doing exploratory SEM, is there some other method I could use as an alternative to parceling to address the number of items I have? Or should I just try to use the items as is?
 
#16
Okay, so I ended up running my SEM only to get the message that my model is unidentified. It says in order to achieve identifiability, it will probably be necessary to impose 1 additional constraint. There are four different parameters that are marked as unidentified (one regression weight and three variances). I do have one of the paths to the indicators set to 1 for each of my latent variables. Three of the unidentified parameters are for the same latent/observed variable--the "previous experience" variable that I had to dummy code. I think the problem is that it is a latent variable with only one indicator (a single dichotomous item). Is there any way to fix this?
 

Lazar

Phineas Packard
#17
What is a latent variable with only one indicator? If it is the experience variable I doubt this is measured with much error so I would just treat it as manifest.
 
#18
Yes, yes, thank you! It is the experience variable. I removed the latent variable and just treated it as a manifest variable and the analysis was able to run.

Now I have another problem, though. One of the questions I had earlier was about my inability to add covariances in my more complicated model. Here is a picture of the structural model (http://tinypic.com/view.php?pic=33yqjro&s=8). I wanted to add covariances between attitudes, norms, and perceived behavioral control per the theory that the model is based on, but AMOS would not let me do so. Now, the second model I tested was a nested model, so it looks exactly the same but without the masculinity variable. When I entered that model into AMOS, I was able to add the covariances between those latent variables. The simpler model has much higher fit indices even though the regression weights indicate that the masculinity variable is a significant predictor of attitudes and norms. Also, the modification indices for the more complicated model (the one in the picture) actually indicate that I should add paths between attitudes, norms, and control anyway, so I think the lack of relationships between those variables is what's creating the poor fit for the more complex model. Any ideas why I can't add these covariances in AMOS, or any way that I could? Should I try drawing one-way paths amongst the three variables (but that implies a causal relationship and I only want to specify covariance)?

I found this thread that seems to be about the same topic, but it just confuses me more and it looks like the original poster never replied to Lazar to get any resolution for the problem. Does it mean that my model would be unidentified if I added the covariances, and that's why I can't? If so, why is my other model identified with the covariances?
http://www.talkstats.com/showthread.php/17255-AMOS-SEM-model-drawing-rules
 

Lazar

Phineas Packard
#19
I have not really used AMOS so I am not sure I can help here. I see no reason why you cannot add covariances (I mean the model in your link is not saturated and even if it was at the structural level it certainly is not at the measurement level). As such i suspect this is a AMOS specific question to which I have no answer other than try it in lavaan (a free package in the free R software). I certainly do not think you should add regression paths, you software should conform to your theory not your theory to the software!
 
#20
Found another thread about this issue....http://culist.semnet.narkive.com/PSAVMw8l/correlate-endogenous-variables-in-amos

It really makes it sound like the problem is conceptual/theoretical and the problem is that they are endogenous variables. But I don't understand how else you could show a relationship between them without adding the covariance.

Here's what I did--I changed up my model by moving masculinity to directly predict intentions. It makes theoretical sense because it's not totally clear whether masculinity predicts norms/attitudes or if they are just correlated. That allowed me to add in the covariances between the other variables in the model. Then when I ran my SEM the model fit was absolutely beautiful...but the model still fits better without the masculinity variable (the RMSEA is even below 5, which has never ever happened for me when doing any kind of psych stats before). Blegh! Basically it fits better with the simplest model that doesn't include any of the variables I added, which was the whole point of my dissertation. How disappointing. I don't suppose there is some kind of SEM penalty for complex models that I'm not understanding, is there?

Anyhow, thanks for your help!