Do you have observations for each level of the combinations of C,X,S?
Any help with the following would be greatly appreciated.
I have a model of the form:
Y = \mu +C +X+S+e
where Y is continuous and C,X and S are categorical, each with 2,5,and 3 levels respctively. C,X and S are all variables with a fixed number of levels.
I want to be able to make comparisons between the levels of S, within each X within each C. There are varying numbers in each subcategory. The sample size is large so that should not be a problem.
Is this possible? Any suggestions?
I've tried a simple linear regression model, but this only allows comparisons between the levels of S for a randomly selected Y.
Thanks in advance!
L.
Do you have observations for each level of the combinations of C,X,S?
I don't have emotions and sometimes that makes me very sad.
Of course there is only one level of each S, X, and C for each Y. But for each unique combination of S, X, and C (there should be 30 unique combinations) is there a value for Y? And a better question - are there multiple observations for each of these unique combinations?
I don't have emotions and sometimes that makes me very sad.
First up - thank you for your help!
The answer to this is yes, there are observations for all combinations.
Yes, the dataset has over 10,000 observations, but there are not an equal number of observations for each of the combinations. Some combinations have a low proportion (<1%) of the dataset.
What software are you using for the analysis?
I don't have emotions and sometimes that makes me very sad.
i am using the R software
any one got any ideas, im not looking for a software specific answer, just a few ideas on the righ approach or options on the kinds of analysis that apply to this kind of problem.
any help appreciated
Just go ahead and estimate the model:
Y = \mu +C +X+S+e
You have lots of data and you will se how precise the results will be by printing the co-variance matrix for the estimates.
If you just specify the factors C, X and S as “factor” in R then the software will take of most of the rest.
(Remember that analysis of variance is a sort of regression analysis and many would do gladly run regression on much fewer and probably more unbalanced data than yours.)
Why can't you simply specify interaction terms? (Probably a really dumb question on my part....)
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Yes, of course it is natural to include the interaction effects.
(I was writing to fast and just copied loglikely:s (nice name) model.)
You can start with the main effects (the “C +X+S”) and then include interactions for those who have large main effects.
In the classical books by Box, Hunter, Hunter and Box Draper there is some material about empirical model building.
If you use an interaction term, and assuming they are signficant, you probably want to focus on the simple effects (the impact of X on Y at some level of S for example). Most software don't do this automatically, you have to ask for it (or get the system to calculate it which is not always a simple matter).
Opinions differ, but interpreting main effects when the interaction effects are signficant is tricky, particularly if you have disordinal interaction. In that case I would not even address main effects, I would stick to simple effects.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Thanks for your replies so far. I still need help with this problem.
I can add the interactions no problem. For example with the model
Y = \mu +C +X+S+C:X:S +e (using R syntax)
But I still dont see how this will allow me to compare the levels of S whilst holding the levels of C and X constant. Or to put it anopther way, I want to compare (see if they are significantly different) the levels of S for a given level of X within C.
Is this even possible?
With interaction you can not, as with regression normally, hold constant other independent variables at any level (that is the level they are held constant does not matter - commonly it is done at the grand mean of the other IV I believe although that is by convention). Instead you have to discuss the impact of one IV on the DV at specific levels of the IV it is interacting with. That is the nature of interaction, the impact of one IV on the DV differs at different levels of the other IV.
So you can do what you want at specific levels of the variable you are interacting with (aka simple effects). Before you do all this, run the interaction term and see if its signficant
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
I would include the two factor interactions (2fi) and the three-factor interaction (3fi).
Y = \mu +C +X+S+C:X + C:S + X:S + C:X:S +e
Then you look at the p-values if the main effects and interactions are significant.
Maybe you will need to combine them in linear contrast. Look back what “Jake” has written here about that.
Tweet |