PDA

View Full Version : Regression with multiple dependent variables - Help!!



cjcass
03-25-2012, 01:07 PM
Firstly, I’m not a statistician but have a project where I need to analyse survey data. Overall the survey will have 50 questions (variables). Out of the 50 questions I will be selecting 3 groups of 4 questions that will become the dependent variables. So when the pre-determined questions have been isolated I will be left with 38 predictor variables and 12 dependent variables, with the dependent variables being split into 3 dependent variable groups. I will need to analyse each group of dependent variables against the 38 predictor variables to see which predictor variables (targeting about 6) have the strongest influence on the dependent variable groups.

Having done some research on the web about this type of analysis, I’m thinking that for each group of dependent variables, I will need to do the following:

1. Carry out a Principal Components Analysis to reduce the quantity of predictor variables and then carry out a Multivariate Regression on each group of dependent variables?

or…

2. Carry out a Principal Components Regression on each group of dependent variables?

I have XLSTAT Pro 7.5 and also SPSS 19. My preference is to use XLSTAT as I would like to keep the whole analysis and survey, with it’s reports, in Excel. If there is a simpler method just using Excel on it’s own then even better.

Having looked at lots of tutorial videos and PDF’s online I’m still struggling to see how I can easily & quickly identify what the most influential predictor variables are, with minimal manipulation/intervention from me as I’m a statistical novice. Also the speed and ease of running this analysis (probably deploying some macro’s & formulas feeding the final survey report) is key as I will be running it several times for a variety of respondents.

To conclude, if you are able to help me or give me some solid direction, I guess my 2 real questions are:

1. Am I on the right lines regarding the 2 options above? What would you recommend?
2. Where/how does the regression output tell me what are the most influential predictor variables?

Any help on this subject would be massively appreciated as currently I seem to be getting nowhere fast!!
Many thanks in anticipation,
Chris

dgi
06-08-2012, 06:52 AM
Firstly, I’m not a statistician but have a project where I need to analyse survey data. Overall the survey will have 50 questions (variables). Out of the 50 questions I will be selecting 3 groups of 4 questions that will become the dependent variables. So when the pre-determined questions have been isolated I will be left with 38 predictor variables and 12 dependent variables, with the dependent variables being split into 3 dependent variable groups. I will need to analyse each group of dependent variables against the 38 predictor variables to see which predictor variables (targeting about 6) have the strongest influence on the dependent variable groups.

Having done some research on the web about this type of analysis, I’m thinking that for each group of dependent variables, I will need to do the following:

1. Carry out a Principal Components Analysis to reduce the quantity of predictor variables and then carry out a Multivariate Regression on each group of dependent variables?

or…

2. Carry out a Principal Components Regression on each group of dependent variables?

I have XLSTAT Pro 7.5 and also SPSS 19. My preference is to use XLSTAT as I would like to keep the whole analysis and survey, with it’s reports, in Excel. If there is a simpler method just using Excel on it’s own then even better.

Having looked at lots of tutorial videos and PDF’s online I’m still struggling to see how I can easily & quickly identify what the most influential predictor variables are, with minimal manipulation/intervention from me as I’m a statistical novice. Also the speed and ease of running this analysis (probably deploying some macro’s & formulas feeding the final survey report) is key as I will be running it several times for a variety of respondents.

To conclude, if you are able to help me or give me some solid direction, I guess my 2 real questions are:

1. Am I on the right lines regarding the 2 options above? What would you recommend?
2. Where/how does the regression output tell me what are the most influential predictor variables?

Any help on this subject would be massively appreciated as currently I seem to be getting nowhere fast!!
Many thanks in anticipation,
Chris

1. I am not a statistician either, but I think you are on the right track. However, the way I would do it is by Partial Least Squares Regression, using your 38 questions as predictor matrix and your relevant subgroup of variables as response. PLSR is very robust against multicollinearity problems, so in principle you don't need to reduce the predictors beforehand.

2. I would say three numerical indicators: 1) loadings weights, 2) size of the regression coefficients and 3) significance of the regression coefficients. Depending on which software you use (sorry, I don't know about XLSTAT), you should also get some informative plots that make visual inspection very easy (loadings plot, bi-plot, etc.).

noetsi
06-08-2012, 01:31 PM
Another possibility is to use structural equation models which allow multiple dependent variables. It (arguably if not entirely) is a specialized form of regression, but one that allows a much more complex form of analysis than normal regressions. It uses regression to analyze the paths between variables.