Interpreting Multiple Regression vs Correlation


I'm analysing the results to a satisfaction survey. I want to work out the key drivers of satisfaction so I'm using a question relating to overall satisfaction as my dependent variable and other satisfaction questions such as satisfaction with cost and product features as independent variables. In total I have about 11 independent variables (all with adequate sample sizes).

I run multiple linear regression in SPSS and remove independent variables from the regression until all remaining ones are significant (< 0.05) and I'm left with 3 variables, which from my understanding means these can be used to 'predict' overall satisfaction.

I have a strong suspicion that independent variables that were not significant during the regression are actually strong drivers of satisfaction. For example cost which was not significant, if customers were much less satisfied with cost (most would have been happy), then that would have a direct impact on overall satisfaction. Therefore I believe it is a driver of satisfaction and the regression is flawed?

I run bivariate correlation and all of the independent variables have fairly strong/strong correlations with the dependent variable and some of the strongest correlations are between independent variables that were not significant during the regression analysis and the dependent variable. I realise that this just means there is a linear relationship between the two.

However, I'm confused as to how I interpret these results and get to the bottom of what is driving satisfaction....

Help please?

You want to use some form multiple regression, not correlation, but your method of variable selection (known as "backward selection") while very widely used, is not good. (I co-wrote a paper on this titled "Stopping Stepwise: Why stepwise and similar variable selection methods are bad and what you should use"; it uses SAS, not SPSS, but the principles are the same).

But if your "satisfaction" is from a single question (which it seems to be from what you say) and if that question takes on a limited number of levels (I'm guessing it's from some kind of Likert-type question) then you probably don't want multiple *linear* regression, but ordinal logistic regression.

You should also (if you haven't already done so) plot the satisfaction variable against each of the 11 variables to look for odd patterns


Ambassador to the humans
@Peter - I absolutely hate stepwise procedures. I was wondering what you came to recommend in your paper - or was it just a warning against using stepwise procedures?
Thanks very much for getting back to me.

The survey consists of a number of different satisfaction questions i.e. what is your satisfaction with our prices? With one that looks at overall satisfaction with the company - we want to know what is driving this overall satisfaction.

Questions aren't traditional Likert scales, they're scoring satisfaction with different features on a 0-10 scale (10 being extremely satisfied, 0 being extremely dissatisfied).

Given that the difference between 0 and 1 and 1 and 2 and 2 and 3...etc are all the same, would you still suggest using ordinal logistic regression?
Also, my survey data has quite a few missing values for some questions, where not all respondents answered every question. I believe SPSS runs ordinal regression using listwise deletion of missing values which won't work in my case as I'd be left with almost 0 respondents in my sample size...

Is there a way around this?