Multiple Regression Help

#1
Hi, I'm currently working on an assignment in which I have to run a multiple regression, however I am a little confused regarding the order of the procedure.
I'll briefly outline the problem and what I have done so far:

The assignment deals with prediction of donating behaviour within a population. The predictors include age, gender, social integration, years of education, religion, and whether a serious life event has occurred in the person's life in the last 5 years such as illness of a family member.
So what I did was:

- Run a hierarchical regression with demographics such as age, gender and education in the first block and the other predictors in the second block.
- I found that the demographics were insignificant, the only significant predictors were religion and life event.
- Those were included in the final model.

I was wondering whether this is the correct course of action to take when doing a multiple regression? Along these lines, another question regards the procedure to follow with assumptions. For instance, I checked for outliers, and found one particularly large xy-outlier. When I removed it and reran the initial model, education became significant. When reporting this, can I disregard the initial model, in other words, could I have checked for assumptions on all the predictors, found outliers, removed them, and then used that as my final model?

Thanks in advance.
 

Blaz

New Member
#2
Hello!

First of all, I would like to point out that hierarchical regression, although very appealing, really only helps when one has a relatively sound theoretical framework in mind. Let me elaborate.

It seems that you don’t know what variables should be included in your “final model”, the question like that shouldn’t really be asked in the context of hierarchical analysis as you should include all the possible contributing factors according to a theory, no more and no less. If one includes additional variables one is at the risk of having too many factors controlled for, which may result in a loss of power. If one doesn’t include all the relevant variables, one is at the risk of omitting important explanatory sources, and the conclusions one draws from the analysis might be flawed. Just because some predictors were insignificant in any given analysis doesn’t mean they are irrelevant.

Secondly I would suggest running all of the diagnostics (including outlier analysis) prior to any model testing. The reason is that you want to avoid situations like yours. All uni- bi- and multi- variate outliers should be checked before any data analysis. One finds oneself often in a situation where “removing this one observation could radically transform the outcome”, which is why it is important to do this beforehand.

So my advice would be: start over. Identify any outliers or influential observations, build a model based on theoretically sound ground, and run the analysis with all the relevant variables, without removing any in the “final model”. This is the only way you can really come to any sort of conclusion using hierarchical regression.

Hope this helps!