# Thread: Multivariate linear regression analysis (multiple dependent variable, one independent

1. ## Multivariate linear regression analysis (multiple dependent variable, one independent

I am trying to determine the reason why(and how many) people with health insurance do not fully use all of its benefits(like free flu vaccines). I am using a sample of 400 people with age, income, education as dependent variables and having health insurance as independent variable. I glanced at the information in http://www-01.ibm.com/support/docvie...id=swg21476743 and followed the mentioned steps.

I got some results like

Multivariate Tests (Design: Intercept + haveinsure)

Effect Value F Hypothesis df Error df Sig.

Intercept Pillai's Trace .053 11.361(b) 3.000 470.000 .000

Wilks' Lambda .827 11.361(b) 3.000 470.000 .000

Hotelling's
Trace .069 11.361(b) 3.000 470.000 .000

Roy's Largest
Root .083 11.361(b) 3.000 470.000 .000

haveinsure Pillai's Trace .138 4.570 12.000 1420.000 .000

Wilks' Lambda .877 4.797 12.000 1248.086 .000

Hotelling's
Trace .151 4.998 12.000 1410.000 .000

Roy's Largest
Root .141 16.101(c) 4.000 473.000 .000

b - Exact statistic
c The statistic is an upper bound on F that yields a lower bound on the significance level

Tests of Between-Subjects Effects Tests

Source Dependent
Variable Type III df Mean F Sig.
Sum of Squares Square

Corrected Model age 37.546(a) 4 9.637 3.893 .004
education 10.619(b) 4 2.655 .477 .752
income 334.245(c) 4 84.061 16.766 .000

Intercept age 32.173 1 34.173 13.805 .000
education 141.268 1 143.268 25.752 .000
income 30.201 1 30.201 6.024 .014

haveinsure age 37.546 4 9.637 3.893 .004
education 10.619 4 2.655 .477 .752
income 335.245 4 84.061 16.766 .000

Error age 1171.320 474 2.475
education 2636.013 474 5.563
income 2375.494 474 5.014

Total age 3150.000 479
education 12315.000 479
income 6289.000 479

Corrected Total age 1210.866 478

education 2646.633 478

income 2711.739 478

a. R Squared = .032 (Adjusted R Squared = .024)
b. R Squared = .004 (Adjusted R Squared = -.004)
c. R Squared = .124 (Adjusted R Squared = .117)

Dependent Parameter B Std. t Sig. 95% Confidence Interval
Variable Error Lower Upper
Bound Bound

age Intercept 1 1.573 0.637 0.525 -2.092 4.092
[haveinsure=1] 1.173 1.576 0.745 0.456 -1.923 4.268
[haveinsure=2] 0.589 1.578 0.373 0.708 -2.514 3.693

education Intercept 4 2.358 1.697 0.091 -0.635 8.636
[haveinsure=1] 0.578 2.362 0.245 0.808 -4.063 5.219
[haveinsure=2] 0.388 2.367 0.164 0.87 -4.265 5.04

income Intercept 1 2.238 0.448 0.659 -3.4 5.4
[haveinsure=1] 2.289 2.242 1.021 0.309 -2.118 6.696
[haveinsure=2] 0.419 2.245 0.188 0.852 -3.999 4.837

1. Am I approaching the problem in a proper way? I mean am I doing the right analysis in SPSS?

2. Which method(Pillai's Trace, Wilks' Lambda, Hotelling's Trace, Roy's Largest Root) should be used for a case like mine?

3. Why is Type III Sum of Squares error 1171.320 for age, education and income?

4. I am new to Multivariate linear regression analysis. How can I interpret and learn more about the output SPSS generated?

Any suggestions would be appreciated.

Thanks

2. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Originally Posted by p_s
I am trying to determine the reason why(and how many) people with health insurance do not fully use all of its benefits(like free flu vaccines).
The most usual thing in this situation would be to think of "insurance" variable as a dependent variable and "age", "income" and "education" as explanatory variables, that is as independent variables. Then you would have a model as something like this one:

insurance = a +b1*age + b2*income +b3* education + error

That would be called a multiple regression model. (Skip the thoughts about multivariate models. That is an other thing.)

But if the insurance variable is a "have" or "do not have" insurance, the you will need to use a logistic = logit model:

log(p/(1-p)) = a +b1*age + b2*income +b3* education

Where p is the proportion having an insurance at the given value of the explanatory variables. Don't worry if it looks complicated. The computer takes care of it and estimates the b1, b2 and b3 and gives you significance test.

3. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

p_s (08-04-2014)

4. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo:

Originally Posted by GretaGarbo
The most usual thing in this situation would be to think of "insurance" variable as a dependent variable and "age", "income" and "education" as explanatory variables, that is as independent variables. Then you would have a model as something like this one:

insurance = a +b1*age + b2*income +b3* education + error

That would be called a multiple regression model. (Skip the thoughts about multivariate models. That is an other thing.)
I think insurance can be a dependent variable if I was trying to study how age, income and education influence if a person has insurance or not.

However, I am trying to determine the reason why(and how many) people with health insurance do not fully use all of its benefits(like free flu vaccines).
Originally Posted by GretaGarbo
But if the insurance variable is a "have" or "do not have" insurance, the you will need to use a logistic = logit model:

log(p/(1-p)) = a +b1*age + b2*income +b3* education

Where p is the proportion having an insurance at the given value of the explanatory variables. Don't worry if it looks complicated. The computer takes care of it and estimates the b1, b2 and b3 and gives you significance test.
Well, in a sample of 400 people, say 300 folks have insurance, so I am taking these 300 people and want to know which of these 300 do not use the free services like flu vaccines, routine health checkups and why? Is it because those folks are too young(is age a factor) to know benefits of flu vaccines, routine health checkups or these people do not know enough(lack of education) about the benefits or something else?

Which model should I use?

I appreciate your assistance and time.

5. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Then, the dependent variable is "use" or "use not" the insurance. That will be the dependent variable and "age", "income" and "education" are explanatory variables.

The sample size is 300, those who are insured. You can not know anything about those who are not insured, so they are not a part of the population you are interested of. So skip the 100 who are not insured.

6. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

p_s (08-08-2014)

7. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo:

Originally Posted by GretaGarbo
Then, the dependent variable is "use" or "use not" the insurance. That will be the dependent variable and "age", "income" and "education" are explanatory variables.

The sample size is 300, those who are insured. You can not know anything about those who are not insured, so they are not a part of the population you are interested of. So skip the 100 who are not insured.
So, should I use logistic regression http://www.ats.ucla.edu/stat/spss/dae/logit.htm and binary logistic in SPSS(Analyze->Regression->Binary Logistic regression) since the dependent is dichotomous(people use preventive care services or not)?

I tried understanding the output of how SPSS does it http://www.ats.ucla.edu/stat/spss/output/logistic.htm and I need to know more to interpret it correctly.

I appreciate your assistance and time.

8. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Originally Posted by p_s
So, should I use logistic regression
Yes, use logit.

9. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

p_s (08-08-2014)

10. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo,
Originally Posted by GretaGarbo
Yes, use logit.
1. In SPSS 22, can I chose Analyze->Regression->Binary Logistic, then chose utilize preventive services as dependent and age, income, education as covariates.

2. For methods, there are few like forward conditional, forward LR, forward Wald. Which are used for cases like mine?

3. Can selection variable be left blank?

4. How can a layman like me get a primer on this and how to do it in SPSS?

11. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Originally Posted by p_s

1. In SPSS 22, can I chose Analyze->Regression->Binary Logistic, then chose utilize preventive services as dependent and age, income, education as covariates.)
OK.
(Maybe you want to make one of the variables to a "categorical" variable, if e.g. "education" has different categories.

Originally Posted by p_s
2. For methods, there are few like forward conditional, forward LR, forward Wald. Which are used for cases like mine?
Just use the "Enter" method. The rest of them are crazy stepwise regression methods (the "forward" and "backwards" stuff.) Don't use that! Formulate you model. Estimate it. Think about the result and write down your thoughts about the results. Then possibly, reformulate the model (include or delete model terms) and re-estimate and think again.

Originally Posted by p_s
3. Can selection variable be left blank?
Yes.

Originally Posted by p_s
4. How can a layman like me get a primer on this and how to do it in SPSS?
I am not sure of what the English word primer means, but of you ask someone here to write a private guide for you, the answer is no! Otherwise, search the internet! And look in your own textbooks about regression and analysis of variance.

12. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

p_s (08-08-2014)

13. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo,

Originally Posted by GretaGarbo
OK.
(Maybe you want to make one of the variables to a "categorical" variable, if e.g. "education" has different categories.
1. Yes, education has different categories, how can that be separated?

2. Also, do I need to change contrast(default is Indicator) and reference category?

Originally Posted by GretaGarbo
Just use the "Enter" method. The rest of them are crazy stepwise regression methods (the "forward" and "backwards" stuff.) Don't use that! Formulate you model. Estimate it. Think about the result and write down your thoughts about the results. Then possibly, reformulate the model (include or delete model terms) and re-estimate and think again.
I expect people having more education and income will be using more of preventive health services(like free flu vaccines). I also anticipate older folks will be using more free routine physical check ups. I know gender also matters so changing this variable might affect the results. Is this is the proper way to think about a model, estimate and re-estimate it with some factors removed?

Originally Posted by GretaGarbo
I am not sure of what the English word primer means, but of you ask someone here to write a private guide for you, the answer is no! Otherwise, search the internet! And look in your own textbooks about regression and analysis of variance.
Sorry, I did not mean to be a freeloader. I realize all too well that public forums exist due to kind and knowledgeable volunteers like you. I was asking if there are some good web links which a layman like me can study to tackle the problem at hand. As you can notice, my background is not in statistics, but for this task I have to know this. Just as there are beginner level tutorials for calculus which explain the minimum, someone attempting to solve calculus must know, I thought there might be some web tutorials for logistic regression which someone can point me to. Searching led me to http://bama.ua.edu/~jhartman/689/mlr.ppt and http://www.nemoursresearch.org/open/...011/Class6.ppt which explain steps in linear regression in SPSS
and https://onlinecourses.science.psu.edu/stat501/node/86

14. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

UCLA has lots of "guides" for SPSS, Stata, and R. For instance: http://www.ats.ucla.edu/stat/spss/

15. ## The Following User Says Thank You to Phaedrus For This Useful Post:

p_s (08-08-2014)

16. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Originally Posted by p_s
1. Yes, education has different categories, how can that be separated?
Just declare it as a category variable in the category box.

Originally Posted by p_s
2. Also, do I need to change contrast(default is Indicator) and reference category?
No, you don't need to do that.

If you are not sure about the meaning or interpretation of the estimates, then you can make up some data (a very small data set) with with very clear pattern. For example with only a clear difference between gender. Experiment with such small fake data sets so that you understand the meaning of the parameter estimates.

17. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

p_s (08-08-2014)

18. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

As others have noted you can not have more than one dependent variable in regression (there are other methods like MANOVA and SEM that do this but not multivariate regression where the multi refers to the predictors not the predicted variable). If you want to know why a predictor is behaving as it is, as you suggested, you might model this separately. That is model what is causing variation in insurance as a separate analysis (there is a specialized form of regression called multilevel regression which supports this, but if you are new to regression that is a big step up in complexity).

If you use logistic regression remember to request the Odds Ratios. These are far more useful to interpret than the slopes in terms of the impact of the predictor (the slopes are difficult to interpret in term of the original predicted variable except for the sign and statistical signficance).

19. ## The Following User Says Thank You to noetsi For This Useful Post:

p_s (08-08-2014)

20. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Originally Posted by noetsi
multivariate regression where the multi refers to the predictors not the predicted variable
Well, I don't know what Noetsi mean by "multivariate" or "predictors" or "predicted variable".

But in my book multivariate regression is a situation where there are several dependent variables, like in the output in the first post, and one or several explanatory variables. And multiple regression is where there are several explanatory variables (also called independent variables) like "age", "income" and "education".

Thus, I suggest to use a logistic multiple regression with "use" or "not use" of insurance as dependent variable, and "age", "income" and "education" as explanatory variables.

Originally Posted by noetsi
If you want to know why a predictor is behaving as it is, as you suggested, you might model this separately.
As I understand it, there is a general agreement that it it advantageous to include all relevant explanatory variables in a multiple regression model. (Among other things to avoid "omitted-variable-bias".) And that it is not so good to do separate regressions for each explanatory variable and try to conclude something about the influence of each variable.

Originally Posted by noetsi
If you use logistic regression remember to request the Odds Ratios. These are far more useful to interpret than the slopes in terms of the impact of the predictor

21. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Multivariate regression is referenced in the title and commonly in the literature. As I explained this term refers not to the variable you are predicting [which is given various titles in the literature including response and dependent variable] but what you are predicting it with [called independent variables often although there are many terms used in the literature]. Because there are so many terms used for the same thing, I stuck with functional ones, showing what is being predicted [the Y on the left side of the equation] and what you are using to predict it with [the X on the right side of the equation].

I thought the author was, in additing to explaining the original dependent variable, also trying to explain one of the predicting [or independent] variables and suggested an approach to do so. But I misread what they said originally.

22. ## The Following User Says Thank You to noetsi For This Useful Post:

p_s (08-08-2014)

23. ## Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Frankly, I don't understand what Noetsi is saying and what he mean by "multivariate regression".

For those interested here is a link to one text and here is a common used textbook (page 388).

24. ## The Following User Says Thank You to GretaGarbo For This Useful Post:

JesperHP (08-14-2014)