# Multivariate linear regression analysis (multiple dependent variable, one independent

#### p_s

##### New Member
I am trying to determine the reason why(and how many) people with health insurance do not fully use all of its benefits(like free flu vaccines). I am using a sample of 400 people with age, income, education as dependent variables and having health insurance as independent variable. I glanced at the information in http://www-01.ibm.com/support/docview.wss?uid=swg21476743 and followed the mentioned steps.

I got some results like

Multivariate Tests (Design: Intercept + haveinsure)

Effect Value F Hypothesis df Error df Sig.

Intercept Pillai's Trace .053 11.361(b) 3.000 470.000 .000

Wilks' Lambda .827 11.361(b) 3.000 470.000 .000

Hotelling's
Trace .069 11.361(b) 3.000 470.000 .000

Roy's Largest
Root .083 11.361(b) 3.000 470.000 .000

haveinsure Pillai's Trace .138 4.570 12.000 1420.000 .000

Wilks' Lambda .877 4.797 12.000 1248.086 .000

Hotelling's
Trace .151 4.998 12.000 1410.000 .000

Roy's Largest
Root .141 16.101(c) 4.000 473.000 .000

b - Exact statistic
c The statistic is an upper bound on F that yields a lower bound on the significance level

Tests of Between-Subjects Effects Tests

Source Dependent
Variable Type III df Mean F Sig.
Sum of Squares Square

Corrected Model age 37.546(a) 4 9.637 3.893 .004
education 10.619(b) 4 2.655 .477 .752
income 334.245(c) 4 84.061 16.766 .000

Intercept age 32.173 1 34.173 13.805 .000
education 141.268 1 143.268 25.752 .000
income 30.201 1 30.201 6.024 .014

haveinsure age 37.546 4 9.637 3.893 .004
education 10.619 4 2.655 .477 .752
income 335.245 4 84.061 16.766 .000

Error age 1171.320 474 2.475
education 2636.013 474 5.563
income 2375.494 474 5.014

Total age 3150.000 479
education 12315.000 479
income 6289.000 479

Corrected Total age 1210.866 478

education 2646.633 478

income 2711.739 478

a. R Squared = .032 (Adjusted R Squared = .024)
b. R Squared = .004 (Adjusted R Squared = -.004)
c. R Squared = .124 (Adjusted R Squared = .117)

Dependent Parameter B Std. t Sig. 95% Confidence Interval
Variable Error Lower Upper
Bound Bound

age Intercept 1 1.573 0.637 0.525 -2.092 4.092
[haveinsure=1] 1.173 1.576 0.745 0.456 -1.923 4.268
[haveinsure=2] 0.589 1.578 0.373 0.708 -2.514 3.693

education Intercept 4 2.358 1.697 0.091 -0.635 8.636
[haveinsure=1] 0.578 2.362 0.245 0.808 -4.063 5.219
[haveinsure=2] 0.388 2.367 0.164 0.87 -4.265 5.04

income Intercept 1 2.238 0.448 0.659 -3.4 5.4
[haveinsure=1] 2.289 2.242 1.021 0.309 -2.118 6.696
[haveinsure=2] 0.419 2.245 0.188 0.852 -3.999 4.837

1. Am I approaching the problem in a proper way? I mean am I doing the right analysis in SPSS?

2. Which method(Pillai's Trace, Wilks' Lambda, Hotelling's Trace, Roy's Largest Root) should be used for a case like mine?

3. Why is Type III Sum of Squares error 1171.320 for age, education and income?

4. I am new to Multivariate linear regression analysis. How can I interpret and learn more about the output SPSS generated?

Any suggestions would be appreciated.

Thanks

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

I am trying to determine the reason why(and how many) people with health insurance do not fully use all of its benefits(like free flu vaccines).
The most usual thing in this situation would be to think of "insurance" variable as a dependent variable and "age", "income" and "education" as explanatory variables, that is as independent variables. Then you would have a model as something like this one:

insurance = a +b1*age + b2*income +b3* education + error

That would be called a multiple regression model. (Skip the thoughts about multivariate models. That is an other thing.)

But if the insurance variable is a "have" or "do not have" insurance, the you will need to use a logistic = logit model:

log(p/(1-p)) = a +b1*age + b2*income +b3* education

Where p is the proportion having an insurance at the given value of the explanatory variables. Don't worry if it looks complicated. The computer takes care of it and estimates the b1, b2 and b3 and gives you significance test.

#### p_s

##### New Member
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo:

The most usual thing in this situation would be to think of "insurance" variable as a dependent variable and "age", "income" and "education" as explanatory variables, that is as independent variables. Then you would have a model as something like this one:

insurance = a +b1*age + b2*income +b3* education + error

That would be called a multiple regression model. (Skip the thoughts about multivariate models. That is an other thing.)
I think insurance can be a dependent variable if I was trying to study how age, income and education influence if a person has insurance or not.

However, I am trying to determine the reason why(and how many) people with health insurance do not fully use all of its benefits(like free flu vaccines).
But if the insurance variable is a "have" or "do not have" insurance, the you will need to use a logistic = logit model:

log(p/(1-p)) = a +b1*age + b2*income +b3* education

Where p is the proportion having an insurance at the given value of the explanatory variables. Don't worry if it looks complicated. The computer takes care of it and estimates the b1, b2 and b3 and gives you significance test.
Well, in a sample of 400 people, say 300 folks have insurance, so I am taking these 300 people and want to know which of these 300 do not use the free services like flu vaccines, routine health checkups and why? Is it because those folks are too young(is age a factor) to know benefits of flu vaccines, routine health checkups or these people do not know enough(lack of education) about the benefits or something else?

Which model should I use?

I appreciate your assistance and time.

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Then, the dependent variable is "use" or "use not" the insurance. That will be the dependent variable and "age", "income" and "education" are explanatory variables.

The sample size is 300, those who are insured. You can not know anything about those who are not insured, so they are not a part of the population you are interested of. So skip the 100 who are not insured.

#### p_s

##### New Member
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo:

Then, the dependent variable is "use" or "use not" the insurance. That will be the dependent variable and "age", "income" and "education" are explanatory variables.

The sample size is 300, those who are insured. You can not know anything about those who are not insured, so they are not a part of the population you are interested of. So skip the 100 who are not insured.
So, should I use logistic regression http://www.ats.ucla.edu/stat/spss/dae/logit.htm and binary logistic in SPSS(Analyze->Regression->Binary Logistic regression) since the dependent is dichotomous(people use preventive care services or not)?

I tried understanding the output of how SPSS does it http://www.ats.ucla.edu/stat/spss/output/logistic.htm and I need to know more to interpret it correctly.

I appreciate your assistance and time.

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

So, should I use logistic regression
Yes, use logit.

#### p_s

##### New Member
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo,
Yes, use logit.
1. In SPSS 22, can I chose Analyze->Regression->Binary Logistic, then chose utilize preventive services as dependent and age, income, education as covariates.

2. For methods, there are few like forward conditional, forward LR, forward Wald. Which are used for cases like mine?

3. Can selection variable be left blank?

4. How can a layman like me get a primer on this and how to do it in SPSS?

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

1. In SPSS 22, can I chose Analyze->Regression->Binary Logistic, then chose utilize preventive services as dependent and age, income, education as covariates.)
OK.
(Maybe you want to make one of the variables to a "categorical" variable, if e.g. "education" has different categories.

2. For methods, there are few like forward conditional, forward LR, forward Wald. Which are used for cases like mine?
Just use the "Enter" method. The rest of them are crazy stepwise regression methods (the "forward" and "backwards" stuff.) Don't use that! Formulate you model. Estimate it. Think about the result and write down your thoughts about the results. Then possibly, reformulate the model (include or delete model terms) and re-estimate and think again.

3. Can selection variable be left blank?
Yes.

4. How can a layman like me get a primer on this and how to do it in SPSS?
I am not sure of what the English word primer means, but of you ask someone here to write a private guide for you, the answer is no! Otherwise, search the internet! And look in your own textbooks about regression and analysis of variance.

#### p_s

##### New Member
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo,

OK.
(Maybe you want to make one of the variables to a "categorical" variable, if e.g. "education" has different categories.
1. Yes, education has different categories, how can that be separated?

2. Also, do I need to change contrast(default is Indicator) and reference category?

Just use the "Enter" method. The rest of them are crazy stepwise regression methods (the "forward" and "backwards" stuff.) Don't use that! Formulate you model. Estimate it. Think about the result and write down your thoughts about the results. Then possibly, reformulate the model (include or delete model terms) and re-estimate and think again.
I expect people having more education and income will be using more of preventive health services(like free flu vaccines). I also anticipate older folks will be using more free routine physical check ups. I know gender also matters so changing this variable might affect the results. Is this is the proper way to think about a model, estimate and re-estimate it with some factors removed?

I am not sure of what the English word primer means, but of you ask someone here to write a private guide for you, the answer is no! Otherwise, search the internet! And look in your own textbooks about regression and analysis of variance.
Sorry, I did not mean to be a freeloader. I realize all too well that public forums exist due to kind and knowledgeable volunteers like you. I was asking if there are some good web links which a layman like me can study to tackle the problem at hand. As you can notice, my background is not in statistics, but for this task I have to know this. Just as there are beginner level tutorials for calculus which explain the minimum, someone attempting to solve calculus must know, I thought there might be some web tutorials for logistic regression which someone can point me to. Searching led me to http://bama.ua.edu/~jhartman/689/mlr.ppt and http://www.nemoursresearch.org/open/StatClass/January2011/Class6.ppt which explain steps in linear regression in SPSS
and https://onlinecourses.science.psu.edu/stat501/node/86

Last edited:

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

1. Yes, education has different categories, how can that be separated?
Just declare it as a category variable in the category box.

2. Also, do I need to change contrast(default is Indicator) and reference category?
No, you don't need to do that.

If you are not sure about the meaning or interpretation of the estimates, then you can make up some data (a very small data set) with with very clear pattern. For example with only a clear difference between gender. Experiment with such small fake data sets so that you understand the meaning of the parameter estimates.

#### noetsi

##### Fortran must die
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

As others have noted you can not have more than one dependent variable in regression (there are other methods like MANOVA and SEM that do this but not multivariate regression where the multi refers to the predictors not the predicted variable). If you want to know why a predictor is behaving as it is, as you suggested, you might model this separately. That is model what is causing variation in insurance as a separate analysis (there is a specialized form of regression called multilevel regression which supports this, but if you are new to regression that is a big step up in complexity).

If you use logistic regression remember to request the Odds Ratios. These are far more useful to interpret than the slopes in terms of the impact of the predictor (the slopes are difficult to interpret in term of the original predicted variable except for the sign and statistical signficance).

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

multivariate regression where the multi refers to the predictors not the predicted variable
Well, I don't know what Noetsi mean by "multivariate" or "predictors" or "predicted variable".

But in my book multivariate regression is a situation where there are several dependent variables, like in the output in the first post, and one or several explanatory variables. And multiple regression is where there are several explanatory variables (also called independent variables) like "age", "income" and "education".

Thus, I suggest to use a logistic multiple regression with "use" or "not use" of insurance as dependent variable, and "age", "income" and "education" as explanatory variables.

If you want to know why a predictor is behaving as it is, as you suggested, you might model this separately.
As I understand it, there is a general agreement that it it advantageous to include all relevant explanatory variables in a multiple regression model. (Among other things to avoid "omitted-variable-bias".) And that it is not so good to do separate regressions for each explanatory variable and try to conclude something about the influence of each variable.

If you use logistic regression remember to request the Odds Ratios. These are far more useful to interpret than the slopes in terms of the impact of the predictor

#### noetsi

##### Fortran must die
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Multivariate regression is referenced in the title and commonly in the literature. As I explained this term refers not to the variable you are predicting [which is given various titles in the literature including response and dependent variable] but what you are predicting it with [called independent variables often although there are many terms used in the literature]. Because there are so many terms used for the same thing, I stuck with functional ones, showing what is being predicted [the Y on the left side of the equation] and what you are using to predict it with [the X on the right side of the equation].

I thought the author was, in additing to explaining the original dependent variable, also trying to explain one of the predicting [or independent] variables and suggested an approach to do so. But I misread what they said originally.

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Frankly, I don't understand what Noetsi is saying and what he mean by "multivariate regression".

For those interested here is a link to one text and here is a common used textbook (page 388).

#### GretaGarbo

##### Human
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Did you read the title of the OP...

Multivariate regression simply means regression with more than one independent variable.

http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3049417/
Really?

The link Noetsi shows, isn't it terrible? In the title it says "Introduction to Multivariate Regression Analysis" but in the text it only talks about "multiple linear regression". Even the title is wrong! I really agree with the authors in their conclusion: "It is apparent to anyone who reads the medical literature today that some knowledge of biostatistics and epidemiology is a necessity." Yes, they could start by educating themselves!

That seems to be a published article. In "Hippokratia, quarterly medical journal". But this is TalkSTATS, not TalkMedicin, so I think it is better to take the definition from the statistical subject area and not medical area.

Hey, where is CoyboyBear? Here is another common misunderstanding!

- - -

I think the OP had misunderstood the model formulation and I believed that it was cleared up.

But if noetsi want to go on with this, then just go on! I give it up. It feels meaningless!

#### noetsi

##### Fortran must die
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

It's the National Institute of Health, one of the most important funders of research which is why I chose it

Multivariate regression = multiple linear regression (in this case, it could just as equally equal multiple logistic regression).

I agree the OP probably does not care and there is little point in arguing over this The one thing I would caution him, and a key point I meant to make in my first post, is that multivariate regression (or multiple linear regression) has multiple independent variables. Not as his first post suggested multiple dependent variables.

#### p_s

##### New Member
Re: Multivariate linear regression analysis (multiple dependent variable, one indepen

Thanks GretaGarbo,
Just declare it as a category variable in the category box.
1. Thanks, but how will that be used? Education levels are till 8th grade(coded as 1), some high school(coded as 1), GED(coded as 1), some college(coded as 3) etc.

2. In SPSS 22, I choose Analyze->Regression->Binary Logistic. Then, do I need to choose anything inside Options, Style, Bootstrap?

3. The values for dependent and independent variables are coded from a Likert scale of 1(Highly Disagree) to 5(Highly Agree). When I try to run the analysis, I get a warning message" The dependent variable has more than two non-missing values. For logistic regression, the dependent value must assume exactly two values on the cases being processed." Should I code the variables from 1(Highly Disagree), 2(Disagree) to 3(neutral) as 0(Disagree) and 4(Agree), 5(Highly Agree) as 1(Agree)?

If you are not sure about the meaning or interpretation of the estimates, then you can make up some data (a very small data set) with with very clear pattern. For example with only a clear difference between gender. Experiment with such small fake data sets so that you understand the meaning of the parameter estimates.
Thanks, I will set up a small data set and try this.