# Thread: correct regression model for dependent variable

1. ## correct regression model for dependent variable

Hi all, I am new to this forum and I hope that the qestions that are bothering me can be answered together with your help.
I have data (N=220) regarding the topic adaptation to climate change. Adaptation (the DV) was operationalized through a set of possible adaptation measures. It was asked if the person has undertaken one or more of these measures in response to current and future environmental changes in the light of climate change. Thereupon a set of 20 possible measures plus space for 'others' is presented, which are all coded as dummy variables (0/1). My question now is: how can I build the DV? A first strategy is to create an additive Index out of the dummies (a1+a2+a3+...+a20, aothers) and treat this index as a metric variable for linear regression analyses.

Would you consider this as an appropriate procedure?
Alternatively, I could just run a logit regression, whereby teh DV is constructed as 'no adaptation'=0 against 'adapted' =1. However, running logi would mean a loss of information, which I'd like to avoid if possible.

I am thankful for your comments and arguments on that.
Best wishes,
Andy

2. ## Re: correct regression model for dependent variable

Originally Posted by andik
Hi all, I am new to this forum and I hope that the qestions that are bothering me can be answered together with your help.
I have data (N=220) regarding the topic adaptation to climate change. Adaptation (the DV) was operationalized through a set of possible adaptation measures. It was asked if the person has undertaken one or more of these measures in response to current and future environmental changes in the light of climate change. Thereupon a set of 20 possible measures plus space for 'others' is presented, which are all coded as dummy variables (0/1). My question now is: how can I build the DV? A first strategy is to create an additive Index out of the dummies (a1+a2+a3+...+a20, aothers) and treat this index as a metric variable for linear regression analyses.

Would you consider this as an appropriate procedure?
Alternatively, I could just run a logit regression, whereby teh DV is constructed as 'no adaptation'=0 against 'adapted' =1. However, running logi would mean a loss of information, which I'd like to avoid if possible.

I am thankful for your comments and arguments on that.
Best wishes,
Andy
Hi Andy,

If you have a measure comprised of 20 dichotomous items and you build an additive index out of them, then your response variable is most likely ordinal. In that case, you'd want an ordered logit regression. Note that since your data are discrete, not using linear regression won't result in a loss of information: there is no information about the magnitude of the difference between scores on the response scale to lose in the first place. Using an arbitrary rule (like 1 when sum(a) > 10, else 0) to run a simple logit regression would result in loss of information, though not necessarily so much as to reduce power, since your sample size is small and an ordered logit model for a 20-point response scale would be pretty cumbersome.

3. ## Re: correct regression model for dependent variable

Originally Posted by seanw
If you have a measure comprised of 20 dichotomous items and you build an additive index out of them, then your response variable is most likely ordinal.
We know a bit more than that: It is a count of the number of measures a respondent has taken. So unlike a truely ordered variable we now have a meaninglful distance between the values. So I would consider a linear regression a good first step, and possibly that is enough. If you want something more fancy, you might consider a binomial model that models the number of "successes" (measures) per 20 options, that would fit the structure of your variable better than an ordered logit.

As Sean W. I am usually sceptical about arbitratily breaking a variable up to making it binary. However, breaking your variable up at a cutvalue of 1 could in your case be meaningful. That would answer the question: have you taken any measure against climate change? Depending on the exact question you want to answer, that could definately be a viable option.

4. ## Re: correct regression model for dependent variable

First of all thank you for helping me solving this. I agree with both of you that applying arbitrarily rules to split a variable into 0/1 is questionable.
Recoding the adaptation index as <1=0 (no adaptation) and >=1 (took adaptive action) though would make sense as regards content.

So unlike a truely ordered variable we now have a meaninglful distance between the values.
That is how i thought to do it. I have to admit that there is a rest of doubt in my mind when I think of treating the additive index as a metric var. A colleague suggested to take the mean of the index and continue with that in a linear model. I am not sure whether that would be an asset though. Do you have any further arguments on that?

As I was unsure about the treatment of the DV adn thus the correct model, I estimated all: linear, ordered logit and a logstic regression (with the recoding of above). The linear model fits nicely so far and passes all post-regression diagnostics. The other both show similar results.

Another idea for the DV was to subdivide the index into 5 categories, ranging from 0=no adaptation; 1-4=A little adaptation; 5-8=moderate; and son on (note that the index highest measured value is 16). The reason behind it is to create categories which are concrete and tangible for the interpretation. But I am unsure if that would make sense and I would probably be more towards the ordered logit model. What do you think?

5. ## Re: correct regression model for dependent variable

Taking the mean won't help. That is just a linear transformation (you divide your count by 20), so any doubts you had on your count variable also apply to your mean variable. I would find the count easier to interpret, so if I had to choose between those two I would stick with the count.

by using 5 categories you are moving further away from the interpretable content of your variable, so I would not do that.

6. ## Re: correct regression model for dependent variable

Originally Posted by maartenbuis
Taking the mean won't help. That is just a linear transformation (you divide your count by 20), so any doubts you had on your count variable also apply to your mean variable. I would find the count easier to interpret, so if I had to choose between those two I would stick with the count.

by using 5 categories you are moving further away from the interpretable content of your variable, so I would not do that.
Thanks maartenbuis, I agree with your opinion on the mean variable. I cannot see the advantage of applying it.

You mentioned 'count variable'. Is this a true 'count variable' as per definition? I never dealt in any great details with count variables. I only know that count variables cannot be used as DV in a linear regression model. Wouldn't that call for a poisson regression? Note that the distribution of the index does not follow a poisson distribution though.

Edit: I just ran a poisson-regression with the index, and the results are....lets say - different. If avoidable, I'd like to use other models instead.

7. ## Re: correct regression model for dependent variable

Originally Posted by andik
You mentioned 'count variable'. Is this a true 'count variable' as per definition? I never dealt in any great details with count variables.
It is a true count, but not in the way Poisson expects counts to be. Poisson a count of events during a given period and/or in a given area, so in principle that count could be very very large. In your case your count is the number of events in 20 trials, so the maximum is given to be 20. You can think of this as a binomial experiment, hence my earlier recommendation to look at a binomial model.

Originally Posted by andik
I only know that count variables cannot be used as DV in a linear regression model.
That is too dogmatic. Models are by definition a simplification of reality, and simplifcation is just another word for "wrong in some useful way". As a consequence all models must be wrong, otherwise they cease to be a model. So, as long as a linear model provides a reasonable approximation of the depedent variable, it is useful and should be used. In your case we know that have to pay special attention to unreasonable predictions: we know that it is a count with a theoretical maximum of 20, so we have to look if the model gives predictions less than 0 or larger than 20. It is also a good idea to look at what happens near those extremes. Just do that, and make your decision accordingly.

8. ## The Following 2 Users Say Thank You to maartenbuis For This Useful Post:

andik (01-21-2015), Karabiner (01-21-2015)

9. ## Re: correct regression model for dependent variable

Originally Posted by maartenbuis
It is a true count, but not in the way Poisson expects counts to be. Poisson a count of events during a given period and/or in a given area, so in principle that count could be very very large. In your case your count is the number of events in 20 trials, so the maximum is given to be 20. You can think of this as a binomial experiment, hence my earlier recommendation to look at a binomial model.
As I mentioned I have no experience with binomial models. But I will dig into it, and will try to estimate a model in stata. Do you have any recommendations for readings on binomial models and/or its applicatioin in Stata?

In your case we know that have to pay special attention to unreasonable predictions: we know that it is a count with a theoretical maximum of 20, so we have to look if the model gives predictions less than 0 or larger than 20. It is also a good idea to look at what happens near those extremes. Just do that, and make your decision accordingly.
I am sorry I am afraid I cannot follow. Do you mean that we have to check whether one of the IVs show coefficiants <0 or >20?

10. ## Re: correct regression model for dependent variable

Hi,
we know that it is a count with a theoretical maximum of 20, so we have to look if the model gives predictions less than 0 or larger than 20. It is also a good idea to look at what happens near those extremes. Just do that, and make your decision accordingly.
I calculated the regression equation from the ouput of the linear regression model. The cons is roughly -9. The model's maximum prediction with the given regression equation is ~19.8. However, the maximum count occured is 16. How can that be?

11. ## Re: correct regression model for dependent variable

It is just a prediction, it need not be bound to the actual
values of your sample data. It was maartenbuis's advice
(if I understood him correctly) to judge whether the
predictions of your model are beyond reasonable boundaries.
At least, the maximum of the predicted values is <= 20
(the maximum of the actually possible values).

Just my 2pence

K.

12. ## The Following User Says Thank You to Karabiner For This Useful Post:

andik (02-12-2015)

13. ## Re: correct regression model for dependent variable

Thanks Karabiner for clarification, I got that point now.

14. ## Re: correct regression model for dependent variable

I am thinking about the meaning of the negative constant. Is the negative number th result of having a DV ranging from 0-16?

Kind regards,
A.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts