# Thread: Query: Using independent variables in regression.

1. ## Query: Using independent variables in regression.

1. When putting in independent variables in regression model, for example "income" and my outcome is health status. I can use income as categorical or as continuous. So how do i decide? My variable (factor) of interest is not income. I am looking at association between sleep duration and health status. Thank you for the help.

2. ## Re: Query: Using independent variables in regression.

You should use income as a continuous / numerical variable for two reasons: 1.) If each income number in your data frame is treated as a level of a factor, you have a huge amount of different factors and this costs lots of degrees of freedoms (that means the proportion of data points to regression parameters is really bad), and 2.) you would not use all information available in this variable, such as order and proportions. So if you want to use income as a covariate (and that is what I understood) use it as a numerical predictor. And you can possibly even improve your model by centring this predictor.

3. ## The Following User Says Thank You to mmercker For This Useful Post:

Tilipa (12-29-2015)

4. ## Re: Query: Using independent variables in regression.

Is there an argument to be made that treating income as a categorical variable allows for non-linear effects on the dependent variable? I understand that you can do this with a continuous variable by adding a squared term, but maybe that assumes a certain functional form. What if the effect of income on health varies by the level of income? For example, what if increases in income have large effects on health for those with low income, small effects on health for those with high income, and no effects on health for those with income in between? (I'm not saying this is plausible, just asking the best way to account for this in a model.)

My inclination is to plot the data to get a sense of what the relationship looks like between income and health, and then choose a transformation of income that is consistent with that picture. Income is trickier than some variables because it tends to be highly skewed.

5. ## The Following User Says Thank You to ErikB For This Useful Post:

Tilipa (12-29-2015)

6. ## Re: Query: Using independent variables in regression.

Originally Posted by ErikB
Is there an argument to be made that treating income as a categorical variable allows for non-linear effects on the dependent variable?
Non-linearity implies that there is an order in the predictor variable, which is not the case in a categorical variable. You could form income classes, and each class could be represented by a factor level. And you could treat this factor as an ordinal variable...

But to model a nonlinear relationship the most natural way would be to treat income X as a continuous predictor, and then test different nonlinear terms (the classical way would be to test some polynimials and everything else that would make sense). In case of the income, some kind of saturation dependency would make sense. These most appropriate dependency can be selected e.g. by the significance of the corresponding predictors, or by the AIC value of the model.

Originally Posted by ErikB
My inclination is to plot the data to get a sense of what the relationship looks like between income and health, and then choose a transformation of income that is consistent with that picture
This is a good idea, look at a scatterplot and subsequently you have a gauge how to model this dependency

7. ## The Following User Says Thank You to mmercker For This Useful Post:

Tilipa (12-29-2015)

8. ## Re: Query: Using independent variables in regression.

Originally Posted by ErikB
Is there an argument to be made that treating income as a categorical variable allows for non-linear effects on the dependent variable?(...) Income is trickier than some variables because it tends to be highly skewed.
Income IMHO belongs to the variables for which a log-transformation
should routinely be considered. It is not useful in every study population,
of course, but in many. E.g. diminishing marginal utility of income is
a well-known phenomenon.

With kind regards

K.

9. ## The Following User Says Thank You to Karabiner For This Useful Post:

Tilipa (12-29-2015)

 Tweet