# Thread: Converting a continuous variable into a categorical variable (low, medium, high)

1. ## Converting a continuous variable into a categorical variable (low, medium, high)

I have a continuous dependent variable (scores on an exam) and a number of predictor variables that I'd like to use in a regression analysis, such as gender, age, and income. I'd like to convert age and income into categorical variables: low, medium, and high. I'd like to do this using percentiles, but am not sure if I should use tertiles (lower 33% = low, middle 33% = medium, upper 33% = high) or if I should divide the data into lower 25% (low), middle 50% (medium), and upper 25% (high).

Thoughts? Thank you!

2. ## Re: Converting a continuous variable into a categorical variable (low, medium, high)

I'd like to convert age and income into categorical variables: low, medium, and high.
Why? What for? You'll throw away statistical Information and you'll
create artificial groups, which could be meaningless. Usually, the
interval scaled variable is perfect as a predictor.
I'd like to do this using percentiles, but am not sure if I should use tertiles (lower 33% = low, middle 33% = medium, upper 33% = high) or if I should divide the data into lower 25% (low), middle 50% (medium), and upper 25% (high).
So you want to use sample data in order to define your groups.
Your defintion will then be sample specific. To wehat could the results
be generalized? The next study with the next sample, or the Population
will have other 33% etc. limits.

Moreover, if most participants are poor, or most of them have medium
income, or most of them are wealthy, you'll define people as
middle/medium/high who aren't.

With kind regards

K.

3. ## Re: Converting a continuous variable into a categorical variable (low, medium, high)

As long as there is a linear-esque relationship between the continuous variable and the outcome, ideally you never categorize it. You get a loss of information, etc.

The only exception I may think of is that you are not disseminating results and it is purely for inhouse use and your sample is pretty complete. But if you are looking to share your results, it can be difficult to generalize results to other samples or populations if the cut rules were developed just using your own sample set.

4. ## Re: Converting a continuous variable into a categorical variable (low, medium, high)

Thanks for your advice. The main reason I wanted to categorize the data is that the incomes are estimated using zip codes/Census data, so they’re not perfect. I also found the B values from my regression to seem more meaningful, because rather than a fraction of a point increase in score per dollar income, there was a larger change in score by category. Please let me know if this changes your opinion or not.

Also, if there wasn’t a linear relationship between the variables and it was desirable to convert the continuous variable into categories, would you use tertiles or 25-50-25?

Thanks!

5. ## Re: Converting a continuous variable into a categorical variable (low, medium, high)

If the variables are not naturally a continuous variable, then conversion may be more acceptable.

The plotting of the relationship between the variables is import in understanding linearity. Options include scattergraphs, loess curves, and general additive models (splines). Tertiles may be dangerous to use in these situations, you want to first determine where changes in slopes occur (knots), and some times just simple piecewise regression or data transformations (logging or polynomials) are good choices. But this is given there isn't a monotonic relationship and not accounting for non-monotoncity would be inappropriate. If there is a linear relationship, moving slowly is fine, but if you have a sinewy or say quadratic going on, you need to address it.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts