+ Reply to Thread
Results 1 to 5 of 5

Thread: Converting a continuous variable into a categorical variable (low, medium, high)

  1. #1
    Points: 10, Level: 1
    Level completed: 19%, Points required for next Level: 40

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Converting a continuous variable into a categorical variable (low, medium, high)




    I have a continuous dependent variable (scores on an exam) and a number of predictor variables that I'd like to use in a regression analysis, such as gender, age, and income. I'd like to convert age and income into categorical variables: low, medium, and high. I'd like to do this using percentiles, but am not sure if I should use tertiles (lower 33% = low, middle 33% = medium, upper 33% = high) or if I should divide the data into lower 25% (low), middle 50% (medium), and upper 25% (high).

    Thoughts? Thank you!

  2. #2
    TS Contributor
    Points: 17,949, Level: 85
    Level completed: 20%, Points required for next Level: 401
    Karabiner's Avatar
    Location
    FC Schalke 04, Germany
    Posts
    2,561
    Thanks
    56
    Thanked 643 Times in 605 Posts

    Re: Converting a continuous variable into a categorical variable (low, medium, high)

    I'd like to convert age and income into categorical variables: low, medium, and high.
    Why? What for? You'll throw away statistical Information and you'll
    create artificial groups, which could be meaningless. Usually, the
    interval scaled variable is perfect as a predictor.
    I'd like to do this using percentiles, but am not sure if I should use tertiles (lower 33% = low, middle 33% = medium, upper 33% = high) or if I should divide the data into lower 25% (low), middle 50% (medium), and upper 25% (high).
    So you want to use sample data in order to define your groups.
    Your defintion will then be sample specific. To wehat could the results
    be generalized? The next study with the next sample, or the Population
    will have other 33% etc. limits.

    Moreover, if most participants are poor, or most of them have medium
    income, or most of them are wealthy, you'll define people as
    middle/medium/high who aren't.

    With kind regards

    K.
    »Jetzt kann mich der Führer mal am Arsch lecken.« (Ernst Kuzorra, 1941)

  3. #3
    Omega Contributor
    Points: 39,022, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,069
    Thanks
    402
    Thanked 1,192 Times in 1,153 Posts

    Re: Converting a continuous variable into a categorical variable (low, medium, high)

    As long as there is a linear-esque relationship between the continuous variable and the outcome, ideally you never categorize it. You get a loss of information, etc.


    The only exception I may think of is that you are not disseminating results and it is purely for inhouse use and your sample is pretty complete. But if you are looking to share your results, it can be difficult to generalize results to other samples or populations if the cut rules were developed just using your own sample set.
    Stop cowardice, ban guns!

  4. #4
    Points: 10, Level: 1
    Level completed: 19%, Points required for next Level: 40

    Posts
    2
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Re: Converting a continuous variable into a categorical variable (low, medium, high)

    Thanks for your advice. The main reason I wanted to categorize the data is that the incomes are estimated using zip codes/Census data, so they’re not perfect. I also found the B values from my regression to seem more meaningful, because rather than a fraction of a point increase in score per dollar income, there was a larger change in score by category. Please let me know if this changes your opinion or not.

    Also, if there wasn’t a linear relationship between the variables and it was desirable to convert the continuous variable into categories, would you use tertiles or 25-50-25?

    Thanks!

  5. #5
    Omega Contributor
    Points: 39,022, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,069
    Thanks
    402
    Thanked 1,192 Times in 1,153 Posts

    Re: Converting a continuous variable into a categorical variable (low, medium, high)


    If the variables are not naturally a continuous variable, then conversion may be more acceptable.


    The plotting of the relationship between the variables is import in understanding linearity. Options include scattergraphs, loess curves, and general additive models (splines). Tertiles may be dangerous to use in these situations, you want to first determine where changes in slopes occur (knots), and some times just simple piecewise regression or data transformations (logging or polynomials) are good choices. But this is given there isn't a monotonic relationship and not accounting for non-monotoncity would be inappropriate. If there is a linear relationship, moving slowly is fine, but if you have a sinewy or say quadratic going on, you need to address it.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats