+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 16

Thread: Bayesian approach for logistic regression models? (the case of low-count cells)

  1. #1
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Bayesian approach for logistic regression models? (the case of low-count cells)




    so... now that i'm a PhD student my advisor wants me to help out people with their projects. as fate may have it, the same problem has arisen both in two situations: a friend of mine working on her dissertation and another one working on an applied research project: the case of low-count cells for logistic regression.

    apparently, this is a lot more common than i thought it was. the friend who's working on her dissertation is documenting the impact that it has on coefficient estimates, SEs, convergence, etc. when i asked her what kind of solutions were out there, she said 99% of people either dropped a category or merged it with another one. so no real help there.

    i reached out to my Stats Dept friends and two people independently mentioned that they've heard a Bayesian approach would probably work well, but they failed to elaborate on why (the gradstudent who's buff on Bayesian stuff's already on holidays).

    so i'm just wondering if anyone (and by anyone i mean Dason) has heard of a Bayesian solution to obtain better parameter estimates when there are low-count cells in logistic regression models. or, to be honest, i'm open to *any* approach that doesn't imply getting rid of data.

    (PS- has anyone noticed how EFFING ugly real data is? uuuugghhh!)
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  2. #2
    TS Contributor
    Points: 6,786, Level: 54
    Level completed: 18%, Points required for next Level: 164

    Location
    Sweden
    Posts
    524
    Thanks
    44
    Thanked 112 Times in 100 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Real data, what is that? This is real data, right?

    Code: 
    > x <- rnorm(10)
    > x
     [1]  0.44272972 -0.12391179 -0.05046462 -0.32750574 -0.01735622  0.40255914  0.30132656
     [8] -0.14345201 -1.76344836 -0.20450095

  3. #3
    R purist
    Points: 35,103, Level: 100
    Level completed: 0%, Points required for next Level: 0
    TheEcologist's Avatar
    Location
    United States
    Posts
    1,921
    Thanks
    303
    Thanked 607 Times in 341 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by spunky View Post

    so i'm just wondering if anyone (and by anyone i mean Dason) has heard of a Bayesian solution to obtain better parameter estimates when there are low-count cells in logistic regression models. or, to be honest, i'm open to *any* approach that doesn't imply getting rid of data.

    Hi! We are glad that you posted here! I would suggest checking out this thread for some guidelines on smart posting behavior that can help you get answers that are better much more quickly.

    For instance the third bullet of point 2 ...

    No but seriously, what kind of logistic regression models are we taking about? An experiment with X factors (sex, education level) and as response a binary (1,0) or count (# success, # fail) variable?

    If so, yes a Bayesian model would work. But I'm also interested in hearing which technique forced people to drop categories with low-counts?


    Here is a good resource with examples from the BUGS language that may have what you need:

    https://github.com/johnmyleswhite/JAGSExamples
    The true ideals of great philosophies always seem to get lost somewhere along the road..

  4. #4
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Gelman and friends have a paper about a "default" prior distribution for logistic regression models:

    http://www.stat.columbia.edu/~gelman...d/priors11.pdf

    When I have encountered this in the past (just one time) I obtained standard errors via bootstrapping and that seemed to work well.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  5. #5
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by Englund View Post
    Real data, what is that? This is real data, right?
    of course it is! actually, this is even MORE ral than that pesky data people go gather out there in that... that... how do they call it? oh yes, the "real world".
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  6. #6
    Human
    Points: 12,686, Level: 73
    Level completed: 59%, Points required for next Level: 164
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,363
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Is this a standard two level (0 or 1) logit model?

    How is it then possible to drop a category. Then it will only be one level left, and no variation.

    Quote Originally Posted by spunky View Post
    she said 99% of people either dropped a category or merged it with another one.


    ......

    (PS- has anyone noticed how EFFING ugly real data is? uuuugghhh!)
    Yes, and how interesting!

  7. #7
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by TheEcologist View Post
    Hi! We are glad that you posted here! I would suggest checking out this thread for some guidelines on smart posting behavior that can help you get answers that are better much more quickly.

    For instance the third bullet of point 2 ...
    how *DARE* you use the forum rules against me!? GUARDS!!! GUARDS!!! OFF WITH HIS HEAD!!!


    Quote Originally Posted by TheEcologist View Post
    No but seriously, what kind of logistic regression models are we taking about? An experiment with X factors (sex, education level) and as response a binary (1,0) or count (# success, # fail) variable?
    basically. the problem i have (borrowing from my friend who's doing the applied project). goes more or less like this.

    say for example that you're trying to model whether people reply 'yes' or 'no' to some variable that i think it's called 'suicidal intention'. so if you start by setting up a contingency table where maybe you have as predictor gender, you can have men/women and then yes/no to suicide ideation. the problem starts (for her) when she starts adding classification layers. like maybe you have the category 'has sought a mental health specialist', 'has considered seeking mental health specialist' and 'has not sought mental health specialist'. suddenly, when you keep classifying and sub-classisfying people (like % are men who have sought help and replied 'yes' to suicide ideation, %women who have sought help and replied 'yes' to suicide ideation), the counts on each cell start dropping more and more. now, logistic regression is better suited than this muliple multi-way contingency tables BUT she's finding that maybe a few thousands (this is a national database) concentrate on some categories whereas just a few dozens concentrate on others, so she can't get accurate regression coefficients. or she gets them but the SEs are HUGE. that's the problem of low-count cells in logistic regression. she just doesn't have a balanced-enough frequency table to get stable analyses.

    Quote Originally Posted by TheEcologist View Post
    If so, yes a Bayesian model would work. But I'm also interested in hearing which technique forced people to drop categories with low-counts?
    no real 'technique' is being implemented. they just lump together categories that end up having very few people in them. say, like in my previous example, that there are very few people one the 'has sought a mental health specialist' and on the 'considered seeking a mental health specialist'. so they merge those two into one bigger category associated with seeking help from a specialist.


    Quote Originally Posted by TheEcologist View Post
    Here is a good resource with examples from the BUGS language that may have what you need:

    https://github.com/johnmyleswhite/JAGSExamples
    cool! have you used this before?
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  8. #8
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by Jake View Post
    Gelman and friends have a paper about a "default" prior distribution for logistic regression models:

    http://www.stat.columbia.edu/~gelman...d/priors11.pdf

    When I have encountered this in the past (just one time) I obtained standard errors via bootstrapping and that seemed to work well.
    you mean when you've encountered the low-count cell problems? you didn't run into issues of having estimated regression coefficients that were humongous? so you just bootstrapped your logistic regression or did you do stuff to it before?

    and thanks for the link! it looks promising!
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  9. #9
    Devorador de queso
    Points: 95,995, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,938
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    But if you don't have much data... what is the problem with a large standard error? Surely that is to be expected - going bayesian hopefully doesn't miraculously fix that.

    Also that link appears to be be for JAGS - not *BUGS. Similar but not exactly the same.
    I don't have emotions and sometimes that makes me very sad.

  10. #10
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by GretaGarbo View Post
    Is this a standard two level (0 or 1) logit model?

    How is it then possible to drop a category. Then it will only be one level left, and no variation.
    i guess i was trying to imply the categories were on the predictors. the response variable stays at 0/1. i elaborated more on my previous reply to TE from the example my friend had.

    she's looking at people with suicidal ideation (so basically replying "yes" or "no" to a question on a questionnaire) and bunch of (mostly categorical) predictors like gender, socio economic status (SES), access to mental health services, etc. this is coming form a national database so her sample is HUGE (on the hundreds of thousands). the problem is that whenever she starts looking at sub categorizing people (like the proportion of suicidal ideation on men, of certain SES, with certain degree of access to mental health services, with certain this and certain that, etc.) she starts running into the problem of maybe there's only 10 or 12 people in some categories but there are thousands on the others. that's what's rendering a lot of her logistic regressions useless. but the response variable is still that "yes"/"no" answer to the suicidal ideation quesiton.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  11. #11
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by Dason View Post
    But if you don't have much data... what is the problem with a large standard error? Surely that is to be expected - going
    true. but what to do about the non-convergences and huge coefficients. it's so werid, whenever she runs into stuff like that the computer either pukes or just gives it an answer that seems non-sensical (and, usually, non-significant although that is not always the case)
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  12. #12
    R purist
    Points: 35,103, Level: 100
    Level completed: 0%, Points required for next Level: 0
    TheEcologist's Avatar
    Location
    United States
    Posts
    1,921
    Thanks
    303
    Thanked 607 Times in 341 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by Dason View Post
    Also that link appears to be be for JAGS - not *BUGS. Similar but not exactly the same.
    People tend to refer to JAGS as a dialect of BUGS (see e.g. the readme), BUGS is the language. Just as R is the dialect and S is the language. So I stick to my guns and say no, it's *BUGS.

    Quote Originally Posted by spunky View Post
    say for example that you're trying to model whether people reply 'yes' or 'no' to some variable that i think it's called 'suicidal intention'. so if you start by setting up a contingency table where maybe you have as predictor gender, you can have men/women and then yes/no to suicide ideation. the problem starts (for her) when she starts adding classification layers. like maybe you have the category 'has sought a mental health specialist', 'has considered seeking mental health specialist' and 'has not sought mental health specialist'. suddenly, when you keep classifying and sub-classisfying people (like % are men who have sought help and replied 'yes' to suicide ideation, %women who have sought help and replied 'yes' to suicide ideation), the counts on each cell start dropping more and more. now, logistic regression is better suited than this muliple multi-way contingency tables BUT she's finding that maybe a few thousands (this is a national database) concentrate on some categories whereas just a few dozens concentrate on others, so she can't get accurate regression coefficients. or she gets them but the SEs are HUGE. that's the problem of low-count cells in logistic regression. she just doesn't have a balanced-enough frequency table to get stable analyses.
    The width of the posterior distributions are also going to be large, for these low-information classes. You can't have 1 egg in your basket and make a 6 egg omelet. She will have to accept this. Strongly informative priors may solve some of this, but how is she going to validate this?


    Quote Originally Posted by spunky View Post
    no real 'technique' is being implemented. they just lump together categories that end up having very few people in them. say, like in my previous example, that there are very few people one the 'has sought a mental health specialist' and on the 'considered seeking a mental health specialist'. so they merge those two into one bigger category associated with seeking help from a specialist.
    There are techniques that use a hierarchical model approach, lending power to low sample estimates from hierarchically modelled relationships - for instance, in plant survival (binary as your example), you may be able to use plant growth as a prior for the survival estimates (slow growing plants will have higher mortality)... so you can use the data on growth to estimate survival for plant species that have small sample size. If she can think of such a scheme she can improve her estimates. It's not perfect, as amongst other effects, the low sample size coefficients tend to show shrinkage towards the mean.

    Thus, in the end, I have to say that there is no real cure for no data. When you have no data, you have no data, and the only real prescription is to get more data.

    Quote Originally Posted by spunky View Post
    cool! have you used this before?
    Yes.
    The true ideals of great philosophies always seem to get lost somewhere along the road..

  13. #13
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by spunky View Post
    you mean when you've encountered the low-count cell problems? you didn't run into issues of having estimated regression coefficients that were humongous? so you just bootstrapped your logistic regression or did you do stuff to it before?
    As I think back on the problem more, I think the situation if I remember correctly was a logistic regression with 2 categorical predictors where one of the cells had 0 successes. Not surprisingly glm() didn't like this at all, but a permutation test yielded very reasonable results. This has been a while though and it was not my data.
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

  14. #14
    TS Contributor
    Points: 22,472, Level: 93
    Level completed: 13%, Points required for next Level: 878
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,137
    Thanks
    166
    Thanked 538 Times in 432 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)

    Quote Originally Posted by Jake View Post
    but a permutation test yielded very reasonable results. This has been a while though and it was not my data.
    this actually sounds like a very reasonable alternative (and it hadn't even crossed my mind until you mentioned it, so thanks!).

    do you have the code for permutation test in logistic regression? or did you use any particular R package to do so? just by googling around i found about this 'glmperm' package which promises to do something similar to what you suggested.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  15. #15
    Cookie Scientist
    Points: 13,431, Level: 75
    Level completed: 46%, Points required for next Level: 219
    Jake's Avatar
    Location
    Austin, TX
    Posts
    1,293
    Thanks
    66
    Thanked 584 Times in 438 Posts

    Re: Bayesian approach for logistic regression models? (the case of low-count cells)


    We used some package, I know there are at least 2 packages that do this, and I can't remember which one it was (I don't still have the code).
    “In God we trust. All others must bring data.”
    ~W. Edwards Deming

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats