+ Reply to Thread
Page 1 of 2 1 2 LastLast
Results 1 to 15 of 25

Thread: Link function for proportional outcome

  1. #1
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Link function for proportional outcome




    I have a data set where the outcome variable is percent passing (ELA and Math tests) for school districts. I will use a 2 level multilevel model with various predictors/covariates at level one and two.

    The outcome variable is percent passing. Obviously the outcome is limited to between 0 and 1 and thus it is not sensible to assume normal distribution (the scores are likely normally distributed) but using a Gaussian link could result in predictions > 1 and < 0. A logit might make sense (binomial family) as this is used in logistic regression (0/1) but it seems wrong because I can take any value between 0 and 1.

    Poisson deals with count data. I don't have count.

    So what link function is appropriate here and why?

    If more details are needed I can furnish them.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  2. #2
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Link function for proportional outcome

    logistic regression works for general binomial data (n > 1) - you don't need to just have 0/1. Do you have the value for n for each observation?
    I don't have emotions and sometimes that makes me very sad.

  3. #3
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Link function for proportional outcome

    Dason you I didn't quite follow. I assume you're saying that I can treat it the same as if though the outcome were 1/0 and use the link function as binomial. This actually seems(ed) sensible but I had seen that using the binomial link was for 1/0 outcomes only. But maybe that was my misinterpretation.

    I have observational data. The lowest level I have is district level information on percent of students who passed. I also have aggregated demographic characteristics for each district. I have perecent passed but I also have the n for the school districts so pulling actual n out is doable:

    Code: 
    round(percent_passed * n) = n_passed
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  4. #4
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Link function for proportional outcome

    @vict I'd be inclined to agree except that assumption will give predicted values > 1 and < 0. This is not possible.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  5. #5
    TS Contributor
    Points: 22,389, Level: 93
    Level completed: 4%, Points required for next Level: 961
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Link function for proportional outcome

    how come there is no love here for beta regression??
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  6. The Following User Says Thank You to spunky For This Useful Post:

    trinker (04-18-2014)

  7. #6
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by trinker View Post
    Dason you I didn't quite follow. I assume you're saying that I can treat it the same as if though the outcome were 1/0 and use the link function as binomial. This actually seems(ed) sensible but I had seen that using the binomial link was for 1/0 outcomes only. But maybe that was my misinterpretation.
    Yeah that's your misinterpretation - this is fine for logistic regression. By the way it's a logit link (not a binomial link) with a binomial family. Basically you're saying conditioned on your covariates the response follows a binomial distribution. The logit link function is how you 'link' the covariates to the success probability - it's what models the form the of the relationship between x and p.

    I have observational data. The lowest level I have is district level information on percent of students who passed. I also have aggregated demographic characteristics for each district. I have perecent passed but I also have the n for the school districts so pulling actual n out is doable:

    Code: 
    round(percent_passed * n) = n_passed
    Yeah you can do logistic regression with that data.
    I don't have emotions and sometimes that makes me very sad.

  8. The Following User Says Thank You to Dason For This Useful Post:

    trinker (04-18-2014)

  9. #7
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Link function for proportional outcome

    @spunky I'll let the discussion go a bit before I decide but this seems to be exactly what I'm after. I also have to do this in HLM program as the requirement of my multilevel course is that I use this program. Do you know if this is available in HLM? I have never heard of it (which means next to nothing) so maybe it's not a commonly used link function yet?
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  10. #8
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by spunky View Post
    how come there is no love here for beta regression??
    It is more difficult and in the case where you actually have the counts it makes more sense to do something like logistic regression. There isn't really much motivation behind using beta regression in this type of case in my opinion. Plus logistic regression is hard enough for non-math people to interpret and understand but it's a lot easier to understand than beta regression (binomial distribution is pretty simple compared to the beta distribution...)
    I don't have emotions and sometimes that makes me very sad.

  11. The Following User Says Thank You to Dason For This Useful Post:

    trinker (04-18-2014)

  12. #9
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by trinker View Post
    I have never heard of it (which means next to nothing) so maybe it's not a commonly used link function yet?
    I think you have a misunderstanding when it comes to the link function. Beta regression is using the beta distribution as the response distribution (what we call the 'family' in glm) - this doesn't directly specify the link function. The link function is how you "link" the covariates to the mean of the response at those values of the covariates.
    I don't have emotions and sometimes that makes me very sad.

  13. The Following User Says Thank You to Dason For This Useful Post:

    trinker (04-18-2014)

  14. #10
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by Dason
    Yeah that's your misinterpretation - this is fine for logistic regression. By the way it's a logit link (not a binomial link) with a binomial family. Basically you're saying conditioned on your covariates the response follows a binomial distribution. The logit link function is how you 'link' the covariates to the success probability - it's what models the form the of the relationship between x and p.
    Thanks, for the help on using the correct language. Great explanation.

    Can I use the percent pass in with a logit link with the binomial family or are you saying use the n_passed (round(percent_passed * n) = n_passed). The n_passed makes less sense because I don't have actual data on individual students though I can make up ids for them arbitrarily and then assign pass fail based on round(percent_passed * n) = n_passed but I don't see what that buys me.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  15. #11
    TS Contributor
    Points: 22,389, Level: 93
    Level completed: 4%, Points required for next Level: 961
    spunky's Avatar
    Location
    vancouver, canada
    Posts
    2,135
    Thanks
    166
    Thanked 537 Times in 431 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by Dason View Post
    Plus logistic regression is hard enough for non-math people to interpret and understand but it's a lot easier to understand than beta regression (binomial distribution is pretty simple compared to the beta distribution...)
    this is *exactly* why beta regression needs to be used MORE often. it helps you leave people puzzled and unable to criticize your work. when faced with their own ignorance, they have little option but to think along the lines of "well, this seems complicated enough so it must be right".

    but you do have a point though. i assumed the emphasis was on the percentages and not on the counts themselves but if you have the counts then go for logistic regression.
    for all your psychometric needs! https://psychometroscar.wordpress.com/about/

  16. #12
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Link function for proportional outcome

    You don't need data for individual students. Did I say something that implied that you did? You need the total count and the total number of passed (the outcome from the 'binomial' experiment) but you don't need the outcomes for each student individually.
    I don't have emotions and sometimes that makes me very sad.

  17. #13
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by Dason
    I think you have a misunderstanding when it comes to the link function.
    Yes this is True. I think it's clearer now. I was thinking link actually transforms the 0/1 but it doesn't it works on the aggregated outcomes (which is percent passed failed). Is this correct?
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  18. #14
    ggplot2orBust
    Points: 71,220, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    User with most referrers
    trinker's Avatar
    Location
    Buffalo, NY
    Posts
    4,417
    Thanks
    1,811
    Thanked 928 Times in 809 Posts

    Re: Link function for proportional outcome

    Quote Originally Posted by Dason
    You don't need data for individual students. Did I say something that implied that you did? You need the total count and the total number of passed (the outcome from the 'binomial' experiment) but you don't need the outcomes for each student individually.
    No but my thinking is if I supply counts how will it know what the counts mean. Say I give it 900 students in district A passed and 1230 in District B passed. How will it (HLM program) know what those numbers mean without either individual data data (passed or not passed) or a way to say 900 out of 2000 students.

    I mean it's sensible you can do this with equations and figure it out that way but I have to give it a data file.
    "If you torture the data long enough it will eventually confess."
    -Ronald Harry Coase -

  19. #15
    Devorador de queso
    Points: 95,754, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,932
    Thanks
    307
    Thanked 2,629 Times in 2,245 Posts

    Re: Link function for proportional outcome


    Quote Originally Posted by trinker View Post
    Yes this is True. I think it's clearer now. I was thinking link actually transforms the 0/1 but it doesn't it works on the aggregated outcomes (which is percent passed failed). Is this correct?
    No - it doesn't do anything to the data itself. It models the relationship between the data and the mean. You don't transform the predictors.

    For logistic regression you're assuming that
    Y_i \sim Bin(n_i, p_i)

    which says that the response has a binomial distribution with parameters n_i (the number of observations/students observed for this response) and p_i (the success probability for each observation/student).

    That seems simple enough but the logistic regression part adds the assumption that we can additionally model the p_i as a function of the covariates. This is what allows us to think things like "the success probability increases as the covariates increase". How we actually 'link' the p_i with the covariates depends on ... you guessed it - the link function. For logistic regression we assume

    log(\frac{p_i}{1-p_i}) = \beta_0 + \beta_1x_i

    So we are saying that if we apply the link function to p_i we get a linear function with respect to the covariates. Notice we don't apply the link function to the covariates - we apply it to p_i.
    I don't have emotions and sometimes that makes me very sad.

  20. The Following User Says Thank You to Dason For This Useful Post:

    trinker (04-18-2014)

+ Reply to Thread
Page 1 of 2 1 2 LastLast

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats