logistic regression works for general binomial data (n > 1) - you don't need to just have 0/1. Do you have the value for n for each observation?
I have a data set where the outcome variable is percent passing (ELA and Math tests) for school districts. I will use a 2 level multilevel model with various predictors/covariates at level one and two.
The outcome variable is percent passing. Obviously the outcome is limited to between 0 and 1 and thus it is not sensible to assume normal distribution (the scores are likely normally distributed) but using a Gaussian link could result in predictions > 1 and < 0. A logit might make sense (binomial family) as this is used in logistic regression (0/1) but it seems wrong because I can take any value between 0 and 1.
Poisson deals with count data. I don't have count.
So what link function is appropriate here and why?
If more details are needed I can furnish them.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
logistic regression works for general binomial data (n > 1) - you don't need to just have 0/1. Do you have the value for n for each observation?
I don't have emotions and sometimes that makes me very sad.
Dason you I didn't quite follow. I assume you're saying that I can treat it the same as if though the outcome were 1/0 and use the link function as binomial. This actually seems(ed) sensible but I had seen that using the binomial link was for 1/0 outcomes only. But maybe that was my misinterpretation.
I have observational data. The lowest level I have is district level information on percent of students who passed. I also have aggregated demographic characteristics for each district. I have perecent passed but I also have the n for the school districts so pulling actual n out is doable:
Code:round(percent_passed * n) = n_passed
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
@vict I'd be inclined to agree except that assumption will give predicted values > 1 and < 0. This is not possible.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
how come there is no love here for beta regression??
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
trinker (04-18-2014)
Yeah that's your misinterpretation - this is fine for logistic regression. By the way it's a logit link (not a binomial link) with a binomial family. Basically you're saying conditioned on your covariates the response follows a binomial distribution. The logit link function is how you 'link' the covariates to the success probability - it's what models the form the of the relationship between x and p.
Yeah you can do logistic regression with that data.I have observational data. The lowest level I have is district level information on percent of students who passed. I also have aggregated demographic characteristics for each district. I have perecent passed but I also have the n for the school districts so pulling actual n out is doable:
Code:round(percent_passed * n) = n_passed
I don't have emotions and sometimes that makes me very sad.
trinker (04-18-2014)
@spunky I'll let the discussion go a bit before I decide but this seems to be exactly what I'm after. I also have to do this in HLM program as the requirement of my multilevel course is that I use this program. Do you know if this is available in HLM? I have never heard of it (which means next to nothing) so maybe it's not a commonly used link function yet?
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
It is more difficult and in the case where you actually have the counts it makes more sense to do something like logistic regression. There isn't really much motivation behind using beta regression in this type of case in my opinion. Plus logistic regression is hard enough for non-math people to interpret and understand but it's a lot easier to understand than beta regression (binomial distribution is pretty simple compared to the beta distribution...)
I don't have emotions and sometimes that makes me very sad.
trinker (04-18-2014)
I think you have a misunderstanding when it comes to the link function. Beta regression is using the beta distribution as the response distribution (what we call the 'family' in glm) - this doesn't directly specify the link function. The link function is how you "link" the covariates to the mean of the response at those values of the covariates.
I don't have emotions and sometimes that makes me very sad.
trinker (04-18-2014)
Thanks, for the help on using the correct language. Great explanation.Originally Posted by Dason
Can I use the percent pass in with a logit link with the binomial family or are you saying use the n_passed (round(percent_passed * n) = n_passed). The n_passed makes less sense because I don't have actual data on individual students though I can make up ids for them arbitrarily and then assign pass fail based on round(percent_passed * n) = n_passed but I don't see what that buys me.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
this is *exactly* why beta regression needs to be used MORE often. it helps you leave people puzzled and unable to criticize your work. when faced with their own ignorance, they have little option but to think along the lines of "well, this seems complicated enough so it must be right".
but you do have a point though. i assumed the emphasis was on the percentages and not on the counts themselves but if you have the counts then go for logistic regression.
for all your psychometric needs! https://psychometroscar.wordpress.com/about/
You don't need data for individual students. Did I say something that implied that you did? You need the total count and the total number of passed (the outcome from the 'binomial' experiment) but you don't need the outcomes for each student individually.
I don't have emotions and sometimes that makes me very sad.
Yes this is True. I think it's clearer now. I was thinking link actually transforms the 0/1 but it doesn't it works on the aggregated outcomes (which is percent passed failed). Is this correct?Originally Posted by Dason
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
No but my thinking is if I supply counts how will it know what the counts mean. Say I give it 900 students in district A passed and 1230 in District B passed. How will it (HLM program) know what those numbers mean without either individual data data (passed or not passed) or a way to say 900 out of 2000 students.Originally Posted by Dason
I mean it's sensible you can do this with equations and figure it out that way but I have to give it a data file.
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
No - it doesn't do anything to the data itself. It models the relationship between the data and the mean. You don't transform the predictors.
For logistic regression you're assuming that
which says that the response has a binomial distribution with parameters (the number of observations/students observed for this response) and (the success probability for each observation/student).
That seems simple enough but the logistic regression part adds the assumption that we can additionally model the as a function of the covariates. This is what allows us to think things like "the success probability increases as the covariates increase". How we actually 'link' the with the covariates depends on ... you guessed it - the link function. For logistic regression we assume
So we are saying that if we apply the link function to we get a linear function with respect to the covariates. Notice we don't apply the link function to the covariates - we apply it to .
I don't have emotions and sometimes that makes me very sad.
trinker (04-18-2014)
Tweet |