+ Reply to Thread
Results 1 to 15 of 15

Thread: Want to run zero inflated regression... but my dependent variable is continous

  1. #1
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Want to run zero inflated regression... but my dependent variable is continous




    So I have been investigating the best model to run for my data. I originally looked at the zero inflated Poisson regression, but looked at my variance and mean and realized my variance is far greater than the meaning signaling over dispersion. Therefore, I elected to run a zero inflated negative binomial model. However, the problem is my dependent variable is continuous not count. What are my options?

    I am analyzing what environmental factors predict distance moved in an animal. I have predictors such as rainfall, temp, cloud cover, etc. For most events the animal did not move, hence the zero inflated idea. But distance is not really a count. This may be a stupid question, but I researched the issue quite a bit before getting lost and deciding to post. Thanks for any input it's greatly appreciated as always.

  2. #2
    Human
    Points: 12,676, Level: 73
    Level completed: 57%, Points required for next Level: 174
    Awards:
    Master Tagger
    GretaGarbo's Avatar
    Posts
    1,362
    Thanks
    455
    Thanked 462 Times in 402 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    Maybe "zero inflated gamma" or "zero inflated lognormal" is worth searching for.

  3. The Following User Says Thank You to GretaGarbo For This Useful Post:

    zombie_kid (02-18-2014)

  4. #3
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    I agree

    I'll also throw this one out there b/c I saw it recently for 1st time and is really cool, if the dependent variable was bounded you might consider zero-one inflated beta, but I'm guessing distance traveled by your critters isn't bounded

    by the way, are you able to post a graph of the distribution of "distance"?

  5. The Following User Says Thank You to ted00 For This Useful Post:

    zombie_kid (02-18-2014)

  6. #4
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    [IMG][/IMG]

    I spent some time researching zero inflated gamma and what others have been doing. It seems the common suggestion is to run a logistic model with any movement valued 1 and no movement 0 to determine the probability of not moving and then running a gamma glm for the data with all the zero movements removed. For the first step, I am not sure how to calculate the probability of not moving. Would I run the logistic glm and then take the intercept coef and convert it to odds ratio and then probability % or would I do it with the best model (a series of covariates)? If I run the logistic glm and chose the best model with covariates how do I get the probability not moving and how would I report that? The probability of not moving with the best model (cloud cover and temperature) to predict movement is xx%, and when they do move the best model is precipitation and temperature? Would it be something like that?

  7. #5
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    Quote Originally Posted by zombie_kid View Post
    [IMG]snipped[/IMG]
    this looks promising, I've seen the zero frequency much higher (making ZI model more difficult) ... but I notice the bin width is ~20, so I'm not sure how many actual zeros there are, what's the proportion of Distance=0 points?

    Quote Originally Posted by zombie_kid View Post
    ...It seems the common suggestion is to run a logistic model with any movement valued 1 and no movement 0 to determine the probability of not moving and then running a gamma glm for the data with all the zero movements removed...
    That's precisely what a ZI model does: it's a mix between logistic (to get probability of Distance=0) and the desired pdf (e.g. gamma). Indeed, your sources are correct that running a logistic model as you described is (probably, usually) a good way to get staring values for the logistic component of the ZI model.

    Quote Originally Posted by zombie_kid View Post
    ...For the first step, I am not sure how to calculate the probability of not moving. Would I run the logistic glm and then take the intercept coef and convert it to odds ratio and then probability % or would I do it with the best model (a series of covariates)? If I run the logistic glm and chose the best model with covariates how do I get the probability not moving and how would I report that? The probability of not moving with the best model (cloud cover and temperature) to predict movement is xx%, and when they do move the best model is precipitation and temperature? Would it be something like that?
    There will be one log-odds estimate for each value of the vector (x1, x2, ..., xk), where x1...xk are k predictor variables, and the log-odds is with respect to whatever the reference level is for all variables combined (e.g. if k=2, the variables are cloud cover and temperature, then the comparison is probably when both = 0, by default) ... is that what you're asking?

    in any event, there will be 2 linear components to your model:

    \log\left(\frac{p}{1-p}\right)=X_1 B_1 for the logistic part

    and

    -\mu^{-1}=X_2 B_2 for the gamma part

    where p, \mu, X_1, B_1, X_2, B_2 are, respec., the probability that Distance>0, the gamma component's mean, the design matrix (independent variables) for the logistic component, the coefficients for the logistic component, the design matrix for the gamma component, the coefficients for the gamma component

    I say all that to say this: when you run a logistic regression as you described, the result can inform you about which variables should be in X_1 and what the starting values of B_1 should be ... of course, given that your sample size is large compared to the number of model parameters, you could start with both design matrices containing all variables and build the model from there

  8. The Following User Says Thank You to ted00 For This Useful Post:

    zombie_kid (02-19-2014)

  9. #6
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    The proportion of zeros is 889/1189. So I ran the logistic model and achieved the following results from the best model from an AIC stepwise selection:
    Coefficients:
    Estimate Std. Error z value Pr(>|z|)
    (Intercept) -1.156957 0.245063 -4.721 2.35e-06 ***
    cover.fpartly 0.443616 0.249647 1.777 0.07557 .
    cover.fsunny -0.305031 0.162182 -1.881 0.06000 .
    patch.f2 0.337086 0.191946 1.756 0.07906 .
    patch.f3 0.765961 0.294270 2.603 0.00924 **
    patch.f4 -0.286296 0.152999 -1.871 0.06131 .
    Precipitation -0.006028 0.002309 -2.610 0.00905 **
    TMIN 0.029289 0.013880 2.110 0.03485 *
    cover.fpartly:Precipitation -0.003900 0.009433 -0.413 0.67929
    cover.fsunny:Precipitation 0.012801 0.002769 4.623 3.79e-06 ***

    I converted this to odds ratio and then into probability percentages, as I would for reporting in my results section. Which look like:
    56% increase in liklihood of movement if cloudy.
    26% decrease in liklihood of movement if sunny
    40% Increase in movement if from patch 2
    115% increase in movement if from patch 3
    25% decrease in movement from patch 4
    .01% decrease in movement for each unit increase in rainfaill (10ths of mm?)
    2.9% Inrease in movement for each degree increase in TMIN
    .01% decrease in movement if cloudy and unit increase in precip
    .1.3% increase in movement if sunny and unit increase in rain

    My next step was running the gamma glm with the zeros removed. The best model achieved based on stepwise AIC was the following:
    Coefficients:
    Estimate Std. Error t value Pr(>|t|)
    (Intercept) 0.0392863 0.0068262 5.755 1.9e-08 ***
    patch.f2 -0.0089734 0.0032709 -2.743 0.00640 **
    patch.f3 -0.0066998 0.0042488 -1.577 0.11574
    patch.f4 -0.0020613 0.0032828 -0.628 0.53047
    Temp -0.0008266 0.0003139 -2.634 0.00883 **

    Now I am not sure how to interpret the coefficients from the gamma model. I have already done a conditional logistic model for habitat selection and a cox proportional hazards for survival, therefore I am familiar with interpreting odds ratios/ hazards rates. Do I calculate odds ratios the same way for the gamma model?

    As for reporting in my results write up. Would I state the above probabilities of movement based on the best model and say something about the probability for each covariate like I did above, or do I need to somehow estimate overall probability of movement given that best model? This is where I get confused. I visioned it for example like The given the best model of Patch, Precipitation, Cloud Cover, and Minimum Temp, the probability of not moving is xx, but when animals do move the best model of Patch and Average daily Temp predict how far the animal moves.

  10. #7
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    The last thing you said is interesting about letting the logistic guide the gamma. It seems kind of silly to get a set of predictors that predict movement and then different ones for distance. Are you saying I could use only the predictors from the best logistic model in the gamma model? My next step is repeating all of this with movement/distance compared to weather the day prior. The above test was for the weather the day of the movement.

  11. #8
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    zombie_kid, are you familiar with SAS, too? I have some SAS code handy I can send you, I don't have my R code version availabel at the moment, would SAS code be any use to you? It's code that shows how to fit several ZI and "hurdle" models either with PROC GENMOD but also by directly using the log-liklihood (hurdle models are similar idea to ZI models)

  12. #9
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    Okay I found a post explaining interpretation of gamma. I still take the exp(coef) and for each one unit increase the distance moved would increase by the exp(coef). So for what I posted above the distance moved would increase .99 per unit change in Temp, or being from patch 2, 3, and 4 (they all have the same value for coef).
    Or
    Is it like in odds ratio that a value <1 means the response would decrease? When I plot Distance ~ The covariates in best model there appears to be an increasing linear relationship, so I do not think this would be true.

  13. #10
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    Unfortunately I am not familiar with SAS. Everyone drilled that R was the best into my head during grad school so I took a 2 week course in Statistics for Ecology in R.

  14. #11
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    that's okay, I can give the main gist

    basically I wanted to solidify the idea that the model is a mixture model, it's a mixture between a logistic component and a gamma component, rather than two seperate models

  15. #12
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    Honestly, looking at the results I am getting with the gamma models I am wondering if it would just be best to test the hypothesis of what promotes movement rather than distance. I am thinking distance may not be the important thing at play here. The trick is convincing my committee.

  16. #13
    Points: 206, Level: 4
    Level completed: 12%, Points required for next Level: 44

    Posts
    23
    Thanks
    6
    Thanked 0 Times in 0 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    Right, I think that's what Bolker describes in his book. I seem to be having a hard time wrapping my brain around mixture models.

  17. #14
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous

    eek
    I just looked at the wikipedia page for mixture models, it's pretty hairy


    I'm not familiar with your Bolker reference


    Since you seem to be at a university, look up this paper, it's not in your field, but it's a pretty straightforward explanation of ZI models and how it's a mixture between a logistic and Poisson component ... only difference is yours is mixture of logistic and gamma components


    C E Rose, S W Martin, K A Wannemuehler, B D Plikaytis. 2006. On the use of zero-inflated and hurdle models for modeling vaccine adverse event count data. Journal of Biopharmaceutical Statistics 16(4):463-81.
    The mathematical explanation of a statistical procedure is really just pseudo-code, which we can make operational by translating it into real computer code. --B. Klemens

  18. #15
    Points: 2,109, Level: 27
    Level completed: 73%, Points required for next Level: 41
    ted00's Avatar
    Location
    USA
    Posts
    237
    Thanks
    21
    Thanked 29 Times in 25 Posts

    Re: Want to run zero inflated regression... but my dependent variable is continous


    Quote Originally Posted by zombie_kid View Post
    Honestly, looking at the results I am getting with the gamma models I am wondering if it would just be best to test the hypothesis of what promotes movement rather than distance. I am thinking distance may not be the important thing at play here. The trick is convincing my committee.
    yep, that's a subject-domain decision ... welcome to scientific research!
    The mathematical explanation of a statistical procedure is really just pseudo-code, which we can make operational by translating it into real computer code. --B. Klemens

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats