# Beta Regression Resources?

#### hlsmith

##### Omega Contributor
Does anyone have some good resources for Beta Regression related to all of its parts:

-assumptions

-diagnostic (residuals)

-interpretation of independent variables (continuous and categorical)

-etc.
_____________________________________________________

I am going to get a dataset soon with a proportion (fraction) dependent variable. Which will be percent correct or percent correct post intervention minus percent correct pre-intervention.

The key independent variable will be a 3 group categorical variable.

I will likely use R or SAS in analyses, but I am up for reading anything.

#### spunky

##### Super Moderator
I know this may sound like I'm not even trying but the only time I had to work with beta regression I found this incredibly useful for someone (like me) who had never ever used it:

https://cran.r-project.org/web/packages/betareg/vignettes/betareg.pdf

(and yeah, I know it's basically the first thing that pops up on google when you search for "beta regression in R" or something like that).

#### kiton

##### New Member
hlsmith, I have emailed you two PDFs -- one on fractional response modeling by Maarten Buis, and another on beta regression by Raydonal Martínez. Note their references as well.

#### GretaGarbo

##### Human
Yes, and here are some references from Maarten Buis.

Ferrari, S.L.P. and Cribari-Neto, F. (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics 31(7): 799-815.

Paolino, P. (2001). Maximum likelihood estimation of models with beta-distributed dependent variables. Political Analysis 9(4): 325-346. link

Smithson, M. and Verkuilen, J. (2006) A better lemon squeezer? Maximum likelihood regression with beta-distributed dependent variables. Psychological Methods 11(1): 54-71.

The expression lemon squeezer come from: "Uncorrectable skew and heteroscedasticity are among the “lemons” of psychological data".

I think that data with proportions, that are not based on the binomial distribution, (e.g. share of income spent on food) is a very important common problem. So I thought that beta regression would be a very good "lemon squeezer".

However the Talkstat user 'martenbuis' has commented on that here on Talkstat (maybe a few years ago). I don't remember very well but maybe he said that it would be better to use the inverse link of the logit and estimate with least squares. (So to my surprise the beta regression would not be the best lemon sqeezer.) I hope someone can search and find Maartenbuis comments. (And I hope Kiton can share with the rest of us the refs. That is kind of the idea with Talkstats )

#### hlsmith

##### Omega Contributor
Betareg, seems pretty straightforward, but I have some questions:

The scale variable, aka precision, phi: seems to be like dispersion Poisson regression. What does it specifically represent? Just the dispersion around the mean. What do I need to consider in regards to it?

I see the scale can be constant or variable if I associate model variables with it. If I do the later, do I just keep them in the model if the phi value goes up or AIC or based on a LR test or if the coefficients are significant in that part of the model?

Also, when presenting results, do I need to do anything to account for the Scale parameter in regards to estimates or talk about its impact on estimates beyond just describing the final model that I used, right?

Lastly, I have not seen a good description of how to present results. Is the R^2 values even important? How do you describe the model coefficients for continuous and categorical variables, since the model uses a beta distribution and lets say a logit link. is it just something like:

-for every increase in the continuous variable the mean dependent proportion increases blank; and

-mean dependent proportion is blank higher for the categorical variable group in comparison to the reference
categorical group.

Does the partial R^2 come into play at all for effect estimates?

Also, the demo example in betareg has the following, which I don't exactly know what they represent, are they just the model equation parts:

omega <- mu * phi

tau <- phi - mu * phi

Thanks!!

#### hlsmith

##### Omega Contributor
How do you describe the model coefficients for continuous and categorical variables, since the model uses a beta distribution and lets say a logit link. is it just something like:

-for every increase in the continuous variable the mean dependent proportion increases blank; and

-mean dependent proportion is blank higher for the categorical variable group in comparison to the reference
categorical group.

The link below and Lemon paper say coefficients are on the log-odds scale and says you can interpret similarly to odds ratios or convert to predicted probabilities too. This seems very weird if the you had a dependent variable that was say a percentage.

So for an fictitious categorical IV example, group 1 has 2 times greater odds of a higher outcome percentage than group 2

Continuous IV, for every 1 unit increase in X the predicted probability for increase percentage formatted outcome is 0.15 higher?

I found this description in a SAS document:

For example, the odds of having a higher Barthel Index score in the rt-PA group are 1.3 times those in the placebo group given this model (exp(b1)=1.326).

http://stats.stackexchange.com/ques...rpret-the-coefficients-from-a-beta-regression

http://support.sas.com/resources/papers/proceedings11/335-2011.pdf

#### GretaGarbo

##### Human
I was searching for maartenbuis comments. I found one here (post 7) and here (post 3)

In the last post he linked to this paper, (that looks interesting) :

Christopher MeaneyEmail and Rahim Moineddin "A Monte Carlo simulation study comparing linear regression, beta regression, variable-dispersion beta regression and fractional logit regression at recovering average difference measures in a two sample design" BMC Medical Research Methodology201414:14
DOI: 10.1186/1471-2288-14-14