When you say that the response is probabilities do you mean that it's actually a proportion? Or are we talking about a continuous outcome that could take any value between 0 and 1?
So I was working on a Logistic Regression with binary DV, but now it turns out that the DV is actually probabilities such that they are between 0 and 1 (obviously).
Do I want to switch my focus onto Probit Regression now? I am finding some material but not a whole lot.
A push in the right direction would be greatly appreciated!
You should get the same substantive results with logistic and probit regression.
You can model proportions using a GLM with a logit link. See eg here. I can't think of any reason you couldn't also do it with a probit link (I just tried and it worked), but I generally prefer the logistic model because the coefficients are so much easier to interpret.
My DV is the probability that a failure was caused by a certain ingredient. The DV ((dependent variable) is masked just because of the way this place takes in data, so I have to use a probability instead of a pass/fail scenario. And adding up the probabilities does not add up to 1. I do not have the data yet, but it looks like logistic might still be the way to go?
Or maybe I should do a beta regession? And to handle 0's I could transform the DV (dependent variable) as such:
where N=sample size
s=arbitrary number in (0,1), I am choosing 0.5
http://psychology3.anu.edu.au/people...erkuilen06.pdf
page 61 on right top side.
If your DV is expressed as a probability (from 0 to 1 but with essentially infinite values between so its not a bivariate variable) why can't you simply use OLS which is much easier to interpret and run diagnostics on?
Logit is about 0/1 values. But it can also be about a proportion for example 15 out of 20 (so that 15/20). But then that is a sum of basically 0/1 variables.
If it is values that can take any value between 0 and 1, then a beta-distribution might be useful.
The only reason it kind of bothered me is as soon as somebody sidetracks the post it becomes defunct, now I will get no more useful posts. But as Dason points out, I will make sure to make everything crystal clear for those that might not understand English so well. I do appreciate everyones help though. I am looking to do the beta distribution with the alteration to the 0's so I can keep my range as (0,1) instead of [0,1].
Autobot (08-17-2012)
This is why I don't like this approach - you need to choose an arbitrary number to get your model to run. If you change the arbitrary number then you get slightly different results. Using a generalised linear model (GLM - see Greta I'm listening!) you do not need to do this kind of fudge.
Only by convention. It's just another data transformation. It happens to be very useful for modelling binary proportions, which is why it's used for that; but as far as I know there is absolutely no statistical or mathematical reason why it shouldn't be used for arbitrary proportions between 0 and 1.
Of course you can always try a couple of different models and then choose the one that provides the best fit for you data.
Autobot (08-17-2012)
