When you say that the response is probabilities do you mean that it's actually a proportion? Or are we talking about a continuous outcome that could take any value between 0 and 1?
So I was working on a Logistic Regression with binary DV, but now it turns out that the DV is actually probabilities such that they are between 0 and 1 (obviously).
Do I want to switch my focus onto Probit Regression now? I am finding some material but not a whole lot.
A push in the right direction would be greatly appreciated!
Last edited by Autobot; 08-16-2012 at 09:32 AM.
When you say that the response is probabilities do you mean that it's actually a proportion? Or are we talking about a continuous outcome that could take any value between 0 and 1?
I don't have emotions and sometimes that makes me very sad.
Autobot (08-16-2012)
You should get the same substantive results with logistic and probit regression.
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Autobot (08-16-2012)
You can model proportions using a GLM with a logit link. See eg here. I can't think of any reason you couldn't also do it with a probit link (I just tried and it worked), but I generally prefer the logistic model because the coefficients are so much easier to interpret.
Autobot (08-16-2012)
My DV is the probability that a failure was caused by a certain ingredient. The DV ((dependent variable) is masked just because of the way this place takes in data, so I have to use a probability instead of a pass/fail scenario. And adding up the probabilities does not add up to 1. I do not have the data yet, but it looks like logistic might still be the way to go?
Last edited by Autobot; 08-16-2012 at 12:11 PM. Reason: Clear up confusion of abbreviations
Or maybe I should do a beta regession? And to handle 0's I could transform the DV (dependent variable) as such:
where N=sample size
s=arbitrary number in (0,1), I am choosing 0.5
http://psychology3.anu.edu.au/people...erkuilen06.pdf
page 61 on right top side.
Last edited by Autobot; 08-16-2012 at 01:10 PM.
If your DV is expressed as a probability (from 0 to 1 but with essentially infinite values between so its not a bivariate variable) why can't you simply use OLS which is much easier to interpret and run diagnostics on?
"Very few theories have been abandoned because they were found to be invalid on the basis of empirical evidence...." Spanos, 1995
Logit is about 0/1 values. But it can also be about a proportion for example 15 out of 20 (so that 15/20). But then that is a sum of basically 0/1 variables.
If it is values that can take any value between 0 and 1, then a beta-distribution might be useful.
I must admit I don’t understand this so I might have misunderstood this post:
It comes to my mind “The Ecologists” posting instruction:
I don’t want to go that far, but if you want to be understood and answered, don’t use abbreviations."So don't use instant-messaging [SMS] shortcuts. Spelling "you" as "u" makes you look like an semi-literate dud who just saved two entire keystrokes."
Autobot (08-16-2012)
Greta was just quoting this thread on posting guidelines (sure there are mistakes in it but the point still stands). She hates abbreviations though. English isn't everybody's first language here (Greta is included in this) so she was just trying to make you more conscientious of that.
I don't have emotions and sometimes that makes me very sad.
Autobot (08-16-2012)
I'm sure Greta meant "semi-literate dude" as a term of endearment
The only reason it kind of bothered me is as soon as somebody sidetracks the post it becomes defunct, now I will get no more useful posts. But as Dason points out, I will make sure to make everything crystal clear for those that might not understand English so well. I do appreciate everyones help though. I am looking to do the beta distribution with the alteration to the 0's so I can keep my range as (0,1) instead of [0,1].
Well as long as the conversation is side tracked I'll help you with the latex tags you used above. You used latex rather than tex or MATH. I prefer math as tex would not have displayed your info correctly.
So...
[MATH]x' = \frac{x(N-1)+s}{N}[/MATH]
Gives you this...
Sorry to further side track but this may be helpful to you in posting here I know it was
for me.
OK let's get this thread back to its original intent everyone
"If you torture the data long enough it will eventually confess."
-Ronald Harry Coase -
Autobot (08-16-2012)
As Dason pointed out I was literally quoting “The Ecologists” forum guidelines (not “posting instructions” sorry for that). I am not a native English speaker and I guess that The Ecologist is not either. (And yes, I also noted that “an” but I didn’t want to change the quotation.) And besides, I said that I did not want to go that far. I interpret is as don’t cut down when it is not necessary.
But that is the formulation given in: “How to post”. It has been there for years. Anybody – who have read the guidelines - and is good in English - could have suggested a correction.
@Smoothjohn, I din not say "semi-literate”. It was the forum guidelines.
No, I don’t hate abbreviations. I just feel sorry for those who post and are not understood. This is an international site. What is obvious for some in one country might not be understandable in other countries, for example for our friends in India and Nigeria. What is obvious for the psychometrician might be unknown for chemometricians.
We can talk about glm, hglm, gee, gllamm gam, gmm,gamlss, glmm and the pros and con of each of them. I am sure that Dason understands this and could give a lecture about each of them, but - and this is my point - would the hundreds of readers here understand all of that? And it only takes one abbreviation to lose the reader.
@Autobot. Now you can ask your self which do you prefer: to be given a suggestion in broken English [you seems to have accepted the idea of betadistribution] and be told that I had not understood your abbreviation, or be left without suggestion?
This is a suggestion for the improvement of this community, for Autobot and other writers: If you want to be understood and answered, don’t use abbreviations!
Autobot (08-17-2012)
This is why I don't like this approach - you need to choose an arbitrary number to get your model to run. If you change the arbitrary number then you get slightly different results. Using a generalised linear model (GLM - see Greta I'm listening!) you do not need to do this kind of fudge.
Only by convention. It's just another data transformation. It happens to be very useful for modelling binary proportions, which is why it's used for that; but as far as I know there is absolutely no statistical or mathematical reason why it shouldn't be used for arbitrary proportions between 0 and 1.
Of course you can always try a couple of different models and then choose the one that provides the best fit for you data.
Autobot (08-17-2012)
Tweet |