developing a statistical prediction rule for success

#1
Hi,
My boss is looking for me to analyze data that we have an past projects and develop a "formula" that will tell how likely a future project is to be a success. We have defined what a success is, so looking at the past data I can find statistics on when projects are successful.
Now comes the hard part, developing the formula. I was reading some articles on how statistical prediction rules are very accurate, but I can't find anything on how to develop them. I'm not looking to do a super high level analysis, but I just want to do some math to show the odds of a project being a success based on certain parameters. Can anyone point me in the right direction? Thanks
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
If you are looking for success y/n and probabilities and odds of this outcome, I would look at logistic regression. It is pretty straightforward and not too high level.
 
#3
I appreciate the input. I looked into logistic regression, and I think you are correct, that is probably the best way to do it. Developing the regressions for each parameter is clear to me, but it seems like there are many different ways to interpret the regressions into a probable outcome. Do you have a suggestion about which method to use?
 

Dason

Ambassador to the humans
#4
Are you saying that you performed multiple regression using different predictors to predict the same outcome? It probably makes more sense to just do one regression where you have all of the predictors in the model simultaneously.
 
#5
I'm somewhat of a beginner with statistics. So I apologize if this is too basic, but I don't know what you mean by "one regression where you have all of the predictors in the model simultaneously." I have a bunch of different stats about each project and whether it was a success or not. So I picked what I think are 5 valuable ones and plotted each stat against 1 for success, 0 for failure. If the stat was numerical I left it alone, if it was binary, I assigned a 1 or 0 to the choices. Thereby getting 5 separate scatter plots with a line of regression for each stat and its correlation to success or failure. I found something online that explains how to calculate probability - π=e^a+bx/1+e^a+bx. But that is not working. perhaps your suggestion is better, but I don't think I quite understand.