@~9min and 30 secs:
https://www.youtube.com/watch?v=TxvEVc8YNlU
Hello,
I have fitted a Logistic Regression model (with 1 binary DV, and several IVs), and then I fitted it to a new dataset using the function predictic() with the argument type="response" (to get the estimated probability).
Now, I am wondering if it is possible to compute the ROC curve for this refitted model.
Any insight is appreciated.
Best
Gm
http://cainarchaeology.weebly.com/
@~9min and 30 secs:
https://www.youtube.com/watch?v=TxvEVc8YNlU
Stop cowardice, ban guns!
Hello,
thanks for pointing out that nice video; interesting. But, unless I am mistaken, there isn't any reference to a roc curve. At about 9.30 min he shows how to get a confusion matrix.
Gm
http://cainarchaeology.weebly.com/
Without going back to it, I know they said "accuracy" something like 0.55 around that time marker. Accuracy is a synonym for AUC at times.
You would have to see what their model consisted of to confirm.
Stop cowardice, ban guns!
Another approach:
Code:confusionMatrix(predictions,testing$y)
confustionMatrix: procedure
predictions: what you came up with in your testing dataset
$: calling categorical outcome variable from within predictions dataset called "y"
Stop cowardice, ban guns!
Here is how I would do it manually ...
hopefully without too many errors. I followed the definitions from https://en.wikipedia.org/wiki/Receiv...characteristicCode:# Create synthetic binary data N=200 b0=0.5 b1=1.1 x=rnorm(N) eta=b0 + b1*x p=(exp(eta)/(1+exp(eta))) y=as.numeric(runif(N)<p) # Plotting the TRUE logistic curve index=order(p) plot(x[index],p[index],type="l") # Split up (y,x) into two datasets one for fitting - insample - # one for prediction - outofsample insample=data.frame(x=x[1:100],y=y[1:100]) outofsample=data.frame(x[101:N],y=y[101:N]) # Fit model on insample data model=glm(y~x,data=insample,family=binomial) summary(model) # 1. Calculate predicted values - as probabilities - for out of sample data eta= model$coef[1] + model$coef[2]*outofsample$x p.outofsample=exp(eta)/(1+exp(eta)) # 2. Wikipedia: "receiver operating characteristic (ROC), or ROC curve, # is a graphical plot that illustrates the performance of a binary classifier" # To create the binary classifier choose tau: tau=0.5 # Choose a value tau in [0,1] if predicted probability - p.outofsample - is higher than # tau we classify as 1 else as zero (example if predicted probability is higher # than 0.5 we would probably guess that or predict the observed value to be 1 - but choosing # 0.5 is arbitrary depending on whether you care about false positive of false negative) # If you care a lot about false negatives you can simply always guess 1 - hence set tau=0 - # and then you would never be guessing 0 when the actual value was 1. # 3. For tau we calculate TPR = TRUE POSITIVE RATE and FPR = FALSE POSITIVE RATE # We do this for outofsample # TPR = true positive (predicted 1 and observed 1) / number observed positive # FPR = false positive (predicted 1 and observed 0) / number observed negative predicted_positive = as.numeric(p.outofsample>tau) TPR=sum(predicted_positive==1 & outofsample$y==1)/sum(outofsample$y) FPR=sum(predicted_positive==1 & outofsample$y==0)/sum(1-outofsample$y) # Do this for alot of values of tau J=100 TAU=seq(0,1,length.out=J) ROC=matrix(nrow=J,ncol=2) colnames(ROC)=c("FPR","TPR") for (j in 1:J) { tau=TAU[j] predicted_positive = as.numeric(p.outofsample>tau) ROC[j,2]=sum(predicted_positive==1 & outofsample$y==1)/sum(outofsample$y) ROC[j,1]=sum(predicted_positive==1 & outofsample$y==0)/sum(1-outofsample$y) } plot(ROC[,1],ROC[,2]) points(TAU,TAU,type="l")
I guess the main insight is that once we have the predicted probabilities we have to make a binary classifier to get from predicted probabilities to predicted values on the binary dependent variable. Choosing the value tau - defining the binary classifier we - we implicitly make a choice in ROC-space representing the tradeoff between TYPE I and TYPE II errors for the given predictor.
Tweet |