# Thread: [R]: ROC curve after fitting a logistic model to a new datset

1. ## [R]: ROC curve after fitting a logistic model to a new datset

Hello,
I have fitted a Logistic Regression model (with 1 binary DV, and several IVs), and then I fitted it to a new dataset using the function predictic() with the argument type="response" (to get the estimated probability).

Now, I am wondering if it is possible to compute the ROC curve for this refitted model.

Any insight is appreciated.

Best
Gm

3. ## Re: [R]: ROC curve after fitting a logistic model to a new datset

Hello,
thanks for pointing out that nice video; interesting. But, unless I am mistaken, there isn't any reference to a roc curve. At about 9.30 min he shows how to get a confusion matrix.

Gm

4. ## Re: [R]: ROC curve after fitting a logistic model to a new datset

Without going back to it, I know they said "accuracy" something like 0.55 around that time marker. Accuracy is a synonym for AUC at times.

You would have to see what their model consisted of to confirm.

5. ## Re: [R]: ROC curve after fitting a logistic model to a new datset

Another approach:

Code:
``confusionMatrix(predictions,testing\$y)``

confustionMatrix: procedure
predictions: what you came up with in your testing dataset
\$: calling categorical outcome variable from within predictions dataset called "y"

6. ## Re: [R]: ROC curve after fitting a logistic model to a new datset

Here is how I would do it manually ...
Code:
``````# Create synthetic binary data
N=200
b0=0.5
b1=1.1
x=rnorm(N)
eta=b0 + b1*x
p=(exp(eta)/(1+exp(eta)))
y=as.numeric(runif(N)<p)

# Plotting the TRUE logistic curve
index=order(p)
plot(x[index],p[index],type="l")

# Split up (y,x) into two datasets one for fitting - insample -
# one for prediction - outofsample

insample=data.frame(x=x[1:100],y=y[1:100])
outofsample=data.frame(x[101:N],y=y[101:N])

# Fit model on insample data
model=glm(y~x,data=insample,family=binomial)
summary(model)

# 1. Calculate predicted values - as probabilities - for out of sample data
eta= model\$coef[1] + model\$coef[2]*outofsample\$x
p.outofsample=exp(eta)/(1+exp(eta))

# 2. Wikipedia: "receiver operating characteristic (ROC), or ROC curve,
# is a graphical plot that illustrates the performance of a binary classifier"

# To create the binary classifier choose tau:

tau=0.5

# Choose a value tau in [0,1] if predicted probability - p.outofsample - is higher than
# tau we classify as 1 else as zero (example if predicted probability is higher
# than 0.5 we would probably guess that or predict the observed value to be 1 - but choosing
# 0.5 is arbitrary depending on whether you care about false positive of false negative)
# If you care a lot about false negatives you can simply always guess 1 - hence set tau=0 -
# and then you would never be guessing 0 when the actual value was 1.

# 3. For tau we calculate TPR = TRUE POSITIVE RATE and FPR = FALSE POSITIVE RATE
# We do this for outofsample
# TPR = true positive (predicted 1 and observed 1) / number observed positive
# FPR = false positive (predicted 1 and observed 0) / number observed negative
predicted_positive = as.numeric(p.outofsample>tau)
TPR=sum(predicted_positive==1 & outofsample\$y==1)/sum(outofsample\$y)
FPR=sum(predicted_positive==1 & outofsample\$y==0)/sum(1-outofsample\$y)

# Do this for alot of values of tau
J=100
TAU=seq(0,1,length.out=J)
ROC=matrix(nrow=J,ncol=2)
colnames(ROC)=c("FPR","TPR")
for (j in 1:J)
{
tau=TAU[j]
predicted_positive = as.numeric(p.outofsample>tau)
ROC[j,2]=sum(predicted_positive==1 & outofsample\$y==1)/sum(outofsample\$y)
ROC[j,1]=sum(predicted_positive==1 & outofsample\$y==0)/sum(1-outofsample\$y)
}

plot(ROC[,1],ROC[,2])
points(TAU,TAU,type="l")``````
hopefully without too many errors. I followed the definitions from https://en.wikipedia.org/wiki/Receiv...characteristic

I guess the main insight is that once we have the predicted probabilities we have to make a binary classifier to get from predicted probabilities to predicted values on the binary dependent variable. Choosing the value tau - defining the binary classifier we - we implicitly make a choice in ROC-space representing the tradeoff between TYPE I and TYPE II errors for the given predictor.

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts