+ Reply to Thread
Results 1 to 6 of 6

Thread: [R]: ROC curve after fitting a logistic model to a new datset

  1. #1
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts

    [R]: ROC curve after fitting a logistic model to a new datset




    Hello,
    I have fitted a Logistic Regression model (with 1 binary DV, and several IVs), and then I fitted it to a new dataset using the function predictic() with the argument type="response" (to get the estimated probability).

    Now, I am wondering if it is possible to compute the ROC curve for this refitted model.

    Any insight is appreciated.

    Best
    Gm
    http://cainarchaeology.weebly.com/

  2. #2
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: [R]: ROC curve after fitting a logistic model to a new datset

    Stop cowardice, ban guns!

  3. #3
    TS Contributor
    Points: 40,089, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Downloads
    gianmarco's Avatar
    Location
    Italy
    Posts
    1,367
    Thanks
    232
    Thanked 301 Times in 225 Posts

    Re: [R]: ROC curve after fitting a logistic model to a new datset

    Hello,
    thanks for pointing out that nice video; interesting. But, unless I am mistaken, there isn't any reference to a roc curve. At about 9.30 min he shows how to get a confusion matrix.

    Gm
    http://cainarchaeology.weebly.com/

  4. #4
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: [R]: ROC curve after fitting a logistic model to a new datset

    Without going back to it, I know they said "accuracy" something like 0.55 around that time marker. Accuracy is a synonym for AUC at times.


    You would have to see what their model consisted of to confirm.
    Stop cowardice, ban guns!

  5. #5
    Omega Contributor
    Points: 38,303, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,993
    Thanks
    397
    Thanked 1,185 Times in 1,146 Posts

    Re: [R]: ROC curve after fitting a logistic model to a new datset

    Another approach:

    Code: 
    confusionMatrix(predictions,testing$y)

    confustionMatrix: procedure
    predictions: what you came up with in your testing dataset
    $: calling categorical outcome variable from within predictions dataset called "y"
    Stop cowardice, ban guns!

  6. #6
    TS Contributor
    Points: 7,081, Level: 55
    Level completed: 66%, Points required for next Level: 69

    Location
    Copenhagen , Denmark
    Posts
    515
    Thanks
    71
    Thanked 123 Times in 116 Posts

    Re: [R]: ROC curve after fitting a logistic model to a new datset


    Here is how I would do it manually ...
    Code: 
    # Create synthetic binary data
    N=200
    b0=0.5
    b1=1.1
    x=rnorm(N)
    eta=b0 + b1*x
    p=(exp(eta)/(1+exp(eta)))
    y=as.numeric(runif(N)<p)
    
    # Plotting the TRUE logistic curve
    index=order(p)
    plot(x[index],p[index],type="l")
    
    # Split up (y,x) into two datasets one for fitting - insample -
    # one for prediction - outofsample
    
    insample=data.frame(x=x[1:100],y=y[1:100])
    outofsample=data.frame(x[101:N],y=y[101:N])
    
    
    # Fit model on insample data
    model=glm(y~x,data=insample,family=binomial)
    summary(model)
    
    
    # 1. Calculate predicted values - as probabilities - for out of sample data
    eta= model$coef[1] + model$coef[2]*outofsample$x
    p.outofsample=exp(eta)/(1+exp(eta))
    
    # 2. Wikipedia: "receiver operating characteristic (ROC), or ROC curve, 
    # is a graphical plot that illustrates the performance of a binary classifier"
    
    # To create the binary classifier choose tau:
    
    tau=0.5
    
    # Choose a value tau in [0,1] if predicted probability - p.outofsample - is higher than
    # tau we classify as 1 else as zero (example if predicted probability is higher
    # than 0.5 we would probably guess that or predict the observed value to be 1 - but choosing
    # 0.5 is arbitrary depending on whether you care about false positive of false negative)
    # If you care a lot about false negatives you can simply always guess 1 - hence set tau=0 -
    # and then you would never be guessing 0 when the actual value was 1.
    
    # 3. For tau we calculate TPR = TRUE POSITIVE RATE and FPR = FALSE POSITIVE RATE
    # We do this for outofsample
    # TPR = true positive (predicted 1 and observed 1) / number observed positive
    # FPR = false positive (predicted 1 and observed 0) / number observed negative
    predicted_positive = as.numeric(p.outofsample>tau)
    TPR=sum(predicted_positive==1 & outofsample$y==1)/sum(outofsample$y)
    FPR=sum(predicted_positive==1 & outofsample$y==0)/sum(1-outofsample$y)
    
    
    # Do this for alot of values of tau
    J=100
    TAU=seq(0,1,length.out=J)
    ROC=matrix(nrow=J,ncol=2)
    colnames(ROC)=c("FPR","TPR")
    for (j in 1:J)
    	{
    		tau=TAU[j]
    		predicted_positive = as.numeric(p.outofsample>tau)
    		ROC[j,2]=sum(predicted_positive==1 & outofsample$y==1)/sum(outofsample$y)
    		ROC[j,1]=sum(predicted_positive==1 & outofsample$y==0)/sum(1-outofsample$y)
    	}
    
    plot(ROC[,1],ROC[,2])
    points(TAU,TAU,type="l")
    hopefully without too many errors. I followed the definitions from https://en.wikipedia.org/wiki/Receiv...characteristic


    I guess the main insight is that once we have the predicted probabilities we have to make a binary classifier to get from predicted probabilities to predicted values on the binary dependent variable. Choosing the value tau - defining the binary classifier we - we implicitly make a choice in ROC-space representing the tradeoff between TYPE I and TYPE II errors for the given predictor.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats