+ Reply to Thread
Results 1 to 3 of 3

Thread: Positive predictive value of biased data

  1. #1
    Points: 576, Level: 11
    Level completed: 52%, Points required for next Level: 24

    Posts
    44
    Thanks
    5
    Thanked 5 Times in 5 Posts

    Positive predictive value of biased data




    Hey,

    I am currently dealing with prediction of binary data due to structural alerts. This means if a substructure is within the query structure it is labelled as positive.
    For evaluation of predictivity of the single alerts I am having datasets of >1000 structures. The problem is that most of the sets consist of approx. 60-75% of positive classified data.

    Now of course i am getting a high positive predictive value being calculated as true positives/(true positives+false positives) because the pobability of guessing a structure as positive is sometimes much more than 50%. This makes it hard for me to compare the outcomes for the different datasets.

    Is there any method (except for reducing the original datasets) how i could include the bias of the predicted dataset to see the actual predictivity?
    Or am I totally missing on something here?

    I would really appreciate your help here.

  2. #2
    Omega Contributor
    Points: 38,334, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    6,998
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: Positive predictive value of biased data

    So your sample has a prevalence higher than most samples and you think your PPV estimates may be mis-leading?


    If so, you have selection bias in your sampling, since it does not appear to be random from the population at large. Yes, prevalence can mess up the horizontal calculations. Can you just report this, or are you using this results yourself. Depending on things or reporting, I wonder if you can mess around with the constant term in a logistic regression model to account for this, though I am not sure you can easily get PPV out of your logistic model.
    Stop cowardice, ban guns!

  3. #3
    Points: 576, Level: 11
    Level completed: 52%, Points required for next Level: 24

    Posts
    44
    Thanks
    5
    Thanked 5 Times in 5 Posts

    Re: Positive predictive value of biased data


    Hey,

    thanks for your reply!

    yes thats basically my problem. I'm having samples with different prevalences and the only value i can get is the PPV because as i am matching chemical structures i only get information about which substructure matches which structure but no information about the negatives.

    Now if i am having for example 76% prevalence that means if i am guessing the positives that would mean that i am getting a ppv of 76% just by chance which is very high.
    Or do i just have some big mistake in my thinking?

    Now I can't see how a regression could help me here im sorry.
    I need to say have never used much of statistics and therefore im pretty lost here how to interpret the results correctly.
    Last edited by lilchaos; 08-30-2016 at 01:31 AM.

+ Reply to Thread

           




Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats