# Thread: ROC curve and comparison of data from serial tests.

1. ## ROC curve and comparison of data from serial tests.

Hello! With the premise that I am a simple biotechnologist and I've never studied biostatistics at all, I have been assigned the task of analyzing some data I've obtained from two serial lab tests.

With the first test I have no population issues, since the two groups I'm comparing are normally distributed and differ only on one variable (disease vs. no disease). I plotted my data on a ROC and obtained my optimal cutoff (I'm working with MedCalc and no previous gold standard for this test).

Then, I had to sample a cohort from each group based on their positivity in the first test in order to perform two serial tests only on positive samples (there was no biological interest in testing negative samples). I selected two populations with the same frequency distribution, skewed on the right for positivity. I tested these samples separately for two additional variables, and I obtained my results.

The questions are:
- can I ROC the data I obtained from the two serial tests non-parametrically and calculate a secondary cut-off (which I would apply only on these data, obviously)?
- classical comparison of ROCs of the two serial data is a correct way to interpret the predictive value of the two serial tests?
- if I want to calculate the correlation of the two variables, do I assume that the data are non-parametric, even if they have the same freq dist but are biased because chosen under an arbitrary parameter in the first test? put simply: spearman or pearson?

Thank you for any help, and I apologize for any bastardization of the statistical language.

2. ## Re: ROC curve and comparison of data from serial tests.

You have a sample and conduct a ROC curve to determine an optimal categorical cutoff between disease/no disease. Then you take two samples from the positives (guessing the new category that represents disease predictor), correct?

Now you say you test these two groups, what do you mean be this and what are you doing exactly with these subsamples? Are you looking at the continous value again and creating a new cut-off? Please describe this will more details and what your purpose would be.

3. ## Re: ROC curve and comparison of data from serial tests.

I have n=100 (case) and n=200 (control), already dichotomized into case/control by standard diagnostic processes (clinical evaluation). I am trying to determine if the presence of a particular serum protein can be used as a disease indicator (it's a continuos value since I get a spectrophotometric measurement that's proportional to the absolute quantity of that molecule per ml of serum: more protein = more severity, but it's just one of many concurring factors). Testing this directly in my population hasn't given me spectacular values: *** t-test, good AUC (.810) but poor specificity (45% of the ctrls are above the optimal cutoff). I then chose positive samples from both cases and controls (the negative don't have the protein and therefore are of no interest for my biological purposes) and tested for another variable on the same protein (continuous values that indicate two biological properties of the protein). I was able to add up to 40 cases/40 controls with the same freq dist in terms of serum levels of the protein. I ROC'd these and obtained two further cut-offs for positivity to these parameters. I am trying to understand how to put these three together. I understand that I should use the IF/AND rule if I want to compare the first test and one of the two serial tests (IF > cutoff for protein quantity AND > cutoff for activity of protein = true pos), but I can't wrap my head around how I could correlate the two serial tests (IF > cutoff, is activity 1 just as good as a predictor as activity 2?). I've found tons of literature on the comparison (via ROC, in parallel) of multiple direct tests for several variables on normally distributed populations, but nothing about doing the same when you don't have a normal population and you're analyzing samples which are already "forced" under another variable.

Ugh, I cant't make sense of this even in my own head.

4. ## Re: ROC curve and comparison of data from serial tests.

No it makes sense. Just one last clarification, I get the first ROC curve, which gives you some false positives and negatives based on the cutoff for the disease level.

Then you want to test a different marker on a sample that tested positive for the first marker. Why are you grabbing two samples for the examination of the other marker? Why two more samples, why not one big sample? You lost me with this.

Overview, see who tests positive for the first marker then test a second marker. Maybe you need a Venn diagram or to start looking at the predicted probability based on logistic regression model (assuming you are using logistic regression for this).

5. ## The Following User Says Thank You to hlsmith For This Useful Post:

zuzi (02-21-2013)

6. ## Re: ROC curve and comparison of data from serial tests.

I think we have different concept of "sample". For me a sample is 1 serum from 1 person, which I am able to test for one variable at a time. I'm not taking two new samples to examine the second and third marker, I am taking the exact same biological sample I used in the first test and phisically do two more different experiments with it. I consider it "serial" because I chose that particular sample based on its performance in the first test. I could do all three tests simultaneously, but then there'd be a great chance (in the ctrls, for example, where 55% resulted negative for the first marker) that I'd be doing the second and third test uselessly (they're quite expensive) and obtain false negatives (negative for the second and third marker because they are negative for the first marker and not because the protein has no function).

I did try to delve into the possibility of making a logistic regression (likelihood ratio test), but I don't really know how to navigate those. SPSS doesn't do multiple logistic regression, or am I wrong?

7. ## Re: ROC curve and comparison of data from serial tests.

I sure SPSS does logistic regression, I would be very surprised if it did not.

So to recap, you are testing samples with a continuous marker for predicting disease status, establish threshold then testing all positives based on threshold for another marker, then establishing threshold and testing all of those new positives with another marker. This is pretty standard in the literature.

Do you want to rephrase what your question or dilemma is.

 Tweet