+ Reply to Thread
Results 1 to 3 of 3

Thread: Modeling and predicting pathology from multivariate clinical data

  1. #1
    Points: 15, Level: 1
    Level completed: 29%, Points required for next Level: 35

    Posts
    1
    Thanks
    1
    Thanked 0 Times in 0 Posts

    Modeling and predicting pathology from multivariate clinical data




    Hello, I have a clinical data set that consists of 5 clinical measurements on thousands of tissue samples. Furthermore, each sample has a pathology diagnosis that is 1 of 5 possible diagnoses (all different types of tumors). I am interested in predicting which pathologic class future samples will belong to based on the 5 clinical measurements. I recognize the predictor can be built using machine learning, and I will apply both decision trees and a deep learning method to the data soon. However, I first wanted to explore more simple analyses that could be used to compare the machine learning findings to. An example of the structure of the data are below, averaged across all samples. The numbers are fake.

    Code: 
                          Tumor Type 1    Tumor Type 2    Tumor Type 3   Tumor Type 4   Tumor Type 5
    Blood Test 1          15.3 +/- 3.2    21.8 +/- 4.3    8.2 +/- 2.3     8.2 +/- 2.3    8.2 +/- 2.3
    Blood Test 2	      13.4 +/- 3.8    15.9 +/- 3.2    22.8 +/- 11.1   8.2 +/- 2.3    8.2 +/- 2.3
    Biopsy Test 1         3.2 +/- 1.3     10.2 +/- 2.9    23.9 +/- 1.2    8.2 +/- 2.3    8.2 +/- 2.3
    Biopsy Test 2         3.2 +/- 1.3     10.2 +/- 2.9    23.9 +/- 1.2    8.2 +/- 2.3    8.2 +/- 2.3
    Imaging Test 1        3.2 +/- 1.3     10.2 +/- 2.9    23.9 +/- 1.2    8.2 +/- 2.3    8.2 +/- 2.3
    Is there a statistical approach to take that might say which clinical tests are "important" for which diagnoses? For example, to determine that Tumor Type 5 is best classified by Blood Test 2 > 15, Biopsy Test 1 < 13, and Imaging Test 1 > 2?

    Do any other analysis methods jump out at you besides machine learning that I should consider?

    Thanks,
    Jason

  2. #2
    TS Contributor
    Points: 12,501, Level: 73
    Level completed: 13%, Points required for next Level: 349
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,491
    Thanks
    162
    Thanked 334 Times in 314 Posts

    Re: Modeling and predicting pathology from multivariate clinical data

    hi,
    do you have measurements also for the cases where there is no tumor? You could try a logistic regression for each tumor type, if you had negative cases as well.

    regards

  3. The Following User Says Thank You to rogojel For This Useful Post:

    jason.parker (09-10-2017)

  4. #3
    Omega Contributor
    Points: 39,242, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,094
    Thanks
    405
    Thanked 1,202 Times in 1,163 Posts

    Re: Modeling and predicting pathology from multivariate clinical data


    As you know you have a classification problem here, so what works for that...many machine learning algorithms. Question, are the continuous lab tests bounded within 0-1.00? Are continuous lab tests correlated?


    Yeah, my first inkling was also logistic reg. You wouldn't need tumor free patients if you did regroupings: 1 vs 2-5; 2 vs 3-5,...,4 vs. 1-3, 5 vs. 1-4. But that is a lot of testing when think about correcting for false discovery. But if it is just for fun it would give you a glimpse into relations. Another crude option would be just to run linear reg and treat 1-5 outcomes as a continuous variable. Another option is doing multinomial logistic regression, but you would need to set your reference group accordingly.


    Lastly you could look at correlations, you could probably get away with Spearman Rank correlation.


    P.S., Would be curious how support vector or random forest may do. I have always wondered if you could run a bunch of short trees say just for each variable (if independent) and pull the split points for each and plot a histogram of the splits to make a decision from. Though running full trees would also help to distinguish Variable Importance.
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats