+ Reply to Thread
Results 1 to 7 of 7

Thread: Recursive partitioning - CART analysis

  1. #1
    Points: 2,897, Level: 32
    Level completed: 98%, Points required for next Level: 3

    Location
    Manchester
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Recursive partitioning - CART analysis



    Are there any experts in CART (classification & regression trees) analysis out there?

    I'm attempting to derive a clinical decision rule that will enable the safe early hospital discharge of some patients with suspected heart attacks. I have around 20 predictive independent variables and one dichotomous dependent variable (i.e. did the patient have a heart attack or develop an unfavourable outcome within 6 months?)

    False positive diagnosis isn't such an issue - those patients will be investigated further in hospital as they usually would be. However the costs of false negative diagnosis are significant - we don't want to be sending patients home with reassurance if they are likely to have poor prognosis.

    I therefore used the Gini splitting method and specified appropriately weighted costs for false positive and negative diagnoses. However when I automatically grow the tree the computer likes to 'play it safe' and opts to send nobody home. Growing the tree manually I can easily identify over 10% of the cohort who have a 0% event rate. However I can't cross validate a manually grown tree.

    Does anybody have any suggestions? Perhaps I could weight the cases so that patients with the most serious outcomes are counted more than once. Or perhaps I should persist with manual tree growth and forget about cross validating.

    Any suggestions would be warmly welcomed!

  2. #2
    Points: 2,965, Level: 33
    Level completed: 44%, Points required for next Level: 85

    Location
    Oxford
    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I think... Further investigation of false positive cases is still cost. I would simply introduce the costs and then there will be a balance. Every diagnostic test (including algorithms) should have sensitivity and specificity. If you want to increase sensitivity to absolute 100% you lose all specificity. Loosening it a bit might give a big rise in specificity.

  3. #3
    Points: 2,897, Level: 32
    Level completed: 98%, Points required for next Level: 3

    Location
    Manchester
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Thanks very much for your helpful suggestions.

    I've weighted the costs in various ways. I assigned the cost of a false positive diagnosis as 1 and the cost of false negative diagnosis anything from 1 to 71 (the latter arose as a result of another calculation). Still, the software won't automatically assign anybody to the 'predict favourable outcome' group. Therefore it doesn't seem to be a case of accepting poorer sensitivity (which would definitely be a reasonable approach) - the software simply can't automatically figure out a useful rule at all!

    I can manually derive a rule quite easily but doing that means I can't cross validate. What do you think - should I just put up with the fact that I can't cross validate and hope that I haven't overfit the data? Are there reasons why a decision tree should be grown automatically and not manually?

    Thanks again,

    Rick.

  4. #4
    Points: 2,965, Level: 33
    Level completed: 44%, Points required for next Level: 85

    Location
    Oxford
    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts
    I just tried my SPSS. there is a setting called "minimum number of parent/child nodes." is your participant number smaller than the default setting?

  5. #5
    Points: 2,897, Level: 32
    Level completed: 98%, Points required for next Level: 3

    Location
    Manchester
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts
    No, I have 800 cases. I set the minimum number of cases in the parent node at 50 and the minimum number in the child node at 20. The minimum improvement is set at the default level (0.0001) and I used the Gini splitting method.

    My tree hasn't actually met the stopping rules I specified but for some reason the software stops going the tree prematurely. I tried three software packages (Answer Tree, CART by Salford Systems and Knowledge Seeker by Angoss) but each does a similar thing! Perhaps manual tree growth is the only way forward?!

  6. #6
    Points: 2,965, Level: 33
    Level completed: 44%, Points required for next Level: 85

    Location
    Oxford
    Posts
    44
    Thanks
    0
    Thanked 0 Times in 0 Posts
    Sorry I have no experience on such a thing. Maybe it is something similar to cluster analysis. Does Gini aim to maximise the similarities between cases? Any implication from that? for example
    Try other splitting methods?
    Did you try to run a cluster analysis? maybe after deleting some outliers you might come up with a better solution? Or perhaps simply to try selecting 400 cases and see what is going on?
    other brainstorming ideas without much theoretical stand --
    Did you actually run a regression to see the factors? Can you try to force the first factor into the model (maybe the one with largest beta) and see what is going on?
    what was the model improvement of your manual model?
    good luck anyway

  7. #7
    Points: 2,897, Level: 32
    Level completed: 98%, Points required for next Level: 3

    Location
    Manchester
    Posts
    12
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Thanks, this is still helpful.

    I did run a regression to identify the predictive variables and only entered those variables found to be significant predictors of outcome (P<0.05) into the CART analysis. I also tried both Gini and twoing splitting methods. The Gini performed better as it has the advantage of being able to take into account the weightings for the costs.

    It sounds to me like the idea of forcing factors into the model is the way to go. I know what I want the decision tree to do and I can make it do it. Telling the computer exactly what I want from its automatic tree growth function appears to be impossible! I suppose the main disadvantage will be that forcing splits increases the chances of overfitting - but then it seems that there's little else I can do.

    Thanks once again for your help.

+ Reply to Thread

Similar Threads

  1. Partitioning data into bins
    By woa in forum R
    Replies: 3
    Last Post: 01-12-2011, 08:16 PM
  2. Non-parametric test and partitioning variance?
    By bhmorrill in forum Statistics
    Replies: 0
    Last Post: 07-16-2010, 02:05 PM
  3. Replies: 2
    Last Post: 03-09-2010, 10:37 PM
  4. Replies: 3
    Last Post: 01-26-2009, 02:34 PM
  5. block recursive VAR model?
    By ensark in forum R
    Replies: 0
    Last Post: 05-29-2008, 04:11 AM

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts








Advertise on Talk Stats