+ Reply to Thread
Results 1 to 8 of 8

Thread: p-values in regression trees

  1. #1
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    p-values in regression trees




    Hi,
    I am analyzing some pretty hopeless datasets where the link between the DVs and IVs is quite weak. I observed however that when I take the two groups resulting from the first partition in the tree I can generallly get a nicely low p-value with a t-test . Is this some property of the trees I wonder. Is there any theorem pointing in this direction or is this possibly a weak signal I am detecting?

    regards

  2. #2
    Devorador de queso
    Points: 95,995, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,938
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: p-values in regression trees

    I don't quite understand what you're asking.
    I don't have emotions and sometimes that makes me very sad.

  3. #3
    Omega Contributor
    Points: 38,432, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,006
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: p-values in regression trees

    Is the program kicking out p-values at your splits (partitions), and you are finding in a certain subgroup the outcome that do significantly differ between groups at the second level?


    A pictorial example would be great. Is this coming from a single decision tree that you have run a couple of times?
    Stop cowardice, ban guns!

  4. #4
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: p-values in regression trees

    Hi,
    yes, it is a single tree and a continuous DV. Imagine that I have the first partition, and I have two subsets , one where the partition condition is TRUE (e.g.Volume>5) and one where the condition is FALSE . If I consider the two subsets and do a t-test for the two subsets like

    t.test(dataset[condition,]$dv, dataset[!condition,]$dv)

    I always get a low p-value (<0.05). My question is whether this is to be expected, as sort of the normal behavior of partitions or it is something one might consider as a signal?

    Now that I think of it, it looks like a case of multiple comparisons.
    Regards

  5. #5
    Devorador de queso
    Points: 95,995, Level: 100
    Level completed: 0%, Points required for next Level: 0
    Awards:
    Posting AwardCommunity AwardDiscussion EnderFrequent Poster
    Dason's Avatar
    Location
    Tampa, FL
    Posts
    12,938
    Thanks
    307
    Thanked 2,630 Times in 2,246 Posts

    Re: p-values in regression trees

    Well typically the tree wouldn't fit the partition if it didn't actually do anything. It's also fit pretty much so that you get the maximal difference between the two groups (that is in essence what the tree is attempting to do...). Now the question of how it compares to an actual linear regression depends on the data itself.
    I don't have emotions and sometimes that makes me very sad.

  6. The Following User Says Thank You to Dason For This Useful Post:

    rogojel (12-19-2016)

  7. #6
    TS Contributor
    Points: 12,287, Level: 72
    Level completed: 60%, Points required for next Level: 163
    rogojel's Avatar
    Location
    I work in Europe, live in Hungary
    Posts
    1,471
    Thanks
    160
    Thanked 332 Times in 312 Posts

    Re: p-values in regression trees

    I wonder if there could be a theorem (exercise) behind this, something like - if the number of data points is large enough and the points are different enough then the first partition will result in two groups which are significantly different?

  8. #7
    Omega Contributor
    Points: 38,432, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,006
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: p-values in regression trees

    Yes, absolutely there should be a reason behind the split. Usually it is something like an entropy or gini index. What package and procedure are you using? It is like Dason said, it is looking for the split the maximizes the difference between the two groups.
    Stop cowardice, ban guns!

  9. #8
    Omega Contributor
    Points: 38,432, Level: 100
    Level completed: 0%, Points required for next Level: 0
    hlsmith's Avatar
    Location
    Not Ames, IA
    Posts
    7,006
    Thanks
    398
    Thanked 1,186 Times in 1,147 Posts

    Re: p-values in regression trees


    The process or measure is called purity and this short article describes the 3 main algorithm (i.e., gini, entropy, and accuracy).


    http://people.revoledu.com/kardi/tut...e-impurity.htm
    Stop cowardice, ban guns!

+ Reply to Thread

           




Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts






Advertise on Talk Stats