[Solved Myself, Worthless Community] Help with decision tree when continuous data
Hello,
I have a dataset (y = true/false, x's = combination of binary and continuous data columns, n = 5000) that I am attempting to create a decision tree for. I used an entropy calculation to determine my 10-most information-gaining columns but it turns out they are all continuous data.
The root of the tree (based on max information gain)contains about 3000 different values (out of a total of 5000 rows) so I am unsure of how to create the branches. There does not appear to be a basic rule (such as a cutoff value) for identifying how the column values are related to the true/false column. Logistic regression failed miserably. Any relatively easy clustering techniques (or any stat methods) for how I can go from this root node to the next nodes in this situation? If it helps, I'm doing this in R and Excel.
Thanks!
Hello,
I have a dataset (y = true/false, x's = combination of binary and continuous data columns, n = 5000) that I am attempting to create a decision tree for. I used an entropy calculation to determine my 10-most information-gaining columns but it turns out they are all continuous data.
The root of the tree (based on max information gain)contains about 3000 different values (out of a total of 5000 rows) so I am unsure of how to create the branches. There does not appear to be a basic rule (such as a cutoff value) for identifying how the column values are related to the true/false column. Logistic regression failed miserably. Any relatively easy clustering techniques (or any stat methods) for how I can go from this root node to the next nodes in this situation? If it helps, I'm doing this in R and Excel.
Thanks!
Last edited: