Help with decision tree when continuous data


Ambassador to the humans
I guess I'm not clear on what your question is. Also you might not be able to use built in packages for the final result but most aren't too difficult to learn and running one could give you an idea of essentially how good of a model you'll be able to fit.

The basic algorithm for fitting a decision tree is pretty simple albeit quite computationally intensive.
I tried R's tree package and on a small subset (about 5000 rows) it worked great, finding rules that gave 11 TRUES with no deviance. But when I took those rules and applied them to the entire training set the results were pretty poor. The best I could get was a net score of -48.

Then I tried running tree on the entire training set and it wasn't able to find any clear rules. Deviance was very high for all end nodes and the predicted values of each end node were nowhere close to TRUE on average (I made the TRUE/FALSE column 1/0 for analysis).

Even if the larger tree had given me satisfactory results, the fact that I may not be able to apply it to additional data is bothersome (my training set is a subset of a much larger data set that my professor is in possession of, and which my end algorithm will be tested against).


Ambassador to the humans
Common ways to make improvements are boosting, bagging, and/or pruning. You could give one or more of them a try to see how well they work for your data.

If you're alright with losing the *incredibly nice* interpretation of decision tree models you might consider moving to a random forest instead.