Using taxonomic levels as factors in random forests: does it make sense? Is it needed

Hi all,

I want to test the effect of a set of predictors (ecological and morphological factors) on a categorical response variable (an animal behaviour). As far as I've read, random forests do not make assumptions about data independence. Therefore, can I use species in my analysis disregarding their relatedness, that is, not using phylogenetic signals (which measure tendency of related species to resemble each other more than species drawn at random from the same tree)??? Is it correct? Otherwise, I could find for example, that one behaviour is well explained by body size, not because body size has some kind of relationship with that behaviour, rather because most species that show that trait are indeed of the same taxonomic family, and that family is mainly composed by same size species.

If this is the case, can I use a taxonomic level as explanatory variables (e.g. genus, family or order)??? Would it give me the real importance of relatedness on that behaviour? (e.g.if in my importance list a get that my taxonomic level is the most important factor, can I interpret that the behaviour is mostly dependent on phylogenetic reasons rather than eco-morphological factors???)

What about classification trees on this regard?

I haven't found anything on papers and books in this regard. Thank you in advance for the help!


Super Moderator
Re: Using taxonomic levels as factors in random forests: does it make sense? Is it ne

First up I haven't used random forests, but your idea sounds reasonable. The use of taxonomic levels is really no different to scaling effects.