I have 1,600 plots representing ecological data inside 1 of many ecological zones. The sample data is meant to describe the conditions inside the 1 zone that has been sampled. Random, stratified sampling was used in the sample design where weighting was used to place plots proportionally by area.

There are 5 base categories and 12 sub-categories created from sampling (Table 1). Each sub-category is made up of a group of conditions. There are 5 conditions, each with a list of attributes that make it up (Table 2). Most attributes occur in each category and sub-category with typically just proportions of plots sampled for each attribute changing between them. Table 3 explores one example sub-category (A-CO-DEN). The exact same conditions fall within other base category sub-groups, just with different percentages. If i say I have certain attributes, what is the likelihood that I am in a particular sub-category/category?

Currently, I create a matrix of all possible combinations in Table 2 (1,920 combinations) and for each sub-category multiply the proportion of plots that fell within each attribute by each other (including 0’s). I then multiply this by the frequency distribution of plots within the sub-category, and then again by the frequency distribution of the base categories.

Is this the correct way to predict, based on attributes encountered, the most likely sub-category to occur? In cases where there is only 1 sub-category, resultant values seem highly biased (based on the sub-category distribution). Is there a better, more robust method for doing this?