How to analyze a data set with < & > values?

RwM

New Member
#1
We are working on vernal pool botany. *There are two types of pool, Natural (control) and Constructed (experiment).

Within each pool, over 8 years, we determined the species, number and percent of pool covered by each species. *I analyzed all that data with one and two-way anova's, and matched pairs analysis. *The data I'm having trouble with was collected by another researcher, with a different data need, so its not at all like what we have, but does contain useful information.

The data are organized like this:


Species Group Species Natural Constructed
V Pool Sp A <20 >20
V Pool Sp B 0 <20
V Pool Sp C <20 >20
V Pool Sp D <20 >20
V Pool Sp E 0 >20

NON-VP WETLAND A <20 >20
NON-VP WETLAND B 0 <20
NON-VP WETLAND C 0 >20
NON-VP WETLAND D <20 0
NON-VP WETLAND E 0 >20

NON-WETLAND F <20 0
NON-WETLAND G 0 >20
NON-WETLAND H >20 >20
NON-WETLAND I <20 0
NON-WETLAND J 0 >20

There are 10 natural vernal pools, and 24 constructed pools. *What I need is a statistical test to determine if there is a significant difference between the two types, and someway of graphically showing such differences if possible.

We've never dealt with > < data values before, and am at a loss as to how to proceed; but I am sure the data contain information that will be useful for the study. *Any assistance will be greatly appreciated.

I am aware of Cochran's Q for dichotomous data, but as you see above, I have 3 values, 0, <20 and >20. Seems to me assigning a '1' for the < and > data means loss of important information. Any suggestions on how to analyze such a data set?
 
#3
Hi RwM,

Not sure if you got your answer yet, but have you considered treating your data as ordinal, where 0 equal 0, 1 is <20, and 2 is >20? Then you could conduct a Mann-Whitney U test (I think the 2 pools are independent, right?). Another option is to treat the data as categorical and conduct Fisher's exact test on pool (vernal, constructed) and value (0, <20, > 20).

A bar chart showing the number that fall within each value category could be displayed for each pool side-by-side.

Hope this helps and isn't too late
 

bugman

Super Moderator
#4
Dason is right, this is censored data (right and left). There is a general rule with left censored water quality data, that the midpoint is used - so for a value of <0.10 you would use 0.05.

However, this usually crops up as an artefact of being below detectable limits for example. In your case, it looks like this study was deliberately constructed this way and should therefore, as mostater suggests, be treated as categorical. This could have been due to time or financial limitation of the project in its initiation.

Do not code as 1 for < and > this is vital information. look at log linear modelling, which will allow you to fit interaction terms based on count data (i.e. count occurrences of >< and O within each category). With this, you should also be able to plot your data as relative frequencies for each pool type across different years.

HTH.

P