I am having trouble trying to find a way of estimating missing data from a project I have been working on and I hope that someone can point me in the right direction.
I am comparing the insect populations of 2 fields. The fields have different crops growing in them, and the aim of the project is to investigate the differences in the insect populations between the two fields. As each field is uniform in habitat with very little variation, it is expected that there will be little difference between the catches in each trap within a field within the same month (although in practice there is a small degree of variation). As the project is in a temperate climate, however, it is expected (and is the case in practice) that there will be variation between months (e.g. lots of species and lots of individuals in June, and few species and few individuals in December).
Sampling is undertaken on a monthly basis over several years. For each field I have 20 insect traps (samples). I need to compare Field 1 to Field 2 for each month of the year (e.g. Field 1 Jan 2011 to Field 2 Jan 2011), and I also need to compare data from a given month in one field to data from another month in the same field (e.g. Field 1 Jan 2011 to Field 1 Feb 2011). For this I need to have the same number of samples for each month. For statistical reasons my sample size has to be 20 per month in every case. For each sample I record the number of species and the number of individuals of each species (e.g. Species A - 10, Species B - 3 etc.). I can use this data for comparing species richness (number of species) and number of individuals between sites. I can also use this data to estimate the total biomass of each monthly catch by multiplying the average weight for a species by the total number of that species caught and adding the totals for species.
In summary, I need to compare the following between field for each month:
Number of species (species richness)
Number of individual insects
Total biomass (weight of all insects in field, estimated as above)
Provided that I have 20 samples for both fields, comparative statistics and biomass calculation are not an issue.
Occasionally, one or more traps in a given month gets destroyed, by animals or farm machinery, resulting in missing data. So, for instance, I am left with a result for January 2011 where I have 20 samples for Field 1 and 18 samples for Field 2.
I need to find a way to estimate the missing data, in a way that I can still compare the data using statistics such as T-tests.
It is worth noting here that there are some zero values within my data for incidences where a sample was not destroyed, but no insects were caught.
The Search for a Solution (so far):
Immediately, I had to discount Complete Case Analysis (listwise deletion) as my sample size needs to be 20. I then researched imputation and quickly discovered that Multiple Imputation is more useful due to the unbiased estimates of standard errors. I came to the conclusion that my missing data was ignorable and random. The examples I came across appeared to be relevant to studies on the population of a single species, but not where there are multiple species to take into account within a sample. I came across a programme called TRIM (TRends and Indices for Monitoring data) which estimates missing data using Loglinear Poisson Regression, which is claimed to be more appropriate than MI for missing monitoring data. However, TRIM appears to again only be of use where the population of a single species is concerned. Also, the FAQ mentioned that it only suitable for yearly data and not for monthly data. This has brought back round to the idea of using MI somehow, but as of yet I am unable to come up with a solution.
I would be grateful for any help!
Advertise on Talk Stats