Flight times of insects

#1
I'm researching the flight times of a wasp in the southeast. The trap catches occurred in 09,20,11, and 12. However, these were done by three different people so trap collection times are different, trap placements were different etc... so my numbers are very different but the pattern looks to be the same. I was to look at trends over time (biweekly from October-December), and across regions (I have 3, north, central, and south). I want to do this within years and between them. And 2011 was only performed in the north region so it will be left out of the regional data.

I think that working with proportions would be the best way since my counts data are so different. As in, 75% of the total trap catch occurred in mic-November or 50% occurred in the central region. I would expect to do something like a difference of proportions test and Fisher's exact test or something along those lines. Possibly a Kolmogorov-Smirnov test? I did a one-way ANOVA on ranks but my advisor was not happy with those results.

I've attached an example of my data. Am I approaching this the right way?
 

bugman

Super Moderator
#2
Have you tried standardising the counts by the amount of time the traps were deployed?

This is the first thing I would do. I would also add in collection date and location as random effects.

I dont think analysing proportions is going to answer your questions (what is the question (s) you are tyring to answer? Because they are not comparible across sites (i.e. 50% of 1,000 is 500 and 50% 0f 2 is 1 etc...) so if you are trying to compare actaul numbers, this will be very misleading.

By collection times, do you mean collection dates? So did the length of time the traps were in the fiedl vary? Or was it the same time but on different dates?
 
#3
Thank you for the response! I have not tried standardizations. The questions are: do adults fly at different times in the various regions? are there more adults flying in one region than another? are there more insects in one year than another?

The collection times were about the same (i.e. collected at 10 day intervals) but the traps were placed, and taken down, at different times so the actual dates are not the same. That's why I broke it up into bi-weekly time periods. But even these intervals are not always the same. For example, around Christmas time there was 13-14 days in between collection times because of mandatory off times with the USDA etc...

Each of these regions has several sites in them, that's why I was working with regions, to account for any variability between individual sites - that's why I wasn't including those as random variables. Each year the sites were in different places. We are working with a woodwasp that's attracted to pine so our sites have to be dependent on host availability. You think that I should include the actual dates and locations anyway?

I am working with R so I will have to work out the code on the standardization.
 

bugman

Super Moderator
#4
Yeah, so I think you should have site nested within region and have time included as another source of random variation. So is your response (y) counts?

Maybe your R model would look something like this:

Code:
required(nlme)
model1<-lme(counts~region*year, random=~1|site/region, family ="poisson", data=data) ###if your data are counts try poission first
anova(model1)
summary(model1)
There maybe better ways to deal with this, but this is how I see it.
 
#5
You're amazing. I went through and standardized the data. I standardized data from each year, instead of doing all data combined. I'll play around with this and see what happens. I have a meeting with my advisor on Monday to go over the stats again. Hadn't talked about using any kind of lm yet so I'm glad you pointed that out. I'm sure I will end up running it about a dozen different ways.

Manuscripts are fun.

I really appreciate your help.
 

bugman

Super Moderator
#6
Good luck and get back to us if you need more help.

Depending on how your data look, you may need to look at the nlme function also...
 
#7
One more question, when nesting, don't you have to have an even number of sites per "rep"? So I have 3 sites in the Ozarks one year and added more the following year. Do I need to create dummy variables to set at 0 for those years when the sites were not established?
 
#8
Is it south-east Asia? Or Australia? Maybe Argentina? And is it months? 9,10,1,12. And wasps in December? (looking out of the window on the remains of snow and ice) Can't be! Must be down under.

Maybe you can use an offset for varying times of traps. 10 days or 14 days. Poisson or negative binomial seems reasonable. But timeperiod that must be a fixed effect isn't it? That is not really randomly chosen. Or are they really independent?


Maybe you could have use of surveyeance models for epidemics outbreaks -like influenza in the winter. That could start earlier or later dependent on previous periods weather. Maybe this is better than the above linear model.
 
#9
Oddly enough this is the US. Arkansas, specifically. Traps are set in late September and removed the first week of January. The poisson distribution does make sense. I would have assumed a fixed effect. Collection times are chosen, essentially, based on availability of people. If I had to wait an extra day because of classes or whatever then I did. It was supposed to be every 10 days but that wasn't always the case.

I was thinking of just labeling the collection dates as 1-8 (or 9 or whatever) based on the number of collections....but then the dates would be off. Hadn't thought about the model before. I don't think it needs to be that complicated though...?
 
#11
So I went through the last four years and found that in 2009 and found that sites either had 2,3, or 4 collection times. 2010 either had 4 or 5 collections. 2011 had 7, 8 or 9 and 2012 had 7 or 9 collections.

However, collection time 1 in 2009 isn't going to be the same as collection time 1 in 2012. Should I still nest these?

After reading some older literature I see the same kind of issue but the way they dealt with it was using a cumulative density function. Basically showing the end date (i.e. January 1) as 100% arrival and everything else leading up to that as a percent arrival of the insect. So it would be at 10% in early October, 50% in mid November etc... does that make sense? I think this would be a good way to visualize the data but maybe not to analyze it.
 

bugman

Super Moderator
#12
After reading some older literature I see the same kind of issue but the way they dealt with it was using a cumulative density function. Basically showing the end date (i.e. January 1) as 100% arrival and everything else leading up to that as a percent arrival of the insect. So it would be at 10% in early October, 50% in mid November etc... does that make sense? I think this would be a good way to visualize the data but maybe not to analyze it.

Personally, I wouldn't analyse it this way. Might be interesting way to show certain patterns though.

So I went through the last four years and found that in 2009 and found that sites either had 2,3, or 4 collection times. 2010 either had 4 or 5 collections. 2011 had 7, 8 or 9 and 2012 had 7 or 9 collections.
I would code collection times something like:

091,092,093...101,102,103,111.... treat them as random effects and nest them within years using REML estimations
 
#13
Coding the collection times like that gives me a week or two of them being offset. For example, it took an additional month to obtain permits in one area so that site's first collection is mid-november while the other sites had already been collected 2-3 times prior. This is where the nesting becomes I problem I think....I need to find a way to standardize the times such as every 10 days or every two weeks.....but if I do that, sometimes I get 2 collection times in a 10 day period and 0 in another just because of how the field work comes about.....UGH
 

bugman

Super Moderator
#14
Can you post an example please?

In terms of the overall objerctives of your study, two - three weeks is not going to be an issue. Biologically it certainly isn't since insect activiity and movement is going to occur at a seasonl level.
In saying that however, if there were big differences in your ambient temperatures between the two-three week periods, this could influence your results. And that being the case why not add temperature in as a covariate
if you are that concerned?

I am not sure how you are doing this, but as I suggested, adding the sample times (dates) in as random effects is accopunting for any random variation
over and above your annual variation
 
#15
Unfortunately, the woodwasp I work with only flies for a couple months (October-December) so 2-3 makes a difference in this case. I have looked at temperature and there isn't much of a difference over time. A steady decrease in avg high and low but emergence doesn't seem to match up with any specific event. And we were in a massive drought so precip events are a non-issue as well.

An example would be:
In 2009, the first collection event in south AR occurred on November 3rd, collecting 11 adult females. In 2010, the first collection event occurred on October 15th, collecting 2 adult females. In this case the first collection event is off by more than 2 weeks.

If I were to nest these times it would appear that the first 2009 collection was significantly more than 2010, but that's probably only because of the late start. Does that make sense?
 

bugman

Super Moderator
#16
Yes that makes sense. But as I said, sample time is a random source of varaition over and above the higher levels in your model (region, year and month?). Even if you sampled on exactly the same date and time in 2009 as you did in 2010 you will still find differences. Model sample time within year or even within month if you want to make inferences about specific months. Don't get to caught up in the different sample dates or the slighly unbalanced nature of this design.
 
#17
So I added a column to my data with the number of traps at each site (some of the old sites had 3 traps and newer ones only have 2) to get an average sirex per trap - this way I'm not comparing 30 sirex from 3 traps to 30 sirex from 1 or 2. This converted my data into continuous responses so I performed a glm.

> fit<-glm(Average~Geo.Loc+Year, family=gaussian, data=total)
> summary(fit)

Call:
glm(formula = Average ~ Geo.Loc + Year, family = gaussian, data = total)

Deviance Residuals:
Min 1Q Median 3Q Max
-2.3714 -1.3835 -0.7539 0.6006 7.6006

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -29.59235 261.91135 -0.113 0.910132
Geo.LocOZ -0.97200 0.32126 -3.026 0.002739 **
Geo.LocSAR -1.57775 0.41916 -3.764 0.000208 ***
Year 0.01589 0.13024 0.122 0.903014
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for gaussian family taken to be 4.150306)

Null deviance: 1109.4 on 255 degrees of freedom
Residual deviance: 1045.9 on 252 degrees of freedom
AIC: 1096.8

Number of Fisher Scoring iterations: 2

This gives me significance between regions but not years - which is what we expected. However, my data is highly overdispersed but I have a lot of 0s and 1s so I guess that's not so surprising. Does this also seem like a good way to look at it? These patterns make sense and I'm gonna do the same with collection dates within each year.