Sound propagation modeling (with lme, unequal sample size, and "drop out" problems?)


New Member
I conducted a sound propagation experiment in which 20 different recorded roar-barks (the long distance vocalization of maned wolves) were played back at different sites(x3), hours(x6: 17h,18h,23h,05h,06h,11h), and with different speaker position (x2: straight forward and inclined upward 45o). All 20 roar-barks were played in almost all condition combinations (but see question 2), summing 840 roar-barks played back. They were recorded again by 7 recorders at 1.25m, 20m, 40m, 80m, 160m, 320m, and 640m respectively, which mean there are potentially 5880 roar-bark recordings (840x7; but see question 3). For each recorded roar-bark I measured the intensity of the sound (in dB) and the correlation with the first recorded (1.25m) signal (index from 0-1; represents a measure of sound structure integrity).
My goal is to analyze the effect of site, hour and speaker position on the sound propagation quality (a smaller drop of intensity and correlation by distance means a better propagation).
So far my approach has been a linear mixed effect model that goes like this (in R): lme(intensity~ distance + site + hour + position + noise.level + temperature + humidity + wind.speed + site*distance + hour*distance + position*distance + noise.level*distance + temperature*distance + humidity*distance + wind.speed*distance, random=~ distance | roar-bark). Then I plan on building a model average with the best models selected with Akaike method using the function “dredge” and “model.avg” from MuMln package.
My questions are:
  • 1) Is there a better approach for my goal?
  • 2) On sites A and B we conducted the playbacks only at 18h, 23h, and 05h, while on site C we conducted on all hours. Is there a problem with that (using linear mixed models)? Should I model only the 3 hours that we have observations for all sites?
  • 3) As the roar-bark propagates it gets fainter and fainter and eventually undetectable (when its intensity drops below or near the background noise level, which is around 20-30 dB). At 160m around 95% of roar-barks are detectable, but at 320m only around 45%, and at 640m only around 20%. How do I deal with this missing data on my response/outcome variable (intensity)? It is obviously not missing at random and the missing pattern reveals information on the conditions that sound better propagates. Should I use the data only up to 160m, were almost all roar-barks have been detected? Should (or can) I use multiple imputations to complete the outcomes, for instance using Hmisc package? Should I simply run the model with the outcomes I have (with unequal number of observations in each distance)?
  • 4) Temperature and Humidity are correlated (-0.7 on this data set and other data suggest the correlation for a longer measurement is over -0.9). Should I use models that only include one of those variables (or none) when looking for the best model?
  • 5) How do I interpret the interactions (all variables versus distance)? For instance, what does it mean if the coefficient of “position2” is 2.83514 but the coefficient for “position2 * distance” is -0.18735? I have seen similar sound propagation models without the interactions with distance, but I expect at least site, hour, and speaker position will modify the effect of distance on the intensity. That’s basically the study goal. So I think including the interactions with distance is the correct approach. Am I interpreting this wrong?
  • 6) For the intensity level I have reasons to believe the model is linear (or at least can be modeled as linear). Physics predicts a -6dB drop for each distance doubling (in addition to other loss sources, as absorption, refraction, and reflexing by obstacles). So if the variable distance is computed as “1”, “2”, “3”, etc. the model gets linear because I recorded in doubling distances. However, for the correlation I have no predictors of model shape. How can I select the best model type?
Thank you all for any help!
ps: I judged my problem was more statistical then with the R software, that is why I posted here.