I'm a biology student and am trying to analyse a project investigating habitat selection by a species of bird. My hypothesis is that the birds should prefer more disturbed habitats.

The data I've collected takes the following format:

Transect Habitat Density of birds
1 1 0.004
1 2 0.003
1 3 0
1 4 0.0003
1 5 0.01
2 1
2 2.... etc.

Where 1-5 are broad scale habitat types, with 1 = most disturbed to 5 = least disturbed.

I want to test for difference in bird density between habitats and my superviser has told me to just use a Kruskal Wallis test.... but my worry is that this doesn't control for transect number. From reading my statistics textbook and having a look online I suspect I need to do an analysis with habitat nested within transect.... but this is where I become a bit lost! I was therefore hoping someone might be able to give me some advice - am I at least thinking along the right lines or should I just obey my superviser?!!




After grabbing hold of another tutor this afternoon, he suggested I try a linear mixed model with 'Transect' added as a random effect. I've had a go and this seems to work but I've now identified another potential problem in that I have unbalanced data - as not every habitat occurred on every transect I have more samples of some habitats than others (e.g. habitat 1 occurred 16 times but habitat 2 only 12).... this analysis is turning into somewhat of a nightmare!


How "unbalanced" is it? If we're talking between 10-20 for each habitat, then it shouldn't be a huge deal. If it's more like 12 in one, 250 in another, etc., that might be a problem.