Problem with qualitative, categorical data analysis

#1
Hi all,

I'm struggling with analysing a data set that is predominantly qualitative, categorical data. My data is basically organised as follows: the presence of three different mosquito species (yes/no or 1/0), and which season, year, month, locality, collection method and sex are associated with each entry. I don't know if that makes sense. I've attached some sample data below that will hopefully clarify. I have tried making some of the data continuous by converting to count data.

1652696436291.png

Because of the nature of the data, parametric tests aren't appropriate, and a non-parametric test such as Kruskal-Wallis also doesn't work. A friend suggested I try correlation tests, so I tried Spearman's Rank Correlation, and additionally Wilcoxon Signed Rank tests. I got some statistical results from that, but I'm worried that the assumptions for the tests are not met. My data are not normally distributed, nor homogenous, as per Shapiro-Wilk and Levene's test. I would like to do multi-variable multi-comparison tests if possible. The Wilcoxon Signed Rank test gave me a significant p-value, but I don't know what post-hoc test to do then.

My research questions are as follows:
1) Is there a significant difference between species abundance across seasons? I.e. species 1's abundance in Summer, Winter, Spring and Fall vs Species 2 and Species 3.
2). Is there a significant difference between species abundance across years? I.e. species 1's abundance in Year 1 etc vs Species 2 and Species 3.
3) Is there a significant difference in species abundance between collection methods?
4) Is there a significant difference in species abundance between locations?

Any advice on how to analyse the data would be greatly appreciated!
 

Buckeye

Active Member
#2
It sounds to me like you want to use some sort of count regression model with season, year, collection method, and location as predictor variables. Potentially a hierarchical model. How much data do you have?
 
#3
It sounds to me like you want to use some sort of count regression model with season, year, collection method, and location as predictor variables. Potentially a hierarchical model. How much data do you have?
Quite a lot, around 10 years worth of data. +/- 7800 lines of data in Excel. I hadn't considered count regression models, thanks for the tip!