# Alternatives to logistic regression when the number of cases is low

#### kimwg

##### New Member
Hi All

I got involved in my first foray into logistic regression, but have come to realize that I probably can't use it. Let me tell you about my data.

I am investigating the consumption of 3 different insect pests by bats in an agricultural area. The response variable is binary for each of the three insect pests (the bat did or did not eat the insect pest). A multinomial coding is possible though perhaps not appropriate in that a bat could potentially eat more than one pest at a time.

A number of IV's could affect the probability of consuming a pest, such as season (insects have variable abundance across the year), habitat type (is it a forest fragment or an agricultural area; this can be expressed as a forest/agriculture dichotomous nominal or as a continuous variable that summarizes various measures of agricultural intensification), bat foraging strategy (do the bats hunt insects on the wing or pluck them off vegetation -- different pests would be differentially vulnerable to predation strategies), and the degree of vegetation clutter where bats can hunt (some can only hunt only in open space, some can hunt in dense vegetation; some overlap/correlation with the foraging strategy variable). Each insect has unique ecology and habitats, it is reasonable to expect that probability of consumption will be different for different IV's.

Anyway, you are probably thinking "hey dummy, run a logistic regression" and you would be right EXCEPT that my number of cases is low. Of 217 samples analyzed, only 22 came up positive for any pest insect (8 positives for insect pest 1, 5 for insect 2, and 9 for insect 3). The most basic "rules of thumb" of 10 cases per IV suggest that I just do not have the sample size necessary to run a binomial logistic for each insect pest. From what I understand, multinomial models need even more cases. Due to the ridiculous amount of effort needed to catch these bats and get their poop to see if they ate the pests, it just is not reasonable to collect more data.

So if I don't do logistic regression, I could, for example, do a chi-square or Fisher's exact for each DV (insect 1, 2, 3), and each IV (4 of these) which leaves me doing 12 tests. Seems inelegant and more tests means more false positives.

Any thoughts on alternatives I could pursue? Ways to deal with 12 individual tests? Is adjusting pvalues down (e.g., seq Bonferroni style as done in multiple pairwise comparisons) an approach to use in this context if I do use chis?

#### noetsi

##### Loves R
I think the Fisher exact test requires a lot less data so you might go with that. With chi-square having cells with less than 5 cases creates problems and having 0 will I think cause the results to not run correctly (not sure of the later, but you want to have most of your cells have at least 5 cases). I don't think you will be able to (for example) test if consumption of insect 2 with five cases was influenced by a given variable because even if that variable only has 2 levels you probably won't have enough cases per cell for the statistic to run. Not for chi square anyway - it might work for Fischer's although I tend to doubt it. I would think, even if it does run that your power will be incredibly low so if there is a reasonably high effect size it still won't be statistically signficant.

That is you won't really know if you should have rejected the null when you did not because of power issues. I also wonder if it makes substantive sense to test if a variable is influencing consumption of insects when in fact the bats dont seem to be eating insects. But of course I don't know your research so that might be a silly observation

For chi square

Sample size (whole table) – A sample with a sufficiently large size is assumed. If a chi squared test is conducted on a sample with a smaller size, then the chi squared test will yield an inaccurate inference. The researcher, by using chi squared test on small samples, might end up committing a Type II error.
Expected cell count – Adequate expected cell counts. Some require 5 or more, and others require 10 or more. A common rule is 5 or more in all cells of a 2-by-2 table, and 5 or more in 80% of cells in larger tables, but no cells with zero expected count. When this assumption is not met, Yates's Correction is applied.
http://en.wikipedia.org/wiki/Pearson's_chi-squared_test

I have no idea what the minimum sample size is for Fischer but you at the least should do a power test for it. This discusses how to do it in G*power which is a very good way to test power.

http://udel.edu/~mcdonald/statfishers.html

#### kimwg

##### New Member
Everything you have said makes perfect sense -- thank you for the reality check! I appreciate it greatly and already have G power so it should be a snap. Thank you kindly!

#### kimwg

##### New Member
That is you won't really know if you should have rejected the null when you did not because of power issues. I also wonder if it makes substantive sense to test if a variable is influencing consumption of insects when in fact the bats dont seem to be eating insects. But of course I don't know your research so that might be a silly observation
Ah, and the other thing I should mention is that this is a tropical system with thousands of insect species, so they are actually eating these three insect pests quite a lot!

#### noetsi

##### Loves R
Ah, and the other thing I should mention is that this is a tropical system with thousands of insect species, so they are actually eating these three insect pests quite a lot!
That is really curious to me. Do you have a theory why you found it so rarely when they are eating them a lot? Or do you mean five is a lot with so many species to chose from

Sounds like fasinating research. Good luck.