unbalanced design in contingency table analysis. Is it a restriction?

#1
I conducted a research in to elucidate the biology a of an insect seed predator that infest seeds of a palm specie in cloud forest in Venezuela. I did collected seeds directly from the palm and seeds picked up from the forest floor in two localities. I want to know if differences exists between localities with respect to where insects collocate their eggs (seeds in plant or seeds in forest floor).
I tried to resolve this question by a contingency table analysis, but I have two questions
1) A reviewer criticize my samplings design because is not balanced (26 vs 176 seeds in Avila site). Although I think that unbalanced design is not a restriction in contingency table analysis (but small expected values are). Is that correct?
2) I will test for independence between place of oviposition and Infestation in each locality by Ji-squared analysis. But I could compare probabilities of insect infestation in plant or forest floor between sites in an only analysis? A logistic regression could be a valid method? Odds ratio?

View attachment 6641
 
Last edited:

maartenbuis

TS Contributor
#2
The unbalanced design is the source of your low expected values, combined with the (fortunately) low chance of being infested. The 26 samples on plant in Avila lead the very low expected value in the Avila plant infested cell. Since infestation is so rare, you will need a large sample in order to get acceptabel power, as we discussed recently on this forum : http://www.talkstats.com/showthread.php/69607-Simulating-a-logistic-regressio-scary-results. 365 is probably not good enough. Anyhow, if you were to perform a logit analysis, then here are the results when you constrain the effect of place to be the same by locality (The analysis was done in Stata):

Code:
. clear

. input loc place infested freq

           loc      place   infested       freq
  1. 1 1 1   0
  2. 1 1 0  80
  3. 1 0 1   8
  4. 1 0 0  65
  5. 0 1 1  10
  6. 0 1 0 176
  7. 0 0 1   0
  8. 0 0 0  26
  9. end

. 
. label define loc 1 "Macarao" 0 "Avila"

. label value loc loc

. label define place 1 "forest floor" 0 "plant"

. label value place place

. label define infested 1 "yes" 0 "no"

. label value infested infested

. 
. logit infested i.loc i.place [fw=freq], or nolog

Logistic regression                             Number of obs     =        365
                                                LR chi2(2)        =       2.86
                                                Prob > chi2       =     0.2398
Log likelihood = -70.291995                     Pseudo R2         =     0.0199

-------------------------------------------------------------------------------
     infested | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
          loc |
     Macarao  |      .7669   .4174429    -0.49   0.626     .2638812    2.228789
              |
        place |
forest floor  |   .3954149   .2157548    -1.70   0.089      .135707    1.152136
        _cons |   .1062894   .0560687    -4.25   0.000     .0377984     .298887
-------------------------------------------------------------------------------