How to take into account different treatments, doses and starting health

#1
Hello all, first thread so please be kind! I have rudimentary statistics knowledge of anovas and regression but am piling through how-to videos of GLMs to no particular avail regarding the comparisons I want to make.

I am analysing a simple bioassay experiment with nematodes, whereby I exposed them to different chemicals and doses, and measured the proportion alive after a certain exposure period.

The variables are as follows:

  1. Nematode (three species)
  2. Chemical (four chemicals plus control)
  3. Dose (three doses... sadly the control was a negative control so dose entered as "0")
  4. Alive nematodes
  5. Total nematodes
  6. (Dead nematodes)
  7. (Proportion alive)
  8. (Proportion dead)

For each nematode - dose - chemical combination, n = 15 (three biological replicates of 5 each).

The average total number of nematodes nematodes per datum was 12.6 (STDEV 4.1)


What I am wanting to investigate is the effect of chemical and dose upon mortality, and to see what interaction the species has.


Problem 1:

Chemical type (four chemicals and control) and dose (three doses) are separate variables. I could split the analysis into separate tests for each chemical, but would it be possible to link the variables of dose and chemical together?

I understand a GLM can test multiple interactions, but would it be possible to compute into the test itself that datum X is not just a 10μl dose AND chemical A, but rather a 10μl dose OF chemical A?


Problem 2:

Comparing the nematodes brings up a problem in that the controls for each had vastly different mortality rates, with mortality rate seemingly correlating with susceptibility to chemicals.

There is only a sample size of three different nematodes (testing one batch of each), but would it be possible to differentiate if the efficacy of the chemicals is altered by just the starting control mortality, or if the species could also play a role?

I could take control mortality into account by inputting it as an extra variable for each species, but that feels incredibly clumsy and wrong.



Sorry for the long first post but any help, even partial answers, would be greatly appreciated!
 

hlsmith

Not a robit
#2
Whew. Don't accidently proliferate resistant nematodes, parasitic organisms mess with my head.


Well, this is the case of needing to have an analytic protocol before conducting the study. You have so many variables it is hard to count them all. And unfortunately secondary to that problem is that you will have a huge permutation of all possible comparisons, which should require correcting your alpha level due to the risk of familywise errors. Lastly, you have a dinky sample size and a binary outcome, which can be a little less forgiving than a continuous one where you can have big treatment effects, where you just have dead Y/N.


What was your plan on the front side to analyze these data, which will likely require a logistic regression? You can cherry pick a comparison, but would have to disclose your initial intention. In analyses things also get tricky when dealing with multiple version of an intervention. I am not sure how to direct you. You really have a daunting path ahead of you and after making corrections for pairwise comparisons, you will need striking results to maintain significance. I have not ran such an analytic project with some many subgroups, it may need to be addressed using multilevel logistic regression, but Power will be low no matter what you do. I would look to your field's literature to see if you can find a comparable study and what analyses they may have conducted. Though remember, just because it was published doesn't mean someone else's approach was correct either!!! It is easy to find poorly conducted studies in journals.
 
#3
Thanks for the reply. The initial plan was not to undertake regression, but rather simply to test individual chemical + concentrations for a significant effect. Since the partner study didn't turn up any synergy between nematode and chemical, this study was put on the shelf. A year and a half later, and the group is interested in the results but from a different perspective, so the aim is to gain as much comparitive data as possible from this study.

N = 15 sounds low, but with an average value of 12.6 and low variation, significance was not hard to come by. Doing a quick Anova + tukey test showed significant differences between most groups (even when not necessarily wanted... definite risk for type 1 errors).

It would be fine to split the data by nematode to reduce the number of comparisons. The main difficulty is combining chemical and dose combinations in a single test (is there an established protocol for this)?
 
#5
Maybe (or probably) I have missed something, but it seems to me that you have three explanatory variables:
- dose (5 levels including the value of 0)
- chemical type(4 levels)
- nematode specie (3 levels).

And you have one dependent variable:
- the number of dead (call it "y") out of n=15 individuals.

It does not matter if you prefer to look at the number alive individuals out of 15. (But for me it is easier to think of increasing dead proportions as the concentration increases.) (I believe that R A Fisher created PROBIT when an insect researcher (Bliss) used increasingly larger poison concentrations.)

Given dose, chemical and specie it is natural to assume that Y (out of n=15) is binomial distributed with parameter "p". This leads to logit (or probit or complementary log-log.)

"p" is the proportion dead, given dose, chemical and specie.

One can imagine a table with 5 concentrations as columns and 4 chemicals as rows and the proportion dead in each cell. Then there would be 3 such tables, one for each specie. (I guess you have done that a long ago.) I recommend that you make plots of this.

So I would suggest the following logit model:

log(p/(1-p)) = alpha + dose_j + chem_i + (dose_chem)_ij + spec_k + spec_chem_ik + spec_dose_ki + dose_chem_spec_ijk

with hopefully obvious nomenclature for main effects, two-factor-interactions (2fi) and one three-factor-interaction (3fi).

If one or several of the 3fi or 2fi can be deleted (i.e. is not significant), then the precision will be increased.

Maybe the dose variable can be use as "quantitative" regression type of variable (i.e. as a covariate) if the variable is linear, then you will only need to estimate one regression parameter instead of 4. That would also improve precision.

I believe that the comparisons you wanted to make can be done within this framework.

But probably I have missed something. :)
 

hlsmith

Not a robit
#6
Well laid out response Greta. I was also envisioning such structures. Though, as a person who uses a lot of binary outcomes, it gets difficult from a binomial standpoint when the outcomes can take on only so many values and you have to account for sampling variability given the sample size. I usually liken it to coin flips and the law of large numbers.


I will also mention that testing an interaction for significance and then throwing it out if not significant may be frowned upon since the final results are conditional on the entire model building process. This is why many try to use training and testing datasets or correct alpha given the process.


But once again the OP should be happy that you gave such a well structured response.
 
#7
As I said, I might have missed something. Maybe I and hlsmith expects different kind of data.

I believed that these data were from a balanced designed experiment so that with 5 doses, 4 chemicals and 3 species there would be 60 cells. And with 15 individuals in each cell that would make 900 nematods (5*4*3*15). That is not a very small sample.

Note that OP reported significance with anova and Tukey hsd (they are not really correct since they are based on the normal distribution, but they are not completely wrong).

But maybe hlsmith expected an observational study with lots of multicolinearity and also fewer observations. Then of course it would be more uncertain.

If the model building is such that the higher interaction effects are removed first, I believe that it is acceptable.

We will see what the original poster (OP) says.

(I fear that the biggest weakness is that the study was not randomized. Or was it?)
 

ondansetron

TS Contributor
#8
If the model building is such that the higher interaction effects are removed first, I believe that it is acceptable.
I learned this is an acceptable method. If the higher order is significant, don't test lower order interactions of those variables, for example, because those variables are important by definition. It's no different than X1*X2 interaction having a significant t-test (or chi-squared, etc) and then deciding not to test X1 or X2 individually since the significant interaction necessitates the utility of X1 and X2 in the model. This relies on logic and appeals to judicious testing policies.
 

hlsmith

Not a robit
#9
Would 60 cells mean, 59+58+57,...,+3+2+1 pairwise tests? Bonferroni correction: 0.05*3540=0.000014.

Yeah so if one group has near null kill rate and another has near total kill rate sure some should come out fine given chance and true effect. I am just trying to point out the proximal dosing or kill rates will require power.

No I wasn't thinking of multicollinearity, OP hasn't mentioned that. Randomization should negate other imbalances.

PS, insignificant interactions may just mean a failure to reject the null, not that there isn't an effect.
 
#10
Would 60 cells mean, 59+58+57,...,+3+2+1 pairwise tests? Bonferroni correction: 0.05*3540=0.000014.
No, all pair need not to be tested. (I am forgetting, but isn't the number of pairs 60*59/2 = 1770 ?)
But if the 3 species is evaluated separately then in each table there is 4*5= 20 cells, so 3*20*19/2 = 570 comparisons.

But once as it is concluded that there is an effect of dose, there is no need for pairwise comparisons.


Yeah so if one group has near null kill rate and another has near total kill rate sure some should come out fine given chance and true effect. I am just trying to point out the proximal dosing or kill rates will require power.
Yes, maybe there will be a wide interval the dose for for example LD50.

PS, insignificant interactions may just mean a failure to reject the null, not that there isn't an effect.
Yes of course. I would have preferred a bioeqivalence test for this but the standard textbook method seems to be to drop insignificant 3fi and 2fi:s.
 

ondansetron

TS Contributor
#11
PS, insignificant interactions may just mean a failure to reject the null, not that there isn't an effect.
I should have clarified that what I learned was for exploratory kinds of modeling/pure model building rather than something that has a theoretical justification (i.e. you're taking more guidance from testing and follow up validation than from prior knowledge). If something has theoretical justification or prior research that causes us to believe an interaction exists, there's no reason to test that interaction term in my mind because the theory justifies it. Just put the term in the model. We can reduce our exposure to Type I and Type II errors by avoiding unnecessary tests.
 
#12
Yeah, I listed permutations, while combinations would be fine for the total number of unique pairwise comparisons, since order should not matter.
 
#13
Thank you for the detailled replies! There has been a little confusion over the size of the experiment: the experiment tested four chemicals in three doses against a control, for a total of thirteen different treatments. With three nematodes species investigated, the total number of combinations was 39. For each combination, 15 counts were taken of nematodes within a "drop", averaging 12.8 nematodes per drop. The mean number counted per treatment was thus 169.2, and just for fun there were 6600 nematodes in the whole experiment (lots of counting).

Since there is no n=15 nematodes from which for each treatment measures the death rate, does that mean a logistic model would still work? - Another problem may be that the data is not linear as the chemical concentrations have diminishing returns (would non-linearity even be measureable with only three dose values?)

Forgive the cluelessness, but other than the binary outcome, is there a structural difference in how a logit model works compared to other types of regression? Are they better at modelling interacting independent variables? (I was expecting to go down the GLM route for this).

(I fear that the biggest weakness is that the study was not randomized. Or was it?)
I am sorry, could you clarify this?
 
#14
With the logistic you are dealing with the binomial distribution for the outcome. So each trial is an independent Bernoulli trial. Kind of like a coin flip.

I still don't follow the 12.8 calculation, can you show exactly how you get to that number?
 
#15
With the logistic you are dealing with the binomial distribution for the outcome. So each trial is an independent Bernoulli trial. Kind of like a coin flip.

I still don't follow the 12.8 calculation, can you show exactly how you get to that number?
I see, thanks, so does that make it a more favourable regression for grouped independent variables?

12.8 is just the average value per datum. 15 drops counted per nematode + treatment combination, each one averaging 12.8 nematodes.
 
#16
12.8 is just the average value per datum. 15 drops counted per nematode + treatment combination, each one averaging 12.8 nematodes.
I don't understand this. What are you measuring? 15 drops of what? Did you for each treatment combination take 15 drops and measured number of nematodes and checked how many of these were alive and how many dead?
 
#17
I don't understand this. What are you measuring? 15 drops of what? Did you for each treatment combination take 15 drops and measured number of nematodes and checked how many of these were alive and how many dead?
Yes exactly; the nematodes were in a liquid medium from which they were counted in 20 microlitre drops. The total number of nematodes, as well as the number of moving nematodes were counted.
 
#18
Let's see now. The treatment combination were:

3 doses,
4 chemical.

That makes a balanced factorial experimental design of 3*4 = 12 experimental conditions.
In addition 1 control with zero dose and thus no chemical. That makes 12 +1 = 13 experimental conditions.

You also had 3 species but they are not really treatments. (I don't know how to think about them but they will triple the number of nematodes.)

If you had randomized then you would have taken an experimental unit and assigned it to one of the 13 treatments. (To randomize means to do a lottery for where it should go.)

If you had picked one nematode individual and randomly assigned it to one of the 13 treatments and checked if it was dead or alive (a Bernoulli experiment) and continued like that until you had 15 nematods per treatments, then that would have been 13 binomial experiments and you could have evaluated it with logit (as hlsmith wrote above).

But I don't think you did it that way. Maybe you had a large group of nematods in 13 test tubes. If you had randomized each test tube to one of the 13 treatments it would have been a randomized experiment. But then you would have got an extra random term.

This is like if you exposes pupils to 3 doses of exercise. But if you are not choosing individuals to the training program, but you are choosing a school class randomly to each training program, then the experimental unit is not the individual but the school class. So instead of having say, 15 children times 3 classes = n = 45, you just have n=3 experimental units. If you randomize classes (test tubes) you will miss the extra variation that exists between the school classes (test tubes).

Now I believe that the experiment can be evaluated as mixed model=multilevel model with dose and chemical as fixed explanatory factors and the test tube as a random effect.

I hope the other ones will comment on this so that it will be a little bit more understandable.

Edit: Ricky, please inform us about what is correct and what is not correct of what I said above.
 
#19
Thanks for the reply, I have been learning various things about mixed models in SPSS. Still very confusing to understand all the different options but it feels like things are starting to come together!

You are correct in that there were separate tubes for the different treatments, in fact there were three biological replicates per treatment (one per five drops) for a total of 39 tubes per nematode species. I have included them as an extra variable (thankfully I inputted the data chronologically!)

The nematodes were taken from the same source though so there theoretically shouldn't be any difference (other than the chemical) between the tubes. Would tube still count as a random effect?

I have run a mixed model and included dose as a nested effect within chemical, although I am not entirely sure about whether the single "dose(chemical)" term is sufficient for the model. I have also included tube number as a random effect. I am not sure at this stage if I have missed anything from the statistical test, would there be critical parameters or functions that I have missed?

The interactions in the output are quite confusing. There are different significance values for all of the parameters, but I am not sure what interactions they are referring to. Ideally I would be finding how the chemicals compare against each other and the control.

The fixed effects output is below:
 

CowboyBear

Super Moderator
#20
Sorry for the delay releasing your posts - they were caught in the spam filter. I've deleted the two first ones and kept the last, since it looks like you tried three times to post without success. Sorry about that - the filter sometimes flags posts from new members that include attachments.