# what is the effect of changing number of replicates in logistic regression?

#### germinator

##### New Member
Hi, I am running a trial that was set up 2 years ago. I am trying to detect if there is an effect of time on germination % and viability of seeds. I started with 3 replicates (petri dishes) of 50 (150 seeds) and collect data once a year. However, since starting the trial I feel that this number of seed may not accurately represent the true variation within the collections I am testing (bulk seed collections). I would prefer to test 8 replicates of 50 seed (400 seed). In reality there is pseudoreplication happening as I have the replicates all within one germination cabinet (don't have spare cabinets to use as actual replicates)...so essentially I could analyse the data as binomial and treat the seeds as individual replicates.

Can someone please let me know:
(a) Should I keep running the trial using the original number of seeds i.e. 150 or can I increase the number to 400 mid way through the trial?

(b) should I analyse the data using the replicates of 50 seeds (% data) or with seeds as individual replicates with binomial data (1- healthy, 0- dead)

Any help would be appreciated!!!

#### GretaGarbo

##### Human
I am trying to detect if there is an effect of time on germination % and viability of seeds. I started with 3 replicates (petri dishes) of 50 (150 seeds) and collect data once a year.
Do you have 50 seeds in one petri dish or do you have one seed per petri dish, thus 150 petri dishes?
Is there any other treatment so that there is a difference between the dishes?

When you "collect data once a year" do you then check how many have germinated? It is not that you have checked the time until it germinated (like checking every week or day)?

Sorry I am not so familiar with your experimental conditions.

Why do you want to increase it to 8 "replicates"?

#### germinator

##### New Member
Thanks for your reply. It is 50 seeds per petri dish, as that is the maximum I can fit in the dishes. The seeds are taken from a bulk collection (i.e handfuls from top, middle and bottom of bag), then mixed well before taking a random subsample to use in the dishes. I.e. there should not be any effect with the varying dishes.

The "collect data once a year" refers to removing seeds from storage once a year and checking to see if the germination and viability alters from the previous year/s. I do record germination once a week but the analyses are done on the total final germination recorded at the end of the experiments.

I would like to increase the number of seeds being tested as some of the species in the trial have >100,000 or even >200,000 seeds/kg, with >100kg in storage, I do not think that testing only 150 seeds (3 replicates of 50) would give a true representation of the variation that may occur. This has been backed up by running analyses on additional data that should not alter across time, with results coming back as significantly different for some species (ANOVA on means of 3 replicates of 50 seed). The increase to 400 seeds, or 50 seeds x 8 replicates may still give me some variation but I can't really increase the amount further due to the extra effort required across all collections.

#### GretaGarbo

##### Human
Aha, so the purpose is to see what is the effect is of storage in a granary. (I thought first it was about some treatment in the germination process.)

I can't see anything wrong with increasing the number or replicates.

The usual thing is that one considers the sample size n as a fixed constant. If you increase the sample to 8 it would still be fixed. I looked for what could be a resonable germination rate (it seem to be more than 90%) and I found this. In that link they suggest as one possibility that if the germination rate is lower than 90 then take an other sample.In that case the sample size would not be fixed but random, it would be a result from the observation. Then the statistical inference would be much more complicated - it would be a sequential statistical testing. Frankly, on the spot, I don't remember how to evaluate such a result. (If you just treat it as if the sample size is fixed, and you get a value that you "don't like", then you take another one that maybe suit you better. To treat it as a fixed sample size would be like cheating, but there are objective sequential statistical methods for that.)

I did an R program so that you can simulate how the results varies when increasing from 3 to 8 replicates. The precision increases, but not that much. It is just based on the assumption that the germination rate is constant and the same on the the replicates so that it is binomial distributed with the same parameter. R and RStudio is free that you can download and run one row after another.

Code:
# germination test

p      <- 0.93 # probability to germ (one individual seed)
n_seed <- 50   # number of seeeds per petri dish

repl <- 3 # number of replicates (no petri dishes per year)

# rbinom(n, size, prob)

y <- rbinom(n=repl, size = n_seed, prob = p)
y
p_hat <- (sum(y)/repl)/n_seed
p_hat

set.seed(170510) #this is another form of seed, a start for random numbers

#### simulation with 3 petri dishes and 10,000 "years"
repl <- 3 # number of replicates (no petri dishes per year)
p_hat <- numeric()
simul <- 10000 #numb of simulations (numb of "years")

for (i in 1:simul){
y <- rbinom(n=repl, size = n_seed, prob = p)
p_hat[i] <- (sum(y)/repl)/n_seed
}

mean(p_hat)
sd(p_hat)
mean(p_hat) + c(-1, 1)*1.96*sd(p_hat)
#  0.8888088 0.9707419  so it can at least vary from 88.8% to 97.1%

hist(p_hat)

#### simulation with 8 petri dishes
repl <- 8 # number of replicates (no petri dishes per year)
p_hat <- numeric()
simul <- 10000 #numb of simulations (numb of "years")

for (i in 1:simul){
y <- rbinom(n=repl, size = n_seed, prob = p)
p_hat[i] <- (sum(y)/repl)/n_seed
}

mean(p_hat)
sd(p_hat)
mean(p_hat) + c(-1, 1)*1.96*sd(p_hat)
#[1] 0.9043768 0.9557072 (most values from 90.4% to 95.5%)

hist(p_hat)

#### germinator

##### New Member
Thanks, I am running the trial to verify that some species do not survive in controlled storage conditions- hence I expect that the germination % and viability (as described in your link) to decrease with storage time (germination rate does not stay constant).
So are you saying that there is no real need to increase the number of reps to 8...and should I be using the dishes are the reps or individual seeds with binomial regression?

#### GretaGarbo

##### Human
So are you saying that there is no real need to increase the number of reps to 8
I did not say that. It is just up to you what precision you want. The larger the sample size, the smaller standard error.

But please remember that the size of the population does not matter, for the needed sample size. You need the same sample size for a poll in Iceland (300.000 inhabitants) as in USA (300 millions inh.)

What matters most is how many years you can stretch out the investigation to.

...and should I be using the dishes are the reps or individual seeds with binomial regression?
If the probability to germinate (p) is the same among the 3 or 8 dishes it does not matter if you consider them to be 3 samples with 50 in each or 1 sample with 150.

But there might be a small environmental difference between the the dishes like there could be a difference of 1% between the dishes, like p+epsilon, where epsilon is normally distributed with zero mean and a standard deviation of 1 percent. Then it could be evaluated with logistic regression with over variation (quasi likelihood) (in the standard packages). Then the variance would be slightly larger than the usual (p*(1-p)/n).

Thanks, I am running the trial to verify that some species do not survive in controlled storage conditions- hence I expect that the germination % and viability (as described in your link) to decrease with storage time (germination rate does not stay constant).
So you want to: verify that some species do NOT survive?

Then you believe that the germination rate falls quite fast? Like it is falling like an exponential curve:

p = a*exp(-k*time)

#### germinator

##### New Member
Hi again,

I will include the over variation in the logistic regression in case there are any differences within the dishes.

Yes, I am expecting that some species do NOT survive, and that it occurs within a few years. I am intending on running the trial for up to 5 years but may be shorter for some species, given that between t0 (fresh seed) and t1 (1 year old seed) there was a 50% decrease.

I have now set up this years tests and continued with the 150 seed to make things simpler when writing it up for my client.

Cheers.