I am examining populations (cultures) of yeast cells and scoring them for a particular trait. The data is categorical; within a population, a yeast cell either has the trait or it does not. I wish to compare two populations -- two genetically different strains of yeast -- to see whether there are differences in the proportion of of cells with the trait.

First question: would a two-proportion z test be the way to go here?

Second question: If so, in all of the examples I’ve seen the samples are taken from one “replicate” of each population. Working with yeast, I can set up multiple independent populations (cultures) for each particular strain (in my experiment I sampled four independent cultures for each of the two strains I examined -- scoring several million cells in each sample). Obviously in humans or many other populations such “replicates” are not possible. But for me they are. So my question is this: do you think it is valid to pool the data from all four replicates of each population? Or would it be best to show the calculation from one “representative” population for each?

I am not a statistician nor do I have good tools beyond Excel to do these calculations, so I am looking for a robust yet manageable approach to analyze my data. I can handle a Z test if that is the way to go, but if it's not a valid approach I would love to hear alternatives!

Thank you in advance!