I hope that this is the right place for such a question.
What test to use to detect if the selected sample has a bias or not?
We have a protein that naturally binds with certain short known DNA sequences (let's say 5 nucleotides each). what I want to show, is that from all possible 5-nucleotides sequences in nature, this protein selected these ones because of how often these occur in gene coding regions.
So what I did is count how many times each 5-nucleotide sequence appears in all genes, then compared the mean of all of them to the mean of the one used by the protein to suggest that the protein favored these because of how often they appeared. and the difference is significant.
I have used ANOVA (specifically Welch ANOVA, as the sample and the population are unbalanced. and also the variances are different. Population variance is 15.0842 and the sample variance is 181.3176). P-value was extremely small which means rejection of the null hypothesis of equal means. is this correct? or should I look into a different measure?
so is my approach correct or not?
Thank you for your input and time.
What test to use to detect if the selected sample has a bias or not?
We have a protein that naturally binds with certain short known DNA sequences (let's say 5 nucleotides each). what I want to show, is that from all possible 5-nucleotides sequences in nature, this protein selected these ones because of how often these occur in gene coding regions.
So what I did is count how many times each 5-nucleotide sequence appears in all genes, then compared the mean of all of them to the mean of the one used by the protein to suggest that the protein favored these because of how often they appeared. and the difference is significant.
I have used ANOVA (specifically Welch ANOVA, as the sample and the population are unbalanced. and also the variances are different. Population variance is 15.0842 and the sample variance is 181.3176). P-value was extremely small which means rejection of the null hypothesis of equal means. is this correct? or should I look into a different measure?
so is my approach correct or not?
Thank you for your input and time.
Last edited: