Hi all,

I have a set of frequencies for each of the 16 possible nucleotide pairs at a given position in a sequence. I want to test whether any of these frequencies are significantly higher than others at these positions.

So I have positions 1-N (I was going to only go to N = 10, but I could go up to N=40) with the frequency of each nucleotide pair at that position (AA, AT, AG, AC...etc.).

I want to find out if, say, the frequency of AA at position 1 (0.98) is significantly higher than any of the other 15 frequencies I have for position 1.

I guess my confusion stems from the fact that I'm not exactly sure whether I have 1 dependent variable (position) and 1 independent variable with multiple groups (i.e IV = nucleotide frequency, and levels are the particular pair) or if I have 16 independent variables, or what. So I'm having trouble figuring out which test to use. I only tagged SPSS because that's the program I have available to me at the moment.

I did try looking at the Handbook of Biological Statistics page, but again I'm still confused about the variables. I was trying to test all the frequencies for each position, for all positions at once (using some kind of Analysis of variance) because I wasn't sure that just testing the frequencies at 1 position separately for all 10 positions was a valid way of going about it (kind of like doing multiple T-tests on the same data set?).

I was also wondering whether this would factor into deciding on a test: I cannot say that the frequencies of a nucleotide pair at the first position are entirely independent of the nucleotides at the second position, because there may be a biological significance for this.

Any help would be greatly appreciated, thanks a lot.