Comparison between two sets of Log2 values

#1
Hello,

I am trying to comparing two sets of data on a Log2 scale. Each group (A and B) has 8 biological replicates and the value in each rep is the geometric mean off of two technical replicates. My usual go to would just be a t-test, but would I apply a different test if they are logarithmic values?


A
1024
1448
4096
2048
181
1448
1448
2896

B
4096
4096
2896
2896
1024
1448
1448
1448

Thanks,
04niceck
 

obh

Active Member
#2
Hi NK,

I didn't understand the following: "each rep is the geometric mean off of two technical replicates."

How does the log(x) distribute?
Or what about a rank test?
 
Last edited:
#3
Hi obh,

Thanks for the response. Let me try to clarify it a bit more. The values are not normally distributed and the values all fall between 4 and 4096. The values are always fixed on a log2 scale so can only be 4, 8, 16, 32, 64, 128 etc.. The values in each group are the geometric mean based off two reps, so the 1448 comes from the geomean of the fixed values of 1024 and 2048.

Would I do spearmans rank if the data is non-parametric?

Thanks,
Nick
 

obh

Active Member
#4
Hi Nick :)

Why did you choose a geometrical mean of two replicates? (don't know bio)

The values are not normally distributed and the values all fall between 4 and 4096
May the logs distribute normally?
Actually, I didn't see a big difference between t for values (p-value=0.347197) or t for the logs (0.279934)
I expected MannWU to be similar but got (0.441803)

Is the data you paste an example or real data?

Is there a dependent between the two groups? like doing it on the same subject before and after?

About the rank tests, I thought about Mann-Whitney U test if independent or Wilcoxon Sign-rank id dependent.
 
Last edited:
#5
Hi Obh,

I think we use geometric mean because they are values on a set scale: 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096 and so we just take the median value between two replicates. Other than that I'm not entirely sure why, it's just the procedure we follow.

This is real data and I don't think the logs distribute normally, the middle values are not the most common. The samples are independent from each other. The 1st sample in set A is a completely different samples to the 1st in set B. Should I try the MannWU test then?

I hope this helps and thanks for the help,

04niceck
 
#7
The logged values look more or less normal in each group so the t test is ok. The ttest doesn't mind how you got the numbers so long as the assumptions are true. Either way, no significant difference has been established.
 

obh

Active Member
#8
Hi Katxt,

The Shapiro Wilk test doesn't reject normality but it is very close (p-value are 0.161191 0.101380) and the sample size is small. (hence small power to reject normality)
So probably the data doesn't distribute normally.

On the other hand, reasonably symmetrical is sufficient for the t-test.
(one potential skewed (skewness -1.573933) and one potentially symmetrical (skewness 0.0336383)

Since the sample size is very small and the distribution is probably not normal and maybe not symmetrical
I assume better use the Mann Whitney U non-parametrical test. (if independent groups )

Actually, I expected to get a similar p-value for both tests. , but it is not the case.

@Karabiner what do you think?
 
#9
A permutation test gives p about 0.36 so again no difference has been established. There may well be a difference but the samples are just too small to establish this with any confidence using any test.
 

obh

Active Member
#10
Hi Katxt,

Clearly, both methods are insignificance.
But the question is which test is more correct to use (what do you do if one is significance and one insignificance...)

In parallel, the power is very low ... so with some assuptions, you need a sample size of 53 per each group for medium effect size and power of 0.8

PS Mann Whitney gives 0.41 (if independent groups )
Or do you use a more accurate method (especially with ties)


R results:

> A <- c(1024, 1448, 4096, 2048, 181, 1448, 1448, 2896)
> B <- c(4096, 4096, 2896, 2896, 1024, 1448, 1448, 1448)
> wilcox.test(A, B, paired = FALSE, alternative = "two.sided", mu = 0.0,

+ exact = TRUE, correct = TRUE, conf.int = TRUE, conf.level = 0.95)



Wilcoxon rank-sum test with

continuity correction

data: A and B

W = 24, p-value = 0.4154

alternative hypothesis: true location shift is not equal to 0

95 percent confidence interval:

-2048 600

sample estimates:

difference in location

-424
 
#11
Thank you for everyone's input so far. I didn't understand when obh asked if the Logs themselves were normally distributed because the data itself is a serial dilution on a Log2 scale. If I transform the values to Log2 I get:

A
10
10.49984589
12
11
7.499845887
10.49984589
10.49984589
11.49984589

B
12
12
11.49
11.49
10
10.49
10.49
10.49

I just don't know whether this assumes a normal distribution now, I think it does. So if I do a parametric unpaired t-test I get:


Column B
vs.
Column A

Unpaired t test
P value 0.2749
P value summary ns
Significantly different (P < 0.05)? No
One- or two-tailed P value? Two-tailed
t, df t=1.136, df=14

How big is the difference?
Mean of column E 10.44
Mean of column F 11.06
Difference between means (F - E) ± SEM 0.6250 ± 0.5500
95% confidence interval -0.5546 to 1.805
R squared (eta squared) 0.08446

F test to compare variances
F, DFn, Dfd 3.015, 7, 7
P value 0.1687
P value summary ns
Significantly different (P < 0.05)? No

Data analyzed
Sample size, column E 8
Sample size, column F 8

Would this be the right way to do it? Either way the data is insignificant. My stats knowledge is mediocre and the analyses mentioned above are probably more accurate but just wanted to throw this out there.

Thanks,
04niceck
 

obh

Active Member
#12
Hi Nick,

Thank you for everyone's input so far. I didn't understand when obh asked if the Logs themselves were normally distributed because the data itself is a serial dilution on a Log2 scale. If I transform the values to Log2 I get:
I used base 10, but it doesn't really matter, the Shapiro Wilk test gives the same results. And the t-test as well.
You can read above what I wrote about the normality/symmetrical

I prefer the Welch's t-test.

I think the non-parametric is better, but you can argue about this.

Cheers,
 
#13
This is a little surprising to me. Using the original data as given the t test residuals are as normal as one could expect, so the t test should be fine. Residuals with raw data.jpg I'm not sure about the 181. kat
 

obh

Active Member
#14
Hi Kat,

Per Shapiro Wilk test the original data is not normal (p-value is 0.0368879,)
You should also look at the histogram

1564622250376.png
 
#15
It isn't the whole data set that needs to be normal - it's each group must be normal about its mean and the SW p values for those do not reject normality (I get roughly 0.5 and 0.2). (I can't get 0.036 above). However the exercise is largely futile at sample sizes of 8. The fact is that you just can't tell if the raw data is from a normal population or not. Come to that, you can't tell if the assumptions for a non parametric test are true or not. A permutation test assumes little more than the data are independent. If the numbers all come out of the same bag, how likely are we to pick two sets of 8 with means as far away as we see in our data? In this case, it is quite likely p = 0.36 or thereabouts. Conclusion - there may be a difference, but there is no real evidence of it here. kat
 

obh

Active Member
#16
Hi Kat,

Surely it isn't all data set that needs to be normal together! I run SW test yesterday for each group separately (see above), but for the logs .. : (p-value are 0.161191 0.101380). I did it together just because I thought this what you did looking at your chart with 16 points ... just to understand what you did...

PS, the plain numbers get better normality results and symmetrical results than the logs .....:)

In such a small sample, it would be difficult to reject normality, so you need to go back to the logic, do you expect this data is coming from a normal distribution or symmetrical distribution? if the answer is yes, go to t-test, otherwise MW U test.

you can't tell if the assumptions for a non-parametric test are true or not
THe is no test that replaces a good practice of experiment ...GIGO
But definitely, the non-parametric test has fewer assumptions than the parametric test (with the price of reduction in power)

Conclusion - there may be a difference, but there is no real evidence of it here.
Correct, but it seems that Nick didn't plan the sample size before doing the experiment.
The power to reject H0 is very low (less than 0.2 for medium effect)

Before doing the experiment you should decide what effect size you want to identify, and what sample size will gain you a reasonable power (usually 0.8)
 
#17
OK. I think that this topic is probably pretty well exhausted. Your comments about the design of the experiment are all true and good advice.

(Incidentally, my chart with 16 points is not the data. It is of the residuals after the model has been fitted.)
 
#19
Thanks for all your input, the reason for the small sample size is that these are animal serum samples and they cost a ludicrous amount of money. 8 per group is the max.


Thanks,
Nick