Hi,
We are commonly have a problem where we do A/B test over a population of users but it seems like population sometimes differ even before the tests starts.
So let's say I have over multiple days the KPI per group (control & test), how would you test for the treatment significance?
This is like a twoway design because I have both before/after test starts and both control/test.
Data can be in user level or aggregated.
The data looks like this: kpi_group(t) where t<start is before and t>start is after and kpi is the kpi and group is control or test.
In some cases before/after are the same users so maybe a paired test is due?
currently what I do is:
1. if it is paired (same user): I compute the difference avg_kpi(after)avg_kpi(before) and then test with Mann–Whitney U test if the difference in test is larger than in control.
2. if the data is not paired, I compute the daily difference between test and control kpi_test(t)kpi_control(t) and then run Mann–Whitney U test on after vs before.
Maybe a Bayesian test is due? (yet I hate the fact I need to come up with arbitrary priors)
see screenshot which shows how wilcoxon test before shows significance between control and test.
We are commonly have a problem where we do A/B test over a population of users but it seems like population sometimes differ even before the tests starts.
So let's say I have over multiple days the KPI per group (control & test), how would you test for the treatment significance?
This is like a twoway design because I have both before/after test starts and both control/test.
Data can be in user level or aggregated.
The data looks like this: kpi_group(t) where t<start is before and t>start is after and kpi is the kpi and group is control or test.
In some cases before/after are the same users so maybe a paired test is due?
currently what I do is:
1. if it is paired (same user): I compute the difference avg_kpi(after)avg_kpi(before) and then test with Mann–Whitney U test if the difference in test is larger than in control.
2. if the data is not paired, I compute the daily difference between test and control kpi_test(t)kpi_control(t) and then run Mann–Whitney U test on after vs before.
Maybe a Bayesian test is due? (yet I hate the fact I need to come up with arbitrary priors)
see screenshot which shows how wilcoxon test before shows significance between control and test.
Attachments

30.8 KB Views: 0
Last edited: