What alpha spending formula to use for chi square test of independence ?

#1
Hi all,

For a website CRO (conversion rate optimization) trajectory project of a client, I’m trying to setup a way that they can sequentially check whether an A/B test ‘reached’ significant results (i.e. Does website B lead to more transactions, than website A?). Much like clinical trials, data is coming in over time and the sooner a conclusion can be drawn the bigger the impact. I’ve been reading some papers on this subject (e.g. INTERIM ANALYSIS: THE ALPHA SPENDING FUNCTION APPROACH by DeMets and Lan, 1994). There, if I understand correctly, they mention some formulas for the alpha spending function to make sure that the type I error stays 5%, no matter how often you check the results in between. However, it is unclear to me if you can use these formulas also for chi square test of independence (the test statistic used by our client) since in rest of the paper they mention the Standardized Normal Statistic (i.e. Z-value for group sequential boundaries, which I’m not sure if this is the same statistic as in a T-test? ). Also, I see in the literature that the Z-test (which, I think, is not similar to the Z-value I mention before) is often used for this kind of questions but I’m not familiar with them.
So, my main question is:

Is there an alpha spending function that can be used for chi square test of independence?

In addition, I was wondering:

When should you use chi square test of independence and when a Z-test? What are the benefits/down sides etc?

Thank you all in advanced! Any help is highly appreciated.
 

hlsmith

Less is more. Stay pure. Stay poor.
#2
Not my area, but many trials may be based on risk differences. These can be calculated using the type of data you have, plus they would be way more interpretible. You should find an alpha spending formula for that and use it. I would be interested in what you find or deploy. Risk difference is just the difference between the prevalences between the two groups!
 
#3
Not my area, but many trials may be based on risk differences. These can be calculated using the type of data you have, plus they would be way more interpretible. You should find an alpha spending formula for that and use it. I would be interested in what you find or deploy. Risk difference is just the difference between the prevalences between the two groups!
Thanks for the comment! Do you mean the boundary at which the test statistic is significant? I would be happy to use that as well. I was reading on this a bit (here for instance https://blog.optimizely.com/2015/01...he-story-behind-optimizelys-new-stats-engine/ and in this whitepaper: https://www.optimizely.com/resources/stats-engine-whitepaper/) but so far I haven't been able to make it work. If I found something I'll let you know.
 

hlsmith

Less is more. Stay pure. Stay poor.
#4
Yeah, I haven't found an example for you- but you usually state a difference that you think is contextually relevant. So say the group of interest has to be 10% better than the control. You can power that study and figure out the sample size, but you can have an early stopping rule that take into account maintaining the alpha level (yes, statistical/ level of significance) and the early small samples size. So after 50% of the planned sample has been seen you run the analyses early, but the size has to be even more significant to be able to generalize out and still hold beyond chance to the true sample size selected a priori.
 
#5
Yeah, I haven't found an example for you- but you usually state a difference that you think is contextually relevant. So say the group of interest has to be 10% better than the control. You can power that study and figure out the sample size, but you can have an early stopping rule that take into account maintaining the alpha level (yes, statistical/ level of significance) and the early small samples size. So after 50% of the planned sample has been seen you run the analyses early, but the size has to be even more significant to be able to generalize out and still hold beyond chance to the true sample size selected a priori.
That is exactly what I need, and I found some alpha spending functions that work for T-tests (i.e. the type-I error stays 5% if you repeatedly check the result) but it seems for chi-square test of independence it does not hold ( I made some simulations and I get around 10% error with the chi-square test if I check each test 100 times). If I'll find something I will post it here!
 

hlsmith

Less is more. Stay pure. Stay poor.
#6
I look forward to any link you post. Why are you focusing on chi-sq, which isn't that interpretable. Why not rate difference with confidence interval?