Somebody please help me

I posted a few times over the last month or so. I'm really hoping someone can at least leave a comment (to one or all three posts). We have data collected over the last year. Each month, we get some number of observations bucketed into a test group or a control group. Each month, we have been observing averages between the two groups (but no formal statistical test to compare the averages). Now, I have somewhere on the order of 60,000 observations in the control group and 4,500 observations in the test group. I want to compare the averages between groups, but the large difference in sample sizes scares me.


Less is more. Stay pure. Stay poor.
Would you want to do a bunch of cross-sectional ttests, repeated tests, some type of time series? Given the sample size i dont think the group sizes is a big concern. You could always run your analyses and then rerun them just using a random subsample of the majority group sized equally the the minority group.
@hlsmith Thanks for your reply. I'm thinking a t-test of sorts. You brought up something that I've thought about. Essentially, we have all this data which I believe is an honest view of the population over a year timeframe. If I plot histograms of the two groups I get right-skewed distributions (similar to how income is distributed). I was thinking about randomly sampling a number of these observations from the two groups and running a two sample t test for the difference in means? I feel like as long as I document what I do and have a reason for it, it will be fine. Do you have any additional thoughts? One of my concerns was running a t test for samples of size 60,000 against a sample of 4,500.


Less is more. Stay pure. Stay poor.
I not sure what issues exist with such imbalanced data in ttests? What is the purpose? I may recommend using all the data and doing quantile regression. It allows you to examine any aspects of their distributions. How do observations end up in groups, are they assigned?
Something else you might want to consider. With samples this big, even tiny differences may produce statistically significant results - differences which are so small they may be quite unimportant in real life. Perhaps you might like to determine how large a difference needs to be to be considered "significant" in a practical situation.
Yes, these are all good points. Some customers have access to an amenity (test group) while other customers do not use this amenity (control) and opt for a more traditional process. We observe that the amenity is helping improve efficiency of our business (measured by an average) relative to customers without this tool. Many observations are coming in real time (daily). Think of it like ordering pizza online and then picking it up versus going to the pizza shop, ordering, then driving back home after the shop makes the pizza.
The user has to sign up to use the service in the first place. I agree that there may be some special quality about these users. But, we've tried our best to make the treatment and control have the same group characteristics except for the use of this service. I think the way these observations come in can be thought of as a random sample from the population of all people that have this amenity. Soon, the service will be available to all of our customers. So, we are trying to show that the use of this service helps our employees work more efficiently.


Active Member
I agree with @hlsmith: use all the data + t-test + quantile regression. Precisely because the group sizes are so big, the Central Limit Theorem guarantees validity of t-test. You may also run two-sample Kolmogorov-Smirnov test to see if the distribution in the test group is the same as that in the control group.

If the difference in means is statistically significant, you may quantify the effect size with Cohen's D or R-square. This would help in addressing the effect size issue that @katxt mentioned.
Yeah. We have calculated sample sizes. But, have yet to run a formal test. By the way, is anyone familiar with power.t.test in r and how it is calculating sample size? I know there are a few approaches. I have an off-shoot question related to calculating sample size for a one-sample t-test. I'm reading from "Biostatistical Methods" - Lachin. My problem is not in a biostat framework. I'm just reading about the formula.

N = ((Z * sigma)/E)^2 where Z is the appropriate quantile for a given alpha level. E is the desired margin of error. This formula does not take into account power. Can someone elaborate on this formula?


Less is more. Stay pure. Stay poor.
What is the specific original question? I would still think quantile treatment effects would be a good approach, it allows for propensity scores to address background imbalances.
So, I took over owning a project in which the prior person calculated sample sizes for a one-sample t-test (less of a background in stats than me). Now, I'm looking back at it and wondering why we aren't directly comparing the two groups with a difference in means. Think back to the pizza example. Is method A faster than method B from a statistical standpoint? That's essentially the question.