My test involves a sample of credit card customers that have defaulted, and I'm using a test vs. control treatment to see if my treatment shows any statistically significant improvement in recovering the defaulted debt. From the calculator I cited above, I can tell what sample size I need to test a binary event, such as percentage of payers (i.e. anyone who pays $1 or more). However, when I want to know what sample size I would need to capture a statistically relevant difference in a metric that is not binary, such as recovery rate based (total dollars recovered / defaulted debt), what variable should I care about for my sample size?

Here is a numerical example to make my summary above more clear:

I have $1,000,000 worth of credit card debt defaults in my test group and $9,000,000 of defaults in my control group. Historically I see that I recover 1% of defaulted debt within the first month of default, which is how I expect my control group to perform. I want to measure if a new strategy deployed on my test group can show an improvement of .1%, i.e. I want to be able to tell if a 1.1% recovery rate is statistically significant vs. my control. This means I need a relatively large sample size to determine a very slight difference in the proportion. The $1,000,000 of defaulted debt in my test group comes from a sample of only about 5,000 accounts, and the $9,000,000 comes from a sample of 45,000 accounts. So, when I'm determining how big my sample size should be to accurately read significance from my test, is the sample size of my test and control groups based on the number of accounts (5,000 and 45,000 in test/control), or is it based on the number of dollars in each group ($1,000,000 and $9,000,000 in test/control)?

Since I'm measuring the recovery rate, which is based on total dollars, I thought it seemed strange to use number of accounts to determine minimum sample size needed to measure a metric that is not only just measuring the proportion of accounts that pay but further the magnitude of the payments, which is not a binary concept. Can I consider each $ in my test and control group as my sample size?