Sample Size for Case-Control Study

We conducted a case-control study to answer the question:
  • Are newly diagnosed HIV+ personnel more likely to have an Meningoencephalitis (ME) diagnosis 2 years prior to HIV+ test than those who are HIV-?
We wanted to perform a case-control study and used the calculation in the attached file. Using alpha = 0.05, beta = 0.20, p1-p2 = 0.003, and a ratio of controls to cases of 1000:1, we came up with a sample size of 2608 cases.

When we pulled the data, we only had 2240 cases. We gave the customer a report with the raw numbers but said we couldn't perform the statistical analysis due to not meeting the required sample size for cases.

The customer takes the raw numbers and calculates a Risk Ratio as follows:

27 cases of HIV in the ME (exposure) group of 3869.
2,213 cases of HIV in the no ME group of 2,587,299

RR 8.2 (95%CI 5.59 to 11.92), P<0.0001

Is it statistically kosher to make this RR calculation given that we didn't meet our sample size for the case-control study?

Thanks for any guidance!




Less is more. Stay pure. Stay poor.
Traditionally, odds ratios are reserved for retrospective and with case-control studies. Risk requires understanding the incidence in the sample. Do you really know the directionality, I would think ME would be the opportunistic infection in HIV, Not HIV number in ME.

Thanks for the reply! I'm really not a biostatistician, I just play one at work. The epi who wrote the analysis plan for this study noted that with the rare-disease assumption, the RR and OR are essentially the same. According to the data we collected the directionality seemed to point that way. 23 had an ME dx in two years prior to HIV+ dx while only 2 had it the other way.

Would you have any reservations about the customer's calculation of RR?


Less is more. Stay pure. Stay poor.
Well OR can approximate RR when rare outcome assumption holds, but that doesn't mean you can just use RR. It just means the ORs will be close to the RRs.

I also think given your question, ORs are more appropriate since they are implying a possible association but not directional risk. I say this, because can you be certain that these patients were not HIV+ at the time of ME diagnosis?
I was thinking HIV usually takes 10 years to present itself.
Thanks again hlsmith!

The thinking is that an ME dx may be an early indicator of HIV infection. So the methods section says:

We isolated all HIV+ diagnoses, while noting the total HIV- diagnoses, from 2009–2017. To capture incident cases from 2011–2017, we established a washout period from 2009–2010 and removed HIV+ cases, leaving us with incident cases of HIV+ that could be identified with our lab data. We classified false HIV+ cases as those diagnosed as positive but then later diagnosed as negative, correcting these individuals to be HIV- for our analysis. We retained the latest relevant, accurate HIV+ date while keeping the earliest relevant, accurate HIV- date. We considered only those HIV+ diagnoses with their most recent HIV- test result as being before HIV+ seroconversion.

We then flagged ME diagnoses from inpatient and outpatient settings using encounter data and claims. We coded an algorithm to enumerate clinically relevant ME diagnoses as occurring within ninety days of a previous ME diagnosis, only retaining incident ME+ cases. If the initial ME diagnosis was documented in an inpatient setting, we did not require a similar 90 day validation to be a ME+ case.

We combined relevant HIV+ and ME+ data by individual member. We determined which individual had comorbid ME+ and HIV+ diagnoses, and whether an individual's ME+ diagnosis occurred before or after their HIV+ seroconversion. We also classified individuals whose ME and HIV seroconversion occurred on the same day.

And our conclusion was:

Because our original sample size calculation to generate enough power for meaningful results was 2,608 HIV+ cases, we could not adequately answer the question of whether ME or another similar diagnosis was associated with early indication or presence of HIV using a case-control study design. This is because our very small sample size did not support statistical analysis or conclusion. The small sample size also limited us from calculating reliable epidemiological measures of association.

And my question is: Is it statistically invalid to then use our data to calculate a RR as the customer did?


Less is more. Stay pure. Stay poor.
I think so. Don't hold me to that, but I fell whenever you use C-C design you are suppose to use ORs. Of note, what could also be model would be including time between dx.