# Thread: Calculating control group in chi square test

I'm trying to compare the prevalence rate of autism in offspring of mothers with and without gestational diabetes. I gathered data from 1000 women, 8 children have autism (prevalence rate 8/1000). Unfortunately, due to lack of capability to gather data from healthy women as control group I was forced to use epidemiological information for general population in Poland, which is 5,2-8,6/10000.
I want to perform chi square test but I'm not sure how to calculate the prevalence in a legitimate way. Should I extrapolate my outcome to 80/10000, calculate mean popultion prevalence (7,9/10000), round it up to 8/1000 and then put the values in the contingency table? Or should the control group be as big as my experimental group (1000) and the prevalence should be then rounded up from 0,69/1000 to 1/1000? Or should the control group be as big as 1450 so as not to round up (1000/0,69=1450)?

Are the exposed group Polish and how do you know the true GDM status for the nonexposed group? This is very ecological and seems like it would not provide and legit conclusions regardless of your approach.

Yes, patients with GDM are from Poland, too.
Concerning the prevalence of GDM in the unexposed group - I don't have any specific information about it, there are some estimations based on studies from the 90's that 3,4% of pregnant women have GDM.
I know the study design may have some limitations, but I'm still studying and we have limited acces to documentation, time and workforce - we were not able to survey another 1000 healthy pregant women.
As for the comparison of the prevalences - I consulted a couple of doctors at my university (one of them dealing with statistics and research regularly) and they confirmed that under the circumstances, it is acceptable. I just wanted to ask for advice on the size of control group I should use in chi square test because it's a dilemma I got after talking to them.

