Hello,

I would like to explain to students the chi²-distribution in an inutitive way but I don't know how to do it. All the videos that I watch explaining the chi² distribution do it in a mathematical way. So my question is, how could I explain to the students why the chi² distribution can be used to determine if there is a significant different between the number of boys and girls in a survey f.e.? Why does the sum of different of squared standard normal variables lead to that information?

I would really appreciate your help.

Thank you.

In my opinion, what you need to do is connect the discrete binomial distribution (boys and girls) to the normal distribution via the central limit theorem. After that, then empirically show that squaring and summing independent standard normal numbers are chi-square distributed.

For example, if you square a single standard normal number Z^2 it is chi-square distributed on one-degree of freedom (df). If you square and sum two independent standard normal numbers, then the sum X = Z1^2 + Z2^2 is chi-square distributed on two-degrees of freedom....and so on. Show this empirically through simulation and provide histograms to the students so that they can see this. Give the descriptive statistics for the some chi-square distributions e.g., the mean is equal to the df; the variance is 2*df; the skew is Sqrt[8/df] and the kurtosis is 12/df. So a chi-square distribution with one df will have a mean of 1; variance of 2 (std. dev. of Sqrt[2]); skew of 8; and kurtosis of 12.

Next, back the boys and girls, which is binomial, use the transformation for the standard normal distribution:

Z = (X - Mu) / Std.

Invoke the central limit theorem by replacing the mean for the binomial distribution (n*p) and the Std (Sqrt[n*p*q] like this:

Chi-square (1df) = Z^2 = (X - Mu)^2/ (Std)^2 = (X - n*p)^2 / (n*p*q).

Expanding we have:

Chi-square (1df) = (X - n*p)^2 / (n*p) + ((n-X) - n*q)^2 / (n*q).

Then, typically, what is generally done is:

Chi-quare (1df) = Sum[ (O1 -E1)^2 / E1 + (O2 -E2)^2 / E2 ].

Of course, this based on a good approximation of the binomial distribution to the normal distribution via the central limit theorem. What is typically used is the requirement that n*p and n*q are both greater than or equal to 5.