Chi² distribution link with standard normal distribution?

#1
Hello,

The chi² distribution is the sum of n random standard normal variables. It's used for example to know if a distribution fits the uniform distribution by taking the sum of the squared difference between the observed frequency of a value and the expected frequency of a value divided by the expected frequency of a value. By doing this I assume that this is because the difference between the observed frequency and the expected frequency must be standard normally distributed. My question is: why is that difference standard normally distributed? And if it's not standard normally distributed why is the sum of the squared differences chi²-distributed?

Thank you very much for your attention to this matter.
 

hlsmith

Omega Contributor
#2
I am totally guessing and I am silly naïve when it comes to these things, but is there a difference squared going on in formulae, since the chi-square is equal to standard normal squared.
 

Dragan

Super Moderator
#3
This is a confusing topic to most students and others.

First, assume X~N(Mu, Sigma^2), then Z = (X - Mu)/Sigma is Standard Normal.

Second, the Chi-Square Distribution can be empirically derived by summing independent Z's e.g. Z^2 is Chi square on one df; Z1^2 + Z2^2 is Chi square on two df, and so on.

Third, the discrete Binomial Distribution with N independent ob's has a mean of N*p, where p is the probability of success, with variance of N*p*q where q is the probability of failure, and Stand. Dev. of Sqrt[N*p*q].

Now, through the Central Limit Theorem (CLT) the Binomial Distribution will approximate the Normal distribution when N is large enough. A typical rule is N*p>=5 and N*q>=5.

Next, "connect the dots." That is,

Chi-Square (1df) = Z^2 = (X - Mu)^2 / (Sigma^2).

Subsequently, impose the CLT using the binomial distribution parameters:

Chi-Square (1df) = Z^2 = (X - N*p)^2 / (N*p*q). (Note that X is the number of successes.)

And then through some not so obvious algebra we have:

Chi-Square (1df) = (X - N*p)^2 / (N*p) + ([N-X] - N*q)^2 / (N*q).

This is usually written as:

Chi-Square (1df) = (O_1 - E_1)^2 / (E_1) + (O_2 - E_2)^2 / (E_2).

Note that it is the Expected frequencies that must be greater than (or equal to) 5 and not the actual observations - to invoke the CLT.
 
#4
Hello,

The chi² distribution is the sum of n random standard normal variables.
The short answer is that don't just add the numbers. You square the numbers before you add them. If you take say 5 normal numbers at random, square each of them and add these five squares, the total you get is distributed chi squared with 5 df.
The errors are normal, so the sum of the errors squared is chi squared.
 
#5
Third, the discrete Binomial Distribution with N independent ob's has a mean of N*p, where p is the probability of success, with variance of N*p*q where q is the probability of failure, and Stand. Dev. of Sqrt[N*p*q].

Chi-Square (1df) = (X - N*p)^2 / (N*p) + ([N-X] - N*q)^2 / (N*q).

This is usually written as:

Chi-Square (1df) = (O_1 - E_1)^2 / (E_1) + (O_2 - E_2)^2 / (E_2).
Thank you very much for the detailed answer. I just have 2 more questions based on the explanation: Why does suddenly the binomial distribution appear? It counts the number of successes but what is a success in the case that we want to test if a distribution has a uniform distribution?

My second question is why is X equal to the observed frequency? Is it due to the number of successes of the binomial distribution? Is it also valid if we want to test if random variable X has a uniform distribution?