I have two data sets, one with 1000 observations and the other with 50. The observations can equal A, B, C or D. Now using the first set of data I calculate the probabilities P(A)=(freq of A)/100 and similarly for P(B),P(C) and P(D).

I now want to use these probabilities to model the second data set. i.e. The expected number of observations for A is P(A)*50. Then I calculate the pearson chi squared statistic.

My problem now is I don't know how many degrees of freedom I should use? As I have 4 classes and I have estimated 4 parameters (but these weren't estimated from the data I'm looking at).

Any suggestions on how many degrees of freedom I should use? Or other appropriate tests to measure the fit?

I now want to use these probabilities to model the second data set. i.e. The expected number of observations for A is P(A)*50. Then I calculate the pearson chi squared statistic.

My problem now is I don't know how many degrees of freedom I should use? As I have 4 classes and I have estimated 4 parameters (but these weren't estimated from the data I'm looking at).

Any suggestions on how many degrees of freedom I should use? Or other appropriate tests to measure the fit?

Last edited: