chi-square test for nested models

#1
Consider 2 models A and B. A (5 free parameters) is nested within B (6 free parameters). Imagine that we do an experiment and acquire data from N = 21 participants. We fit models A and B to each individual dataset by minimizing Pearson's chi-square statistic. Chi-square values are generally lower for model B than model A, indicating a better goodness-of-fit for model B. I would like to test whether the improvement in goodness-of-fit for Model B is significant by doing a chi-square test for nested models.
I am not sure about how to perform this test. I think I have to compute the difference chi-sq (modelA) - chi-sq (modelB). Remember that the sample size N = 21, so I get 21 chi-square difference values. These values should be generally positive, because the goodness-of-fit is generally worse for model A. The difference should follow a chi-square distribution with a number of degrees of freedom df(diff) = df(model B) - df(model A) = 6-5 = 1. What can I do next?
 
#2
Can you be more specific about the type of models you're fitting? Typically (though not always) models are fit by minimizing the negative log-likelihood, and for discrete outcomes, the Pearson's chi-square statistic is calculated afterwards from the observed vs. expected values as a measure of goodness-of-fit. This is not to be confused with the deviance, another goodness-of-fit measure, which is -2(log-likelihood of full model - log-likelihood of saturated model).

Your statement "chi-sq (modelA) - chi-sq (modelB)" would be more accurate as "deviance (modelA) - deviance (modelB)", since the deviance of individual models is not typically chi-square distributed. However, provided the sample size is large, the difference in deviance between nested models will be approximately chi-square distributed under the null hypothesis that model A = model B. This resulting chi-square statistic will have df(diff) = df(model B) - df(model A), as you correctly specified above.

If you compute the p-value for your chi-square statistic, and conclude that it is sufficiently small to reject the null hypothesis, then that would be an argument for preferring model B, which describes the data better.