I'm currently analyzing items from a bank for K-12 education. I have a separate data set for each grade/subject combination (for example, 1st grade English is a set, 2nd grade English is a set, etc). Each data set contains 400-7000(!) items.

I would love to able to run various DIF analyses, but missing data is a huge problem. Each student in the data set has answered only a fraction of the items. For example, in the 1st grade English data set, I have an N of 4000, but no one student has answered more than 50 percent of the 400 items, and the percentages of missing answers is above 90 percent. This is actually one of the more complete data sets I have. I expect the set with 7000 items to have a percentage missing in the 95 or above range.

My questions: there's absolutely no way to perform Mantel-Haenszel, IRT-based, or any other form of DIF on data with so many missings, right? Even if my program of choice (STATA) returned results (which it doesn't), that this information would be unreliable, invalid, and otherwise useless?

Also, assuming the above is the case, any suggestions for things I might try to measure different function by groups in lieu of conventional DIF, even if it's not particularly sophisticated?

Thanks in advance,

Bryan ]]>

i want to check whether the two samples can be drawn from the same distribution. can i do it with chi-square test? what are the assumptions i need for it?

if i can't use chi square test - what can i do?

notes about the data sets:

- discrete

- mostly increasing (in particular - not normally distributed)

- about 50-70 participants in each data set.

thanks!! :) ]]>