I am attempting to find a citation for or document the use of a correction factor for an \(m \times n\) contingency table with sparse data (for most of the tables, 20% to 80% of the cells have a value of 0).
I have been working with a statistician who insists that adding 0.5 to each value is an appropriate and common method of dealing with sparsity. The situation is as follows:
The data is preliminary analysis to consider whether proposed projects are worthwhile. The table consists of the occurrences of haplotypes in different regions: 130 haplotypes and 42 geographically defined regions. The aim to to demonstrate whether or not there is variation in haplotype distribution across the regions. Due to the high genetic variability, many haplotypes may be observed only once in the data.
The statistician proposed a permutation test using the \(\chi^2\) score as a useful metric for a permutation test. Prior to computing the \(\chi^2\) for the data and the permutations, 0.5 is added to the data.
I requested references that adding 0.5 is appropriate and he insists that it is common enough that I should be able to find a reference. As of yet, the closest I have come is the Yates continuity corrects, which the statistician insists is not what he mean.
That's a long way of saying: does anyone know of a literature citation that introduces or supports adding 0.5 to an \(m \times n\) contingency table?
I have been working with a statistician who insists that adding 0.5 to each value is an appropriate and common method of dealing with sparsity. The situation is as follows:
The data is preliminary analysis to consider whether proposed projects are worthwhile. The table consists of the occurrences of haplotypes in different regions: 130 haplotypes and 42 geographically defined regions. The aim to to demonstrate whether or not there is variation in haplotype distribution across the regions. Due to the high genetic variability, many haplotypes may be observed only once in the data.
The statistician proposed a permutation test using the \(\chi^2\) score as a useful metric for a permutation test. Prior to computing the \(\chi^2\) for the data and the permutations, 0.5 is added to the data.
I requested references that adding 0.5 is appropriate and he insists that it is common enough that I should be able to find a reference. As of yet, the closest I have come is the Yates continuity corrects, which the statistician insists is not what he mean.
That's a long way of saying: does anyone know of a literature citation that introduces or supports adding 0.5 to an \(m \times n\) contingency table?