Why is there a difference in the confidence intervals calculated by SPSS compared to SAS and Epi-Info 7?

#1
I writing sample code for colleagues in developing countries who will use it to analyze a multi-stage, cluster-sample survey that measures the prevalence of infection with soil-transmitted intestinal worms. So far, I have used three of the software packages they might use: IBM SPSS Statistics (I have version 24, release 24.0.0.0, 64-bit edition), SAS (I’m using version 7.1.5.2) and Epi-Info 7 (a software package from CDC; I have version 7.1.5.2).

The survey produces equal-probability samples from each of the three target populations. The key parameter being estimated is the percentage of each population that is infected. Estimated population totals are not needed.

In running the code I’ve written, I’ve noticed a small but consistent difference in the 95% confidence interval between SAS and Epi-Info, on the one hand, and IBM SPSS on the other.

In the interests of simplicity and because the FPC is unlikely to have a meaningful impact on precision, I’ve done the analyses in SAS and Epi-Info providing only the stratum and cluster variables as sample design information. I’m assuming the software uses the “ultimate cluster” approach to variance calculation. It makes no difference in SAS and Epi-Info whether I include a weight statement. Results are the same (disregarding population totals) if I leave it out, set it equal to 1 for all records or set it equal to 1,000 for all records.

To conduct the same analysis with SPSS, I pressed “FINISH” after providing the analysis plan with the required design variables (stratum, cluster and weight). Unlike SAS and Epi-Info, SPSS requires a sample weight variable, and if weight is set to 1 for all records, no design effects are calculated. Therefore, I created a weight variable with the value 1000 for all records.

For each of the three datasets, the SEs and design effects with SPSS were the same as with Epi-Info 7 and SAS. However, the 95% confidence intervals with SPSS were displaced a little higher for low values of dichotomous variables and a little lower for high values. Here’s a typical example, for the yes/no infection variable (point estimate 3.9%, for “yes”): 95% CI with SAS and Epi-Info 1.6%-6.2%; with SPSS 2.2%-6.9%. For “no,” point estimate 96.1%, 95% CI with SAS and Epi-Info 94.3%-98.0%, with SPSS 93.1%-97.8%.

I’m wondering if the changing the analysis syntax in SPSS would make the CIs the same as with SAS and Epi-Info 7, or if SPSS calculates them in a slightly different way. Thank you.
 

noetsi

Fortran must die
#3
Companies use different code and it can lead to very different results. Statisticians disagree on some issues, so the code will....
 

Miner

TS Contributor
#4
Also, some software offers multiple options for a given test and may offer a different default test.

For example, Minitab offers three options for the normality test: Anderson-Darling, Ryan-Joiner and Kolmogorov-Smirnov, with the A-D test as default. If another SW package set K-S as the default, it would potential yield different results.