Discrepencies between SPSS and SAS

#1
Does anyone know why there are differences in results when performing Kaplan Meier survival analysis tests in SPSS and SAS. The values do not match exactly and the bigger the sample population, the wider this difference becomes.

Does anyone know what the difference in algorithms might be?
 
Last edited:
#3
Unfortunately I do not have hands on experience with this at the moment. I have been told when you use the two KM functions in SPSS and SAS, the survival time/duration is different in both. I was told maybe it is the way they censor the data? I need more details on this to write a report on the discrepancies.
 

Dason

Ambassador to the humans
#4
So you're writing a report on the way two programs differ and you haven't used either for the task of interest?
 
#5
I'm doing research to help write the report. I have skimmed SPSS, but not SAS - I have only read about a bit about SAS. I was given some information, which is why I am trying to see if anyone here has experienced similar problems and has found reasons behind them.
 
Last edited:

noetsi

Fortran must die
#6
For SPSS (go to the part that discusses this test, it starts on page 153)
http://www.hks.harvard.edu/fs/pnorris/Classes/A SPSS Manuals/SPSS Advanced Statistics 17.0.pdf

This may help with the SAS code
http://www.nesug.org/proceedings/nesug08/sa/sa20.pdf

This might be useful as a general overview
http://www.lexjansen.com/wuss/2004/hands_on_workshops/i_how_kaplan_meier_and_cox_p.pdf

The answer to your question may involve reading the SPSS and SAS documentation and thinking carefully about what that entails in the context of this specific method. I dont know it well enough to comment beyond posting these links.
 

Mean Joe

TS Contributor
#8
When doing a Kaplan-Meier, there are a couple of options for estimating survival function past the last observed age (which may be a censored value). One option is to keep the function at its last value, another option is to set the function to 0, and another intermediate option could be to use an exponential curve to reduce the function for the value at last observed age down to 0.

You talk about having a bigger sample size. With large data sets, assumptions are made about the location of values within the intervals (you calculate the survival function at endpoints of the intervals). One assumption that can be made is that all of the uncensored observations in an interval occur at the same value within that interval, say \(c_j\). Rather than place all probability at the \(c_j\) values, usually you evaluate the distribution function at the given endpoints and then smooth the function by interpolation -- here there is a choice about what kind of interpolation -- between successive values. I suspect SPSS and SAS are using different methods for the interpolation. My source says that linear interpolation is usual, but it seems that would not make a smooth function that people often like to make nowadays.