Hi everybody,
I'm using multiple imputation with the fully conditional specification (fcs) method in SAS (proc mi, fcs statement) to replace missing values in 26 variables. My choice for fcs was motivated by the fact that the missing value pattern was arbitrary. All variables (except one) have missing values, with missing value percentages for the variables in the range 0.2%-36.4% (median 14.1%). This is the first time I'm conducting multiple imputation, and I assume those are a lot of values to impute. All variables are binary, so I used the logistic statement in fcs. The syntax goes as follows.
proc mi data = datain out = dataout nimpute = 5 seed = 123456;
class var1 var2 ...var25 var 26;
fcs nbiter=100
logistic(var1/details)
logistic(var2/details)
...
logistic(var25/details)
logistic(var26/details);
var var1 var2 ... var25 var26;
run;
Variables in the var statement were sorted by % missing values (descending).
Although the procedure did run completely, I got the warning message: "The maximum likelihood estimates for the FCS method logistic model for variable var1 in an iteration process may not exist. The resulting posterior predictive distribution of the parameters used in the imputation process is based on the maximum likelihood estimates in the last maximum likelihood iteration.
This message was shown for about 9 out of the 26 variables. For those variables (which have low sample proportion estimates in the non-imputed set), the sample proportion estimates were considerably higher then in the non-imputed set. Although I have no understanding of what could have gone wrong in the statistics behind the fcs method, I presumed that those variables that have low sample proportion values were maybe not appropriate to use for the imputation. For now, I constructed specific models in the fcs statements for each variable for which I received a warning, each time leaving out all the other variables for which I received warnings.
Suppose the warning was shown for var 1-8:
proc mi data = data out = data nimpute = 5 seed = 123456;
class var1 var2 ...var25 var26;
fcs nbiter=100
logistic(var1 = var9 var10 var11 var12 ... var25 var 26/details)
logistic(var2= var9 var10 var11 var12 ... var25 var 26/details)
...
logistic(var25/details)
logistic(var26/details);
var var1 var2 ...var 25 var 26;
run;
I did not receive anymore warning messages. Sample proportion estimates calculated from the imputed dataset were acceptable (although still somewhat inflated for low sample proportions).
My questions:
- is this good practice; am I trying to impute too many missing values?
- what is the exact problem causing the warning messages?
- did I adress the problem in an appropriate way or should I do something else?
thanks in advance for any reply or comment,
Kind regards,
Philippe
I'm using multiple imputation with the fully conditional specification (fcs) method in SAS (proc mi, fcs statement) to replace missing values in 26 variables. My choice for fcs was motivated by the fact that the missing value pattern was arbitrary. All variables (except one) have missing values, with missing value percentages for the variables in the range 0.2%-36.4% (median 14.1%). This is the first time I'm conducting multiple imputation, and I assume those are a lot of values to impute. All variables are binary, so I used the logistic statement in fcs. The syntax goes as follows.
proc mi data = datain out = dataout nimpute = 5 seed = 123456;
class var1 var2 ...var25 var 26;
fcs nbiter=100
logistic(var1/details)
logistic(var2/details)
...
logistic(var25/details)
logistic(var26/details);
var var1 var2 ... var25 var26;
run;
Variables in the var statement were sorted by % missing values (descending).
Although the procedure did run completely, I got the warning message: "The maximum likelihood estimates for the FCS method logistic model for variable var1 in an iteration process may not exist. The resulting posterior predictive distribution of the parameters used in the imputation process is based on the maximum likelihood estimates in the last maximum likelihood iteration.
This message was shown for about 9 out of the 26 variables. For those variables (which have low sample proportion estimates in the non-imputed set), the sample proportion estimates were considerably higher then in the non-imputed set. Although I have no understanding of what could have gone wrong in the statistics behind the fcs method, I presumed that those variables that have low sample proportion values were maybe not appropriate to use for the imputation. For now, I constructed specific models in the fcs statements for each variable for which I received a warning, each time leaving out all the other variables for which I received warnings.
Suppose the warning was shown for var 1-8:
proc mi data = data out = data nimpute = 5 seed = 123456;
class var1 var2 ...var25 var26;
fcs nbiter=100
logistic(var1 = var9 var10 var11 var12 ... var25 var 26/details)
logistic(var2= var9 var10 var11 var12 ... var25 var 26/details)
...
logistic(var25/details)
logistic(var26/details);
var var1 var2 ...var 25 var 26;
run;
I did not receive anymore warning messages. Sample proportion estimates calculated from the imputed dataset were acceptable (although still somewhat inflated for low sample proportions).
My questions:
- is this good practice; am I trying to impute too many missing values?
- what is the exact problem causing the warning messages?
- did I adress the problem in an appropriate way or should I do something else?
thanks in advance for any reply or comment,
Kind regards,
Philippe