Proc surveylogistic

Hello everyone. I am trying to make some analysis on survey data. I am working with the procedure SURVEYLOGISTIC where I am using also survey weights. The goodnes of fit results are incredibly high and R-Squared equal 1. Any help with some transfomation or something, that could bring reasonable results? Thank you


Less is more. Stay pure. Stay poor.
R-sq = 1, hmmm. Do you have a variable in the model that is a piece of the outcome. Or is your model overparameterized. An R-sq = 1, generally means you are fully explaining the outcome, also, the high GoF means your expected and observed values for the outcome are the same or very close.

Neither of these being high is a bad thing, however the r-sq = 1 may signify an issue.
My collegue has the same problem. By using sample weights in the model the R-squared is also equal 1 in a different fields of study (same proc). The problem is that the weights cause these problems...

Mean Joe

TS Contributor
Yeah would need to have a lot more information to give good advice.

...But can always give generic advice.

Maybe you have too many factors in your model. Remove a couple, then your R-Square should go down.
As mentioned. The problem are the weights... If I don´t use them in the model, the model works quite fine. The problem with the high AIC,SC (over a million) and the R-squared comes up with adding weights... I reduced the expl. variables to cca 10 and nothing. I think, I tried it as an exercise with 2-3 variables and still the same...
You asked a question. Several people asked you for more information. You didn't provide it.

Why do you expect this to be a successful strategy?
I have problems with the procedure Surveylogistic (binary target variable), in the model are cca 10 explanatory variables. I am working with survey data where are also survey WEIGHTS. Model without weights Rsquared = 0,3 (cca) with weights R=1. AIC without weights 1400, with weights 1 million. I have problems with the application of the weights. I am not sure, if any more information would help... I heard, that the weights cause the problems and the R-squared in the logistic regression can be calculated in different ways...

Mean Joe

TS Contributor
Ok, be forewarned that I'm no expert on SURVEYLOGISTIC. For example, I don't even know how to get AIC output.

But it sounds like your WEIGHTs may not be calculated correctly.

Or maybe you're missing some statement. Are you using a STRATUM statement? What about a TOTAL= option in the PROC statement?

Are the WEIGHTs within a STRATUM all the same?
That´s the point... I don´t know if it is calculated in a wrong way or if it is correct. The weights should calculate the sample on the whole population of the country... but could not find any information on how the weights were calculated or what they mean. It was just said that I must use them in the analysis :/ The thing with the R-squared is that there should be different types of R-squared which could be calculated. But I don t know if it is a bad SW calculation or it is just an inapropriate R-squared statistic...