# Thread: Logistic Regression - predicted probabilities opposite to actual percentages

I've asked this question elsewhere but haven't yet got a response that makes much sense to me...

I have conducted a logistic regression in order to identify whether student status (student/non-student), time period (time 1, 2 or 3), or condition (condition 1 or condition 2) predict a binary outcome (buying lunch or purchasing lunch).

I have plotted the predicted probabilities that are saved as a result of the logistic regression to visualise the data. These show a decrease in probability of lunch being bought between time 1 and time 2 for one of the conditions. However, when looking at the percentages of people who bought their lunch (rather than the predicted probabilities), there is an increase between time 1 and time 2.

Is it possible for there to be an increase in terms of percentages but a decrease in terms of probabilities, or does this indicate that something has gone wrong with the model?

I've been told that it could be Simpson's paradox, that it could be that the model is just a very bad fit, or that it means something has gone wrong along the line somewhere, but I don't know what I need to do to test any of these.

Is anyone able to help?

Re: Logistic Regression - predicted probabilities opposite to actual percentages

Was time entered into the model as dummy variables (3) or 2 dummy variables with a reference group? How is it entered into the model. Also you plotted predicted probabilities exported from model, so scored data. I would imagine looking at the actual coefficients would be more telling in the pred probs may be higher or lower due to other factors. Did you test for interactions?

Can you provide a descript of the model y = bo + b1 +,...bk
Can you provide the coefficients.

Simpson's paradox would be the changing in direction of an effect when stratifying it.

Can you also provide the percentage values for the 3 groups.

I think we just need a little more info to help. Also can you post a link to your other posts elsewhere, so we can see what feedback you got and info you have provided. Thanks!

Re: Logistic Regression - predicted probabilities opposite to actual percentages

Hi, thanks so much for your response.

Originally Posted by hlsmith
Was time entered into the model as dummy variables (3) or 2 dummy variables with a reference group? How is it entered into the model.
I had one variable coded 0 (reference category), 1 and 2 - will this work, or should I have used dummy variables?

Did you test for interactions?
Yes, that's mostly what's of interest for this study, the interaction between time and condition (and potentially, time, condition and student status). The model includes the main effects of time, condition and student status, and then the interaction terms of time x condition, and time x condition x student status.

Can you provide the coefficients.
Here are my coefficients - I've marked which variables are significant (this is a simplified model that doesn't include student status, but the pattern is the same):
Time (1) -0.361**
Time (2) -0.172
Condition(1) 0.734**
Condition(1) by Time(1) 0.111
Condition(1) by Time(2) 0.452*
Constant 0.248**

Can you also provide the percentage values for the 3 groups.
I've attached a graph showing the percentage of people purchasing their lunch - the two clusters are the two conditions, with the different colours corresponding with time 1, time 2 and time 3. I've also attached the predicted probabilities for lunch being purchased so you can see what I mean about them being so different (I have also checked that the outcome measure is coded correctly, and hasn't been flipped in the analysis)

I think we just need a little more info to help. Also can you post a link to your other posts elsewhere, so we can see what feedback you got and info you have provided. Thanks!
No problem - here's the post I made on stackexchange: https://stats.stackexchange.com/ques...-with-percenta

Some of the responses sounded very useful but when trying to apply their advice I found it really difficult to follow.

I hope that helps.

Re: Logistic Regression - predicted probabilities opposite to actual percentages

Can we see your coding syntax and which group was your reference group, 3?

Re: Logistic Regression - predicted probabilities opposite to actual percentages

I've just been running it through SPSS so have no syntax I'm afraid - the reference groups are time 1, condition 1 and non-students, all coded as 0.

Re: Logistic Regression - predicted probabilities opposite to actual percentages

Do your graphs represent this. I would imagine if blue was group one then time(1) in the model would actually be time 2 and that coefficient should be positive?

Can you better label your graphs?

Re: Logistic Regression - predicted probabilities opposite to actual percentages

The graphs should represent it - the first graph (the one in which time 2 is in orange) is the percentages (so the percentages increase between time 1 and time 2, and then decrease for time 3), and the second graph is the predicted probabilities, which decrease between time 1 and time 2.

I hope that makes sense!

