# Thread: How to run and interpret a quadratic variable in ordered logit

1. ## How to run and interpret a quadratic variable in ordered logit

Hello everyone,

I've run an ordered logit model (in Stata 12) with a quadratic age variable (8 age groups) on a dependent variable with 5 categories (self reported health [poor to excellent]. I've run the margins command as per the instructions Bukharin gave me awhile back.

What I'm getting for outcome 1 (poor health) is this:

| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .0823576 .0106364 7.74 0.000 .0615106 .1032046
2 | .0705586 .0074272 9.50 0.000 .0560015 .0851157
3 | .0602709 .0049174 12.26 0.000 .0506329 .0699089
4 | .0513465 .0031103 16.51 0.000 .0452505 .0574425
5 | .0436402 .0021147 20.64 0.000 .0394955 .0477849
6 | .0370132 .0019898 18.60 0.000 .0331131 .0409132
7 | .0313351 .0023151 13.54 0.000 .0267976 .0358726
8 | .0264858 .0026728 9.91 0.000 .0212472 .0317244

I'm not sure how to interpret this: the youngest group (45-49 years of age) has a higher probability...whereas the oldest (80+) has the lowest. This jseems backwards to me but it's probably because I'm not clear on how to interpret it. Perhaps they are cumulative probabilities - if that's the case...I'm also not sure how to interpret them.

Not sure if this could be an issue but the age variable also includes group 0 (younger than 45) - but this was not included in the regression, nor the marginal analysis (asked for categories 1/8 [not 0]).

Anywho,
Thanks for your help
Sean

2. ## Re: How to run and interpret a quadratic variable in ordered logit

I would start by cross-tabulating age group vs self rated health, and then running a simple model with age as the only predictor. Do the results agree (more or less)?

3. ## Re: How to run and interpret a quadratic variable in ordered logit

Originally Posted by bukharin
I would start by cross-tabulating age group vs self rated health, and then running a simple model with age as the only predictor. Do the results agree (more or less)?
Here's the cross tab results. I would have thought that the margins results would go the other way. The one's I posted prior were for poor health status. I think my problem is that I'm not sure what the margins are telling me, so I don't know if I'm interpreting them correctly. Would you be able to do me a huge favour and walk me through an example interpretation of the margins output?

Age group of the | Self-Reported Health
respondent. | ...poor? ...fair? ...good? ...very g ...excell | Total
-------------------+-------------------------------------------------------+----------
45 to 49 | 69 245 653 590 381 | 1,938
50 to 54 | 84 294 643 601 359 | 1,981
55 to 59 | 114 299 628 578 356 | 1,975
60 to 64 | 91 274 529 526 347 | 1,767
65 to 69 | 69 234 462 393 227 | 1,385
70 to 74 | 50 194 380 281 155 | 1,060
75 to 80 | 65 206 313 206 91 | 881
80 years and older | 82 261 374 238 90 | 1,045
-------------------+-------------------------------------------------------+----------
Total | 624 2,007 3,982 3,413 2,006 | 12,032

For another example, here's the margins for the highest category excellent health status (The probabilities begun going the other way for Very good and Excellent health).

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .135346 .0077956 17.36 0.000 .120067 .150625
2 | .1561369 .0052982 29.47 0.000 .1457527 .1665211
3 | .1792657 .0043016 41.67 0.000 .1708349 .1876966
4 | .204763 .0073369 27.91 0.000 .1903831 .219143
5 | .2326016 .0126392 18.40 0.000 .2078291 .257374
6 | .2626901 .0190511 13.79 0.000 .2253506 .3000297
7 | .2948708 .0262087 11.25 0.000 .2435026 .346239
8 | .3289206 .0338698 9.71 0.000 .2625369 .3953042
------------------------------------------------------------------------------

Here's a regression with just age (linear) on self reported health status:

Survey: Ordered logistic regression Number of obs = 12032
Population size = 12032.46
Replications = 500
Wald chi2(1) = 166.67
Prob > chi2 = 0.0000

-------------------------------------------------------------------------------------
| Observed Bstrap *
XSelfReportedHealth | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
XAge | .8870974 .0082319 -12.91 0.000 .8711089 .9033793
--------------------+----------------------------------------------------------------
/cut1 | -3.466362 .0655787 -52.86 0.000 -3.594894 -3.33783
/cut2 | -1.824878 .045673 -39.96 0.000 -1.914395 -1.735361
/cut3 | -.2673545 .0422428 -6.33 0.000 -.3501488 -.1845601
/cut4 | 1.107109 .044474 24.89 0.000 1.019942 1.194277
-------------------------------------------------------------------------------------

Thanks,
Sean

4. ## Re: How to run and interpret a quadratic variable in ordered logit

It looks to me like it's working really nicely with these data. Here is what I get from running a simple linear model with your data:
Code:
``````clear
set more off

input age count1 count2 count3 count4 count5
45 69 245 653 590 381
50 84 294 643 601 359
55 114 299 628 578 356
60 91 274 529 526 347
65 69 234 462 393 227
70 50 194 380 281 155
75 65 206 313 206 91
80 82 261 374 238 90
end

* empirically observed proportions
egen total=rowtotal(count*)
foreach cat of numlist 1/5 {
gen obs`cat'=count`cat' / total
}
tempfile observed
save `observed'

* now reshape for analysis
reshape long count, i(age) j(health)

lab define health 1 "poor" 2 "fair" 3 "good" 4 "very good" 5 "excellent"
lab val health health
tab age health [fw=count], row

* ordinal logit model
ologit health age [fw=count]
estimates store mymodel

* obtain adjust probabilities of each level of health by age
tempfile predicted

foreach health of numlist 1/5 {
estimates restore mymodel
margins, predict(outcome(`health')) at(age=(45(5)80)) post
preserve
parmest, norestore
gen health=`health'
gen age=5 * _n + 40
capture append using `predicted'
save `predicted', replace
restore
}

* now plot predicted and observed probabilities against age
use `predicted', clear
keep health age estimate
reshape wide estimate, i(age) j(health)

* merge in observed probabilities
merge 1:1 age using `observed'

twoway scatter obs* age, mstyle(p1 p2 p3 p4 p5) || ///
line estimate* age, sort lstyle(p1 p2 p3 p4 p5) ///
title(Observed vs predicted probability of health status) ///
legend(title(Health status) ///
order(1 "poor" 2 "fair" 3 "good" 4 "very good" 5 "excellent")) ///
xtitle(Age) ytitle(Probability)``````

5. ## Re: How to run and interpret a quadratic variable in ordered logit

Thanks Bukharin. I'm still not sure how to interpret the margins though. For the first margins example (poor health status - outcome 1). The 45-49 age group (1) has a probability of .0823576 while the 80+ age group (8) has a probability of .0264858. How would I interpret this? Is it that the 45-49 age group has a higher probability of being in the next group (fair health)?

If that's correct then for the highest outcome (5 - excellent health status). The probability for the 45-49 age group (1) is .135346 while the probability for the 80+ age group (8) is .3289206. I'm not sure what that means though.

Take care,
Sean

6. ## Re: How to run and interpret a quadratic variable in ordered logit

Your interpretation of the -margins- output is correct but I worry that you have some problem with your model. You can see that the coefficient for XAge in your model is positive - so as people get older they tend to move up health categories. When I ran the simple linear model I got a negative coefficient for age which is more what you'd expect:
Code:
``````. ologit health age [fw=count], nolog

Ordered logistic regression                       Number of obs   =      12032
LR chi2(1)      =     199.98
Prob > chi2     =     0.0000
Log likelihood = -17638.075                       Pseudo R2       =     0.0056

------------------------------------------------------------------------------
health |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |  -.0211823   .0015016   -14.11   0.000    -.0241254   -.0182391
-------------+----------------------------------------------------------------
/cut1 |   -4.18985   .1005756                     -4.386974   -3.992725
/cut2 |  -2.546833   .0934131                     -2.729919   -2.363747
/cut3 |  -1.055638   .0907687                     -1.233542   -.8777349
/cut4 |    .368005   .0909177                      .1898095    .5462004
------------------------------------------------------------------------------``````
I'm a little puzzled by your "Population size = 12032.46" - is the model a little more complex than you've described?

7. ## Re: How to run and interpret a quadratic variable in ordered logit

Hey Bukharin,

I think it's positive because it's an odds-ratio and not the coeffecient. The population size is off because of bootstrapped estimates: here's what I'm getting without bootstraps or odds-ratio.

ologit XSelfReportedHealth XAge if XAge>0

Iteration 0: log likelihood = -17738.065
Iteration 1: log likelihood = -17638.188
Iteration 2: log likelihood = -17638.075
Iteration 3: log likelihood = -17638.075

Ordered logistic regression Number of obs = 12032
LR chi2(1) = 199.98
Prob > chi2 = 0.0000
Log likelihood = -17638.075 Pseudo R2 = 0.0056

-------------------------------------------------------------------------------------
XSelfReportedHealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
XAge | -.1059113 .0075081 -14.11 0.000 -.120627 -.0911956
--------------------+----------------------------------------------------------------
/cut1 | -3.34256 .0519516 -3.444383 -3.240736
/cut2 | -1.699542 .0378382 -1.773704 -1.625381
/cut3 | -.2083479 .0341976 -.275374 -.1413219
/cut4 | 1.215295 .0368326 1.143105 1.287486
-------------------------------------------------------------------------------------

8. ## Re: How to run and interpret a quadratic variable in ordered logit

Sorry, you're right - I didn't see that you'd requested odds ratios.

In any case that simple model should definitely show people shifting to lower health categories as they get older - what do you get when running -margins- directly after the above model? Please post both your -margins- command and its output.

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts