I would start by cross-tabulating age group vs self rated health, and then running a simple model with age as the only predictor. Do the results agree (more or less)?
Hello everyone,
I've run an ordered logit model (in Stata 12) with a quadratic age variable (8 age groups) on a dependent variable with 5 categories (self reported health [poor to excellent]. I've run the margins command as per the instructions Bukharin gave me awhile back.
http://www.talkstats.com/showthread....e-Binomial-Reg.
What I'm getting for outcome 1 (poor health) is this:
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .0823576 .0106364 7.74 0.000 .0615106 .1032046
2 | .0705586 .0074272 9.50 0.000 .0560015 .0851157
3 | .0602709 .0049174 12.26 0.000 .0506329 .0699089
4 | .0513465 .0031103 16.51 0.000 .0452505 .0574425
5 | .0436402 .0021147 20.64 0.000 .0394955 .0477849
6 | .0370132 .0019898 18.60 0.000 .0331131 .0409132
7 | .0313351 .0023151 13.54 0.000 .0267976 .0358726
8 | .0264858 .0026728 9.91 0.000 .0212472 .0317244
I'm not sure how to interpret this: the youngest group (45-49 years of age) has a higher probability...whereas the oldest (80+) has the lowest. This jseems backwards to me but it's probably because I'm not clear on how to interpret it. Perhaps they are cumulative probabilities - if that's the case...I'm also not sure how to interpret them.
Not sure if this could be an issue but the age variable also includes group 0 (younger than 45) - but this was not included in the regression, nor the marginal analysis (asked for categories 1/8 [not 0]).
Anywho,
Thanks for your help
Sean
Last edited by seandb; 10-20-2012 at 12:23 PM. Reason: More details
I would start by cross-tabulating age group vs self rated health, and then running a simple model with age as the only predictor. Do the results agree (more or less)?
Here's the cross tab results. I would have thought that the margins results would go the other way. The one's I posted prior were for poor health status. I think my problem is that I'm not sure what the margins are telling me, so I don't know if I'm interpreting them correctly. Would you be able to do me a huge favour and walk me through an example interpretation of the margins output?
Age group of the | Self-Reported Health
respondent. | ...poor? ...fair? ...good? ...very g ...excell | Total
-------------------+-------------------------------------------------------+----------
45 to 49 | 69 245 653 590 381 | 1,938
50 to 54 | 84 294 643 601 359 | 1,981
55 to 59 | 114 299 628 578 356 | 1,975
60 to 64 | 91 274 529 526 347 | 1,767
65 to 69 | 69 234 462 393 227 | 1,385
70 to 74 | 50 194 380 281 155 | 1,060
75 to 80 | 65 206 313 206 91 | 881
80 years and older | 82 261 374 238 90 | 1,045
-------------------+-------------------------------------------------------+----------
Total | 624 2,007 3,982 3,413 2,006 | 12,032
For another example, here's the margins for the highest category excellent health status (The probabilities begun going the other way for Very good and Excellent health).
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_at |
1 | .135346 .0077956 17.36 0.000 .120067 .150625
2 | .1561369 .0052982 29.47 0.000 .1457527 .1665211
3 | .1792657 .0043016 41.67 0.000 .1708349 .1876966
4 | .204763 .0073369 27.91 0.000 .1903831 .219143
5 | .2326016 .0126392 18.40 0.000 .2078291 .257374
6 | .2626901 .0190511 13.79 0.000 .2253506 .3000297
7 | .2948708 .0262087 11.25 0.000 .2435026 .346239
8 | .3289206 .0338698 9.71 0.000 .2625369 .3953042
------------------------------------------------------------------------------
Here's a regression with just age (linear) on self reported health status:
Survey: Ordered logistic regression Number of obs = 12032
Population size = 12032.46
Replications = 500
Wald chi2(1) = 166.67
Prob > chi2 = 0.0000
-------------------------------------------------------------------------------------
| Observed Bstrap *
XSelfReportedHealth | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
XAge | .8870974 .0082319 -12.91 0.000 .8711089 .9033793
--------------------+----------------------------------------------------------------
/cut1 | -3.466362 .0655787 -52.86 0.000 -3.594894 -3.33783
/cut2 | -1.824878 .045673 -39.96 0.000 -1.914395 -1.735361
/cut3 | -.2673545 .0422428 -6.33 0.000 -.3501488 -.1845601
/cut4 | 1.107109 .044474 24.89 0.000 1.019942 1.194277
-------------------------------------------------------------------------------------
Thanks,
Sean
It looks to me like it's working really nicely with these data. Here is what I get from running a simple linear model with your data:
Code:clear set more off input age count1 count2 count3 count4 count5 45 69 245 653 590 381 50 84 294 643 601 359 55 114 299 628 578 356 60 91 274 529 526 347 65 69 234 462 393 227 70 50 194 380 281 155 75 65 206 313 206 91 80 82 261 374 238 90 end * empirically observed proportions egen total=rowtotal(count*) foreach cat of numlist 1/5 { gen obs`cat'=count`cat' / total } tempfile observed save `observed' * now reshape for analysis reshape long count, i(age) j(health) lab define health 1 "poor" 2 "fair" 3 "good" 4 "very good" 5 "excellent" lab val health health tab age health [fw=count], row * ordinal logit model ologit health age [fw=count] estimates store mymodel * obtain adjust probabilities of each level of health by age tempfile predicted foreach health of numlist 1/5 { estimates restore mymodel margins, predict(outcome(`health')) at(age=(45(5)80)) post preserve parmest, norestore gen health=`health' gen age=5 * _n + 40 capture append using `predicted' save `predicted', replace restore } * now plot predicted and observed probabilities against age use `predicted', clear keep health age estimate reshape wide estimate, i(age) j(health) * merge in observed probabilities merge 1:1 age using `observed' twoway scatter obs* age, mstyle(p1 p2 p3 p4 p5) || /// line estimate* age, sort lstyle(p1 p2 p3 p4 p5) /// title(Observed vs predicted probability of health status) /// legend(title(Health status) /// order(1 "poor" 2 "fair" 3 "good" 4 "very good" 5 "excellent")) /// xtitle(Age) ytitle(Probability)
Thanks Bukharin. I'm still not sure how to interpret the margins though. For the first margins example (poor health status - outcome 1). The 45-49 age group (1) has a probability of .0823576 while the 80+ age group (8) has a probability of .0264858. How would I interpret this? Is it that the 45-49 age group has a higher probability of being in the next group (fair health)?
If that's correct then for the highest outcome (5 - excellent health status). The probability for the 45-49 age group (1) is .135346 while the probability for the 80+ age group (8) is .3289206. I'm not sure what that means though.
Take care,
Sean
Your interpretation of the -margins- output is correct but I worry that you have some problem with your model. You can see that the coefficient for XAge in your model is positive - so as people get older they tend to move up health categories. When I ran the simple linear model I got a negative coefficient for age which is more what you'd expect:
I'm a little puzzled by your "Population size = 12032.46" - is the model a little more complex than you've described?Code:. ologit health age [fw=count], nolog Ordered logistic regression Number of obs = 12032 LR chi2(1) = 199.98 Prob > chi2 = 0.0000 Log likelihood = -17638.075 Pseudo R2 = 0.0056 ------------------------------------------------------------------------------ health | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | -.0211823 .0015016 -14.11 0.000 -.0241254 -.0182391 -------------+---------------------------------------------------------------- /cut1 | -4.18985 .1005756 -4.386974 -3.992725 /cut2 | -2.546833 .0934131 -2.729919 -2.363747 /cut3 | -1.055638 .0907687 -1.233542 -.8777349 /cut4 | .368005 .0909177 .1898095 .5462004 ------------------------------------------------------------------------------
Hey Bukharin,
I think it's positive because it's an odds-ratio and not the coeffecient. The population size is off because of bootstrapped estimates: here's what I'm getting without bootstraps or odds-ratio.
ologit XSelfReportedHealth XAge if XAge>0
Iteration 0: log likelihood = -17738.065
Iteration 1: log likelihood = -17638.188
Iteration 2: log likelihood = -17638.075
Iteration 3: log likelihood = -17638.075
Ordered logistic regression Number of obs = 12032
LR chi2(1) = 199.98
Prob > chi2 = 0.0000
Log likelihood = -17638.075 Pseudo R2 = 0.0056
-------------------------------------------------------------------------------------
XSelfReportedHealth | Coef. Std. Err. z P>|z| [95% Conf. Interval]
--------------------+----------------------------------------------------------------
XAge | -.1059113 .0075081 -14.11 0.000 -.120627 -.0911956
--------------------+----------------------------------------------------------------
/cut1 | -3.34256 .0519516 -3.444383 -3.240736
/cut2 | -1.699542 .0378382 -1.773704 -1.625381
/cut3 | -.2083479 .0341976 -.275374 -.1413219
/cut4 | 1.215295 .0368326 1.143105 1.287486
-------------------------------------------------------------------------------------
Sorry, you're right - I didn't see that you'd requested odds ratios.
In any case that simple model should definitely show people shifting to lower health categories as they get older - what do you get when running -margins- directly after the above model? Please post both your -margins- command and its output.
|
|