# Thread: Proper way to execute Logistic regression?

1. ## Proper way to execute Logistic regression?

I executed "logot job age i.gender i.school i.company" for states A & B to find what makes students get a skill_job (=1) or not (=0). Here, gender=0 (boy) & 1 (girl), school = school type (1-5), company = employer type (1-3).
However I got a comment that the following results were not comparable and the analysis was not done properly. Please tell me how to realize from the results and execute properly?
Thanks.

2. ## Re: Proper way to execute Logistic regression?

Please elaborate on your problem. Include the commands you enter as written and the text of the error. Also use a font like Lucida Console around font size 8 so that it is easier to read your results.

3. ## Re: Proper way to execute Logistic regression?

There are variables, e.g., job (0=not skill related, 1=skill related), age (numerical), gender (0=boy, 1=girl), school (school type: 1=type A, 2=type B, 3=type C, 4=type D, 5=type E), company (employer type: 1=type A, 2=type B, 3=type C).
Try to find who gets a skill related job in companies after finishing school to compare two states A & B.
After treating all unknowns as missing values, I executed
logit job age i.gender i.school i.company

I obtained the following results.

Someone quickly commented me that the results were not comparable and the analysis may not be done properly.
I had no chance to ask him a reason. My questions are....

How can you tell if the results are not comparable by looking at the results?
How can I check if the analysis was done properly?

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[STATE A]
Logistic regression Number of obs = 2010
LR chi2(11) = 179.89
Prob > chi2 = 0.0000
Log likelihood = -822.46643 Pseudo R2 = 0.0979

------------------------------------------------------------------------------
skill_job | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0204837 .0121626 -1.68 0.092 -.0443219 .0033546
1.gender | -.8376907 .1278955 -6.55 0.000 -1.088361 -.5870201
|
school |
2 | .0551259 .1678677 0.33 0.743 -.2738888 .3841405
3 | .0283689 .193962 0.15 0.884 -.3517896 .4085274
4 | -.245789 .2276207 -1.08 0.280 -.6919173 .2003393
5 | -.014502 .2319636 -0.06 0.950 -.4691423 .4401383
|
company |
2 | -.2985257 .186702 -1.60 0.110 -.6644549 .0674035
3 | 1.199048 .1579964 7.59 0.000 .8893808 1.508715

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[STATE B]
Logistic regression Number of obs = 5000
LR chi2(10) = 1022.33
Prob > chi2 = 0.0000
Log likelihood = -2100.9994 Pseudo R2 = 0.2022

------------------------------------------------------------------------------
skill_job | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | -.0079074 .0084048 -0.94 0.347 -.0243806 .0085657
1.gender | -.0957489 .076279 -1.26 0.209 -.2452531 .0537553
|
school |
2 | -.0454449 .1600483 -0.28 0.776 -.3591338 .2682439
3 | -1.086746 .104779 -10.37 0.000 -1.292109 -.8813828
4 | -.2759944 .2146616 -1.29 0.199 -.6967235 .1447347
5 | -.5501513 .1114829 -4.93 0.000 -.7686538 -.3316487
|
company |
2 | .9866781 .1121615 8.80 0.000 .7668457 1.206511
3 | 2.825782 .1236482 22.85 0.000 2.583436 3.068128

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

4. ## Re: Proper way to execute Logistic regression?

So, why are you running two separate logit regressions for each state? Why not include a state indicator variable in your regression, instead? This would resolve your problem, it seems to me.

5. ## Re: Proper way to execute Logistic regression?

I asked another person who can access to data for state B to run the same do file.
We cannot put them together. So, there is no way to execute do file separately and compare them?
I was told that one variable was causing a problem. I just like to know how to identify which variables need to be removed or whether the results are not comparable.

6. ## Re: Proper way to execute Logistic regression?

Tom,

You can compare them in the sense that you can say "For Group A, the effect of schooling level on job skill is ..., and for group B, the effect of schooling is... " But unless you include both groups in your regression, the effects your report are conditional on the group you're analyzing. You wouldn't be estimating the overall return of a policy, for example, only the return to each group individually. The exception to this is if the errors of the regression in the two groups are correlated with one another, in which case you absolutely must analyze the groups at the same time.

Your two models appear to contain the same variables, so I don't see why you'd need to remove one, and I don't see anything that must overtly cause a problem.

7. ## The Following User Says Thank You to eyesack_kn For This Useful Post:

tom2012 (06-29-2012)

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts