I am carrying out research using Labour Force Survey as to the marginal effects of categorical variables using the binary outcome of either employed or unemployed.
I have used logit glm for a basic analysis tool of the data, but am now having trouble with the marginal effects. I just want to create base categories for each variable, and then model the marginal effect of a change in each categorical variable individually.
An example variable is the government office. There are 8 government offices (in R as numbers 1-8), so if I have the base categories all kept constant, what would the change be if the individual was to live in each of the other districts.
Hopefully this makes some sort of sense, I am mainly just after direction in terms of packages and methodology. Thanks.
If I understand you correctly part of you're question is about interpretation of marginal effects of categorial variables in the logit model. Generally the "marginal effects" - if we agree on calling a change in a categorial variable a marginal effect in spite of the variable being discrete - will depend on the values of the other independent variables. Probably you have som quantitative variables in youre model one strategy is to hold these constant at their mean values when calculating the marginal effect any other variable. For a categorial variable this might mean considering individuals half men half women if you have 50% women 50% men in youre data set so one consideration would be whether you want this or want to calculate one marginal effect for women on for men of youre categorial variable under consideration.
Its hard to ssay anything more concrete without the specific model, you're questions of interes and estimationresults.
also "8 government offices" is that a Factor in R and if it is I believe R creates dummies automatically...but again Im having trouble picturing the design of youre model in specifics
I will try and make it a bit clearer, sorry! Basically I have the (un)employment outcome 0/1. For each individual I then have a number of variables such as education, in which 1=degree, 2=A Level, 3=GCSE, 4=Other, 5=None, or age bands, 1=18-24, 2=25-34 etc etc. I am looking for a way to create base categories, so I could measure the impact that say, someone from a higher age bracket, would have on that person's chances of being unemployed. There is an example of this in http://www.ons.gov.uk/ons/rel/elmr/e...010/index.html on page 3 of the 'Exits from unemployment in the UK 2006-2009', I am basically trying to carry out a very similar model.
Assume you have a model y ~ b1 + b2x2 + b3x3 where y = P(employment) ... you use the logit so
y = exp(b1 + b2x2 + b3x3) / (1 +exp(b1 + b2x2 + b3x3 )) Assume x2 = age (as a quantitative variable)
to find the marginal effect of age you differentiate dy/dx2 the resulting expression is a funktion of both x2 and x3...
lets call it dy(x2,x3). To calculate the marginal effect you have to select a value for x3 (if there are more variable in youre
model you have to select values for x3,x4,...). The standard solution given in statistics book are to insert the mean of x3 as
the selected value. But you may have research interests that dictates another choice. Example x3 could be a dummy for men and women
and maybe you want to compare the marginal effect of age for men with the marginal effect of age for women then assuming the dummy=1
for men you could calculate the marginal effect for men by d(x2,1).
The marginal effect depends on age x2 so as in the model you linked to different agegroups - here different values of x2 - will give different marginal effects.
(So one strategy is simple to drop the agegroup and calculate marginal effects for age 18,19,20,... or maybe with two years intervals for less values 18, 20, 22).
The study you refer to report the selection of values in table 1 page 39.
If you insist on calculating marginal effects for agegroups the modelling is going to change... consider this:
So if you want agegroups you need to choose the basegroup.
The basegroup of the study you refer to is in table 1 page 39 as said. But lets pretend we use agegroup ag in [0,18] as a referencegroup (very bad choice!) but it serves purpose of illustration.
you have age=x2 and cut it int intervals... people only have one age so no observations occur in two intervals ... the make dummies:
d1=0 if age in [0,18] and d1=1 if age in [19,24]
d2=1 if age [25-31] and otherwise 0
d3=1 if age [32-38] and otherwise 0
estimate the model
y ~ b1 + b2d1 + b3d2 + ...
For the "marginal" effect of the agedummis you can calculate as:
y1 = exp(b1 + b2(d1=0) + b3(d2=0) / (1 +exp(b1 + b2(d1=0) + b3(d2=0) )) = exp(b1)/(1+exp(b1))
y2 = exp(b1 + b2(d1=1) + b3(d2=0) / (1 +exp(b1 + b2(d1=1) + b3(d2=0) )) = exp(b1 + d1)/(1+exp(b1 + d1))
y1 gives you the probability of employment for agegroup [0-18] and y2 is the probability of employment for agegroup [19-24] the difference in probability is the "marginal effect"
[if you truly want a marginal effect keep the age ungrouped as a quantitative variable].
Next you have to decide what to do with the group d2=1 age [25-31] do you want to compare that with the "base gruoup" [0,18] or with the group [19-24]? The study you refer again
have chosen an agegroup they keep constant....
Also notice they use Female as reference... some use averages of dummies but one might argue that it dosn't make sense to compare something to an individual being fx. half man half women since
this type of individual is non-existing (I guess ). This I mentioned in the initial reply.
Notice that the study gives a marginal effect for men - that is changing the sex.... but keeping age constant so in reality tha study is comparing Female age 35-49 with men age 35-49. Again
this can be calculated assuming sex=0 for female:
y1 = exp(b1 + b2(sex=0) + b3([35-49]=1) / (1 +exp(b1 + b2(sex=0) + b3(age=[35-49]) )) = exp(b1 + b3)/(1+exp(b1 + b3))
y2 = exp(b1 + b2(sex=1) + b3([35-49]=1) / (1 +exp(b1 + b2(sex=1) + b3([35-49]=1) )) = exp(b1 + b2 + b3)/(1+exp(b1 + b2 + b3))
Off course calculating the difference you need to include all variables of the model and change the value of one variable here sex keeping all other constant at the chosen referencegroup
represented by a vector of values for all independent variables.