dummy variables in logistic regression

#1
Hello everybody

I have a logistic regression:

  • The dependent variable is dichotomous and was measured in four different countries (A, B, C, D)
  • There are several explanatory variables. One of them is the country where the dependent variable was measured.
When I use the "country" as explanatory variable, "R" dummy-codes it. There is one reference-category (e.g. A)
and regression coefficients are being computed for the three remaining categories (B, C, D).

However, in my case there is no reason to define a refence-category as all countries are equally important. I am not
interested in whether there is for example a difference between country "B" compared to country "A", but in whether there is a
difference between A (or B, C, D) compared to the other three.

Are there any reasons against using four dichotomous variables (A: yes/no; B: yes/no; C: yes/no; D: yes/no) instead?

Thanks for your advice!
 
#2
Yes, the problem is that your model can't be specified uniquely, and the regression coefficients can't be calculated. If you included four dummy variables, then the fourth can be expressed as a function of the other three. That means it's impossible to determine the regression coefficients, since there are multiple possible solutions for the regression equation (just like an equation with more than 1 unknown can't be solved uniquely).
 

noetsi

Fortran must die
#3
I think you can write contrast statements to compare the mean for example test that the mean of A = 1/3B+1/3C+1/3D or that A=B=C=D. But it has been a very long time since I worked with those.
 

obh

Active Member
#4
If you want to estimate probabilities using the multinomial logistic regression it doesn't really matter which value (A,B,C,D) is the reference.
So the "problem" is only with the interpretation.

After defining A as the reference you get all the results as a comparison to A, but you can easily translate it to any other relation like the odds of B related to C or the odd of C relate to D.

See the following example:

In the following example if you run A as a reference and when to calculate C compare to B:

C to B = C to A / B to A = 2.7723/3.9961=0.6938
C to B = C to A / B to A = 0.9668/0.8780=1.1012 (rounding)


Example with A, B, C when A is the reference

Interpretation
When all the values of the predictors (Xj) are zero:
The odds of B in comparison to A is: 3.9961
The odds of C in comparison to A is: 2.7723

One unit increase in X1:
Will decrease the odds of B in comparison to A by 12.2% (a.k.a. the odds will be multiplied by 0.8780).
Will decrease the odds of C in comparison to A by 3.3% (a.k.a. the odds will be multiplied by 0.9668).


Example with A, B, C when B is the reference

When all the values of the predictors (Xj) are zero:
The odds of C in comparison to B is: 0.2502
The odds of A in comparison to B is: 0.6938

One unit increase in X1:
Will increase the odds of C in comparison to B by 13.9% (a.k.a. the odds will be multiplied by 1.1389).
Will increase the odds of A in comparison to B by 10.1% (a.k.a. the odds will be multiplied by 1.1012).

Since it is a bit confusing to interpret the multinomial regression, I used the following interpretation calculator: http://www.statskingdom.com/430logistic_regression.html