# Hierarchical regression with partially nested variables

#### giordano

##### New Member
I would like to do a logistic regression of vaccination coverage of, for example, HPV. Let's assume that we have 50 states (S). 30 states have an urban part and a country part (urbanity variable U). The other 20 have only a country area. 10 of the states have population areas where individuals speaks only Spanish and other areas only English (L). The linguistic region correlates only slightly with urbanity. For example:

S(1)-U(urban)-L(engl)
S(1)-U(country)-L(engl)
S(1)-U(urban)-L(spanish)
S(1)-U(country)-L(spanish)

S(2)-U(urban)-L(engl)
S(2)-U(urban)-L(spanish)

S(3)-U(urban)-L(engl)
S(3)-U(country)-L(engl)

S(4)-U(urban)-L(engl)
S(4)-U(country)-L(spanish)

where the number indicate a state.

The situation is that U is nested in S but not each S has both levels of U. This is also the case for the linguistic variable. If U would be nested completely in S and L completely in S, the model would look like (in R-terminology):
y ~ (1|S) +(1|S:U) + (1|S:L)
Is it possible to estimate odds ratio using this model even if they are not completly nested?
Or Should be used another model, for example:
y ~ (1|S) + U + L
y: 0/1 vaccinated
(1|S) random variable
U and L fix variable.

Or does it makes more sense to use only random variables?
y ~ (1|S) + (1|U) + (1|L)

Problems which arise using fixed variables: would give significant results or narrow CI due to the high number of individuals.
I would be very happy if someone could give me some hints how to cope with partially nested variables.

Last edited: