[SPSS] Multivariable logistic regression - how to transform to achieve linearity?

#1
Hi,

I am doing a project to identify variables that can predict rehospitalisations. There are a few categorical and continuous variables that I am considering. How do I transform the continuous variables such that the it has a linear relationship with the logit?

I am more adept at using SPSS but if advice is given for other statistical package, I am still willing to give it a try.

Thanks for the help:D.
 

noetsi

Fortran must die
#2
First you should test to see if they are non-linear:p If you find them to be non-linear then a series of transformations using roots and powers can be used to make the data more linear. Tukey's ladder is the easiest place to start.

http://onlinestatbook.com/2/transformations/tukey.html

Whatever software you use the process is the same. You create new variables from the Y (most commonly although sometimes you transform X) applying the right power transformation until your relationship is linear. It is essentially a trial and error process - you start with the less extreme transformation and move to the more extreme ones until the relationship is linear.

Note that this process won't always work. Some relationships are inherently non-linear. In some cases this can be addressed, but I don't remember the process (which is complex at best).
 
#3
I don't understand the question. The logit model (because of the link function) is inherently non-linear to begin with, so why should you check linearity?
 

noetsi

Fortran must die
#4
The OP comment

How do I transform the continuous variables such that the it has a linear relationship with the logit?
The logit model is non-linear between the dichotomous Y and the predictor. It is linear between the logit and the predictor which is why you test for linearity in logistic regression.

Or so I have always been taught. If not that is a shock to me....
 
#6
This is interesting as I've never used these ideas in applied research.

I suppose what you are saying is that you look at a plot of x and y and see if you find a straight line because the relationship should be linear? Is this correct?
 

noetsi

Fortran must die
#7
Actually I don't use plots to detect non-linearity. I specify an interaction term between the predictor and it square. If this interaction term is significant for a given predictor non-linearity is indicated. In practice this rarely comes up with my (also applied only) analysis because virtually all of my predictors are dummy variables and they are always linear. Only the continuous ones (I almost never have) can be non-linear.

Correcting it would involve specifying a transformation (of either the X or Y) until the term is not signficant, although you could use the old fashioned method of looking at a graph to see when it becomes linear.
 

hlsmith

Omega Contributor
#8
To paraphrase and confirm that I followed:

In logistic regression, you should check linearity between continous independent variable and the logit.
You can do this by introducing an interaction term consisting of a term multiplied by its quadratic term ((continous variable * continuous variable) * continuous variable) in the model. If significant you have non-linearity.

I wanted to veriry the continous(continous^2) and whether you would also incorporate the basic continous variable at the same time in the model.
 

noetsi

Fortran must die
#9
Note that the formal test of this creates an interaction between a continuous variable and its log (so it is log*Continuousvar). This is Box Tidwell. I was told you could use the square of the continuous var instead of the log of it in the interaction term.
 

noetsi

Fortran must die
#11
I modified what I said considerably. You probably should look there instead of what I wrote oringally.

T&F is Tbachnick and Fidel - I thought we had used this code before. :p

You run the original variables and their interaction term as a predictor. It is the signficance of the interaction terms that matters, however for the test of linearity. T&F confuse this considerably in their comments on p 474-475 where they seem to be referencing the p value of the continuous variable itself. 443 and the comments I linked in my last post clarifies what you are really supposed to do, look at the interaction term.

Note also their modification of the alpha value given FW error discussed on 474
 
#12
Thanks for the help! Using your method, I have managed to transform the logit equation such that it is linear. However, I am still unclear of what I should do now to include the changes into my logistic regression data. Could you help me with it?


First you should test to see if they are non-linear:p If you find them to be non-linear then a series of transformations using roots and powers can be used to make the data more linear. Tukey's ladder is the easiest place to start.

http://onlinestatbook.com/2/transformations/tukey.html

Whatever software you use the process is the same. You create new variables from the Y (most commonly although sometimes you transform X) applying the right power transformation until your relationship is linear. It is essentially a trial and error process - you start with the less extreme transformation and move to the more extreme ones until the relationship is linear.

Note that this process won't always work. Some relationships are inherently non-linear. In some cases this can be addressed, but I don't remember the process (which is complex at best).
 

noetsi

Fortran must die
#13
I am not sure I understand the question. If you transformed a variable to make it linear you replace the non-linear variable with the new transformed one and then run the model as you would with any variable. Note that in interpreting the results you need to reference the transformed measurement (like refering to the logged variable rather than the original scale). Alternatively you can transform back to the original variable after running the model (which is highly reccomended by authors, but which I have never done myself so can not provide details of the interpretation that results).
 
#14
Thanks for the reply. But in binary logistic regression, I am transforming the logit(Ln(p/(1-p))) isn't it? It is not the y, which is a 1 or 0, that I will use to run in the logistic regression.

I am not sure I understand the question. If you transformed a variable to make it linear you replace the non-linear variable with the new transformed one and then run the model as you would with any variable. Note that in interpreting the results you need to reference the transformed measurement (like refering to the logged variable rather than the original scale). Alternatively you can transform back to the original variable after running the model (which is highly reccomended by authors, but which I have never done myself so can not provide details of the interpretation that results).
 

noetsi

Fortran must die
#15
All the transformations I have seen have been of the raw data not of a calculated logit. Even with Y you are transforming the 0 and 1 (that is the original data). But there could well be transformations I have not worked with.
 
#16
In linear regression models and in logistic linear regression, the “linearity” is that the model is linear in the parameters, often called “betas”.

Let LP be a linear predictor:

LP= beta0 + beta1*x1+beta2*x2

It doesn't matter if the “x:es” are nonlinear like x1= log(x01) and x2= (x02)^2

Substitute betas for z:s and the x:es for k:s if it becomes more clear.
LP = z0 + z1*k1 + z2*k2

(When one is searching for least squares or maximum likelihood, then the x:es are as observed constants (“k:s”) and the betas (“z:s”) are varied to try to find the minimum or maximum.)

In linear regression: E(Y) = mu = LP

In linear logistic regression (Y= 1 or 0):

E(Y) =p and

log(p/(1-p)) = LP

which can be solved to the non-linear link:
p = exp(LP)/(1+exp(LP))

Which is an S-shaped function.

- - -

Of course the model must fit! Maybe the original LP does not fit. Maybe it is needed to include a squared term:
LP2 = beta0 + beta1*x1+beta2*x2 + beta3*(x2)^2
But it will still be a linear model since it is linear in its parameters – the betas.
- - -

When you run this in a software you just declare that you want to run logit (or logistic regression) and tell which variable is the 0/1 variables. So you absolutely do not transform the dependent variable. Then you also declare which are your independent variables (x-variables)

To transform a continuous variable by classifying it in “high” and “low” is to throw away information.