Thought experiment (standardized binary variable)

hlsmith

Not a robit
#1
OK, I am working on a corrected LASSO logistic model, which addresses the model building dependence (i.e., variables were not declared a priori but established via the modeling process) and all candidate covariates were standardized to unify their scales before entering them into the model.

But for this question, I believe we can just call this a logistic regression question. So I standardize all candidate variables entered into the model (e.g., 3 are continuous and the rest ~ 10 are binary). We are ignoring the continuous variables going forward in this post, since their interpretation is straightforward. Now the binary variables during the standardization process likely got their value of either 1 or 0 subtracted by the mean (prevalence), then divided by a standard deviation feature. So an example standardized binary variable ends up taking the following values: -3.10 or 0.32, while another binary variable is: -0.72 or 1.38.

So, a complaint about possibly doing this, is that standardized binary variables are much more difficult to interpret. Though, it was felt necessary to unify candidate variable scales given the modeling approach. So my question/comment: when interpreting the outputted log odds converted to odds ratios, am I now just saying the odds of the outcome are XX times greater for a 1 standard deviation increase in the prevalence of the binary variable?

I will happily entertain any feedback or comments - Thanks!
 
Last edited:
#2
In "usual" linear regression (a multiple regression estimated with LS) it doesn't matter what scale we are using. If we change the scale from cm to dm (dividing "x" by 10) then the corresponding regression coefficient will just be 10 times larger. So that LS linear regression is scale invariant. (In contrast to PCA where scales matter.) So it doesn't matter if we standardize in LS regression.

But is LASSO regression (or ridge regression) scale invariant? I don't know. I didn't find anything in a quick search.
 

hlsmith

Not a robit
#3
I believe we can get away with just calling this logistic regression (but there is a penalty). I am out of the office, but I may try tomorrow running it and logistic with different scalings. Though in LASSO etc. the variables need to have the same units, kind of like in nearest neighbors if variables are not similarly formatted certain variables have larger influence in distance calculations.
 
Last edited:

hlsmith

Not a robit
#4
@GretaGarbo et al. I am revisiting this topic - to rephrase my original question: "how would you interpret a standardized binary variable's estimated odds ratio (OR) from a logistic regression model?

UPDATE: The standardization process took either the 0 or 1 value for the binary variable and minus the mean (AKA prevalence of the positive binary value) then divided that difference by the standard deviation of the binary variable: (e.g.,(0 - mean)/std) and (1 - mean)/std).

So I am construing the logistic regression output as a 1 unit increase meaning a standard deviation increase in the prevalence of the binary variable. So say an OR of 1.5 would be interpreted as:

"A standard deviation increase in the prevalence of the binary variable would have a 1.5 times greater odds of the outcome given the fitted model."
 
Last edited:

hlsmith

Not a robit
#6
Thanks for some feedback @Dason

I was just looking for some feedback/confirmation. That and a standard deviation for a continuous variable is a little more palatable than thinking about a transformed binary variable.
 

Dason

Ambassador to the humans
#7
True. But if it helps conceptually you can just think of the original binary variable as a continuous variable (forget that it only takes two values - they're both numbers right?)

Now in terms of applying the interpretation it probably makes more sense to think in terms of going from the 0 level to 1 level right? If that's the case you probably do want to look at something other than a 1-unit change on the scaled variable when looking at your odds ratios. If you convert things so you're looking at the odds ratio for a 1 sd* change (where the sd* is the sd of the original binary variable) that will get you the interpretation of the odds ratio for going from 0 to 1.

But all of this is to say that there fundamentally isn't really anything different in how the regression treats a binary and a continuous variable. If you're interpreting an odds ratio for a 1 unit change on the scaled version that means you're looking at the odds ratio for a 1 sd change on the unscaled version. For a continuous variable that probably makes sense to actually interpret. For a binary it probably doesn't make as much sense to think about what happens when you go from a 0 on the binary scale to a .37 on the binary scale (or whatever that sd ends up being).