Logisti regression, mean or median of independent variable.

Greetings great minded statisticians!

I have an issue categorizing my continuous variable. I have put a continuous variable with a negative skew, into three categories. I am using logistic regression and I want to see if my three variables of omega-3 fatty acids will reduce the risk of being in the "psychological distress category" (My dependent variable is 0 - no distress, 1 - psychological distress).

I suspect that when calculating odds rations for the difference between the categories, Stata is using the means of the three categories, and these are being compared against each other and the odds ratio is calculated based on the relative differences between the means of the three categories. This will give me somewhat inflated differences because of the rather extreme values in category three, and I am wondering if I should program Stata to use the medians of the three categories, instead of the mean. Could someone help me get around this?


Less is more. Stay pure. Stay poor.
Not a STATA person, but sure you can look into their program documentation to see how it is running it. Most likely Odd Ratios are being calculated comparing the odds of a group compared to the reference group, while controlling for any covariates. So the procedure is grouping all of the values together and giving that category the same probability. Not really sure what you are trying to do different, if you do not like the large groupings you can always recategorize them or enter the variable as a continuous variable.


hlsmith is right - it has nothing to do with means or medians - by categorising your continuous variable you've thrown away all other information about that variable.

Generally it's best not to categorise continuous variables - use a flexible modeling approach such as splines or fractional polynomials instead. See for example this link