- Thread starter alibumay3
- Start date

g(BMI) = B0 + B1 * X1 + B2 * X2 + .... + Bp * Xp,

where g(x) is some linear or non-linear transformation. Experiment with the following values of g(x):

g(x) = x,

g(x) = log(x + A), for some parameter A,

g(x) = x^m, for some parameter m,

g(x) = exp{x}.

See which insights you will generate. Choose the optimal value of g(x) using BIC or AIC... In general, discretizing a continuous variable into just a few categories is loss of information. And information is money.

I was just wondering if the bimodal or multinomial regression

But you should use none of them. You should use the original scale of BMI and use the normal distribution in a usual regression model. By categorizing you will loose information.

But which one do you think is the easiest to understand, the difference of BMI 32 minus 23 versus an odds ratio? Most people does not know what an odds ratio is.

But let's talk about stats. It should be obvious to most people that you have lost some information if you just classify people as 'obese' or 'not obese' versus if you just say that this person has got a BMI of 32 and this person has 23. The last statement contains more information.

In more technical statistical terms a binomial model will be less efficient, that is having higher variance, and having less power than a model based on the actual BMI and the normal distribution. So with a binomial model you will essentially be throwing away information. As a rule of thumb you will throw away 30 percent of your data. (I saw a paper where it was shown that 38 percent of the data were lost.) You can go and tell the people who have done the measurements that now I am going to throw away 30 percent of your data, and then you can see how happy they will be.

(You also risk to get angry shouts from the statistician Frank Harrel: "Don't dichotomize!" Read his blog and twitter.)

So do a model like this:

BMI = a + b*education + other variables + epsilon

where epsilon is assumed to be normally distributed. (Here you do not throw away information.)

Then, if you are absolutley must show an odds ratio, then use the above estimated model, that can be used to estimate the fraction that will be above BMI=30, that is, the model estimates the probability of beeing above 30. Then you will get the odds ratio.