Is Multivariate Analysis possible?

#1
Hi i am a medical student working on my graduation final paper.
My work is about two cohorts of patients: the first reactivating a viral infection during chemoterapy courses and the second non reactivating it.
I identified 4 potential risk factors for the viral reactivation: the kind of chemo (only 2 types in play, so it's a binomial variable), the blood levels of specific anti-viral antibodies, the total antibodies and the blood levels of lymphocytes (the last 3 are continuous variables). Then i confronted the two cohorts.
I performed Fisher's exact test for the first variable and Mann-Whitney U-test on the other 3 variables. All the 4 variables resulted statistically significant and associated to the viral reactivation in this univariate setting.
Considering that i want to strenghten the significance of my work, i would like to perform some kind of multivariate analysis, but i don't know if it's possible (given i have a both a categorial variable and continuous variables to consider) and i don't know how to proceed.
Thank you very much for your answers!
 

j58

Active Member
#2
So, the outcome is binary: reactivation or no reactivation? And the predictors are a mix of continuous and categorical variables? This is a classic logistic regression problem. Get the book Logistic Regression: A Self-Learning Text by Kleinbaum.
 
#3
Thank you j58, i’ll check the book for sure!
I was wondering, though, to transform the continuous variables into binomial ones choosing a cut-off level for each variable (eg. 400 lymphocytes, with each patient being above or below that level) and confirm their statistical significance with fisher’s tests.
In that way i would have a table of 1 binomial outcome and 4 binomial predictors.
Is that viable to do and does it make any sense?
 

j58

Active Member
#4
You can do it, but you shouldn't. Significance tests are binary. They, at best, tell you if you have enough information to conclude that there is an effect, but they don't tell you how large (or small) the effect is. You should always estimate the effect size by a point estimate and a confidence interval. Regress the outcome (which is binary, right?) on all the predictors (and their interactions -- this will be explained in the book). Leave the continuous predictors on their original scale. From the regression coefficients you can make statements like "a 10-unit increase in lymphocytes predicts a 5% increase in risk of reactivation." You can't get that information from a mere significance test (or even if you dichotomize the predictor in regression). You also get a confidence interval for the effect estimate and a p-value, so you get the significance test for free.

BTW, the correct term is "binary" or "dichotomous," not "binomial."
 
#5
Ok, i appreciate a lot your kindness and i will follow your advices. I am sorry for being imprecise but i am just into basic statistics and english is not my native language. Thanks again
 
#6
Hi, i am sorry for bothering you again. I performed the logistic regression, but i get these unexpected results and i have troubles in interpretating them.

------------------------------------------------------------------------ --- 95% C.I. for EXP(B)
----------------------B------- S.E. ----Wald---df---Sign.----Exp(B)-----Inferior---- Superior
Protocol(1)------- ,383 ----1,333----,083 --- 1 ---,774---- 1,467------- ,108-------19,999
IgG_anti_CMV -----,000-----,014---- ,001--- 1--- ,981-----1,000------- ,973------- 1,029
Nadir_Lymph -----,014 -----,007-----4,753 --1--- ,029----1,014------- 1,001 -------1,028
Nadir_IgG -------- ,005-----,003-----2,023--- 1---,155---- 1,005------- ,998 --------1,011
Constant---------7,187-----3,611----3,961---1--- ,047 -----,001

I am fine with only the Nadir_Lymph variable mantaining statistical significance in this multivariate setting. What i don't understand is its exp(B), which is very low. Is that possible that the model is off because i have only 33 observations?
 
Last edited:

j58

Active Member
#9
There's a rule of thumb (which in my experience is a good one) that the ratio of observations to predictors should be at least 10 to avoid overfitting the data. You should, therefore, probably try to drop one predictor. I would look at the correlation matrix of the predictors. If any of them are highly correlated with another, you can probably drop one of them, since it is giving you essentially the same information as the other. Additionally, when two predictors are highly correlated, when both are in the model both can be non-significant even though each is significant individually. This can occur because with both in the model, neither may explain much unique variance in the outcome.

Regarding the "smallness" of the one significant exp(B), the units that the predictors are measured in affect the magnitude of the regression coefficients. For a predictor X, Exp(B_X) is the odds ratio for a 1-unit increase in X. Whether this is "small" or not, from a clinical perspective, depends on the units of X. If a 1-unit change in X is not clinically significant, but a 10-unit change is, then B_X isn't so small, clinically speaking.
 
Last edited:
#10
There's a rule of thumb (which in my experience is a good one) that the ratio of observations to predictors should be at least 10 to avoid overfitting the data.
I agree with most of what J58 has said. But not with this statement. I believe that it come from one publication where in that application it happened that the sample size was 10 times the number of explanatory variables. In reality they have no relation.
Compare the Placket Burman design where you have 11 variabeles in just 12 experimental runs. Or, think about investigating a very small side effect (but dangerous) in tens of thousands of patients.

It is just a false internet story.

Also, I don't agree here about the model building strategy.
 

Dason

Ambassador to the humans
#12
Greta this has a binary response and the results are asymptotic. 10 observations per result is reasonable. I'm not saying it's required but if you're gonna use classical results it's a decent rule of thumb. I don't know anybody that would suggest a Plackett Burman design for a binary response.