# Thread: Logistic regression versus chi square test for RCT data

1. ## Logistic regression versus chi square test for RCT data

Hi

My understanding of statistics is basic and mostly on an applied level, but now I have come into a discussion with my project leader and I'm forced to try to understand a bit more. So my use of terms might not be 100% accurate, and English i not my language, but I hope I will be able to state my questions clearly enough...

The questions are conserning chi square test versus logistic regression and how these differ theoretically and/or on a more applied level.

We are analyzing data from a large (N>4000) two-arm randomized controlled trial on a public health intervention. The main outcome is dichotomous.

We originally planned to test for differences between the two treatment arms with a regular chi square test. We will also calculate the OR with corresponding 95% CI.

Reading a lot of published papers within the same field, I see that a lot of these use logistic regression instead of a chi square test. Some of them with simple, univariate analysis with treatment allocation (group a or b) as the only predictor, others use multivariate models including other baseline characteristics in the model.

I was asking the project leader why we didn't just to a simple logsitic regression instead of the tedious process of separate chi square and OR calculatiions. With the regression approach we immidiately get the ORs with CIs.

The project leader said that that didn't make any sense, we were not trying to "predict" anything. Logistic regression was not to be used in RCTs.

I know that a chi square test in theory is more of a descriptive test, and that regression is about estimating a predictive model. But I also now that a chi square test and simple logistic regression yields basically the same results. And while a chi square test might not be predictive in nature, the conclusions we land on when looking at the P-values together with the reported ORs are predictive in nature. The same way we would use the results from a regression analysis. In addition there is an overwhelming amount of published papers in high ranking journal that kind of makes my project leader's opinions look a bit strange.

So what am I missing here?

We also had a dispute about doing adjusted analysis (multivariate logistic regression) in an RCT. Again, my project leader says it does not make sense. But now my project leader said that you COULD do it if, and only if there was a big difference for one or several of the baseline characteristic between the two groups. Well, to me it makes perfect sense that you should do it in such a situation ... but it kind of contradicts what was said about prediction earlier. And, after a quick search I found lots of papers advocating the use of adjusted analysis in RCT data, even though perfect randomization had occured. I have noticed that it is debated whether it si a sound strategy, but it certainly does not look like it is a big no-no.

Kind regards
O.

2. ## Re: Logistic regression versus chi square test for RCT data

I am not a purist or theoretical person at all, applied much like yourself. Completely onboard with your comments. Modern software allows you to do quiet a bit with logistic regression and as you stated it can be used for multivariable regression. If I read an article, I may be more inclined to think less of the analyses done solely with chi-square (however biased or ignorant that statement may be).

It may come down to in the logistic, if you are trying to talk more about the beta coefficients or not. In addition, in logit models you can get global null hypothesis test which use chi-square (e.g., likelihood ratio, score, and wald).

Would be interested to see what everyone else writes.

3. ## The Following User Says Thank You to hlsmith For This Useful Post:

pirflax (03-16-2013)

4. ## Re: Logistic regression versus chi square test for RCT data

Like hlsmith, I am not a statistician, and have learned some basics (unlike hlsmith who is advanced) only when faced some problems in my own researches. Still I guarantee that logistic regression, especially multivariate one, is so much better than a dull chi-square. The difference is that it takes into account other independent variables when assessing each independent variable. Therefore, it can control for the confounders (as far as it has the data pertaining to other independent variables) and give you standardized coefficients, where the correlation between the dependent and each independent variable is calculated when the effect of all other independent variables is kept zero.

Logistic regression can be used for prediction but is not limited to prediction. It gives the standardized correlations and ORs when other variables are controlled for. So it is absolutely better.

We also had a dispute about doing adjusted analysis (multivariate logistic regression) in an RCT. Again, my project leader says it does not make sense. But now my project leader said that you COULD do it if, and only if there was a big difference for one or several of the baseline characteristic between the two groups.
It is a common approach to enter only those independent variables into the regression that have differences close to significant or significant in bivariate chi-squares. For example, in this common approach, only those independent variables are added to the regression model that their bivariate comparisons using chi-square have led to P values smaller than 1.5 or 1. But that P value does not necessarily need a big difference, and this common approach is not the only accepted one (if, and only if is not right I think).

And, after a quick search I found lots of papers advocating the use of adjusted analysis in RCT data, even though perfect randomization had occured.
Perfect randomization cannot eliminate many confounders. Multivariate approaches are a great help in adjusting some of other remaining confounding variables (or many of them, depending on the case).

5. ## The Following User Says Thank You to victorxstc For This Useful Post:

pirflax (03-16-2013)

6. ## Re: Logistic regression versus chi square test for RCT data

Thank you both!

At least I can conclude that what my project leader says is wrong. But she kind of contradicts herself anyway with saying "you absolutely cannot use logistic regression on RCT data, we are not trying to predict anything" (I could argue that we are, but that is a different discussion) and "you might use it if you had imbalance in baseline characteristics between the two groups" (like suddenly it is OK to use log reg even though we are not "predicting")

We are talking about a internationally highly rewarded professor here, so I sometimes have to pinch my arm to see if I'm awake when I hear these things. And I trusted the person so much that I started getting real doubts whether I was missing something and/or that I was not able to interprete the evidence I had at hand (tons of papers and colleagues).

Originally Posted by victorxstc
Perfect randomization cannot eliminate many confounders. Multivariate approaches are a great help in adjusting some of other remaining confounding variables (or many of them, depending on the case).
Didn't quite understand this. If a confounding variable is perfectly randomized between the two groups, it wil not affect a test for between-group differences, or ...? There might be other reasons to still want to adjust for confounders, but perfect randomization will eliminate one of the reasons, if I have understood it correctly.

7. ## Re: Logistic regression versus chi square test for RCT data

Originally Posted by pirflax
Thank you both!

At least I can conclude that what my project leader says is wrong. But she kind of contradicts herself anyway with saying "you absolutely cannot use logistic regression on RCT data, we are not trying to predict anything" (I could argue that we are, but that is a different discussion) and "you might use it if you had imbalance in baseline characteristics between the two groups" (like suddenly it is OK to use log reg even though we are not "predicting")

We are talking about a internationally highly rewarded professor here, so I sometimes have to pinch my arm to see if I'm awake when I hear these things. And I trusted the person so much that I started getting real doubts whether I was missing something and/or that I was not able to interprete the evidence I had at hand (tons of papers and colleagues).
In my initial post, before editing some parts out, I talked about an accredited so-called statistician who knows almost nothing about stats and yet runs many research centers!

Many professors can make mistakes and many of them do not have the courage to admit it. So they have to escape with contradicting themselves, hoping that their students do not attempt to catch them and break their pride... I quite sympathize with you If she simply admitted that she had made a mistake (something all human have the right to do) and then corrected herself, she might look much legible in her student's eye, than trying to say the correct thing without admitting the previous mistake.

Didn't quite understand this. If a confounding variable is perfectly randomized between the two groups, it wil not affect a test for between-group differences, or ...? There might be other reasons to still want to adjust for confounders, but perfect randomization will eliminate one of the reasons, if I have understood it correctly.
Uh maybe I have misunderstood your point by the word "perfect randomization". First I think we don't randomize the confounding variables, but we randomize the treatment. Besides there is many of those confounders, not only one. So if you meant balancing them by randomizing them, it seems impossible to me.

Off topic:
Then let me explain about my understanding of the word perfect randomization, this way:

Perfect randomization might mean a randomization done perfectly, meaning that human intervention in randomizing the treatment was excluded and the randomization is not biased... This is close to what I thought of that word.

That word can also mean a randomization that gives us two groups that are perfectly "balanced" in terms of all the confounding variables. I think this is what you were referring to by the word "perfect"... right? Well I think this is not possible, nor it is a fully random state. It seems like a desirable randomization, not necessarily a perfect one. Since, in the perfect randomization, we can't know or expect any specific result (if we can, it is not perfectly [or purely] random but it is a specific type of random which we want [unless our initial sample is infinite]). [End of Off topic!]

Ok you have a n = 4000 sample. Lets assume you randomly chose 2000 people to receive a treatment. Your randomly generated group would have for example 54% females, 22% smokers, and an average age of 27 +/- 8 years, etc. The other randomly selected group would have for example 47% females, 28% smokers, and an average age of 25+/-7. You compare each of these characteristics and see that for example the age is significantly different between the two groups. Or maybe gender has a huge effect on that specific treatment you are testing, and the existing 7% difference in the male/female ratios of the two groups can affect your results... Or maybe smoking can affect your results... Remember that we have talked about only 3 confounders so far. There might be numerous other confounding variables, and no randomization can give a sample balanced in terms of some of them, let alone all of them.

So what can we do now? A thing is that we use more sophisticated sampling methods. For example, first make sure the number of females and males in the groups is similar, then randomize the treatment between them. This is an accepted method of matching, but is killing difficult when we have to match the two groups according to 3 or 4 traits (gender, age, smoking, ethnicity, etc.), or when our sample size needs to be large...

Another solution is that (besides trying our best to balance the sample), we can enter the confounders such as age, gender, and smoking status (and other characteristics) of the participants into a multivariate regression analysis, and control for their effects when assessing the effect of the treatment on the dependent variable. The more confounding variables we can collect and document from each patient, the more accurate our model will be (which can better rule out the confounding effects).

8. ## Re: Logistic regression versus chi square test for RCT data

Originally Posted by victorxstc
Uh maybe I have misunderstood your point by the word "perfect randomization". First I think we don't randomize the confounding variables, but we randomize the treatment. Besides there is many of those confounders, not only one. So if you meant balancing them by randomizing them, it seems impossible to me.

....

Perfect randomization might mean a randomization done perfectly, meaning that human intervention in randomizing the treatment was excluded and the randomization is not biased... This is close to what I thought of that word.

That word can also mean a randomization that gives us two groups that are perfectly "balanced" in terms of all the confounding variables. I think this is what you were referring to by the word "perfect"... right? Well I think this is not possible, nor it is a fully random state. It seems like a desirable randomization, not necessarily a perfect one. Since, in the perfect randomization, we can't know or expect any specific result (if we can, it is not perfectly [or purely] random but it is a specific type of random which we want [unless our initial sample is infinite]). [End of Off topic!]
Jupp, with "perfect randomization" I ment perfectly balanced groups. And I agree that understanding "prefect randomization" as perfectly balanced groups does not sound right - probably more like a contradiction i terms. I would be very suspicious if a standard, unbiased randomization turned out with two perfectly balanced groups over 20+ covariates

9. ## Re: Logistic regression versus chi square test for RCT data

I really dont know much about this, but to say that regression is only for prediction does not make much sense i my understanding. This will give you much more information

10. ## Re: Logistic regression versus chi square test for RCT data

Originally Posted by victorxstc
The more confounding variables we can collect and document from each patient, the more accurate our model will be
I just want to make a note regarding this. The more variables we put in in the model, the better the model will be for our sample. Too many independent variables in a regression may lead to bad out of sample predictions. We often want a model to be as parsimonious as possible.

11. ## Re: Logistic regression versus chi square test for RCT data

Originally Posted by Englund
I just want to make a note regarding this. The more variables we put in in the model, the better the model will be for our sample. Too many independent variables in a regression may lead to bad out of sample predictions. We often want a model to be as parsimonious as possible.
agree on that. But again, one want to predict with as much accuracy as possible

12. ## Re: Logistic regression versus chi square test for RCT data

Originally Posted by Englund
Too many independent variables in a regression may lead to bad out of sample predictions. We often want a model to be as parsimonious as possible.
Too many and parsimonious are words a little bit general. How many exactly and on which basis? Multicollinearity for example? Ok but I could not go into every detail in a single post in response to a question about perfect randomization! But I still think if those are confounding variables, the more included, the better.

13. ## Re: Logistic regression versus chi square test for RCT data

Agreed. We can at least state that we want a model to be as simple as possible, but not simpler than necessary

 Tweet

#### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts