# Question regarding best statistical analysis for future study!

#### AK12345

##### New Member
Hello!
First post here, I have a question regarding optimal choice of statistical analysis for a future study.

Overview(simplified): Patients can recieve treatment A or B. Each hospital has a prefered treatment depending on local tradition and the choice of A or B is not affected by patient factors. Treatment A or B can be given in different dosages. We have a dichotomous outcome, Remission: Yes or No.
For choice of dose there are a few important confounders between dosage and remission, for example age.

Our hypothesis is that treatment A is superior to treatment B in terms of increasing remission. We also hypothesize that lower dosages of A and B is favourable in terms of remission.
How do I best handle the statistical analysis?

Comparing A vs B
Considering that patient factors do not influence the choice of A and B, can I do a simple chi-Square test to see if there is a significant difference? Should I do a logistic regression and adjust for potential differences in the populations even though from my viewpoint there are no clear confounders? If I would do a logistic regression with both choice of treatment and hospital there would be a very strong relationship between these two variables seeing as they are linked.

Dosage
The dosage is measured in mg and is a continous variable. I can make it in to a categorical variable. My assumption is I should make a logistic regression and adjust for confounders? Would the hospital be considered a confounder or an instrumental variable?(the hospital does not directly influence outcome but it can effect it trhough mediators, the hospital does also effect dosage through mediators such as experience in staff).

Thanks for any help!

#### hlsmith

##### Less is more. Stay pure. Stay poor.
Welcome to the forum!

The biggest issue I see is that patients are not randomized to hospital. So you say there are no confounders per hospital and treatment - but the background characteristic of patients between hospitals will be different and will impact the effectiveness of treatment. You should create propensity scores for which hospital the patient selected and incorporate those scores / weights into you outcome model.

Also, how many patients are we dealing with? Multiple treatment values can impact things especially if the groups are not approximately balanced. Lots and lots of data here would be great.

Also, how can you be confident that a patient was not omit somewhere else, not to one of these two hospitals? Say they get omitted to hospital 3 or while out of town to a completely different hospital. And lastly, what about competing risks and if they are differential between hospitals. So say patient #1 from Hospital A leaves and gets hit by a car. We will never now if they were gonna have a readmission or not an if this happens more at hospital #2 it will impact results.

#### AK12345

##### New Member
Welcome to the forum!

The biggest issue I see is that patients are not randomized to hospital. So you say there are no confounders per hospital and treatment - but the background characteristic of patients between hospitals will be different and will impact the effectiveness of treatment. You should create propensity scores for which hospital the patient selected and incorporate those scores / weights into you outcome model.

Also, how many patients are we dealing with? Multiple treatment values can impact things especially if the groups are not approximately balanced. Lots and lots of data here would be great.

Also, how can you be confident that a patient was not omit somewhere else, not to one of these two hospitals? Say they get omitted to hospital 3 or while out of town to a completely different hospital. And lastly, what about competing risks and if they are differential between hospitals. So say patient #1 from Hospital A leaves and gets hit by a car. We will never now if they were gonna have a readmission or not an if this happens more at hospital #2 it will impact results.

Regarding getting treatment elsewhere, without going in to too much detail the medication given is tied to a certain procedure that you cant get outside of the hospitals we have data from. It's very unlikely that the patient will get treatment elsewhere in our study. We also only include the first time the patient get this procedure in the study period, most recieve the treatment only once but a few recieve it reppeatedly. Also due to how care is organized in my country it is unlikely that the patient can actively chose hospital, you get the treatment at your local hospital in 99% of cases. My assumption is that neither the doctor nor the patient can chose treatment A or B and that it is only decided by where you live in 99% of cases.

Also I could add that you cant have procedure X without medication A or B, so there is no control group who only gets the procedure and not medication A or B.

We have 7000 patients before some exclusions, i expect 4000-5000 to be in the final statistical analysis.

I think I have to read up on propensity scores, regarding the dosage tied to outcome, does it seem to correct to do logistic regression and put in dosage as independt variable, outcome as dependent and then add potentiel confounders as other independent variables?

Last edited:

#### hlsmith

##### Less is more. Stay pure. Stay poor.
When I teach on this topic, I provide an simulation example where the proportion of events (readmission) are the same between two groups, at say 1 year. However when you run a proportional hazards model you see that one group has a bunch of early readmission and the other doesn't have readmission until closer to 1 year. So when using logistic regression you naively think the drug used on one group does not have an additional risk. However, when you examine time, it is shown that the drug results in many more early readmissions.

You run the risk of missing such an issue when using logistic reg and not addressing for time. Something to think about.

#### AK12345

##### New Member
When I teach on this topic, I provide an simulation example where the proportion of events (readmission) are the same between two groups, at say 1 year. However when you run a proportional hazards model you see that one group has a bunch of early readmission and the other doesn't have readmission until closer to 1 year. So when using logistic regression you naively think the drug used on one group does not have an additional risk. However, when you examine time, it is shown that the drug results in many more early readmissions.

You run the risk of missing such an issue when using logistic reg and not addressing for time. Something to think about.
Thanks for your input! I'll try to expand a bit more on the setting.

In this case we dont expect treatment A or B to have any long term effects on rehospitalisation due to the nature of the procedure and treatment A or B, our outcome is a self-rating scale administered the day after the procedure. This self-rating scale is categorized in to a dichotomous outcome(this is a standard procedure for this self-rating scale in several studies in our field) and we hypothesize that treatment A would improve the scores over treatment B. (Treatment A or B is administered just before the procedure).

#### hlsmith

##### Less is more. Stay pure. Stay poor.

Collapsing data into a binary response loses information. You may want to question that process, even if people have done it before doesn't mean it is best practice.

You don't have to use propensity scores if you address potential confounders in the model. However, using PS means you will generate marginal estimates and using multiple logistic regression means you will generate conditional estimates on the controlled for confounders.