Help in statistical design of cardiovascular research protocol please - thoughts and suggestions appreciated, as I am out of depth

Here is the big picture - read on its interesting.

People take anticoagulant drugs (ie blood thinners such as warfarin, NOACs) to reduce their chance of blood clot formation (that causes strokes) when they are diagnosed with a common heart arrhythmia called atrial fibrillation (AF). This is a lifelong treatment. As you can imagine, thinning ones blood also makes them more prone to bleeding (especially intracerebral - ie bleed inside ones head that can be fatal).

In current clinical practice, doctors use a risk stratification score called HASBLED, that can predict a patients risk of bleeding while on those blood thinners. This has a c-index of 0.62.

For my project, I am going to perform a “blood thinness” test on patients who take this drug and measure how thin their blood is while on these blood thinning drugs. The idea is that the blood test result can then be incorporated into a new risk stratification score that will hopefully be more accurate in predicting which patient is at high risk of bleeding (ie the score will have a higher c-index, anything above 0.70 is acceptable).

What I already know
1. The bleeding event rate for people on blood thinners is 2.3% per year. ( ie if you monitor 100 people taking the drug for an entire year, 2.3 of them will have a bleed within 12 months)
2. I can follow people up for up to 3 years
3. The aim is to achieve a margin of error 5% with a 99% power
4. The blood thinness test results are normally distributed in those patients
5. I can allow for up to 6 confounding variables (all normally distributed) to improve the r2

What I do NOT know - (those are my questions)
1. the blood test thinness value that I will use as a cut-off to dichotomise the patient population between low-risk for bleeding vs high-risk for bleeding. Alternatively the patient population ca be divided into 3 groups (low, moderate and high risk of bleeding) [I dont know if this makes it easier from a statistical proof point of view)
2. How many patients I will need to enrol in the study to achieve statistical significance as above (sample size calculation)

A. Any ideas of what the best/most efficient statistical method would be to approach this issue, given the limitations above?

B. Can someone walk me through the study sample size calculation and the best way to statistically analyse my results? (Chi-square, t-test, wilcoxon ranked sum test, analysis of variance (ANOVA), Kruskal-Wallis test. Pearson’s and Spearman’s for correlation analysis)

Any help and explanation on the above would be much appreciated. Thank you for taking a moment to look into this.


Less is more. Stay pure. Stay poor.
No offense, but given your description you are out in the ocean with no land in sight. How about when I get done providing advice, you can help me become a cardiologist? ST segment decompression, ECGs, arrhythmias, occlusions, stents, just help me fill in the rest and I should be fine and I will ask human subjects to allow me, given my newly minted credentials, to collect their protected health information and come in to have their blood drawn which an insurance company or gov can pay for, right.

If you are at an academic institution, please consult your biostats department and bring on a sub-I, with experience. This isn't something you will likely be able to manage.

Welcome to the forum and fell free to post more questions!