I TOO have a question about surgical outcomes and data analysis!

Hey everyone
So I’m new here and I have a few questions about this new project I’ve been assigned to. I’ve worked with statistical analysis to some degree before but it was mainly just assembling data and not really analyzing it.
So I am on this new project where we are analyzing the effects of Virtual Surgical Planning (exactly what it sounds like) pre-operatively and the effect it has on the patient’s outcome.
We have about 19 patients in the study and at first glance this seems like our top priority independent variable that we are examining. But the patient either received VSP or they didn’t so it’s a binary variable basically.
Here is what my boss sent me “The two groups are relatively small but I’d like to do some stasticial analysis regarding demographics (i.e. whether the two populations are stastically equivalent), as well as some sort of regression analysis using virtual planning as the variable we are trying to evaluate across all the data points we have collected.”
So to me it seems we have these as independent variables:
- VSP (did the patient have it or not before their surgery)(the focus of our study)
I’m not sure if THESE are independent variables per se but they are other factors:
- Chemotherapy: did the patient have this prior to surgery? (the surgery encompasses removing the jaw and replacing it with bone harvested from the leg which has been 3D modeled to show where the self-transplant should be taken from and how to bend it and shape it by computer modeling…. A lot of these jaw resections were due to cancer, so that’s why this is a factor)
- Pre-OP XRT: (XRT is radiation therapy for cancer)
- Age at surgery
- Patient gender
Dependent variables I’m assuming:
- Surgery time (measured in hours)
- Length of stay (postoperatively) (measured in hours)
- Complications: this one is confusing to me because they range from either blood clots, to skin rejection. Some patients had no complications. This is one of the more important dependent variables. Since this is basically a quality type variable should I treat this as binary? Either they had complications or they did not?
And finally there are just some odd data points I don’t know what to do with:
- Recipient artery for the skin flap
- Recipient vein for the skin flap
- Recipient artery for the bone
- Recipient vein for the bone
- Comorbidities: does the patient drink? Smoke? Have high blood pressure? Depression? Anxiety? This category is puzzling me to how to deal with it
- Diagnosis (why is this patient having this surgery in the first place?)

So there are a lot of things we are trying to look at and I don’t know what tests to do to tie it all together?
I can’t really do a regression for all of these categories, right, surely I will have to exam some data points exclusively from others?

ANY type of help would be appreciated

I have attached the data if any of you would like to look at it… my post is long but there really is not much data to be scoured over



TS Contributor
I think a different data organization would have been a lot better, like having one row for each patient and columns like Age, gender, complication, time of stay etc. can you assume that the patient in the first row in the age data for instance is the same as the patient that is in the first row for the lenght of stay?

A different point would be the number of sample points. The logic of "I know the sample size is low but I would like to do a statistical analysis" is faulty because the low sample size wll generally give you a low power which makes the analysis useless because you would not know whether the fact that you can not reject the null is coming from the fact that the null is true or from the fact that the power of the test is too low to detect a practically important difference.

E.g. if you run a t-test on the age distribution of the two groups, the sample size would permit you to detect a difference of 17 years (!! :) ) or greater between the average ages of the two groups, anything less would very probably be missed by the test.

But at least you have a nice list of questions, so you could actually start to plan a data collection to be able to answer those by thinking about practically important effect sizes, sample sizes, random sampling and al the rest.

Last edited:


Super Moderator
Yes I believe it was random assignment
I would suggest trying to confirm this. By random assignment I mean by using random number tables, a random number generator, something like that; not just haphazard assignment to one group or other. The distinction is pretty crucial. E.g. if you did use random assignment, then you don't need to worry about or statistically control for pre-existing differences between the participants in the treatment and control groups, as the random assignment will have already controlled these differences.