Which field of statistics does this fall under? I am trying to learn this.

#1
Hello all. Please see the excerpt below from an article I was reading. I am familiar with R and basic statistical analyses and wished to learn the statistical techniques described below. I am presuming they are part of predictive modeling but is there some particular name for this kind of statistical analysis if I want to read more on it and learn how it is performed so that I can eventually replicate it?
Thanks.

"Using the preintervention cohort, a multivariate logistic regression model was constructed to generate variables that predicted the need for intraoperative or postoperative blood product transfusions. This predictive model was then applied to the postintervention cohort to identify a subgroup of patients who were predicted to receive transfusion, but did not. This subgroup is defined as a “misclassified” population that was predicted to receive transfusion, did not actually receive transfusions, and did not have any difference in clinical outcomes. The area under the receiver-operating characteristics curve was used to assess the accuracy of the models.
Using the postintervention cohort, a decision tree was built to forecast the likelihood of perioperative blood transfusion. Specifically, three separate multivariate logistic regression models—one for demographics (D), one for preoperative comorbidities (C), and one for operative factors (O)—were built and used to generate scores D, C, and O for each patient. Then the three scores were included in the final combined model. Three models were fitted to allow the use of this score at different time point in the course of patient care. For example, operative information is not available in the preoperative setting, but using the preoperative category of the score would provide the likelihood of being transfused. To simplify the computation of the scoring system, the regression coefficients were uniformly rescaled to make the maximum total score 200. The area under the receiver-operating characteristics curves of the models generated by the simplified scoring systems were identical to those derived from the original regression coefficients. Calibration was tested using the Hosmer-Lemeshow goodness-of-fit test. A classification and regression tree was used to define cutoffs for each component using the new scores. Each node split decision in the tree was chosen from the possible cutoffs for all components according to Gini’s coefficient impurity measure. The node and depth of tree were set manually."
 
#2
but is there some particular name for this kind of statistical analysis
The names are:
a multivariate logistic regression model
but it sholuld be called a multiple logistic regression model.

A classification and regression tree
I believe that they mean the same thing with the latest two.

here is an other method:
the receiver-operating characteristics curves
- - - -

But I don't understand this. The way they combine things. It seems like they have made a "home made" analysis.


This make me worried:
To simplify the computation of the scoring system, the regression coefficients were uniformly rescaled to make the maximum total score 200.

So I would not trust them so much.
 
#3
Thank you for your reply. Is this some particular field under biostatistics where they teach you how to tie up multiple regression with classification and decision trees as well as receiving operating curve analysis as used in the example above? Personally, I am familiar with the simple linear, logistic, ordinal, poisson regression models...how do they tie up to decision trees, I’d love to see a step-by-step example of that, any suggestions please?
Again, thanks so much for your reply.
 
#4
I have tried to read the text once again. I still don't understand it. So I don't trust it. Sometimes people are a little bit too creative.

What is the title of the paper and where is it published?
 

hlsmith

Less is more. Stay pure. Stay poor.
#5
Yes, this would just fall under biostatistics. There are diagnostic and prediction modeling components in it. Just as @GretaGarbo mentioned - the writing/approach seems a little more tedious than necessary. If you took the first three courses in biostatistics (biostatistics, linear regression, and logistic regression) you would be close to having the skill set to run these. Or just do a bunch of self-studying with applications.