Development and validation of a machine learning predictive model for perioperative myocardial injury in cardiac surgery with cardiopulmonary bypass

Background Perioperative myocardial injury (PMI) with different cut-off values has showed to be associated with different prognostic effect after cardiac surgery. Machine learning (ML) method has been widely used in perioperative risk predictions during cardiac surgery. However, the utilization of ML in PMI has not been studied yet. Therefore, we sought to develop and validate the performances of ML for PMI with different cut-off values in cardiac surgery with cardiopulmonary bypass (CPB). Methods This was a second analysis of a multicenter clinical trial (OPTIMAL) and requirement for written informed consent was waived due to the retrospective design. Patients aged 18–70 undergoing elective cardiac surgery with CPB from December 2018 to April 2021 were enrolled in China. The models were developed using the data from Fuwai Hospital and externally validated by the other three cardiac centres. Traditional logistic regression (LR) and eleven ML models were constructed. The primary outcome was PMI, defined as the postoperative maximum cardiac Troponin I beyond different times of upper reference limit (40x, 70x, 100x, 130x) We measured the model performance by examining the area under the receiver operating characteristic curve (AUROC), precision-recall curve (AUPRC), and calibration brier score. Results A total of 2983 eligible patients eventually participated in both the model development (n = 2420) and external validation (n = 563). The CatboostClassifier and RandomForestClassifier emerged as potential alternatives to the LR model for predicting PMI. The AUROC demonstrated an increase with each of the four cutoffs, peaking at 100x URL in the testing dataset and at 70x URL in the external validation dataset. However, it’s worth noting that the AUPRC decreased with each cutoff increment. Additionally, the Brier loss score decreased as the cutoffs increased, reaching its lowest point at 0.16 with a 130x URL cutoff. Moreover, extended CPB time, aortic duration, elevated preoperative N-terminal brain sodium peptide, reduced preoperative neutrophil count, higher body mass index, and increased high-sensitivity C-reactive protein levels were identified as risk factors for PMI across all four cutoff values. Conclusions The CatboostClassifier and RandomForestClassifer algorithms could be an alternative for LR in prediction of PMI. Furthermore, preoperative higher N-terminal brain sodium peptide and lower high-sensitivity C-reactive protein were strong risk factor for PMI, the underlying mechanism require further investigation. Supplementary Information The online version contains supplementary material available at 10.1186/s13019-024-02856-y.


Introduction
Annually, between 1 and 1.25 million cardiac surgeries are performed worldwide [1]; However, cardio-surgical procedures may induce flow disturbances during cardiopulmonary bypass (CPB), which can lead to perioperative myocardial injury (PMI) [2,3].Additionally, temporary ischemic episodes, cardioplegia reperfusion, and varying vasopressor and inotrope doses can exacerbate myocardial damage [4].Myocardial injury leads to the release of specific biomarkers like cardiac troponin (cTn) I and T, as well as creatine kinase myocardial band (CK-MB).Elevated cTn levels above the 99th percentile upper reference limit (URL) indicate PMI [5].The recommendations on the optimal cut-off of the available biomarkers for the PMI differ significantly as the great variation of the biomarker's kinetics and assay kits [5,6].The PMI with different cutoff values affects the prognosis [7,8].However, there is no relevant research on how cutoff values affect the risk prediction ability.
Machine learning (ML), with its thriving in the medicine domain, has been validated as an efficacious data preprocessing approach [9][10][11][12].However, the performance of ML predicting PMI remains unknown.
Hence, in this study, we hypothesized that ML models, alongside traditional logistic regression, would demonstrate effective performance in estimating the risk of PMI using patient-specific variables across various cardiovascular surgical types involving CPB.Additionally, we expected that the performance evaluation would be conducted using four cardiac centers in China and considering four different cTn cutoff values (40x, 70x, 100x, 130x URL).

Study design and participants
This study was a second analysis based on a multi-center randomized clinical trial (OPTIMAL, conducted at four cardiac centers in China) [13], approved by the Ethics Committee of Fuwai Hospital in Beijing (2018 − 1055) and the requirement for written informed consent was waived due to the retrospective design.An overview of the experimental design is presented in Fig. 1.
Patient inclusion criteria were as follows: (1) male or female adult patients aged 18-70 years, (2) patients who underwent elective surgery with CPB at our institution from December 2018 to April 2021.Patients without a record of cTnI were excluded.Preoperative and intraoperative variables, including demographic characteristics, baseline laboratory values, medical history, medication history, surgery time, CPB time, aortic clamp time, and surgery type, were extracted.
The present study adheres to the applicable Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) guidelines [14].

Model development and evaluation
The enrolled data of Fuwai Hospital (Beijing) were randomly assigned to 80% for the training dataset and 20% for the testing dataset.Furthermore, the data collected from three other cardiac centres in China were utilized for external validation (Fuwai Yunnan Cardiovascular Hospital, the First Affiliated Hospital of Wenzhou Medical University, and Fuwai Central China Cardiovascular Hospital).Development and validation datasets were imputed separately with mean values for continuous variables and frequency for categorical variables.In addition, the standard scaler data normalization technique was utilized to convert the data.A total of sixty available variables were captured to construct the predictive models.Features were selected in the training dataset using the least absolute shrinkage and selection operator (LASSO).In the LASSO model, the coefficients of variables were shrunk to zero, which means they were eliminated from the model.
The area under the receiver operating characteristic curve (AUROC) and the precision-recall (AUPRC) curve were utilized to discriminate the models.The brier score and calibration curve were executed to demonstrate the model calibration.Meanwhile, the accuracy, precision, recall score, and F1 score were also performed to evaluate the model comprehensively.Furthermore, decision curve analysis was also conducted.We selected the bestperforming model based on the combination of these three metrics in the following order of priority: the highest AUROC, AUPRC, and well calibration curve.In addition, visualization of all features was performed, along with ranked feature importance, as derived from the SHapley Additive exPlanations (SHAP) interpreter [15].The methods were in accordance with our previous published paper [16].

Statistical analysis
Categorical variables were presented in numbers and percentages, and continuous variables were presented as mean with standard deviation (SD) or median (Q1, Q3).
The normality test was conducted on the continuous variables.The χ2 test and Fischer's exact test were used for categorical variables (p < 0.05 indicates statistical significance).The student's t-test and the Mann-Whitney U test were applied for continuous variables.
Python programming language (Python Software Foundation, version 3.9.7 and integrated development environment Jupyter Notebook 1.1.0)and SPSS software version 26.0 (IBM Corp., Armonk, New York, USA) were applied in our analysis.

Patient characteristics
A total of 2983 eligible patients were eventually included in this study, 2420 of whom were from Fuwai Hospital (Beijing) for model development (1936 for the training Fig. 1 The overall review of this study and 484 for the testing dataset), and 563 from three other cardiac centres were for external validation.Four different cut-off values of cTn (40x, 70x, 100x, 130x) were used to define PMI.The demographics of the development and validation dataset are described in Table 1.

Model construction
We constructed the eleven ML algorithms and LR models based on the approaches mentioned above.The LASSO selected the following features enter the final models: preoperative variables including age; sex; body

Model performance of different PMI cut-off values
The model performances with different PMI cutoff values were calculated across twelve ML algorithms.The AUC with varying cutoffs were summarized in Fig. 2.
In the external validation dataset, the LR model achieved AUPRCs of 0.67,0.70,0.65,and 0.65, with   -3, supplementary Tables 1-4.Furthermore, the decision curves with four PMI cutoffs are presented in Fig. 3.

SHAP interpreter for the models
SHapley Additive exPlanations (SHAP) summary plot was applied to illustrate the feature importance of the predictive model.High SHAP values indicate an increased risk of PMI.According to the CAT classifier model, in the testing dataset, the top five features with a 40x URL were coronary artery bypass graft (CABG) surgery type, Hs-CRP, body temperature, hemoglobin of end CPB, and neutrophil count; the top five features with a 70x URL were CABG surgery, NT-pro BNP, CPB time, aortic time, and Hs-CRP; the top five with a 100x URL were CPB time, aortic time, NT-pro BNP, surgery time and Hs-CRP; and the top five with a 130x URL were CPB time, aortic time, NT-pro BNP, surgery time and CABG surgery.
In the external validation dataset, the top five features with a 40x URL were Hs-CRP, CABG surgery type, neutrophil count, body temperature, and prothrombin time; the top five features with a 70x URL were CPB time, NTpro BNP, CABG surgery, aortic time and surgery time; the top five with a 100x URL were CPB time, aortic time, surgery time, Hs-CRP and NT-pro BNP; the top five with a 130x URL were CPB time, aortic time, surgery time NT-pro BNP and Hs-CRP.The SHAP values of different cutoffs are presented in the Supplementary Fig. 4.

Discussion
In this retrospective cohort study, we have developed and externally validated the model performance using eleven ML models and the traditional LR method based on four different PMI cutoffs.Consequently, the ML models, especially the CAT and RF models, exhibited better performance in the discrimination and calibration compared Additionally, the top five risk factors across all four cutoffs were prolonged CPB, aortic duration, surgery time, elevated preoperative Nt-proBNP, and decreased preoperative Hs-CRP.These findings highlight the potential use of CAT and RF models in estimating PMI risk and guiding clinical decision-making in cardiac surgery.
To our knowledge, this is the first study to focus on establishing the ML predictive model for PMI with a large sample size.In this study, with various cardiovascular types and URL cutoffs, the CAT model showed potential candidates for forecasting PMI risk among the eleven ML algorithms.The CAT algorithm, a binary recursive segmentation technology, could yield convincing results with limited training data and computational power by reducing the calculating time, overfitting chances, and tuning the hyperparameter burden [17,18].PMI, a common complication after cardiac surgery, has been identified to be associated with substantial short and long-term mortality [19][20][21].However, the cTnI values between different assays and manufacturers may influence the cutoff of PMI, and most of the studies mainly focused on CABG and percutaneous coronary intervention(PCI) [22,23].Thus, we have investigated a wide range of cardiovascular types for the potential risk of PMI.Moreover, we have explored the predictive model with a wide range of cut-off URLs for PMI [6][7][8].The AUROC exhibited an upward trend with each of the four cutoffs, reaching its peak at 100x URL in the testing dataset and at 70x URL in the external validation dataset.However, it's important to note that the AUPRC decreased with each increment in cutoff.Furthermore, the Brier loss score decreased as the cutoffs increased, reaching its lowest point at 0.16 with a 130x URL cutoff.
Furthermore, we have explored the potential risk factors for PMI with four different cutoffs.The previous study has reported that preoperative high-dose statin loading played an essential role in preventing PMI by downregulating the release kinetics of cardiac biomarkers such as cTnI, CK-MB, and Nt-proBNP [24].In addition, hypotension and transcription orchestration played a crucial role in PMI [25,26].However, the above studies merely analyzed the potential perioperative risk factor for PMI during cardiac surgery with CPB.In this study, CPB and aortic clamp time were in a strongly positive correlation with PMI.The plausible reasons may attribute to the following two folds.First, the activation of systematic inflammation response mediated by the CPB circuit upregulates inflammatory cytokines and small molecules such as interleukins-8, interleukins-10, and tumor necrosis factor α, which could exacerbate myocardial injury [27].More importantly, a longer duration of CPB is associated with the increase of plasma levels of soluble syndecan-1, a signal for endothelial glycocalyx degradation, which could precipitate neutrophil egress from the bone marrow contributing to and dilating the systemic inflammatory response [28].Furthermore, prolonged CPB time and aorta clamp time were significantly associated with endotoxin levels.The intestinal mucosa is especially vulnerable to hypoperfusion during CPB.The endotoxin could be dispersed into the circulating blood, exacerbating the myocardial injury [29].It is desperate to enhance the patient management during cardiac surgery, reducing the incidence of severe complications, especially PMI, which is primarily clinically silent and only ascertained by routine troponin screening.Of note, over 90% of elevated troponin patients are absent in ischemia-related evidence of electrocardiographic or echocardiographic.Thus they could not be diagnosed as myocardial infarction defined in the 4th Universal Definition [5].
Consistent with a previous double-blind, randomized controlled trial, the higher Nt-proBNP levels could predict PMI following elective vascular operations [30].Our study confirmed that the preoperative Nt-proBNP was positively correlated with PMI.Although the level of Nt-proBNP is the marker of the overload volume and is utilized to guide outpatient therapy among patients with heart failure [31], a previous study found an additional mechanism of its release, with the potential to modify oxidant stress in the heart [30].Furthermore, we also confirmed that the valvular surgery type was more inclined to suffer PMI [32].Intriguing, the preoperative neutrophils, BMI, and Hs-CRP were negative correlated with PMI, which needs further prospective investigation.
There are several limitations.First, our study was a retrospective design, which may accompany some immeasurable confounding biases.Although we conducted an external validation, further validation is needed before our models are affirmatively applied to other populations, institutions, and regions.Second, our risk model is tailored to patients undergoing cardiac surgery with CPB, which may be inapplicable in other surgery types.Third, the population in our study has mainly undergone valve surgery, which may limit the use in other surgery.Fourth, the optimal cutoff needs more extensive and detailed investigations.

Conclusions
The CAT and RF algorithms could be an alternative for LR in prediction of PMI.Furthermore, preoperative higher Nt-proBNP and lower Hs-CRP were strong risk factor for PMI, the underlying mechanism require further investigation.

Fig. 2 (
Fig. 2 (A) The ROC-AUC of logistic regression and the highest value of AUROC among the elven machine learning algorithms with 40x,70x,100x,130x URL in the development dataset, (B) The ROC-AUC of logistic regression and the highest value of AUROC among the elven machine learning algorithms with 40x,70x,100x,130x URL in the external validation dataset, (C) The PR-AUC of logistic regression and the highest value of AUPRC among the elven machine learning algorithms with 40x,70x,100x,130x URL in the development dataset, (D) The PR-AUC of logistic regression and the highest value of AUPRC among the elven machine learning algorithms with 40x,70x,100x,130x URL in the external validation dataset.SVM: support vector machine; AB: AdaBoostClassifier; RF: RamdomForestClassifier; CAT: CatboostClassifier; EX: ExtraTreeClassifier; LGBM: LGBMClassifier; GB: GradientBoostingClassifier

Table 1
Demographics of development and ex-validation dataset