A comparative study of four intensive care outcome prediction models in cardiac surgery patients
© Doerr et al; licensee BioMed Central Ltd. 2011
Received: 31 October 2010
Accepted: 1 March 2011
Published: 1 March 2011
Outcome prediction scoring systems are increasingly used in intensive care medicine, but most were not developed for use in cardiac surgery patients. We compared the performance of four intensive care outcome prediction scoring systems (Acute Physiology and Chronic Health Evaluation II [APACHE II], Simplified Acute Physiology Score II [SAPS II], Sequential Organ Failure Assessment [SOFA], and Cardiac Surgery Score [CASUS]) in patients after open heart surgery.
We prospectively included all consecutive adult patients who underwent open heart surgery and were admitted to the intensive care unit (ICU) between January 1st 2007 and December 31st 2008. Scores were calculated daily from ICU admission until discharge. The outcome measure was ICU mortality. The performance of the four scores was assessed by calibration and discrimination statistics. Derived variables (Mean- and Max- scores) were also evaluated.
During the study period, 2801 patients (29.6% female) were included. Mean age was 66.9 ± 10.7 years and the ICU mortality rate was 5.2%. Calibration tests for SOFA and CASUS were reliable throughout (p-value not < 0.05), but there were significant differences between predicted and observed outcome for SAPS II (days 1, 2, 3 and 5) and APACHE II (days 2 and 3). CASUS, and its mean- and maximum-derivatives, discriminated better between survivors and non-survivors than the other scores throughout the study (area under curve ≥ 0.90). In order of best discrimination, CASUS was followed by SOFA, then SAPS II, and finally APACHE II. SAPS II and APACHE II derivatives had discrimination results that were superior to those of the SOFA derivatives.
CASUS and SOFA are reliable ICU mortality risk stratification models for cardiac surgery patients. SAPS II and APACHE II did not perform well in terms of calibration and discrimination statistics.
Summary of variables included in the different postoperative scoring systems
NYHA IV (cardiac)
Patient dependence on respirator
Central nervous system
Therapeutic low immunity
This study involved an evaluation of prospectively collected data from all consecutive adult patients admitted to our ICU after cardiac surgery. Patients admitted between January 1st 2007 and December 31st 2008 were included and the study was approved by the Institutional Review Board of Friedrich Schiller University Hospital (approval no.: 2809-05/10). Only the first admission was considered for patients who were readmitted to the ICU during the study period. Data were collected from the quality control system QUIMS 2.0b (University Hospital of Muenster, Germany) and from the intensive care information system COPRA 5.2 (COPRASYSTEM GmbH, Sasbachwalden, Germany), which is interfaced with patient monitors (Philips IntelliVue MP70, Amsterdam, Netherlands), ventilators (Draeger Evita IV, Luebeck, Germany and Hamilton Galileo, Bonaduz, Swizerland), blood gas analyzing devices (ABL 800Flex Radiometer, Copenhagen, Denmark) and the central laboratories.
The attending physician collected the study data of all scores for the first postoperative week. Two assigned medical clerks validated the data collection daily. A senior consultant performed a second periodical validation. Inconsistency between the raters was resolved by consensus. There were no missing data. Outcome was defined as ICU mortality. The scores were calculated using the most abnormal value for each variable per day. The maximum derivative of any scoring system (Max-score) was defined as the worst daily score throughout the whole ICU stay. Mean-score was calculated by dividing the sum of all daily values during the ICU stay by the ICU length of stay (ICULOS) in days.
Statistical analyses were performed with SPSS software version 18 (SPSS Inc, Chicago, IL). Graphics were drawn using Microsoft Excel software. Continuous scale data are presented as mean ± standard deviation (SD) and were analyzed using the two-tailed Student's t-test for independent samples. The Kolmogorov-Smirnov test showed a normal distribution of the continuous data. A p value of < 0.05 was considered as significant. Calibration was performed using the Hosmer-Lemeshow (HL) test (goodness-of-fit-test) to insure the absence of a significant discrepancy between predicted and observed mortality. Calibration was considered good when there was a low χ2 value and a high p value (>0.05). Discrimination (ability of a scoring model to differentiate between survival and death) was evaluated with receiver-operating-characteristic (ROC) curves; the area under the curve (AUC) indicates the discriminative ability of the scores, i.e., the ability to discriminate survivors from non-survivors. AUCs enable direct comparison of different scoring systems: An AUC of 0.5 (a diagonal line) is equivalent to random chance, AUC >0.7 indicates a moderate prognostic model, and AUC >0.8 (a bulbous curve) indicates a good prognostic model. The overall correct classification (OCC) (the ratio of number of correctly predicted survivors and non-survivors to the total number of patients) values of the scores were calculated. The risk of mortality is given as odds ratios for all scores with 95%-confidence intervals. All statistical analyses were performed from ICU day 1 (n = 2801) (operative day) to day 6 (n = 431 patients) only, in order to obtain accurate statistical results and to avoid a small number of patients. The preoperative logistic and additive EuroSCORE were also statistically tested.
Type of surgery in the study population
Isolated valve surgery
Combined CABG & valve surgery
Ascending aorta and aortic arch surgery
Combined ascending aorta & valve surgery
Combined ascending aorta & coronary surgery
Congenital, cardiac tumors, pulmonary embolectomy, Assist device implantation
Day 1-6: Logistic regression, OCC, calibration (HL), discrimination (ROC) for EuroSCORE, CASUS, SOFA, SAPSII, APACHEII
ICU-Day 1 (2801)
ICU-Day 2 (2769)
ICU-Day 3 (1234)
ICU-Day 4 (815)
ICU-Day 5 (566)
ICU-Day 6 (430)
Logistic regression/odds ratio, OCC, calibration (HL), discrimination (ROC) for CASUS, SOFA, SAPSII, APACHEII derivatives
Derivative of the Scoring model
Patients undergoing cardiac surgery show temporary pathophysiological effects related to the heart-lung-machine [5, 6] that can influence the values of the postoperative scoring systems  and may make them unreliable in this population. These effects include the relatively long mechanical ventilation time needed to stabilize these patients [8, 9] and the postoperative sedation that limits the role of the Glasgow Coma Scale (GCS) as a prognostic parameter . Electrolyte- and blood glucose imbalances are also frequent . All these factors are temporary and have a limited effect on prognosis. In addition, most currently used scoring systems ignore some of the parameters that can influence outcomes in these patients. The most common examples of this are the use of intra-aortic balloon pumps (IABP) and ventricular assist devices (VAD), and the presence of postoperative low cardiac output syndrome (LCOS) [5, 6, 8, 11]. In 2005, CASUS  was suggested as a specialized cardiac surgery scoring system that took into account the special circumstances encountered in the ICU after cardiac surgery. However, many ICUs are still using the general postoperative risk stratification models for cardiac surgery patients, notably, in central Europe, the SOFA, APACHE II and SAPS II scores. Postoperative risk stratification is increasingly used, especially in cardiac surgery, and we believed it was important to compare these widely used scoring systems with the relatively new model (CASUS) to try and identify the optimal tool in this field.
The APACHE II model , published in 1985, was developed to simplify the original APACHE model and has become the most frequently used general mortality prediction model. APACHE II has been extensively validated, and despite being the oldest system, it still performs well . More recent versions (APACHE III and IV) have not been widely adopted. All the APACHE models are based on the most abnormal values registered during the first 24 h after ICU admission. However, because several studies [13, 14] have supported serial daily usage of postoperative risk stratification models, we chose to evaluate APACHE II on all ICU days. In our study, APACHE II had the worst discrimination of the four models studied but its calibration was better than that of SAPS II.
SAPS II was developed in 1994  based on a European/North American database, which included 13,152 patients. Logistic regression analysis was used to select variables, and for weighting and conversion of the score to give the probability of hospital mortality for ICU patients over the age of 18. Although cardiac surgery patients were originally excluded from the score's target, it is used in many cardiac ICUs. SAPS II has been extensively studied and validated. There seems to be quite convincing evidence of the ability to maintain good discrimination across different populations, but calibration is often poor [15, 16]. Our study in cardiac surgery patients, confirmed the poor calibration of SAPS II and its discrimination was worse than that of SOFA and CASUS. SAPS III  was introduced in 2005 in an attempt to overcome shortcomings related to different case-mixes and lead-time bias of SAPS II. However, its calibration and discrimination set were shown to vary widely around the world  so that many centers in central Europe still use the older version.
The SOFA was originally developed in 1996 as a morbidity risk stratification model for patients with sepsis . Because of its good performance and reliability, SOFA is widely used as a scoring model for ICU patients not only for morbidity but also for mortality prediction . In 2003, Ceriani et al.  suggested the use of SOFA in cardiac surgery patients. Based on the good results they obtained in 218 patients, they concluded that SOFA was applicable in cardiac surgery without any need for specific modifications. SOFA comprises separate daily scores for respiratory, renal, cardiovascular, central nervous, coagulation, and hepatic systems. The scores can be used in several ways, as individual scores (for each organ), as the sum of scores on a single ICU day, or as the sum of the worst scores during the ICU stay.
CASUS was developed based on retrospective analyses to identify descriptors of mortality and multiorgan dysfunction in postoperative cardiac surgical patients. It was then evaluated prospectively in 3230 patients in a single center study . The main goal was to develop a scoring model that was specific to this type of patient and had a minimum number of descriptors. CASUS is, therefore, a compact score index with only ten, readily available descriptors. This scoring system has not yet been externally validated in multicenter studies, and accordingly, has not yet gained much popularity.
The ideal scoring system should not only be simple and reproducible but also reliable. This reliability can be assessed using calibration and discrimination tests, considered by the European Society of Intensive Care Medicine (ESICM) to be the best methods to validate score systems and prognostic parameters . It has been argued that perfect discrimination is important in order to evaluate an individual patient's risk using a scoring system, whereas for clinical trials or comparison of care between ICUs better calibration is needed . Accordingly, validations of scoring systems in the literature have frequently been achieved using good discrimination tests, although the HL test has often resulted in unreliable calibration. The HL test is very sensitive to the size of the study population with large numbers of patients resulting in unreliable calibration . This fact is applicable to study populations larger than 5000 patients , which was not the case in our study. In other words, if the HL-test, in studies with more than 5000 patients, is significant this does not necessarily mean that the scoring systems are not useful or are unreliable .
However, our study, with a more optimal size of study population, showed that APACHE II and SAPS II are not suitable for use in cardiac surgery patients. CASUS and SOFA had an acceptable performance with the HL-test compared to the other two scores. CASUS was clearly superior in its ability to discriminate between survival and death on all days. This predictive property allows complications to be anticipated in individual patients and should alert residents, especially those with relatively little experience, to ask for help. The OCC (the ratio of correctly predicted number of survivors and non-survivors to the total number of patients) was also better in CASUS than in the other scores. We decided not to compare the different scores using odds ratios, because conclusions from such analyses can be distorted, as the maximum points in the different scoring systems vary significantly. Nevertheless odds ratios are useful tools to estimate the risk of mortality. Hence, for example, results can be influenced by different inotropic regimes or fluid replacement strategies in different hospitals. The assessment of the central nervous system may also affect results because the GCS is affected by sedation, anesthesia and paralysis [10, 21, 22], and calculation requires clinical evaluation, which may be biased by subjective interpretation [6, 10, 22]. CASUS is not affected by these problems. Its simple variable, 'neurologic state', can be calculated in less than one minute per patient per day. The parameters included in any scoring system influence its usefulness in different populations of patients. It is, therefore, perhaps not surprising that CASUS, which was specifically constructed for cardiac surgery patients, is superior to general severity systems in this group.
Mean- and max-score derivatives were introduced for SOFA by Moreno et al.  in 1999 and Ferreira et al.  in 2001. These methods were extended by Ceriani et al.  in 2003. We chose to calculate mean- and max-values for all four scores. However, it should be remembered that calculating the mean- and max-values adds some degree of selectivity to the model. The mean-derivative of a model reflects the overall average, whereas the max-derivative highlights the peak of organ dysfunction during the postoperative ICU stay; both are associated with the ICULOS, and thus allow a defined outcome prediction. The mean- and max-derivatives of all scores demonstrated better calibration, discrimination and OCC than the original models.
Similar to other studies, we detected a severe decrease in the study population on the third day because uncomplicated cases had been transferred to the general floor (Table 3). It is therefore important that a score is reliable during the first two days so that patients at risk are not discharged too early potentially leading to ICU-readmission and/or prolonged hospital stay, both of which are associated with higher mortality rates [25, 26]. The good prognostic abilities of SOFA and CASUS in this study suggest they could be used to identify high-risk patients, enabling certain precautions to be put into place, such as daily monitoring of physiological dysfunction , and allowing prognoses and therapeutic choices, including withdrawal of therapy, to be discussed and reconsidered . Nevertheless, no scoring system can replace clinical evaluation at a patient's bedside; they can only serve as an objective tool in decision making. Although scoring systems may provide an indication of disease severity and prognosis in individual patients and assist in overall patient assessment along with full clinical evaluation and other available parameters, they are designed for use in groups of patients and should never be the sole basis for therapeutic decisions .
SOFA and CASUS are reliable tools for risk stratification in cardiac surgery patients. CASUS is more accurate than SOFA in mortality prediction. In contrast, APACHE II and SAPS II are not the tools of choice for this group of patients.
We thank Dr. Tobias Berg of Friedrich-Schiller-University, Jena, Germany for his substantial technical and statistical support, and for the realization of the online CASUS calculation, which can be found on the following homepages http://www.cardiac-icu.org (English version) and http://www.cardiac-icu.de (German version). Furthermore an App for iPhone, iPad and iPod touch is available for free on the iTunes App store: http://itunes.apple.com/us/app/cardiac-icu/id389965786?mt=8.
- Knaus W, Draper E, Wagner D, Zimmerman J: APACHE II: a severity of disease classification system. Crit Care Med. 1985, 13: 818-829. 10.1097/00003246-198510000-00009.View ArticlePubMedGoogle Scholar
- Le Gall JR, Lemeshow S, Saulnier F: A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. JAMA. 1993, 270: 2957-2963. 10.1001/jama.270.24.2957.View ArticlePubMedGoogle Scholar
- Vincent JL, Moreno R, Takala J, Willatts S, De Mendonca A, Bruining H, Reinhart K, Suter PM, Thijs LG: The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996, 22: 707-710. 10.1007/BF01709751.View ArticlePubMedGoogle Scholar
- Hekmat K, Kroener A, Stuetzer H, Schwinger RH, Kampe S, Bennink GB, Mehlhorn U: Daily assessment of organ dysfunction and survival in intensive care unit cardiac surgical patients. Ann Thorac Surg. 2005, 79: 1555-1562. 10.1016/j.athoracsur.2004.10.017.View ArticlePubMedGoogle Scholar
- Ryan TA, Rady MY, Bashour CA, Leventhal M, Lytle B, Starr NJ: Predictors of outcome in cardiac surgical patients with prolonged intensive care stay. Chest. 1997, 112: 1035-1042. 10.1378/chest.112.4.1035.View ArticlePubMedGoogle Scholar
- Turner JS, Morgan CJ, Thakrar B, Pepper JR: Difficulties in predicting outcome in cardiac surgery patients. Crit Care Med. 1995, 23: 1843-1850. 10.1097/00003246-199511000-00010.View ArticlePubMedGoogle Scholar
- Weiss YG, Merin G, Koganov E, Ribo A, Oppenheim-Eden A, Medalion B, Peruanski M, Reider E, Bar-Ziv J, Hanson WC, Pizov R: Postcardiopulmonary bypass hypoxemia: a prospective study on incidence, risk factors, and clinical significance. J Cardiothorac Vasc Anesth. 2000, 14: 506-513. 10.1053/jcan.2000.9488.View ArticlePubMedGoogle Scholar
- Kollef MH, Wragge T, Pasque C: Determinants of mortality and multiorgan dysfunction in cardiac surgery patients requiring prolonged mechanical ventilation. Chest. 1995, 107: 1395-1401. 10.1378/chest.107.5.1395.View ArticlePubMedGoogle Scholar
- Rady MY, Ryan T, Starr NJ: Perioperative determinants of morbidity and mortality in elderly patients undergoing cardiac surgery. Crit Care Med. 1998, 26: 225-235. 10.1097/00003246-199802000-00016.View ArticlePubMedGoogle Scholar
- Marik PE, Varon J: Severity scoring and outcome assessment. Computerized predictive models and scoring systems. Crit Care Clin. 1999, 15: 633-646. 10.1016/S0749-0704(05)70076-2.View ArticlePubMedGoogle Scholar
- Higgins TL, Estafanous FG, Loop FD, Beck GJ, Lee JC, Starr NJ, Knaus WA, Cosgrove DM: ICU admission score for predicting morbidity and mortality risk after coronary artery bypass grafting. Ann Thorac Surg. 1997, 64: 1050-1058. 10.1016/S0003-4975(97)00553-5.View ArticlePubMedGoogle Scholar
- Strand K, Flaatten H: Severity scoring in the ICU: a review. Acta Anaesthesiol Scand. 2008, 52: 467-478. 10.1111/j.1399-6576.2008.01586.x.View ArticlePubMedGoogle Scholar
- Badreldin AM, Kroener A, Heldwein MB, Doerr F, Vogt H, Ismail MM, Bossert T, Hekmat K: Prognostic value of daily cardiac surgery score (CASUS) and its derivatives in cardiac surgery patients. Thorac Cardiovasc Surg. 2010, 58: 1-6. 10.1055/s-0030-1250080.View ArticleGoogle Scholar
- Ceriani R, Mazzoni M, Bortone F, Gandini S, Solinas C, Susini G, Parodi O: Application of the sequential organ failure assessment score to cardiac surgical patients. Chest. 2003, 123: 1229-1239. 10.1378/chest.123.4.1229.View ArticlePubMedGoogle Scholar
- Harrison DA, Brady AR, Parry GJ: Recalibration of risk prediction models in a large multicenter cohort of admissions to adult, general critical care units in the United Kingdom. Crit Care Med. 2006, 34: 1378-1388. 10.1097/01.CCM.0000216702.94014.75.View ArticlePubMedGoogle Scholar
- Aegerter P, Boumendil A, Retbi A: SAPS II revisited. Intensive Care Med. 2005, 31: 416-423. 10.1007/s00134-005-2557-9.View ArticlePubMedGoogle Scholar
- Moreno RP, Metnitz PG, Almeida E: SAPS 3 Investigators. SAPS 3 - from evaluation of the patient to evaluation of the intensive care unit. Part 2: development of a prognostic model for hospital mortality at ICU admission. Intensive Care Med. 2005, 31: 1345-1355. 10.1007/s00134-005-2763-5.View ArticlePubMedPubMed CentralGoogle Scholar
- 2nd European Consensus Conference in Intensive Care Medicine: Predicting outcome in ICU patients. Intensive Care Med. 1994, 20: 390-397. 10.1007/BF01720917.
- Sakr Y, Krauss C, Amaral A, Réa-Neto A, Specht M, Reinhart K, Marx G: Comparison of the performance of SAPS II, SAPS 3, APACHE II, and their customized prognostic models in a surgical intensive care unit. Br J Anaesth. 2008, 101: 798-803. 10.1093/bja/aen291.View ArticlePubMedGoogle Scholar
- Kramer AA, Zimmerman JE: Assessing the calibration of mortality benchmarks in critical care: The Hosmer-Lemeshow test revisited. Crit Care Med. 2007, 35: 2212-2213. 10.1097/01.CCM.0000281522.70992.EF.View ArticleGoogle Scholar
- Vincent JL, Ferreira F, Moreno R: Scoring systems for assessing organ dysfunction and survival. Crit Care Clin. 2000, 16: 353-366. 10.1016/S0749-0704(05)70114-7.View ArticlePubMedGoogle Scholar
- Marshall JC, Cook DJ, Christou NV, Bernard GR, Sprung CL, Sibbald WJ: Multiple organ dysfunction score: a reliable descriptor of a complex clinical outcome. Crit Care Med. 1995, 23: 1638-1652. 10.1097/00003246-199510000-00007.View ArticlePubMedGoogle Scholar
- Moreno R, Vincent JL, Matos R, Mendonca A, Cantraine F, Thijs L, Takala J, Sprung C, Antonelli M, Bruining H, Willatts S: The use ofmaximum SOFA score to quantify organ dysfunction/failure in intensive care. Results of a prospective, multicentre study. Working Group on Sepsis Related Problems of the ESICM. Intensive Care Med. 1999, 25: 686-696. 10.1007/s001340050931.View ArticlePubMedGoogle Scholar
- Ferreira FL, Bota DP, Bross A, Melot C, Vincent JL: Serial evaluation of the SOFA score to predict outcome in critically ill patients. JAMA. 2001, 286: 1754-1758. 10.1001/jama.286.14.1754.View ArticlePubMedGoogle Scholar
- Chung DA, Sharples LD, Nashef SAM: A case-control analysis of readmissions to the cardiac surgical intensive care unit. Eur J Cardiothorac Surg. 2002, 22: 282-286. 10.1016/S1010-7940(02)00303-2.View ArticlePubMedGoogle Scholar
- Michalopoulos A, Stavridis G, Geroulanos S: Severe sepsis in cardiac surgical patients. Eur J Surg. 1998, 164: 217-222. 10.1080/110241598750004670.View ArticlePubMedGoogle Scholar
- Hutchinson C, Craig S, Ridley S: Sequential organ scoring as a measure of effectiveness of critical care. Anaesthesia. 2000, 55: 1149-1154. 10.1046/j.1365-2044.2000.01608.x.View ArticlePubMedGoogle Scholar
- Pintor PP, Colangelo S, Bobbio M: Evolution of case-mix in heart surgery: from mortality risk to complication risk. Eur J Cardiothorac Surg. 2002, 22: 927-933. 10.1016/S1010-7940(02)00566-3.View ArticlePubMedGoogle Scholar
- Heijmans JH, Maessen JG, Roekaerts PMHJ: Risk stratification for adverse outcome in cardiac surgery. Eur J Anaesthesiol. 2003, 20: 515-527. 10.1097/00003643-200307000-00002.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.