Long-term mortality prediction after operations for type A ascending aortic dissection

Background There are few long-term mortality prediction studies after acute aortic dissection (AAD) Type A and none were performed using new models such as neural networks (NN) or support vector machines (SVM) which may show a higher discriminatory potency than standard multivariable models. Methods We used 32 risk factors identified by Literature search and previously assessed in short-term outcome investigations. Models were trained (50%) and validated (50%) on 2 random samples from a consecutive 235-patient cohort. NN were run only on patients with complete data for all included variables (N = 211); SVM on the overall group. Discrimination was assessed by receiver operating characteristic area under the curve (AUC) and Gini's coefficients along with classification performance. Results There were 84 deaths (36%) occurring at 564 ± 48 days (95%CI from 470 to 658 days). Patients with complete variables had a slightly lower death rate (60 of 211, 28%). NN classified 44 of 60 (73%) dead patients and 147 of 151 (97%) long-term survivors using 5 covariates: immediate post-operative chronic renal failure, circulatory arrest time, the type of surgery on ascending aorta plus hemi-arch, extracorporeal circulation time and the presence of Marfan habitus. Global accuracies of training and validation NN were excellent with AUC respectively 0.871 and 0.870 but classification errors were high among patients who died. Training SVM, using a larger number of covariates, showed no false negative or false positive cases among 118 randomly selected patients (error = 0%, AUC 1.0) whereas validation SVM, among 117 patients, provided 5 false negative and 11 false positive cases (error = 22%, AUC 0.821, p < 0.01 versus NN results). An html file was produced to adopt and manipulate the selected parameters for practical predictive purposes. Conclusions Both NN and SVM accurately selected a few operative and immediate post-operative factors and the Marfan habitus as long-term mortality predictors in AAD Type A. Although these factors were not new per se, their combination may be used in practice to index death risk post-operatively with good accuracy.


Background
Type A acute aortic dissection (AAD) requires emergency replacement of the ascending aorta and/or the aortic arch with or without aortic valve replacement and inhospital mortality ranges from 7 to 30% in recent series [1,2]. Among 526 patients enrolled from 1996 to 2001 by the International Registry of AAD investigators, 30-day mortality was 25.1% on average [1]. A large list of pre-, intra-and immediate post-operative factors may inde-pendently contribute to increase the mortality risk at short-term (see [2] for extensive review). These include: history of aortic valve replacement, migrating chest pain, hypotension and/or shock, cardiac tamponade, limb ischemia, the length of extracorporeal circulation and chronic renal failure. There has also been an effort to investigate whether surgical techniques may contribute to modify the risk; however inconsistent results were obtained as to the role of retrograde, anterograde or selective cerebral perfusion after circulatory arrest [1,2]. More recently, anatomo-surgical parameters [3] and biological indexes, such as D-dimer values above a given threshold [4], were assessed as diagnostic tools, but no study was performed to clarify their potential predictive role. On the other hand, it is largely unknown whether the assessed short-term risk factors may also predict long-term (say 1-to 2-year) mortality in Type A AAD patients.
Aim of the study was therefore to see whether selected risk factors assessed previously for prediction of 30-day mortality risk among Type A AAD patients [1,2], may also contribute to index long-term prediction using neural networks known to have a larger global accuracy as compared to standard models such as logistic regression [2,5]. In addition, to improve discrimination between cases and non cases [6], which is essential once new risk equations are tested in general and in cardiac surgical outcome studies [7][8][9][10] in particular, support vector machines (SVM) were also used [11,12] for the first time on this material.
For each patient there were 32 potential predictors including demographic characteristics and pre-, operative and immediate post-operative variables including dummies (see Additional File 1 for the definition of mathematical, computational or statistical technicalities) constructed in order to index operative techniques and related complications. These were selected based on a Literature review of studies performed to assess the role of relatively short-term potential predictors [2]. Thus, year of surgery, hospital localization, age, sex and presence of clinically diagnosed high blood pressure and Marfan habitus were considered. Among AAD onset symptoms we coded shock and whether intubation was present at arrival or neurological deficits were present. Previous cardiac surgery was also coded. Among intraoperative variables there were: cross-clamping and total circulatory arrest times in min after extracorporeal perfusion started along with operative techniques (whether ascending aorta plus arch or hemi-arch or plus aortic valve and whether by Bentall or Cabrol, all as dummies versus ascending aorta alone). We also coded whether cerebral perfusion was anterograde, retrograde or both. Immediate post-operative complications were noted for each patient and included: total bleeding in ml, limb ischemia, by clinical and CT documentation, renal complica-tions, including oligo-anuria and continuous hemodialysis, gastrointestinal complications such as bleeding and ischemia, and other complications requiring medical or surgical treatment and cerebral accidents, neurological deficits and coma, by clinical and CT documentation. For the definition of the analysed variables we followed those reported in previous studies [1,2].
Follow-up was performed by periodic visits and/or telephone contacts. Death certificates and all pertinent records were reviewed: time and causes of death were considered and patients alive were censored. For the purpose of the study we concentrate here on all-cause mortality.

Statistical Analysis
Data are expressed as means ± SD or SE (when appropriate). The selection of potential predictors was done a priori based on previous knowledge [2,5,13]. Linear correlation with the outcome variable and information value (that is the relative importance of each covariate) were considered. Follow-up data were investigated by modelling the presence (coded 1) or absence (coded 0) of post-operative mortality using Tiberius Data Mining © software (version 6.1.5; see http://www.tiberius.biz) to obtain multilayer perceptron (MLP) neural network solutions. These were from a 3-layer network, including the hidden unit containing 2 neurons (one linear and the second non-linear), with 32 input nodes (corresponding to the 32 potential risk factors selected) and one output unit, modelling the dichotomous risk outcome [2,5]. MLP were trained on a randomly selected sub sample (50% of all patients included), preventing over-fitting [14,15]. Validation was performed on the remaining 50%. Gini's coefficient and graph [16] were produced. Receiver operating characteristic (ROC) areas under the curve (AUC) were compared [17,18] between solutions using MedCalc software (version 9.6.3.0; see http://www.medcalcsoftware.com). To run SVM [11] cSVM (version 3.1.0; see http://www.smartlab.dibe.unige.it) was used with optimal C search on 50% of the overall sample. There are similarities between neural networks and SVM since an SVM with a sigmoid kernel is equivalent to a neural network with a sigmoid activation function and one hidden unit, the difference being only the number of neurons, automatically selected by a SVM [12]. A value of p < 0.05 was considered statistically significant in all cases.

Univariate contributors
The univariate contribution of the 32 potential risk factors for AAD Type A is shown in Additional File 2, Table  S1 among the 235 patients studied (see Additional File 2). These patients were from 2 Cardiac Surgical Centres, one in central and the other in southern Italy, and were fol-lowed-up from 8 months to 7 years post operation. There were 84 deaths (36%): 81 (95%) of these were of cardiac origin, whereas the remaining 4 (5%) presented mixed causes, from accidents to cancer and suicide. Deaths occurred at 564 ± 48 (mean ± SE) days (95%CI from 470 to 658 days). To index the relative discrimination between cases and non cases (variable = Status) provided individually by these factors, the table shows the information value, Gini's coefficient and linear correlation. A good information value (> 0.5) is provided by chronic renal failure, bleeding in the first post-operative 24 hours, extracorporeal circulation and circulatory arrest times, age, and dummies for post-operative neurological coma and immediate post-operative dialysis in continuous. Apart bleeding in the first post-operative 24 hours, the other variables present a high linear correlation and a large Gini's coefficient.

Multivariable contribution by NN
There were 211 (90%) patients who had all variables for analysis whereas missing data were seen from 0.4 to 9.4%, depending on the examined variable (Additional File 2, Table S1). There was a slightly lower, not significantly different death rate among patients with complete information for all variables (60 of 211, 28%), than among the overall studied patients. Neural network model classified 44 of 60 (73%) dead patients and 147 of 151 (97%) longterm survivors using 5 covariates: immediate post-operative chronic renal failure, circulatory arrest time, the type of surgery on ascending aorta plus hemi-arch, extracorporeal circulation time and the presence of Marfan habitus. Figure 1 shows a semi-quantitative graphic presentation of these risk factors for training and validation models. The proportions of dead patients identified by neural network were slightly lower in training and validation runs (respectively 69 and 64%) than in the overall study. However, much similar proportions were correctly identified among long-term survivors (respectively 97 and 100%). Of note that global accuracies (as detected by ROC AUC) were extremely high (respectively 0.871 and 0.870). Figure 2 shows the results of the SVM run on the overall study group (N = 235), since by this method there is no limitation to confine the analysis to patients with complete data for all variables, as with neural networks. A somewhat different picture is provided by SVM as compared to neural network. First, SVM make use of a larger number of covariates, some of which provided little if any information, yet globally enabled to obtain a Gini's coefficient of 1.00 (using 27 of 32 covariates) with no false negative or false positive cases identified among 118 randomly selected AAD Type A patients (error = 0%).

Multivariable contribution by SVM
Second, when validation SVM were run on the remaining 117 patients, the Gini's coefficient was 0.642 (with an ROC AUC = 0.821), a statistically lower (p < 0.01) result as compared to those obtained by neural network model. There were 15 false negative and 11 false positive cases (error = 22%) identified. Third, validation and training SVM used different covariates to predict outcome and there was a relatively different ranked importance.

Variables selected in common by NN and SVM
There were 4 covariates (circulatory arrest time, immediate post-operative chronic renal failure, the type of surgery on ascending aorta plus hemi-arch, and the presence of Marfan habitus) selected in common by neural network models and both training and validation SVM. It is important to consider that a high correlation (r = 0.31) exists between circulatory arrest and extracorporeal circulation times (results not shown).

Discussion
This is the first investigation to adopt neural networks and support vector machines to assess the relatively longterm predictive role of a quite large series of potential risk factors including pre-operative, operative and immediately post-operative variables in AAD Type A patients. The presence of Marfan habitus, the length of circulatory arrest, an intervention on the ascending aorta plus hemiarch and immediate post-operative chronic renal failure were the risk factors selected in common by these methods with a very high global accuracy (ROC AUC > 0.82). Although the factors selected were not new, their combination might be used in practice to enable the construction of risk charts whereby levels of risk might be defined. However, it is clear that the corresponding cells of these charts need to contain a sufficient number of cases and non cases, which is presumably possible only after large multi-centre and/or multinational cooperative efforts will be undertaken. The evidence presented here might contribute to stimulate cooperation to reach this aim.
The presented rules provided very good predictive and discrimination properties, however only Marfan habitus was a parameter that could be used pre-operatively. Determination a priori about which patients are not candidates for surgery is therefore not possible using the evidence of this investigation. Nevertheless, as there were 2 operative parameters contributing to increase long-term mortality risk, it is important that attention is paid to keep the length of circulatory arrest at the minimal level and to consider that an intervention on the aorta plus hemi-arch conveys an independent risk of lower survival. On the other hand, all efforts should be done to reduce the incidence of post-operative chronic renal failure.
The incidence of AAD Type A has been estimated at from 5 to 30 per million people per year in the United States, which is 880 to 147 times less than the incidence of acute myocardial infarction, but still provides an important clinical problem and sometimes a dilemma for the differentiating difficulties between these presentations [1][2][3]. Although biological thresholds of plasma molecules such as D-dimer are actively looked for in order to improve diagnosis [4], this may not have an impact on prediction before the results of larger studies are obtained. Therefore, risk profiling remains crucial. Based on results obtained by the IRAD investigators, short-term mortality could be reduced from as high as 58% in medically treated patients to the current average figure of 25.1% (and sometimes less) when surgery is performed [1]. Risk factors may contribute to better management and a more defined risk assessment [1,2]: in-hospital mortality was as high as 31.4% in unstable patients presenting with cardiac tamponade, shock, congestive heart failure, cerebro-vascular accident, stroke, coma, acute myocardial and/or mesenteric ischemia and acute renal failure at the time of operation, whereas stable patients may present with a mortality as low as 16.7%.
In a previous report we investigated 30-day mortality among 208 patients from 2 Italian Centres [2] using a series of demographic, pre-operative, operative and postoperative characteristics, selected from 37 such variables considered in the Literature as potential predictors of short-term mortality after AAD Type A. When logistic or neural network models were produced in one Centre and applied to the data from the second Centre, for external validation [13][14][15], there were predictors which were  Table S1. Keep = 1 means that covariate may stay in the model. Note that Gini's coefficients are practically identical for training and validation neural network models, respectively 0.742 and 0.741 (ROC AUC: 0.871 and 0.870, respectively). Therefore, training and validation neural network models have a very high, yet similar, accuracy and define a set of 5 predictive covariates useful to index long-term mortality in patients operated for Type A ascending aorta dissection.

Variable
When the performance and/or reliability of predictive models is limited, or of low sensitivity and specificity, their capability may be hampered to identify high risk subjects who deserve individualized treatment [13]. The neural network method stems [14,15] from its potential for improved predictive performance by exploring, hidden layers to find nonlinearities, interactions and nonlinear interactions among predictors. The attraction of neural networks is quite evident from the impressive growth of results published [15]. However, there are relatively few comparative reports on the performance and accuracy of neural networks, which was assessed only versus multiple logistic function, to predict events in clinical [9] or epidemiological [5,18] cardiovascular studies.
There has been some controversy as to whether new risk predictors, or series of old and newer ones, can add to the prediction of events, including mortality, in terms of clinical utility, impact or discrimination [6]. Although in clinical and epidemiological experiences discrimination metrics (such as ROC AUC) are quite well established methods [2,5,18,20,21], it has been pointed out that ROC AUC are insensitive in comparing models [6], which may be circumvented however by making comparisons with fixed number of covariates [5]. To evaluate and compare predictive risk models there have been therefore new methods to be proposed, based primarily on stratification into clinical categories on the basis of risk and attempts to assess the ability of new models to more accurately reclassify individuals into higher or lower risk strata [22,23]. Risk reclassification for single factors can be then examined by using models with and without each risk factor in turn or measuring the net reclassification improvement, that is the difference in proportions moving up and down risk strata among case patients versus control participants [6,23]. Whatever reclassification method is selected it is important to understand that when length of follow-up differs (as in the present series) among individuals and/or the cohort is relatively small it may be impossible to apply them [6]. Moreover and more importantly, reclassification methods depend on the particular categories used [6]: in our case it is far from established if a 5%, 10%, 20%, 30% or more are adequate categories of long-term risk of AAD Type A. To compare with established experiences in preventive cardiology [20,24] or coronary by-pass surgery [25], the sensitivity and specificity of the abovementioned thresholds should be accurately assessed, which again calls for large amount of data being collected and therefore improved multicentre collaboration.

Conclusions
The classification provided by neural network models and related SVM may represent a compromise to cope with the necessity to assess the clinical relevance of variables used for predictive purposes in AAD Type A patients, but also in different areas of research. These methods may also go beyond the classical contention of standard predictive models, namely that only predictors that are statistically significant are typically used [6]. Indeed, with SVM a high discrimination is obtained by using a large number of variables, most with little informative content if used alone. As we have shown, however, it is extremely important not only to train but to validate these methods, which demands further study and the accumulation of very large data sets. Our results may well stimulate these efforts.
An important take-home message for clinicians should be that with neural networks and SVM, by concentrating on a few risk factors such as those described here, it is possible to predict long-term mortality in AAD Type A patients with a global good accuracy. We produced an html tool (see Additional File 3) based on the neural network solution reported here, whereby it is easy to appreciate that increasing from 60 to 80 min the circulatory arrest time, the patient long-term risk category evolves from false (survival) to true (dead) at an assessment strength (roughly the degree of certitude) of 1/3. By further increasing circulatory arrest times to 120 and 180 min, the assessment strengths become 2/3 and almost 1, respectively. Although Surgeons know well and from decades that this is a hardly steerable variable in the clinical practice, a dimensional outcome predictive assessment might be obtained using our tool immediately after the operation is finished, which may have an impact for further clinical decision making. The other variables described in the present study might also be used for predictive assessments so that a very large combination of clinical presentations could be easily modeled.