Decision tree algorithm in locally advanced rectal cancer: an example of over-interpretation and misuse of a machine learning approach

01 Pubblicazione su rivista
De Felice F., Crocetti D., Parisi M., Maiuri V., Moscarelli E., Caiazzo R., Bulzonetti N., Musio D., Tombolini V.
ISSN: 0171-5216

Purpose: To analyse the classification performances of a decision tree method applied to predictor variables in survival outcome in patients with locally advanced rectal cancer (LARC). The aim was to offer a critical analysis to better apply tree-based approach in clinical practice and improve its interpretation. Materials and methods: Data concerning patients with histological proven LARC between 2007 and 2014 were reviewed. All patients were treated with trimodality approach with a curative intent. The Kaplan–Meier method was used to estimate overall survival (OS). Decision tree methods were was used to select important variables in outcome prediction. Results: A total of 100 patients were included. The 5-year and 7-year OS rates were 76.4% and 71.3%, respectively. Age, co-morbidities, tumor size, clinical tumor classification (cT) and clinical nodes classification (cN) were the important predictor variables to the tree’s construction. Overall, 13 distinct groups of patients were defined. Patients aged < 65 years with cT3 disease and elderly patients with a tumor size < 5 cm seemed to have highest rates of survival. But the process over-fitted the data, leading to poor algorithm performance. Conclusion: We proposed a decision tree algorithm to identify known and new pre-treatment clinical predictors of survival in LARC. Our analysis confirmed that tree-based machine learning method, especially classification trees, can be easily interpreted even by a non-expert in the field, but controlling cross validation errors is mandatory to capture its statistical power. However, it is necessary to carefully analyze the classification error trend to chose the important predictor variables, especially in little data. Machine learning approach should be considered the new unexplored frontier in LARC. Based on big datasets, decision trees represent an opportunity to improve decision-making process in clinical practice.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma