The development of innovative analytic tools for data-driven models is one of the most important issues in modern healthcare. The amount of data available either from a single patient or from a population of subjects could be difficult to understand and could slow down the diagnostic and therapeutic approach. Machine Learning (ML) are at the forefront of such a data-based revolution. However, descriptive or prescriptive models in healthcare need to be easily interpretable and assessable, but this is not the case of cutting edge ML algorithms such as Deep Neural Networks, Support Vector Machines or Random Forests and it poses a barrier to the adoption of these methods due to lack of explanations on the decisions. On the other hand, Decision Trees offer nice interpretability but lack the most important property in ML, which is generalization ability.
In this project, we propose to use Mixed Integer Optimization (MIO) to develop an optimal decision tree which encompasses hyperplane or even more complex splits that use multiple features for dichotomic branching at the nodes and allows to include further constraints on the characteristics of the final tree. The best values of the tree parameters are found using specialized exact algorithms.
The new ML model is used and compared with Deep Neural Networks, Support Vector Machines or Random Forests on two different problems in postural and rehabilitation medicine.
The project will comprise three connected threads:
1. design of new ML models and optimization algorithms;
2. Collection of postural data and definition of ML model for classification in healthy and not healthy;
3. Collection of data from medical records and definition of a ML model for the prediction of the rehabilitation outcome.
The research aims to investigate new optimization-based methods for Machine Learning and apply them to real-world healthcare problems. The group is composed of experts in the different fields of optimization, machine learning and healthcare, facing together the heterogeneous challenges of predictive healthcare. For every aspect of the project, several points confirm the potential impact of the approach proposed.
Algorithmic improvements
Optimization-based algorithms have been both theoretical and practical pillars in ML-based disciplines. Even if their relevance in DNNs and SVMs is well known, so far researchers have not paid enough attention to the definition of optimization algorithms for the training of classification trees. This was mainly due to the high level of computational complexity of the related integer optimization problems. However, the great improvements in the last twenty years in both algorithms and hardware make feasible the solution of MIOs needed in ML applications. Indeed in [Bertsimas, D., & Dunn, J. (2017)] the first linear mixed integer formulation (MIO) for classification trees has been proposed, inaugurating a new era in interpretable ML for healthcare.
Concerning innovative algorithms in the class of classification trees, we are prone to follow different lines of research. The first proposal consists in modifying the classical MIO formulation to overcome the not bearable computational burden for large datasets. Indeed several improvements can be proposed starting with the definition of a new objective which is the combination of two different objectives, one concerning the error and the other concerning the level of complexity of the classification tree. Indeed the tree complexity can be measured using different criteria and we propose to use a linearization of a non-linear function which leads to a new MIO problem with fewer integer variables and less symmetry and therefore well suited for branch and cut algorithms. Moreover, ad-hoc Chvàtal-Gomory cuts can be included to improve the formulation. The potential impact of these improvements is to drastically reduce the computational burden needed to find an optimal classification, making interpretable ML finally available to healthcare.
Postural analysis based on non-invasive clinical data
Applying ML models to healthcare aims to bridge the gap between the personal experience of doctors and the available data. Particularly for postural analysis, non-invasive clinical tests can be performed on patients, detecting hundreds of measures indiscernible to the human eye. Machine learning tools can discover unknown relations between these values; and optimal classification trees are intended to overcome both inexplicability of the model and uncertainty in the certification of the procedure. This research aims to produce a tree-based model able to use non-invasive clinical information to predict postural-related information without hurting the patient, consuming fewer resources and reducing the economic burden.
Prediction of the effectiveness of the rehabilitation procedure based on clinical data
The prediction of a rehabilitation procedure relies on a great variety of parameters, where some of them are influenced by the personal perception of the physician who takes the record, making data biased and noisy. Furthermore, traditional statistical tools proved to be unreliable for many problems since the nature of hidden distributions and relations is more complex than the one tackled by classical models. The application of interpretable ML techniques can overcome these issues, leading to a gain of insight by the physician not available so far. The potentials are on one side a reduction of organizational costs and negative externalities, on the other the possibility to take track of patient improvements in real-time. To this aim the possibility of introducing also online algorithms can be relevant to iteratively include daily-based data and adapt to new conditions.
- Bertsimas, D., & Dunn, J. (2017). Optimal classification trees. Machine Learning, 106(7), 1039-1082]