Coronavirus-associated disease 2019 (COVID-2019) has caused a pandemic with unprecedented mortality, morbidity and economic implications. Policymakers and epidemiologists are trying their best to predict aggregate-level trends and inform decision-makers, while physicians and patients look for novel approaches for individual risk prognostication and aversion. Yet, limited evidence is available to support accurate estimation of COVID-19 burden and forecast future trends or predict individual risk.
One of the key challenges in analyzing aggregate COVID-19 data is the multidimensional interplay between aggregate variables,
individual variables, time series features, repeated data, moderators, and dependent variables. This is a key challenge yet
also an opportunity to improve forecasting, if tackled with modern self-learning big data science approaches. It is also evident that
standard analytical approaches, limited to few independent variables and one or two dependent variables cannot inform accurately on
future trends at aggregate level, nor on individual risk prediction. Indeed, clinical risk prediction models developed to data for COVID-19
are limited in scope and accuracy. Finally, to date no model has been able to inferentially test the effectiveness of proposed interventions (eg hard lockdown) or define the risk benefit profile of them (eg survival vs unemployment).
The present research project will have many important implications, both at the aggregate (global, national, regional and local level), as well as at the individual level. First, it will be able to compare different data sources, and weigh their accuracy and informativeness in a comprehensive fashion. Second, it will be able to provide predictions and forecasting analyses at a very detailed level, including estimations for clusters of outcomes of interest (clinical, societal, economical). Accordingly, it will inform on the most appropriate course of action to mitigate and contain risks due to COVID-19, as well as similar biological threats to humanity. Third, it will be capable of providing personalized predictions to individuals, capitalizing on aggregate predictive models as well as individual risk prediction scores, poignantly summarized in an intelligible fashion. All these activities will be self-learning, in the sense that the devised models will be able to adapt and adjust their parameters with appropriate fine tuning algorithms, as soon as new data are collected. The benefits stemming from the present research project will be several in terms of society interests and scope of activities. First, the data source search and selection phase will inform on which datasets on COVID-19 and related info are most reliable, comprehensive and informative, and such guidance will prove crucial even for research or business projects beyond the realms of the present one. Second, the application of automated machine learning and deep learning tools, their comparison, and finetuning using different cycles of forecasting, data collection and model rebuilding, will inform on which analytical strategy, in terms of complexity, accuracy and stability, will prove most effective and useful for aggregate risk forecasting. Such details will prove very useful for other research groups aiming at identifying the most valuable models for aggregate forecasting of COVID-19 risks. In addition, the very models, from the early ones to the final chosen ones, will be provided for free in a dedicated web repository (github.com) to ensure transparency and also foster further use, including by low income and limited resource organization and countries. Such refined forecasting models will also clearly inform on the best management strategies (eg lockdown, drug therapy, vaccination) to implement and their most plausible consequences at the aggregate and individual level. Finally, integration of the final models for aggregate data forecasting with individual risk scores for COVID-19 in a dedicated website and app, which will be made available for free to all interested parties and individuals, will improve global, national, regional, local, and individual risk prediction, decision making, and risk minimization. In particular, the findings of our work will enable quantitative appraisal of global, national, regional and local risks of adverse outcomes, in terms of individual health, but also economic growth/crisis. Accordingly, the models resulting from the sophisticated machine learning/deep learning algorithms will inform on the most appropriate societal, organizational and individual strategies for risk aversion/mitigating (eg lockdown, isolation, use of personal protective equipment, and so forth). Risk quantification and its counterpart in economic terms will prove seminal to refine, when appropriate and required, insurance cost and risk estimation.