In a protocol for automatic contact tracing, devices in a distributed environment (e.g., smartphones) exchange information that enables to keep track of each person's recent contacts in an automatic and transparent manner. According to epidemiologists, the latter is an effective manner to slow down the spread of a virus during an epidemic (such as the on-going COVID-19 pandemic), as whenever a person tests positive to the virus it is possible to notify and immediately quarantine all of its recent contacts.
A crucial property of automatic contact tracing protocols is their resilience to attacks trying to pollute the collected information in such a way that it becomes useless. Another essential feature is the fact that such protocols should not be abused by authorities or third parties in order to violate the privacy of citizens.
The project SPECTRA will lay the foundations of *secure* and *privacy-preserving* protocols for automatic contact tracing. In particular, we will:
- Characterize the security properties of such protocols in a precise manner, which covers all known and future attacks, and analyze current proposals in this respect.
- Put forward new efficient protocols with strong security and privacy guarantees under as minimal as possible assumptions.
- Investigate new algorithms based on adversarial machine learning and process mining for exploiting the collected data effectively even in the presence of malicious inputs, as well as decentralized learning techniques (i.e., federated learning) to preserve users' privacy.
- Develop a prototype implementation that builds on already developed and widespread infrastructures (such as distributed ledger technologies).
SPECTRA is framed within an EU initiative for the development of a common toolbox for Member States, which sets out the various relevant parameters to enable a coordinated development and use of officially recognized contact tracing applications.
SPECTRA aims at laying the foundations of secure and privacy-preserving automatic contact tracing, which is an effective complementary measure for battling infectious diseases for which vaccines are not yet available. In particular, the project will bring:
1) The first solutions to the problem satisfying strong security and privacy guarantees in a well-defined and general model, along with an efficient prototype implementation.
2) New machine learning algorithms in ensemble with process mining to exploit data collected within automatic contact tracing systems in a privacy-preserving and useful manner.
We elaborate on the innovative aspects of each of these contributions in more details below.
PROVABLE SECURITY
Both centralized and decentralized solutions have been shown to be prone to a number of attacks. These attacks range from the capability of tracing or de-anonymizing infected users [12,14], to the ability to prove contact with an infected user [13,14], or putting opponents into quarantine [14]. Many of these attacks stem from the fact that current designs do not consider the possibility of the attacker colluding with the server and of the server colluding with the authority. Additionally, all of the existing solutions are not receipt-free in the sense that they allow to reward reporting users. All of these features will be part of our security model, and thus our protocols will be resilient to such attacks.
Furthermore, the advantage of our approach is that it will provide security and privacy by design. In particular, breaking security and privacy of our solutions will be equivalent to solving well-studied computational problems that mathematicians have tried to crack for thousands of years. This offers a win-win situation, as either automatic contact tracing is secure or we have discovered a major breakthrough in computational complexity. Moreover, security in the real-ideal paradigm ensures that our protocols will be as secure as having a trusted authority running the contact tracing system on behalf of the users, which is the best we can hope for.
Finally, we will leverage blockchains as a means to store transactions in a distributed, tamper-proof manner. As transactions are cryptographically signed by the sender and stored only upon distributed validation and the achievement of decentralized consensus, distributed ledgers will serve as a non-repudiable audit trail of the conducted operations [23].
ROBUST AND PRIVACY-PRESERVING LEARNING FROM CONTACT-TRACING DATA
Contact-tracing data collected by each individual device represents a potential gold mine, which machine learning as well as process mining and data analytics solutions (henceforth referred to altogether as learning) can exploit to extract knowledge. Just to name a few examples, we can envisage effective methods to:
- learn more accurate models of the spread of infectious diseases;
- predict geographical areas which can turn into possible clusters of infected people;
- identify what users' habits and behaviors may catalyze the diffusion of the infection.
Unfortunately, standard learning techniques may suffer from two key issues:
- models are easily fooled by carefully-crafted inputs [24];
- model training must be performed on data that is stored in one machine.
In response to the first threat above, adversarial learning (AL) [25, 26] is a prominent research area whose main goal is to make ML models robust against adversarial examples, most notably through the development of novel attacker-aware learning algorithms [27].
On the other hand, federated learning (FL) and federated analytics (FA) [28,29] allow to train machine learning models and perform data analysis across many devices without centralized data collection, ensuring that only the user has a copy of their data and such data never leaves the device where it has been originally collected and stored.
In this project, we plan to investigate learning solutions that make use of contact-tracing data, yet conceived according to both AL and FL/FA principles, therefore robust to attacks perpetrated by malicious adversaries and privacy-preserving, by design.
FURTHER REFERENCES
[23] A. Sutton, R. Samavi: Blockchain Enabled Privacy Audit Logs. ISWC (1) 2017: 645-660
[24] I. J. Goodfellow et al. Explaining and harnessing adversarial examples. CoRR abs/1412.6572 (2014)
[25] L. Huang et al. Adversarial machine learning. In AISec 2011, pp. 43-58
[26] S. Bengio et al. Adversarial Machine Learning at Scale. Preprint arXiv:1611.01236, 2017
[27] B. Biggio et al. Support vector machines under adversarial label noise. In ACML 2011, pp. 97-112
[28] K. Bonawitz et al.. Towards federated learning at scale: System design. arXiv preprint arXiv:1902.01046, 2019
[29] K. Bonawitz et al. Practical secure aggregation for privacy-preserving machine learning. ACM SIGSAC 2017, pp. 1175-1191.
[30] Q. Yang et al. Federated Machine Learning: Concept and Applications. ACM TIST, 10(2), pp.1-19, 2019