Semantic technologies combine knowledge representation and artificial intelligence techniques in order to achieve a more effective management of enterprise knowledge and data bases. In this context, Ontology-based Data Management (OBDM) has consolidated itself as a paradigm for integrating, sharing and governing data, based on a three-tier architecture, in which an ontology, i.e., a conceptual formalization of the business domain, is connected to autonomous data sources through declarative mappings.
In the presence of sensitive information, data access needs to be properly regulated. However, state-of-the-art OBDM techniques and systems do not provide any support to the protection of confidential data, even though they proved themselves to be perfectly suited for data sharing and distribution.
In this project we aim at filling this gap, and at developing methods and tools for data privacy and security in OBDM. To this aim we will revisit and adapt to OBDM the Controlled Query Evaluation (CQE) framework, in which confidential data are protected through a policy specifying the information that cannot be disclosed and (optimal) censors (minimally) alters answers to user queries in order to preserve the secrets. By virtue of the declarative, logic-based nature of both frameworks, we believe that their marriage is natural and effective. At the same time, it is also really challenging, since CQE has been so far mainly studied in the context of databases, and very few works have instead considered it in the presence of ontologies. Thus, a clear, systematic view of the CQE problem over ontologies, and a fortiori in OBDM, is still missing to date. Thus, our specific objectives will be: studying fundamental research issues and developing effective algorithms for CQE over ontologies and in OBDM; implementing these algorithms in tools; testing them on real-world use cases characterized by the presence of highly sensitive information.
We describe the innovation of the research and the advancement of the state of the art we aim to pursue by discussing the main project contributions according to the five research objectives introduced in the previous section:
- Developing foundations of Controlled Query Evaluation over ontologies (O1)
The research in this area is still at its initial stages. First of all, a systematic view of CQE over ontologies is still missing to date. Our first contribution will be the definition of a clear picture of the features of CQE over ontologies. As previously done for propositional databases [BB04a, BB04b], we will clarify which are the parameters associated to the problem, e.g., awareness of the attacker, possible availability of external knowledge, interpretation assumption (CWA vs. OWA), enforcement method (lying vs. refusal), language for the ontology, the policy and the queries. We will then provide a new definition of query answering under CQE over ontologies, in the line of our recent work [LRS19], which is a form of reasoning over all optimal censors. We will then provide new computational complexity results for query answering, for various practical combinations of the above parameters, and in particular we will give exact complexity characterizations of the the problem for ontologies specified in the logics commonly used for OBDM, i.e., those at the basis of the OWL tractable profiles: DL-Lite [CD*07], EL [BBC05] , and RL [KZ14]. We will also provide complexity results for more expressive DLs. We will investigate and clarify the relationship with other problems and forms of reasoning studied over ontologies, e.g., Consistent Query Answering [BB16], or abduction [B08]. This will allow us to migrate results from these other frameworks to CQE and rely our techniques on consolidated reasoning services over ontologies.
- Developing foundations of Controlled Query Evaluation in OBDM (O2).
Besides very few exceptions (see [BCK18] mentioned in previous section), there is basically no literature on CQE in OBDM, and on data privacy in OBDM in general. We believe that CQE, by virtue of its declarative and logic-based nature, is perfectly suited to OBDM, and that instantiating and adapting it to OBDM will allow us to obtain important advancements on data privacy in ontology-based information systems. We aim at efficient algorithms, thus we will identify tractable settings and/or will devise optimization methods for our approach to be realizable in practice. With this project we will clearly show the potential of declarative approaches for data privacy and security in OBDM.
- Developing tools for Controlled Query Evaluation in OBDM (O3).
After the accomplishment of this project, Mastro will be the first OBDM system offering services to protect data confidentiality. This will push the usage of Mastro and OBDM in general for effective ontology-based data sharing and integration. Indeed, Mastro will allow for realizing seamless interoperation environments, where data protection specifications, privacy considerations and related legislation (e.g., GDPR) can be declared, implemented and enforced at the conceptual level.
- Realizing a privacy preserving OBDM application for the management of cancer patient data (O4).
This will demonstrate that semantic technologies, which have already been proved in the past to be perfectly suited for data sharing and integration, can be effectively used also in contexts characterized by highly sensitive information, thanks to our OBDM techniques and tools enriched with CQE capabilities.
- Realizing a privacy preserving OBDM application over statistical data (O5).
We will test the CQE approach in OBDM over statistical data, as a novel declarative method of protecting aggregated information. We will also compare our approach with other methods, like differential privacy.
[CD*07] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. JAR 2007.
[BBC05] F. Baader, S. Brandt, C. Lutz: Pushing the EL envelope. IJCAI 2005.
[KZ14] R. Kontchakov, M. Zakharyaschev: An introduction to description logics and query rewriting. In RW 2014.
[BB16] M. Bienvenu, C. Bourgaux. Inconsistency-tolerant querying of description logic knowledge bases. In RW 2016.
[B08] M. Bienvenu: Complexity of Abduction in the EL Family of Lightweight Description Logics. KR 2008