information extraction

Ontology Mediated Information Extraction in Financial Domain with Mastro System-T

Information extraction (IE) refers to the task of turning text documents into a structured form, in order to make the information contained therein automatically processable. Ontology Mediated Information Extraction (OMIE) is a new paradigm for IE that seeks to exploit the semantic knowledge expressed in ontologies to improve query answering over unstructured data (properly raw text).

Ontology Mediated Information Extraction with MASTRO SYSTEM-T

In several data-centric application domains, the need arises to extract valuable information from unstructured text documents. The recent paradigm of Ontology Mediated Information Extraction (OMIE) faces this problem by taking into account the knowledge expressed by a domain ontology, and reasoning over it to improve the quality of extracted data. MASTRO SYSTEM-T is a novel tool for OMIE, developed by Sapienza University and IBM Almaden Research. In this work, we demonstrate its usage for information extraction over real-world financial text documents from the U.S. EDGAR system.

Ontology-based Document Spanning Systems for Information Extraction

Information Extraction (IE) is the task of automatically organizing in a structured form data extracted from free text documents. In several contexts, it is often desirable that extracted data are then organized according to an ontology, which provides a formal and conceptual representation of the domain of interest. Ontologies allow for a better data interpretation, as well as for their semantic integration with other information, as in Ontology-based Data Access (OBDA), a popular declarative framework for data management where an ontology is connected to a data layer through mappings.

Ontology population for open-source intelligence: A GATE-based solution

Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text.

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma