AraCorPy (Arabish Corpus with Python)
| Componente | Categoria |
|---|---|
| Arianna D'Ottone | Tutor di riferimento |
The aim of this project is to establish an international collaboration between the proponent and the research group GETALP (Study Group for Machine Translation and Automated Processing of Languages and Speech) of the Laboratoire d'Informatique de Grenoble (LIG) of the Université Grenoble Alpes.
The collaboration is aimed at making available, in Open Access, the linguistic data resulting from my PhD research, on the Tunisian dialect of Tunisia, through a web interface that allows to interact with linguistic data. The information that is intended to be made available to the user through the system are: POS tagging, stemming, lemmatization, glossing, transliteration, diatopic and diachronic information (at least at an initial level, but that in the future could be expanded with additional levels of interaction with the data). Through the web platform it would be possible to insert, in the search engine of the system, input of linguistic strings in Italian, English or Tunisian (graphically encoded in all possible writing systems: Romanization, Arabic characters, Arabish - Tunisian written in Roman script and numbers, used by native speakers on social networks). As this system was designed, it is not intended to provide the user with an automatic translation of the strings, but an apparatus of fundamental and synthetic information to allow the user to reconstruct himself a translation, in English or Italian, potentially perfect of the text entered.
In order to support the collaboration with the GETALP group, a parallel training of the proposing subject in computational linguistics is foreseen, through the training path outlined in the appropriate sections.