The DREAM project (Data Recording Entry Alternative Multi-script) aims to create a system for cataloguing and researching bibliographic resources in non-Latin scripts (Arabic, Chinese, Hebrew, Hindi, Japanese, Korean, Persian, Russian and Sanskrit). It stems from the need to bring Italy in line with what has already been happening in other European countries for decades, in order to allow Italian and foreign researchers to carry out their research more effectively and increase the international visibility of an inestimable heritage that can only be glimpsed from transliterations not always regulated by uniform criteria.
The project envisions the creation of a research tool (meta-opac) for non-Latin scripts to which, over time, several national institutions could adhere. This new cataloguing system, in fact, will be able to manage, thanks to its compliance with international standards, the descriptions of any national or international institution sharing similar requirements and needs. Another major goal of the project will also be to update Sapienza¿s online catalogue.
The project, proposed by the "Italian Institute of Oriental Studies" Department, has a strong multidisciplinary and interdepartmental approach, envisaging, in fact, the collaboration amongst a number of different academic units of Sapienza (i.e. Sapienza Library System and Departments) and the synergy, in terms of academic and technical skills, between the teaching and technical staff of the University. The project has a remarkable potential since it aspires to the eventual migration of these data in their original script into the national SBN (National Bibliographic System) collective catalogue. The results of the project will be shared as open-access data and through a conference on the subject. The project plan stages over a 36-month time frame, that is the intended overall duration of the project.
Since in Italy there isn¿t a catalogue specifically devised to manage the cataloguing process of resources in non-Latin script, the project strives to create one such innovative model, stimulating an updating of the national reference catalogue for the search of bibliographic materials.
The National Library Service (Sistema Bibliotecario Nazionale, SBN) has already taken a first step in this direction, but the new implementations are still at an initial stage, and there is currently not one single attempt at changing the cataloguing process at the national level. The DREAM project, based on non-proprietary logics (open source), could either constitute a theoretical model or a concrete phase in the development of a national interface, thus starting a process of alignment of the Italian Library Service to European and international standards for bibliographic records in non-Latin scripts (see for instance Shaker 2002, Peng et al. 2013, and Campbell-Belew 2018).
The proposed project is strongly expected to yield significant benefits at different levels:
¿ Through the meta-opac, researchers both in Italy and abroad will have a single access point for carrying out research on and inside Italian institutions that own library resources in languages such as Arabic, Chinese, Korean, Japanese, Russian, etc.
¿ Librarians will be able to start cataloguing bibliographic records by using non-Latin scripts and see their descriptions used in their local catalogue as well as in the meta-opac. Locally, the system allows librarians and personnel in research facilities to create concurrent descriptions for the same resource, one in the original writing and the other one romanized, even according to different standards and uses. A record link (implemented at the Sebina sw level) will allow these records to be linked to each other or mirrored.
¿ Once reception of data in UTF-8 format encryption will be implemented, the National Library Service will automatically be enabled to take advantage of these ready-made records, which will be transferred to the Index database.
¿ The project will help the creation of optical character recognition (OCR) devices for non-Latin scripts, and will devise a concrete possibility to search for full-text on digital documents in such languages.
Once the migration to the SBN Index (i.e. its management catalogue) of the newly created data recorded in the original language is complete, researchers will be provided with a unparalleled advancement for of the software to search into our national bibliographic catalogue, and the pilot project for this update,being hosted in Sapienza, would greatly enrich the electronic resources platform of its Library System.
For some essential bibliographic references see:
Ali Kamal Shaker (2002). Bibliographic access to non-roman scripts in library opacs: a study of selected ARL Academic libraries in the United States. PhD Thesis. http://d-scholarship.pitt.edu/10331/1/akshaker12-2002.pdf
Aliprand, Joan M., The structure and content of MARC 21 records in the Unicode environment, Information Technology and Libraries, Dec 2005; 24, 4, p. 170-179.
Lyle Campbell, Anna Belew, eds. (2018). Cataloguing the World's Endangered Languages. Routledge 2018.
MARC 21 Format for Bibliographic Data. https://www.loc.gov/marc/bibliographic/
Open Archives Initiative Protocol for Metadata Harvesting. Version 2 (2002). http://www.openarchives.org/OAI/openarchivesprotocol.html
PCC Guidelines for Creating Bibliographic Records in Multiple Character Sets (Sept. 7, 2017). https://www.loc.gov/aba/pcc/bibco/documents/PCCNonLatinGuidelines.pdf
Sharma S., Sharma A. K., Gupta J. P., Exploring OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting, in International Journal of Advanced Research in Computer Science, Vol. 1, No. 2m July0-August 2010, pp. 165- 175.
UnicodeTM Implementation at the Library of Congress. Cataloging Policy Position (2006). https://www.loc.gov/catdir/cpso/unicode.pdf
UNIMARC formats and related documentation. https://www.ifla.org/publications/unimarc-formats-and-related-documentation
Xujun Peng, Huaigu Cao, Srirangaraj Setlur, Venu Govindaraju, and Prem Natarajan (2013). Multilingual OCR research and applications: An overview. ACM International Conference Proceeding Series. DOI: 10.1145/2505377.2509977