The present research project is centred on the study of Mandarin Chinese right dislocations (henceforth RDs), a linguistic phenomenon frequently observed in spoken language involving a syntactic shift, i.e. the uttering of a constituent at the end of the sentence instead of its canonical intra-sentential position, for particular pragmatic purposes and within specific linguistic and extra-linguistic contexts.
The short-term goal of the project is the creation of an integrated dataset of syntactic, pragmatic and prosodic cues characterizing RDs in Mandarin Chinese, on the basis of an extensive analysis of spoken Mandarin Chinese data and using advanced quantitative and statistical methods as conditional inference trees, random forests and correspondence analysis. The main findings of the analysis are feasible to give a contribution to theoretical linguistics for a deeper understanding of right dislocations and more generally of information structure categories in Mandarin Chinese, but also to provide corpus linguistic with an innovative tool for the annotation of Spoken Mandarin corpora.
The long-term goal of the project is the creation of a speech processing tool capable of integrating pragma-syntactic factors and prosodic cues for the correct recognition of RDs in Mandarin Chinese. The tool will find its application not only in the area of natural language processing, but it will hopefully be employed for the automatic extraction of RDs from larger corpora, enabling future larger-scale research on the phenomenon.
Based on the current state-of-the-art in the field of Mandarin Chinese RDs, there are mainly three aspects of this project that can be defined as innovative and likely to produce an advance in the field of Mandarin Chinese information structure: the analysis of naturally occurring spoken language, the integrated approach to communicative events and the results based on the use of advanced statistical methods.
First of all, although in recent years work on spoken language has laid the path for a deeper understanding of human language, communication and cognition, only some researchers have undertaken this challenging task in the Sinosphere and very few in the European or Italian academia. Two examples worth being mentioned are the work of my PhD supervisor Prof. Chiara Romagnoli in Italy on Chinese discourse markers and the work of Vittorio Tantucci in the UK on Chinese modal expressions and intersubjectivity. However, to the best of our knowledge, none has currently engaged in studies of naturally occurring spoken Chinese from the information structure perspective, needless to mention the specific case of Chinese RDs.
Second, the intended analysis aims at describing RDs within a conceptualization of the language reminiscent of the famous definition given by Saussure at the end of the 19th century ¿ a ¿system where everything holds together¿. Although there is no lack of literature produced on Mandarin Chinese RDs, many aspects have been investigated in isolation without being able to account for the multiple relations that each construction holds within the different levels of language. An integrated description of the RDs accounting not only for the syntactic, pragmatic and prosodic properties separately but also for the ways in which these interact in ongoing spoken language to fit the specific needs of the interlocutors in the specific moment and environment of utterance is rather certain to be the best possible account which can be provided for a such complex phenomenon.
Lastly, in the field of an ever-evolving system as language, it is crucial that research be based on and produce results which hold not only for the specific cases taken into analysis, but are generalizable to larger samples of population. Given the rapid advances of statistical and computational sciences in the last decades, it would be anything but fare to exploit them in order to obtain the most accurate and unbiased linguistic descriptions possible. Hopefully, the advanced and rigorous statistical method employed in this research will not only be able to provide a solid evidence-based description of RDs, but will also serve as a model to be extended to other linguistic phenomena and other languages of the world.
From what has been said above, it is evident that multiple benefits can derive from the research I am proposing: accounting for the ways in which the different levels of language are interrelated in the functioning of RDs does not only mean providing a tool for the better understanding of this specific construction ¿ which has not yet reached a satisfactory level of inquiry ¿ but also for that of Mandarin Chinese information structure, fulcrum of an extremely active and still ongoing debate in the academic world. In turns, this would contribute to a better comprehension of the cognitive processes guiding the speakers¿ choices when coding information at each linguistic level, making them comparable with competing processes in other languages. In addition, the findings will hopefully contribute a pioneering description of the phenomenon to CFL ¿ teaching of Chinese as a second language ¿ a relatively newly-born discipline currently experiencing a blossom of contributions in China and in many Western countries, including Italy among one of the most prominent.
Another field which could greatly benefit from this research is that of NLP (natural language processing), where Chinese information structure finds huge applications both for academic and commercial purposes, with China being one of the main actors in the global natural language processing market. Given the fragmented status of theoretical research on Chinese RDs, their practical application fails to recognize the essentially different nature of these and other structures which happen to display similar formal features, generally treated under umbrella labels such as ¿increments¿ or ¿TCU extensions¿. An essential missing factor for the disambiguation between these different phenomena is most certainly prosody, which yields a fundamental relation with information structure. Providing a dataset of integrated linguistic features and a related algorithm for the automatic recognition of RDs would, out of doubt, contribute to enhance not only existing tools for Mandarin discourse processing but also for Mandarin speech processing, another extremely eager-to-develop field of research.