
Circular RNAs (circRNAs) are RNA species formed via a non-canonical splicing event, called back-splicing, through which a downstream 5' splice site interacts with an upstream 3' splice site forming a covalent bond, leading to the formation of a circular molecule.
The rate of discovery of new circRNAs in different cellular systems is not paralleled by an adequate characterization of their function. Important clues about the functionality of an RNA molecule may be deduced from the knowledge of its secondary structure. Although bioinformatic tools are available to predict the secondary structure of circular RNA molecules, no one has ever used them on a large scale to study global and local properties of circRNAs secondary structure. In the first part of the proposed project, a large set of human and murine circRNAs originated from pre-mRNAs will be folded in silico in order to assess whether they have a particularly high or low secondary structure content. Indicatively, a low level of secondary structure could suggest a protein-coding activity, while structured RNAs are more likely to exert their function directly, without being translated. Since we do not expect all circRNAs to play the same role, we will evaluate whether differences in secondary structure content correlate with differences in the mRNA region from which they originate.
CircRNAs will also be scanned in search for common sequence and structure motifs. Special attention will be paid to patterns found in the region around the back-splicing junction site, whose sequence represents the distinctive feature of circRNAs. The involvement of the identified motifs in mediating the interaction of circRNAs with proteins will be investigated by searching for their presence in binding sites found in publicly available CLIP-Seq data.
As a final step, we will write an algorithm for the prediction of circRNA-protein interactions exploiting the inferred secondary structure of circRNAs.
The proposed project will represent the first systematic study pointing to the characterization of circRNA global secondary structure and to the identification of common moieties within this class of molecules. These RNAs originate mainly from the coding regions of pre-mRNAs. Since secondary structure may interfere with translation, coding regions are significantly less structured than untranslated regions (Shabalina et al., 2006; Ding et al., 2014); nevertheless, they seem to have a higher secondary structure content compared to random sequences (Seffens and Digby, 1999), with a significant bias in favor of local RNA structures (Katz and Burge, 2003; Meyer and Miklos, 2005). By comparing the predicted secondary structure content of circRNAs with that of control circularized exons we will be able to understand whether the folding of these molecules is subject to some kind of constraint, which may be linked with their possible translation or related to their function as RNA molecules.
The identification of sequence and structural motifs will be helpful for understanding the functional roles of circRNAs. Several known structural elements exist, including IRESs, which are involved in specific functions. IRESs are RNA elements able to interact with the translational machinery and mediate the cap-independent translation of an RNA molecule (Jang et al., 1990). The best way to infer the presence of an IRES in an RNA is through a comparison with known IRES secondary structures (Hong et al., 2013). Our circRNA dataset will be scanned in search of BEAR-encoded known IRES secondary structures, then we will evaluate whether circRNAs are enriched in IRESs compared to controls. The presence of IRESs in circRNAs have already been studied in previous works (Chen et al., 2016; Dudekula et al., 2016), but no enrichment analysis has ever been made.
In neurons, circRNAs have been found to be enriched in synaptosomes and neuropils compared with their host linear transcripts (You et al., 2015). This suggests the existence of a mechanism for the transport of circRNAs from the soma to the synapse. It is reasonable to think that the synapse-specific circRNAs carry a motif, possibly in the region around the back-splicing site, which is responsible for their particular subcellular localization. Such a motif could be easily identified by employing BEAM to fetch patterns which are over-represented in synapse-specific circRNAs compared to soma-specific ones.
CircRNA localization could also be studied through a global analysis of the RNAs bound by proteins involved in transport. mRNAs are transported from the soma to the synapse in the form of compacted ribonucleoparticles known as RNA granules (Knowles et al., 1996; El Fatimy et al., 2016). FMRP, an RBP involved in the control of local protein synthesis (Khandjian et al., 2004; Stefani et al., 2004), is one of the proteins contained in these granules. Linear RNAs interacting with FMRP have been previously identified through PAR-CLIP (Ascano et al., 2016); we plan to reanalyse this experiment in order to determine whether FMRP binds circRNAs and if these molecules share a common motif.
In general, RNA-protein interaction data coming from high-throughput experiments deposited in public databases represent a valuable source of information for the identification of sequence and structure patterns that allow circRNAs to interact with their protein partners. These data will also be useful to test the performance of the circRNA-protein interaction predictor we plan to develop. At present, the only way to predict these interactions using the existing tools is to provide them with an "extended" linear sequence, with the first N nucleotides repeated at the end so as to represent the back-splicing junction. Although this approach may roughly simulate the circularity of the input molecule, the secondary structure, which is used by the software to predict the interaction, is not correctly predicted and represented. A proper description of the structural features of circRNAs will make these programs more capable of detecting their protein interactors.
The proposed project describes a purely bioinformatic work, however the methodologies we plan to develop will be extensively tested and used in our molecular biology laboratory. Regarding the sequence and structure motifs identified, we will be able to evaluate their functionality through mutagenesis experiments. The motif discovery pipeline will also be used to find patterns shared among the interactors of the RBPs we are studying, such as FUS. The circRNA-protein interaction predictor will be employed to find the proteins interacting with circRNAs studied in our laboratory; mass spectrometry experiments will be conducted to corroborate the results and test the effectiveness of the program.
We strongly believe that, besides our laboratory, anyone who studies circRNAs will benefit from these methodologies.