Nome e qualifica del proponente del progetto: 
sb_p_2044290
Anno: 
2020
Abstract: 

The etiological agent of COVID-19 is a Coronavirus which was named Sars-CoV-2 (Severe acute respiratory syndrome CoronaVirus 2). The genome of the virus possesses a single stranded positive RNA and is about 30kb long. The genome codes for two large overlapping polyproteins that are processed by intracellular proteolysis to produce non-structural proteins involved in virus replication and assembly. The CoV evolves and adapts to the host through accumulation of mutations generated by several mechanisms also linked to the virus RNA-dependent-RNA-polymerase (RdRp) activity. Currently, many initiatives are ongoing worldwide to develop an effective vaccine or to find or to reposition drug candidates able to prevent virus infection and/or replication.
In this context, it is important to understand the dynamics of evolution of the virus and to study how its proteome changes. Indeed, modification of specific virus proteins considered promising targets for therapy may jeopardize most of the efforts; moreover, even single mutations in specific proteins can change pathogenicity or contagiousness of the virus.
In this project, we propose a systematic screening of the Sars-CoV-2 genome isolates to scrutinize the position, type and the prevalence of specific mutations in each of the protein expressed by the virus. This activity will require the development of a software workflow able to carry out all the necessary steps on a large amount of genomic data.
Sars-CoV-2 genome sequences will be taken from GISAID repository. Tools from BLAST and EMBOSS suites will be utilized to analyse and translate nucleotide sequences. Sequence redundancy will be removed by clustering techniques. Multiple sequence alignments between each reference protein and the cognate variants will be analysed by R scripts able to collect various statistics. Whenever possible, it will be attempted to correlate pervasiveness of protein variants with structural and functional properties.

ERC: 
LS2_12
LS6_5
LS2_13
Componenti gruppo di ricerca: 
sb_cp_is_2676604
Innovatività: 

COVID-19 catalysed dramatically the attention of the WHO and of most of the biomedical community. Unfortunately, it is predicted that we will face soon a second epidemic wave and, in the future, also new epidemics of zoonotic origin. Environmental disruption and consequent promiscuity between species will make easier for other viruses to cross the barrier between species and become human pathogenic organisms. This future perspective urges for gathering as much knowledge as possible from the COVID-19 pandemic. This knowledge will pave the way to implement strategies to fight against the expected future threats. In the case of the proposed project, it is believed that the results of the systematic scrutiny of the Sars-CoV-2 genomes will provide valuable information at different levels:

- A detailed picture of the mutability of each protein coded by the Sars-CoV-2 genome will be drawn.
- Mapping of mutated site onto the sequence and the structure of each protein will indicate the most variable and the most conserved regions. This information will be very useful to the researchers involved in development of vaccines or therapeutic protocols. Indeed, it is well known that even point mutations of specific proteins may dramatically change the response to drugs and enable the virus to escape the host immunological response. Also, point mutations may change significantly the biology and pathogenesis of the virus.
- Likewise, mapping of point mutations and analysis of the copresence of variants of different proteins within the same virus isolate may also provide clues about the existence of a close functional correlation between the proteins. In addition, this may contribute to the characterization of the virus proteins for which the role and function is not yet completely clear. Once again, the new information may have direct or indirect impact on the development of new therapeutic strategies.
- Association of the protein variants with chronological and geographical localization of the Sars-Cov-2 isolates may be tentatively correlated to the severity of the COVID-19. Indeed, the results may hint at the proteins that are most important for determining the severity of symptoms and the prognosis of the disease.
- From the methodological point of view, the research will provide a software workflow of general use. In the case of Sars-CoV-2, it can be utilized to monitor the evolution of the virus as more and more genomes will be deposited in GISAID. Since the tools uses the FASTA format for the sequences it can be applied as well to the genomes extracted from GenBank. For the same reason, the workflow can be adapted with little changes to monitor the evolution of other viruses.
- In perspective, once the tool will be consolidated, it might be used to build and maintain a data bank of mutations of Sars-CoV-2 proteins, publicly available.

The computational part of the work will be carried out using personal computers under the Linux (Ubuntu version) operating system. This system is particularly suited for scientific purposes. Use of personal computers has the advantage of providing easy access to computational resources and offers the opportunity to quickly tailor the software for specific tasks. Moreover, current personal computers equipped with multicore processor provide a computer power enough for many tasks. However, if the amount of accumulating data on genome sequences will exceed the processing capability of the personal machine, the workflow can be easily ported to TeraStat, the supercomputing facility hosted by the Department of Statistics at Sapienza.

Codice Bando: 
2044290

© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma