bioinformatics | Ricerc@Sapienza

Massive NGS data analysis reveals hundreds of potential novel gene fusions in human cell lines

Background:

Spatial Organization and Molecular Correlation of Tumor-Infiltrating Lymphocytes Using Deep Learning on Pathology Images

Beyond sample curation and basic pathologic characterization, the digitized H&E-stained images
of TCGA samples remain underutilized. To highlight this resource, we present mappings of tumorinfiltrating lymphocytes (TILs) based on H&E images from 13 TCGA tumor types. These TIL
maps are derived through computational staining using a convolutional neural network trained to
classify patches of images. Affinity propagation revealed local spatial structure in TIL patterns and

Gene co-expression in the interactome: moving from correlation toward causation via an integrated approach to disease module discovery

In this study, we integrate the outcomes of co-expression network analysis with the human interactome network to predict novel putative disease genes and modules. We first apply the SWItch Miner (SWIM) methodology, which predicts important (switch) genes within the co-expression network that regulate disease state transitions, then map them to the human protein–protein interaction network (PPI, or interactome) to predict novel disease–disease relationships (i.e., a SWIM-informed diseasome).

Ancient plant DNA in lake sediments

Recent advances in sequencing technologies now permit the analyses of plant DNA from fossil samples (ancient plant DNA, plant aDNA), and thus enable the molecular reconstruction of palaeofloras. Hitherto, ancient frozen soils have proved excellent in preserving DNA molecules, and have thus been the most commonly used source of plant aDNA. However, DNA from soil mainly represents taxa growing a few metres from the sampling point.

BioWebEngine: a generation environment for bioinformatics research

With technologies for massively parallel genome sequencing available, bioinformatics has entered the “big data” era. Developing applications in this field involves collaboration of domain experts with IT specialists to specify programs able to query several sources, obtain data in several formats, search them for significant patterns and present the obtained results according to several types of visualisation.

Recent trends and analytical challenges in plant bioactive peptide separation, identification and validation

Interest in research into bioactive peptides (BPs) is growing because of their health-promoting ability. Several bioactivities have been ascribed to peptides, including antioxidant, antihypertensive and antimicrobial properties. As they can be produced from precursor proteins, the investigation of BPs in foods is becoming increasingly popular. For the same reason, production of BPs from by-products has also emerged as a possible means of reducing waste and recovering value-added compounds suitable for functional food production and supplements.

Integrated transcriptomic correlation network analysis identifies COPD molecular determinants

Chronic obstructive pulmonary disease (COPD) is a complex and heterogeneous syndrome. Network-based analysis implemented by SWIM software can be exploited to identify key molecular switches - called “switch genes” - for the disease. Genes contributing to common biological processes or defining given cell types are usually co-regulated and co-expressed, forming expression network modules.

Interpreting and integrating big data in non-coding RNA research

In the last two decades, we have witnessed an impressive crescendo of non-coding RNA studies, due to both the development of high-throughput RNA-sequencing strategies and an ever-increasing awareness of the involvement of newly discovered ncRNA classes in complex regulatory networks.

Long-read annotation. Automated Eukaryotic genome annotation based on long-read cDNA sequencing

Single-molecule full-length complementary DNA (cDNA) sequencing can aid genome annotation by revealing transcript structure and alternative splice forms, yet current annotation pipelines do not incorporate such information. Here we present long-read annotation (LoReAn) software, an automated annotation pipeline utilizing short- and long-read cDNA sequencing, protein evidence, and ab initio prediction to generate accurate genome annotations.