Skoltech | Seminar: “Challenges of Big Data in Understanding RNA Biogenesis and Function”

We are glad to invite you to a seminar by Dr. Dmitri Pervouchine, titled “Challenges of Big Data in Understanding RNA Biogenesis and Function”

Seminar abstract

High-throughput sequencing technology dramatically enhanced our capability to look at the molecular parts of living cells. Genomes, transcriptomes, chromatin, transcription factor binding, cellular localization — this is a largely incomplete list of objects and events that can now be studied at a previously unprecedented level of detail. Since the technology is still labor-intensive and costly, many research groups coalesced into big consortia in order to generate comprehensive compendia of data sets, each with a particular biological focus.

Quite a typical goal of a consortium work is to identify novel genomic elements or to quantify their abundance and associations with each other. This talk will be focused mainly on the data generated by human and mouse ENCODE projects, GTEx Genotype-Tissue expression project, and GENCODE reference gene annotations. I will explain seemingly conflicting results that have been published regarding whether organ specific transcriptional patterns should dominate over species specific patterns or vice versa. Next, I will describe IPSA, a pipeline for splicing analysis that was used to identify brain-specific micro-alterations of splicing. Besides consortium work I will present a few hypotheses on the biogenesis of circular RNAs, on the role of long-range RNA-RNA interactions in the regulation of splicing and polyadenylation, and share some thoughts about prospective projects on the interface between genomics and neuroscience.

The consortium format has been increasingly criticized for its its inability to produce integrated data sets and for its higher cost compared to small scale projects. In the light of lessons learned from working in large collaborative environments, I will share my view of the current challenges in management and integration of big data, perspectives on where the technology will next develop, and discuss what we can do to survive in the anticipated in silico data tsunami.

Speaker introduction

Dmitri Pervouchine was born in 1974 in Moscow, Russia. He graduated from Moscow State University with two degrees, in Mathematics and in Chemistry, and continued his education at the Bioinformatics program at Boston University. In 2002 he defended his PhD in Mathematics (Algebra) under the supervision of Dr. Ernest B. Vinberg and joined the group of Dr. Nancy Kopell at the Center for BioDynamics and Department of Mathematics and Statistics at Boston University, working on neuronal dynamics, memory, and learning. In 2005 he joined the group of Drs. Mikhail Gelfand and Andrei Mironov, where he worked on computational RNA structure prediction and evolution. Dmitri found a class of RNA structures in introns of eukaryotic genes that impact splicing of pre-mRNAs. In 2011 Dmitri joined transcriptomics research at the group of Dr. Roderic Guigo in the Center for Genomic Regulation in Barcelona. Dmitri works in close collaboration with Dr. Thomas Gingeras from Cold Spring Harbor Laboratory and particularly he was involved in mouse ENCODE consortium where he and colleagues identified a group of genes with constrained expression across mammalian evolution and described distinct properties of this group of genes.