Blavatnik School of Computer Science, TAU
Edmond J. Safra Center for Bioinformatics
Prof. Pavel Pevzner
Department of Computer Science and Engineering, University of California at San Diego
"Metagenomic Assembly: from metaSPAdes to metaFlye"
Sunday, April 26 2020, at 11:10
(Refreshments from 11:00)
School of Computer Science, Check Point building, room 420
Host: Prof. Ron Shamir, Computer Science
I will briefly summarize the recent progress with metagenomic modules in the SPAdes assembly toolkit (metaSPAdes, metaplasmidSPAdes, metaviralSPAdes, cloudSPAdes, biosynteticSPAdes, and toxinSPAdes) but the focus of my talk will be on long-read rather than short-read assembly. Long-read assemblies improved over the short read assemblies because of their greater ability to disambiguate genomic/metagenomic repeats. However, most algorithms for assembling long reads construct contiguous genomic segments (contigs) but do not provide the repeat characterization (repeat graph) necessary for producing optimal assemblies. We present the Flye algorithm (Kolmogorov et al., Nature Biotech 2019) for assembling long reads that does not attempt to construct contigs at the initial assembly stage but instead generates arbitrary paths (disjointigs) in the unknown repeat graph and constructs a repeat graph from these error-riddled disjointigs. Counter-intuitively, this seemingly reckless approach results in an accurate repeat graph and improves on the state-of-the-art long-read assemblers with respect to contiguity and speed. We further describe metaFlye long-read assembler and its applications to various metagenomic datasets.
This is a joint work with Dmitry Antipov, Anton Bankevich, Mikhail Kolmogorov, Anton Korobeynikov, Alla Lapidus, Yu Lin, Dmitry Meleshko, Sergey Nurk, Mikhail Rayko, Tatiana Dvorkina, Ivan Tolstoganov, and Jeffrey Yuan.
Pavel Pevzner is Ronald R. Taylor Professor of Computer Science and Engineering and Director of the NIH Center for Computational Mass Spectrometry at University of California, San Diego. He holds Ph.D. from Moscow Institute of Physics and Technology, Russia. He was named Howard Hughes Medical Institute Professor in 2006. He was elected the Association for Computing Machinery Fellow in 2010, the International Society for Computational Biology Fellow in 2012, the European Academy of Sciences member (Academia Europaea) in 2016, and the American Association for Advancement in Science (AAAI) Fellow in 2018. He was awarded a Honoris Causa (2011) from Simon Fraser University in Vancouver, the Senior Scientist Award (2017) by the International Society for Computational Biology, and the Kanellakis Theory and Practice Award from the Association for Computing Machinery. Dr. Pevzner authored textbooks "Computational Molecular Biology: An Algorithmic Approach", "Introduction to Bioinformatics Algorithms" (with Neal Jones) and “Bioinformatics Algorithms: an Active Learning Approach” (with Phillip Compeau). He co-developed the Bioinformatics and Data Structure and Algorithms online specializations on Coursera as well as the Algorithms Micro Master Program at edX.