Distinguished Speaker Series: Metagenomic Assembly: from metaSPAdes to metaFlye
Prof. Pavel Pevzner, University of California at San Diego
Distinguished speaker talk of the Edmond J. Safra Center for Bioinformatics
Prof. Pavel Pevzner
Department of Computer Science and Engineering, University of California at San Diego
"Metagenomic Assembly: from metaSPAdes to metaFlye"
Thursday, November 5 2020, at 16:00 (Israel time)
Host: Prof. Ron Shamir, Computer Science
I will briefly summarize the recent progress with metagenomic modules in the SPAdes assembly toolkit (metaSPAdes, metaplasmidSPAdes, metaviralSPAdes, cloudSPAdes, biosynteticSPAdes, and toxinSPAdes) but the focus of my talk will be on long-read rather than short-read assembly. Long-read assemblies improved over the short read assemblies because of their greater ability to disambiguate genomic/metagenomic repeats. However, most algorithms for assembling long reads construct contiguous genomic segments (contigs) but do not provide the repeat characterization (repeat graph) necessary for producing optimal assemblies. We present the Flye algorithm (Kolmogorov et al., Nature Biotech 2019) for assembling long reads that does not attempt to construct contigs at the initial assembly stage but instead generates arbitrary paths (disjointigs) in the unknown repeat graph and constructs a repeat graph from these error-riddled disjointigs. Counter-intuitively, this seemingly reckless approach results in an accurate repeat graph and improves on the state-of-the-art long-read assemblers with respect to contiguity and speed. We further describe metaFlye long-read assembler and its applications to various metagenomic datasets.