Distinguished Speaker Series of Faculty of Exact Sciences

Distinguished Speaker Series: Data structures to represent sets of k-mers

Prof. Paul Medvedev, Associate Professor in the Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology, Director of the Center for Computational Biology and Bioinformatics, Pennsylvania State University

18 June 2019, 12:00

Schreiber 006, Computer Science, TAU

Abstract: The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying k-mer sets has emerged as a shared underlying component and there have been many specialized data structures for their representation. In this talk, I will describe the applications of k-mer sets in bioinformatics and motivate the need for specialized data structures. I will give an overview of known approaches and lower bounds, with a focus on unitig-based representations. Finally, I will describe a data structure for representing sets of k-mer sets, called the HowDe Sequence Bloom Tree.

Host: Prof. Ron Shamir, Computer Science School, TAU.