Distinguished Speaker Series: Data structures to represent sets of k-mers

Prof. Paul Medvedev, Associate Professor in the Department of Computer Science and Engineering and the Department of Biochemistry and Molecular Biology, Director of the Center for Computational Biology and Bioinformatics, Pennsylvania State University
18 June 2019, 12:00 
Schreiber 006, Computer Science, TAU 
Distinguished Speaker Series

Abstract:  The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k-mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying k-mer sets has emerged as a shared underlying component and there have been many specialized data structures for their representation. In this talk, I will describe the applications of k-mer sets in bioinformatics and motivate the need for specialized data structures. I will give an overview of known approaches and lower bounds, with a focus on unitig-based representations. Finally, I will describe a data structure for representing sets of k-mer sets, called the HowDe Sequence Bloom Tree.


Host: Prof. Ron Shamir, Computer Science School, TAU.




Tel Aviv University makes every effort to respect copyright. If you own copyright to the content contained
here and / or the use of such content is in your opinion infringing, Contact us as soon as possible >>