succinct data structures
Recently Published Documents


TOTAL DOCUMENTS

49
(FIVE YEARS 11)

H-INDEX

12
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Jarno Alanko ◽  
Ilya Slizovskiy ◽  
Daniel Lokshtanov ◽  
Travis Gagie ◽  
Noelle Noyes ◽  
...  

Bait-enriched sequencing is a relatively new sequencing protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes ("baits") are designed, manufactured, and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. This effectively enriches the DNA for which the probes were designed. Most recently, Metsky et al. (Nature Biotech 2019) demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples. In this work, we formalize the problem of designing baits by defining the Minimum Bait Cover problem, which aims to find the smallest possible set of bait sequences that cover every position of a set of reference sequences under an approximate matching model. We show that the problem is NP-hard, and that it remains NP-hard under very restrictive assumptions. This indicates that no polynomial-time exact algorithm exists for the problem, and that the problem is intractable even for small and deceptively simple inputs. In light of this, we design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the recent method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 minutes to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 8% of the data in 24 hours. Our implementation is publicly available at https://github.com/jnalanko/syotti.


2021 ◽  
Vol 7 (1) ◽  
pp. 29
Author(s):  
Nieves R. Brisaboa ◽  
Pablo Gutiérrez-Asorey ◽  
Miguel R. Luaces ◽  
Tirso V. Rodeiro

Geographic Information Systems (GIS) have spread all over our technological environment in the last decade. The inclusion of GPS technologies in everyday portable devices along with the creation of massive shareable geographical data banks has boosted the rise of geoinformatics. Despite the technological maturity of this field, there are still relevant research challenges concerning efficient information storage and representation. One of the most powerful techniques to tackle these issues is designing new Succinct Data Structures (SDS). These structures are defined by three main characteristics: they use a compact representation of the data, they have self-index properties and, as a consequence, they do not need decompression to process the enclosed information. Thus, SDS are not only capable of storing geographical data using as little space as possible, but they can also solve queries efficiently without any previous decompression. This work introduces how SDS can be successfully applied in the GIS context through several novel approaches and practical use cases.


Author(s):  
Sankardeep Chakraborty ◽  
Seungbum Jo ◽  
Kunihiko Sadakane ◽  
Srinivasa Rao Satti

2021 ◽  
Author(s):  
Taher Mun ◽  
Nae-Chyun Chen ◽  
Ben Langmead

AbstractMotivationAs more population genetics datasets and population-specific references become available, the task of translating (“lifting”) read alignments from one reference coordinate system to another is becoming more common. Existing tools generally require a chain file, whereas VCF files are the more common way to represent variation. Existing tools also do not make effective use of threads, creating a post-alignment bottleneck.ResultsLevioSAM is a tool for lifting SAM/BAM alignments from one reference to another using a VCF file containing population variants. LevioSAM uses succinct data structures and scales efficiently to many threads. When run downstream of a read aligner, levioSAM completes in less than 13% the time required by an aligner when both are run with 16 threads.Availabilityhttps://github.com/alshai/[email protected], [email protected]


2019 ◽  
Vol 13 (2) ◽  
pp. 227-236
Author(s):  
Tetsuo Shibuya

Abstract A data structure is called succinct if its asymptotical space requirement matches the original data size. The development of succinct data structures is an important factor to deal with the explosively increasing big data. Moreover, wider variations of big data have been produced in various fields recently and there is a substantial need for the development of more application-specific succinct data structures. In this study, we review the recently proposed application-oriented succinct data structures motivated by big data applications in three different fields: privacy-preserving computation in cryptography, genome assembly in bioinformatics, and work space reduction for compressed communications.


Sign in / Sign up

Export Citation Format

Share Document