succinct data structures Latest Research Papers

Bait-enriched sequencing is a relatively new sequencing protocol that is becoming increasingly ubiquitous as it has been shown to successfully amplify regions of interest in metagenomic samples. In this method, a set of synthetic probes ("baits") are designed, manufactured, and applied to fragmented metagenomic DNA. The probes bind to the fragmented DNA and any unbound DNA is rinsed away, leaving the bound fragments to be amplified for sequencing. This effectively enriches the DNA for which the probes were designed. Most recently, Metsky et al. (Nature Biotech 2019) demonstrated that bait-enrichment is capable of detecting a large number of human viral pathogens within metagenomic samples. In this work, we formalize the problem of designing baits by defining the Minimum Bait Cover problem, which aims to find the smallest possible set of bait sequences that cover every position of a set of reference sequences under an approximate matching model. We show that the problem is NP-hard, and that it remains NP-hard under very restrictive assumptions. This indicates that no polynomial-time exact algorithm exists for the problem, and that the problem is intractable even for small and deceptively simple inputs. In light of this, we design an efficient heuristic that takes advantage of succinct data structures. We refer to our method as Syotti. The running time of Syotti shows linear scaling in practice, running at least an order of magnitude faster than state-of-the-art methods, including the recent method of Metsky et al. At the same time, our method produces bait sets that are smaller than the ones produced by the competing methods, while also leaving fewer positions uncovered. Lastly, we show that Syotti requires only 25 minutes to design baits for a dataset comprised of 3 billion nucleotides from 1000 related bacterial substrains, whereas the method of Metsky et al. shows clearly super-linear running time and fails to process even a subset of 8% of the data in 24 hours. Our implementation is publicly available at https://github.com/jnalanko/syotti.

Download Full-text

Succinct Data Structures in the Realm of GIS

Engineering Proceedings ◽

10.3390/engproc2021007029 ◽

2021 ◽

Vol 7 (1) ◽

pp. 29

Author(s):

Nieves R. Brisaboa ◽

Pablo Gutiérrez-Asorey ◽

Miguel R. Luaces ◽

Tirso V. Rodeiro

Keyword(s):

Data Structures ◽

Information Storage ◽

Compact Representation ◽

Portable Devices ◽

Relevant Research ◽

Succinct Data Structures ◽

Technological Environment ◽

Geographical Data ◽

Efficient Information ◽

Data Banks

Geographic Information Systems (GIS) have spread all over our technological environment in the last decade. The inclusion of GPS technologies in everyday portable devices along with the creation of massive shareable geographical data banks has boosted the rise of geoinformatics. Despite the technological maturity of this field, there are still relevant research challenges concerning efficient information storage and representation. One of the most powerful techniques to tackle these issues is designing new Succinct Data Structures (SDS). These structures are defined by three main characteristics: they use a compact representation of the data, they have self-index properties and, as a consequence, they do not need decompression to process the enclosed information. Thus, SDS are not only capable of storing geographical data using as little space as possible, but they can also solve queries efficiently without any previous decompression. This work introduces how SDS can be successfully applied in the GIS context through several novel approaches and practical use cases.

Download Full-text

Succinct Data Structures for Small Clique-Width Graphs

2021 Data Compression Conference (DCC) ◽

10.1109/dcc50243.2021.00021 ◽

2021 ◽

Author(s):

Sankardeep Chakraborty ◽

Seungbum Jo ◽

Kunihiko Sadakane ◽

Srinivasa Rao Satti

Keyword(s):

Data Structures ◽

Succinct Data Structures

Download Full-text

LevioSAM: Fast lift-over of alternate reference alignments

10.1101/2021.02.05.429867 ◽

2021 ◽

Author(s):

Taher Mun ◽

Nae-Chyun Chen ◽

Ben Langmead

Keyword(s):

Population Genetics ◽

Coordinate System ◽

Data Structures ◽

Succinct Data Structures ◽

Reference Coordinate System ◽

Link Type ◽

A Chain ◽

Time Required ◽

Effective Use

AbstractMotivationAs more population genetics datasets and population-specific references become available, the task of translating (“lifting”) read alignments from one reference coordinate system to another is becoming more common. Existing tools generally require a chain file, whereas VCF files are the more common way to represent variation. Existing tools also do not make effective use of threads, creating a post-alignment bottleneck.ResultsLevioSAM is a tool for lifting SAM/BAM alignments from one reference to another using a VCF file containing population variants. LevioSAM uses succinct data structures and scales efficiently to many threads. When run downstream of a read aligner, levioSAM completes in less than 13% the time required by an aligner when both are run with 16 threads.Availabilityhttps://github.com/alshai/[email protected], [email protected]

Download Full-text

Succinct Data Structures for Series-Parallel, Block-Cactus and 3-Leaf Power Graphs

Combinatorial Optimization and Applications - Lecture Notes in Computer Science ◽

10.1007/978-3-030-92681-6_33 ◽

2021 ◽

pp. 416-430

Author(s):

Sankardeep Chakraborty ◽

Seungbum Jo ◽

Kunihiko Sadakane ◽

Srinivasa Rao Satti

Keyword(s):

Data Structures ◽

Succinct Data Structures

Download Full-text

Compact and succinct data structures for multidimensional orthogonal range searching

Information and Computation ◽

10.1016/j.ic.2020.104519 ◽

2020 ◽

Vol 273 ◽

pp. 104519

Author(s):

Kazuki Ishiyama ◽

Kunihiko Sadakane

Keyword(s):

Data Structures ◽

Range Searching ◽

Succinct Data Structures ◽

Orthogonal Range Searching

Download Full-text

Compendious and Succinct Data Structures for Big Data

Advances in Intelligent Systems and Computing - Advances in Computational Intelligence and Communication Technology ◽

10.1007/978-981-15-1275-9_37 ◽

2020 ◽

pp. 457-467

Author(s):

Vinesh Kumar ◽

Akhilesh Kumar Singh ◽

Sharad Pratap Singh

Keyword(s):

Big Data ◽

Data Structures ◽

Succinct Data Structures

Download Full-text

Leveraging Succinct Data Structures for DNA Sequence Mapping on FPGA

2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) ◽

10.1109/ipdpsw50202.2020.00035 ◽

2020 ◽

Author(s):

Guido Walter Di Donato ◽

Alberto Zeni ◽

Lorenzo Di Tucci ◽

Marco D. Santambrogio

Keyword(s):

Dna Sequence ◽

Data Structures ◽

Succinct Data Structures ◽

Sequence Mapping

Download Full-text

Application-Oriented Succinct Data Structures for Big Data

The Review of Socionetwork Strategies ◽

10.1007/s12626-019-00045-1 ◽

2019 ◽

Vol 13 (2) ◽

pp. 227-236

Author(s):

Tetsuo Shibuya

Keyword(s):

Big Data ◽

Data Structure ◽

Data Structures ◽

Genome Assembly ◽

Original Data ◽

Succinct Data Structures ◽

Space Requirement ◽

Space Reduction ◽

Big Data Applications ◽

Application Specific

Abstract A data structure is called succinct if its asymptotical space requirement matches the original data size. The development of succinct data structures is an important factor to deal with the explosively increasing big data. Moreover, wider variations of big data have been produced in various fields recently and there is a substantial need for the development of more application-specific succinct data structures. In this study, we review the recently proposed application-oriented succinct data structures motivated by big data applications in three different fields: privacy-preserving computation in cryptography, genome assembly in bioinformatics, and work space reduction for compressed communications.

Download Full-text