Effective BST Approach to Find Underflow Condition in Interval Trees Using Augmented Data Structure

Augmented Interval List: a novel data structure for efficient genomic interval search

Bioinformatics ◽

10.1093/bioinformatics/btz407 ◽

2019 ◽

Vol 35 (23) ◽

pp. 4907-4911 ◽

Cited By ~ 8

Author(s):

Jianglin Feng ◽

Aakrosh Ratan ◽

Nathan C Sheffield

Keyword(s):

Data Structure ◽

High Performance ◽

Genomic Analysis ◽

Genomic Data ◽

Interval Data ◽

Supplementary Information ◽

Genomic Interval ◽

Interval Trees ◽

Running Maximum ◽

Scalable Methods

Abstract Motivation Genomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary. Results We present a new data structure, the Augmented Interval List (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N+n+m), where n is the number of overlaps between R and q, N is the number of intervals in the set R and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5–18 times faster than standard high-performance code based on augmented interval-trees, nested containment lists or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4–60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis. Availability and implementation An implementation of the AIList data structure with both construction and search algorithms is available at http://ailist.databio.org. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Augmented Interval List: a novel data structure for efficient genomic interval search

10.1101/593657 ◽

2019 ◽

Cited By ~ 1

Author(s):

Jianglin Feng ◽

Aakrosh Ratan ◽

Nathan C. Sheffield

Keyword(s):

Data Structure ◽

High Performance ◽

Genomic Analysis ◽

Genomic Data ◽

Interval Data ◽

Genomic Interval ◽

Interval Trees ◽

Running Maximum ◽

Genomic Data Analysis ◽

Scalable Methods

AbstractMotivationGenomic data is frequently stored as segments or intervals. Because this data type is so common, interval-based comparisons are fundamental to genomic analysis. As the volume of available genomic data grows, developing efficient and scalable methods for searching interval data is necessary.ResultsWe present a new data structure, the augmented interval list (AIList), to enumerate intersections between a query interval q and an interval set R. An AIList is constructed by first sorting R as a list by the interval start coordinate, then decomposing it into a few approximately flattened components (sublists), and then augmenting each sublist with the running maximum interval end. The query time for AIList is O(log2N + n + m), where n is the number of overlaps between R and q, N is the number of intervals in the set R, and m is the average number of extra comparisons required to find the n overlaps. Tested on real genomic interval datasets, AIList code runs 5 - 18 times faster than standard high-performance code based on augmented interval-trees (AITree), nested containment lists (NCList), or R-trees (BEDTools). For large datasets, the memory-usage for AIList is 4% - 60% of other methods. The AIList data structure, therefore, provides a significantly improved fundamental operation for highly scalable genomic data analysis.AvailabilityAn implementation of the AIList data structure with both construction and search algorithms is available at code.databio.org/AIList.

Download Full-text

COMPACT INTERVAL TREES: A DATA STRUCTURE FOR CONVEX HULLS

International Journal of Computational Geometry & Applications ◽

10.1142/s0218195991000025 ◽

1991 ◽

Vol 01 (01) ◽

pp. 1-22 ◽

Cited By ~ 19

Author(s):

LEONIDAS GUIBAS ◽

JOHN HERSHBERGER ◽

JACK SNOEYINK

Keyword(s):

Data Structure ◽

Convex Hull ◽

Compact Interval ◽

Common Tangent ◽

Convex Polygons ◽

Fixed Set ◽

Interval Tree ◽

Interval Trees ◽

Arrangements Of Lines ◽

The Common

In this paper, we investigate the problem of finding the common tangents of two convex polygons that intersect in two (unknown) points. First, we give a Θ( log 2n) bound for algorithms that store the polygons in independent arrays. Second, we show how to beat the lower bound if the vertices of the convex polygons are drawn from a fixed set of n points. We introduce a data structure called a compact interval tree that supports common tangent computations, as well as the standard binary-search-based queries, in O( log n) time apiece. Third, we apply compact interval trees to solve the subpath hull query problem: given a simple path, preprocess it so that we can find the convex hull of a query subpath quickly. With O(n log n) preprocessing, we can assemble a compact interval tree that represents the convex hull of a query subpath in O( log n) time. In order to represent arrangements of Lines implicitly, Edelsbrunner et al. used a less efficient structure, called bridge trees, to solve the subpath hull query problem. Our compact interval trees improve their results by a factor of O( log n). Thus, the present paper replaces the paper on bridge trees referred to by Edelsbrunner et al.

Download Full-text

A Data Structure for Disjoint Sets

Genomic Perl ◽

10.1017/cbo9781139164764.021 ◽

2002 ◽

pp. 313-317

Keyword(s):

Data Structure ◽

Disjoint Sets

Download Full-text

Latent psychological data structure of secondary driving tasks

PsycEXTRA Dataset ◽

10.1037/e577182012-012 ◽

2004 ◽

Author(s):

Reates Curry

Keyword(s):

Data Structure ◽

Psychological Data

Download Full-text

Distributed Modeling Of Building Systems Through Cyber-Physical Integration

Promyshlennoe i Grazhdanskoe Stroitel stvo ◽

10.33622/0869-7019.2019.09.12-17 ◽

2019 ◽

pp. 12-17

Keyword(s):

Data Structure ◽

Operating Conditions ◽

Physical Models ◽

Distributed Model ◽

Physical Processes ◽

Distributed Modeling ◽

Distributed Models ◽

Operating Modes ◽

Building Systems ◽

Important Practical Application

This article describes the proposed approaches to creating distributed models that can, with given accuracy under given restrictions, replace classical physical models for construction objects. The ability to implement the proposed approaches is a consequence of the cyber-physical integration of building systems. The principles of forming the data structure of designed objects and distributed models, which make it possible to uniquely identify the elements and increase the level of detail of such a model, are presented. The data structure diagram of distributed modeling includes, among other things, the level of formation and transmission of signals about physical processes inside cyber-physical building systems. An enlarged algorithm for creating the structure of the distributed model which describes the process of developing a data structure, formalizing requirements for the parameters of a design object and its operating modes (including normal operating conditions and extreme conditions, including natural disasters) and selecting objects for a complete group that provides distributed modeling is presented. The article formulates the main approaches to the implementation of an important practical application of the cyber-physical integration of building systems - the possibility of forming distributed physical models of designed construction objects and the directions of further research are outlined.

Download Full-text