An interactive SQL relational interface for querying main-memory data structures

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.

Download Full-text

Compact Data Structures to Represent and Query Data Warehouses into Main Memory

IEEE Latin America Transactions ◽

10.1109/tla.2018.8789552 ◽

2018 ◽

Vol 16 (9) ◽

pp. 2328-2335 ◽

Cited By ~ 2

Author(s):

Cristian Vallejos ◽

Monica Caniupan ◽

Gilberto Gutierrez

Keyword(s):

Data Structures ◽

Main Memory ◽

Data Warehouses ◽

Compact Data Structures

Download Full-text

Storing Set Families More Compactly with Top ZDDs

Algorithms ◽

10.3390/a14060172 ◽

2021 ◽

Vol 14 (6) ◽

pp. 172

Author(s):

Kotaro Matsuda ◽

Shuhei Denzumi ◽

Kunihiko Sadakane

Keyword(s):

Data Structures ◽

Binary Decision Diagrams ◽

Real Data ◽

Directed Acyclic Graphs ◽

Main Memory ◽

Large Set ◽

Decision Diagrams ◽

Binary Decision ◽

Acyclic Graphs

Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of the top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller sizes than ZDDs for real data.

Download Full-text

CoroBase

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430932 ◽

2020 ◽

Vol 14 (3) ◽

pp. 431-444

Author(s):

Yongjun He ◽

Jiacheng Lu ◽

Tianzheng Wang

Keyword(s):

Data Structures ◽

Main Memory ◽

Data Prefetching ◽

Backward Compatibility ◽

Transaction Models ◽

Main Memory Database ◽

Hide Data ◽

Rich Data ◽

Software Prefetching ◽

Database Engine

Data stalls are a major overhead in main-memory database engines due to the use of pointer-rich data structures. Lightweight coroutines ease the implementation of software prefetching to hide data stalls by overlapping computation and asynchronous data prefetching. Prior solutions, however, mainly focused on (1) individual components and operations and (2) intra-transaction batching that requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much end-to-end benefit they bring under various workloads. This paper presents CoroBase, a main-memory database engine that tackles these challenges with a new coroutine-to-transaction paradigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefits of prefetching. We show that on a 48-core server, CoroBase can perform close to 2x better for read-intensive workloads and remain competitive for workloads that inherently do not benefit from software prefetching.

Download Full-text

Optimization of T-Tree Index of Main Memory Database in Critical Application

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.40-41.206 ◽

2010 ◽

Vol 40-41 ◽

pp. 206-211

Author(s):

Zhi Lin Zhu

Keyword(s):

Data Structures ◽

High Performance ◽

Main Memory ◽

Ongoing Study ◽

Index Structure ◽

Index Structures ◽

Main Memory Database ◽

Data Structures And Algorithms ◽

Tree Index ◽

Overall Performance

One approach to achieving high performance in the DBMS in the critical application is to store the database in main memory rather than on disk. One can then design new data structures and algorithms oriented towards increasing the efficiency of the main memory database -MMDB. In this paper we present some results on index structures from an ongoing study of MMDB. We propose a new index structure, the T-tail Tree. We give the main algorithm of the T-tail Tree and the performance of these algorithms. Our results indicate that T-tail Tree provides good overall performance in main memory.

Download Full-text

COMPRESSION OF HIGH-RESOLUTION VOXEL PHANTOMS BY MEANS OF B+ TREE

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v7.i12.2019.312 ◽

2020 ◽

Vol 7 (12) ◽

pp. 199-208

Author(s):

A. Kavinilavu ◽

S. Neelavathy Pari

Keyword(s):

Data Structures ◽

Hierarchical Structures ◽

Main Memory ◽

Memory Access ◽

Scientific Models ◽

Access Time ◽

Structural Representation ◽

Access To Data ◽

Tree Leaf ◽

Voxel Phantoms

Data structures are chosen to save space and to grant fast access to data by it’s key for a particular structural representation. The data structures surveyed are linear lists, hierarchical structures, graph structures. B+ tree is an expansion of a B tree data structure which allows efficient insertions, deletions and search operations. It is used to store a large amount of data that cannot be stored in the main memory. B+ tree leaf nodes are connected together in the form of a singly linked list to make search queries more efficient and effective. The drawback of binary tree geometry is that the decrease in memory use comes at the expense of more frequent memory access, might slow down simulation in which frequent memory access constitutes a significant part of the execution time. Processing and compression of voxel phantoms without loss of quality. Voxels are often utilized in the visualization and analysis of medical and scientific (logical) information. Voxel phantoms which comprise a set of small volume components appeared towards the end of the 1980s and improved on the first scientific models. These are the models of the human body. These phantoms are an extremely exact representation. Fetching of records in the equal number of disk accesses and to reduce the access time by reducing the height of the tree and increasing the number of branches in the node.

Download Full-text

Optimal Solutions to the Sort Algorithms of Database Structure

European Journal of Education ◽

10.26417/ejed.v3i1.p40-50 ◽

2020 ◽

Vol 3 (1) ◽

pp. 40

Author(s):

Rifat Osmanaj ◽

Hysen Binjaku

Keyword(s):

Data Structures ◽

Main Memory ◽

Sorting Algorithm ◽

Massive Data ◽

Optimal Solutions ◽

Tabular Data ◽

Data Set ◽

Advantages And Disadvantages ◽

Database Structure ◽

Selection Of

Sorting is much used in massive data applications, insurance systems, education, health, business, etc. To the sorting operation that sorts the data as desired, quick access to the required data is achieved. Typically sorted data are organized in strings as file elements or tables. The most common case is when the tabular data is processed in the main memory of the computer. The paper presents the algorithms currently used for sorting objects that are involved in static and dynamic data structures. Then the selection of the data set on which particular algorithms will be applied will be made and the advantages and disadvantages of each of the algorithms in question will be seen.Thereafter, it is determined the efficiency of the sorting algorithm work and it is considered what is determinative when selecting the appropriate algorithm for sorting.

Download Full-text

Data Representation in Big data via succinct data structures

GBAMS- Vidushi ◽

10.26829/vidushi.v9i02.12288 ◽

2017 ◽

Vol 9 (02) ◽

Author(s):

Vinesh Kumar ◽

Jayant Shekhar ◽

Sunil Kumar

Keyword(s):

Big Data ◽

Data Structures ◽

Time Complexity ◽

Data Representation ◽

Complex Problem ◽

Main Memory ◽

Succinct Data Structures ◽

Efficient Data ◽

Tree Data ◽

Day By Day

Data Representation in memory is one of the tasks in Big data. Data representation includes several types of tree data structures through the system can access accurate and efficient data in big data. Succinct data structures can play important role in data representation while data in big-data is processed in main memory. Data representation is a very complex problem in Big Data.We proposed some solution of problems of data representation in Big data. Data processing in big data can be utilized to take a decision on data mining. We know the function and rules for query processing. We have to either change the method of processor we can change the way of representation. In this paper, different kind of tree data structures is presented for data representation in main memory of computer system for big data by using succinct data structures. Here we first compare all data structures by the table. Each method has different space and time complexity. We know that Big data information services increasing day by day. So space complexity of succinct data structures is becoming very popular in practice in this era.

Download Full-text

Structural Equations Modeling With Nonhierarchically Dependent Data Structures

PsycEXTRA Dataset ◽

10.1037/e658982011-001 ◽

2011 ◽

Author(s):

Paras D. Mehta ◽

Steven M. Boker ◽

Michael C. Neale

Keyword(s):

Data Structures ◽

Structural Equations ◽

Dependent Data ◽

Structural Equations Modeling

Download Full-text