Compact Data Structures to Represent and Query Data Warehouses into Main Memory

2018 ◽  
Vol 16 (9) ◽  
pp. 2328-2335 ◽  
Author(s):  
Cristian Vallejos ◽  
Monica Caniupan ◽  
Gilberto Gutierrez
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Jeongmin Bae ◽  
Hajin Jeon ◽  
Min-Soo Kim

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.


Author(s):  
Zheng Li ◽  
Diego Seco ◽  
Jose Fuentes-Sepulveda

2013 ◽  
Vol 38 (2) ◽  
pp. 131-142 ◽  
Author(s):  
Artur Wojciechowski

AbstractData warehouses integrate external data sources (EDSs), which very often change their data structures (schemas). In many cases, such changes cause an erroneous execution of an already deployed ETL workow. Structural changes of EDSs are frequent, therefore an automatic reparation of an ETL workow, after such changes, is of a high importance. This paper presents a framework, called E-ETL, for handling the evolution of an ETL layer. Detection of changes in EDSs causes a repa- ration of the fragment of ETL workow which interacts with the changed EDSs. The proposed framework was developed as a module external to a standard commercial or open-source ETL engine, accessing the engine by means of API. The innovation of this framework consists in: (1) the algorithms for semi-automatic reparation of an ETL workow and (2) its ability to interact with various ETL engines that provide API.


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 172
Author(s):  
Kotaro Matsuda ◽  
Shuhei Denzumi ◽  
Kunihiko Sadakane

Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of the top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller sizes than ZDDs for real data.


2020 ◽  
Vol 14 (3) ◽  
pp. 431-444
Author(s):  
Yongjun He ◽  
Jiacheng Lu ◽  
Tianzheng Wang

Data stalls are a major overhead in main-memory database engines due to the use of pointer-rich data structures. Lightweight coroutines ease the implementation of software prefetching to hide data stalls by overlapping computation and asynchronous data prefetching. Prior solutions, however, mainly focused on (1) individual components and operations and (2) intra-transaction batching that requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much end-to-end benefit they bring under various workloads. This paper presents CoroBase, a main-memory database engine that tackles these challenges with a new coroutine-to-transaction paradigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefits of prefetching. We show that on a 48-core server, CoroBase can perform close to 2x better for read-intensive workloads and remain competitive for workloads that inherently do not benefit from software prefetching.


Sign in / Sign up

Export Citation Format

Share Document