GPrimer: a fast GPU-based pipeline for primer design for qPCR experiments

Abstract Background Design of valid high-quality primers is essential for qPCR experiments. MRPrimer is a powerful pipeline based on MapReduce that combines both primer design for target sequences and homology tests on off-target sequences. It takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB. Due to the effectiveness of primers designed by MRPrimer in qPCR analysis, it has been widely used for developing many online design tools and building primer databases. However, the computational speed of MRPrimer is too slow to deal with the sizes of sequence DBs growing exponentially and thus must be improved. Results We develop a fast GPU-based pipeline for primer design (GPrimer) that takes the same input and returns the same output with MRPrimer. MRPrimer consists of a total of seven MapReduce steps, among which two steps are very time-consuming. GPrimer significantly improves the speed of those two steps by exploiting the computational power of GPUs. In particular, it designs data structures for coalesced memory access in GPU and workload balancing among GPU threads and copies the data structures between main memory and GPU memory in a streaming fashion. For human RefSeq DB, GPrimer achieves a speedup of 57 times for the entire steps and a speedup of 557 times for the most time-consuming step using a single machine of 4 GPUs, compared with MRPrimer running on a cluster of six machines. Conclusions We propose a GPU-based pipeline for primer design that takes an entire sequence DB as input and returns all feasible and valid primer pairs existing in the DB at once without an additional step using BLAST-like tools. The software is available at https://github.com/qhtjrmin/GPrimer.git.

Download Full-text

Robust Performance of Main Memory Data Structures by Configuration

Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data ◽

10.1145/3318464.3389725 ◽

2020 ◽

Cited By ~ 1

Author(s):

Tiemo Bang ◽

Ismail Oukid ◽

Norman May ◽

Ilia Petrov ◽

Carsten Binnig

Keyword(s):

Data Structures ◽

Main Memory ◽

Robust Performance

Download Full-text

Compact Data Structures to Represent and Query Data Warehouses into Main Memory

IEEE Latin America Transactions ◽

10.1109/tla.2018.8789552 ◽

2018 ◽

Vol 16 (9) ◽

pp. 2328-2335 ◽

Cited By ~ 2

Author(s):

Cristian Vallejos ◽

Monica Caniupan ◽

Gilberto Gutierrez

Keyword(s):

Data Structures ◽

Main Memory ◽

Data Warehouses ◽

Compact Data Structures

Download Full-text

Metamodeling of NURBS Surfaces for Visual Representation of Hydrodynamic Solutions

10.5957/fast-2015-003 ◽

2015 ◽

Author(s):

Brian J. Cuneo ◽

Michael J. Sypniewski ◽

Zensho Heshiki ◽

Benjamin J. Rosenthal

Keyword(s):

Design Tool ◽

Decision Makers ◽

Optimal Designs ◽

Background Information ◽

Distributed Data ◽

Design Tools ◽

Total Resistance ◽

3 Dimensional ◽

Computational Speed ◽

Nurbs Surfaces

The present paper discusses the topic that of directional stability and course keeping of fast Throughout early stages of ship design, numerous design variable values must be considered to gather information in an effort to steer decision makers towards optimal designs. The computational speed of design tools often leads engineers to make tradeoffs between solution fidelity and the time it takes to analyze multiple designs with the same tool. Metamodeling allows for instantaneous estimation of higher fidelity solutions for tools with a single value as an output, such as total resistance. However, these methods lack in usefulness when the solution of a design tool is visual in nature, such as the distribution of pressures on a hull. This paper will outline a new method of metamodeling applied to distributed data. While current metamodeling methods are useful in automated design environments, such as optimization algorithms, the method described in this paper allows the designer to be brought back into the design process by providing immediate estimations of design tool outputs that are best interpreted visually. The method works by using Non-Uniform Rational B-Spine (NURBS) curves and surfaces to represent 2 and 3 dimensional outputs of design tools and applying conventional metamodeling methods to the NURBS coefficients. The remainder of this paper will be outlined as follows: first background information will be presented, then the NURBS metamodeling method will be explained in detail, next two example applications will be demonstrated, finally some conclusions will be made.

Download Full-text

Storing Set Families More Compactly with Top ZDDs

Algorithms ◽

10.3390/a14060172 ◽

2021 ◽

Vol 14 (6) ◽

pp. 172

Author(s):

Kotaro Matsuda ◽

Shuhei Denzumi ◽

Kunihiko Sadakane

Keyword(s):

Data Structures ◽

Binary Decision Diagrams ◽

Real Data ◽

Directed Acyclic Graphs ◽

Main Memory ◽

Large Set ◽

Decision Diagrams ◽

Binary Decision ◽

Acyclic Graphs

Zero-suppressed Binary Decision Diagrams (ZDDs) are data structures for representing set families in a compressed form. With ZDDs, many valuable operations on set families can be done in time polynomial in ZDD size. In some cases, however, the size of ZDDs for representing large set families becomes too huge to store them in the main memory. This paper proposes top ZDD, a novel representation of ZDDs which uses less space than existing ones. The top ZDD is an extension of the top tree, which compresses trees, to compress directed acyclic graphs by sharing identical subgraphs. We prove that navigational operations on ZDDs can be done in time poly-logarithmic in ZDD size, and show that there exist set families for which the size of the top ZDD is exponentially smaller than that of the ZDD. We also show experimentally that our top ZDDs have smaller sizes than ZDDs for real data.

Download Full-text

CoroBase

Proceedings of the VLDB Endowment ◽

10.14778/3430915.3430932 ◽

2020 ◽

Vol 14 (3) ◽

pp. 431-444

Author(s):

Yongjun He ◽

Jiacheng Lu ◽

Tianzheng Wang

Keyword(s):

Data Structures ◽

Main Memory ◽

Data Prefetching ◽

Backward Compatibility ◽

Transaction Models ◽

Main Memory Database ◽

Hide Data ◽

Rich Data ◽

Software Prefetching ◽

Database Engine

Data stalls are a major overhead in main-memory database engines due to the use of pointer-rich data structures. Lightweight coroutines ease the implementation of software prefetching to hide data stalls by overlapping computation and asynchronous data prefetching. Prior solutions, however, mainly focused on (1) individual components and operations and (2) intra-transaction batching that requires interface changes, breaking backward compatibility. It was not clear how they apply to a full database engine and how much end-to-end benefit they bring under various workloads. This paper presents CoroBase, a main-memory database engine that tackles these challenges with a new coroutine-to-transaction paradigm. Coroutine-to-transaction models transactions as coroutines and thus enables inter-transaction batching, avoiding application changes but retaining the benefits of prefetching. We show that on a 48-core server, CoroBase can perform close to 2x better for read-intensive workloads and remain competitive for workloads that inherently do not benefit from software prefetching.

Download Full-text

An interactive SQL relational interface for querying main-memory data structures

Computing ◽

10.1007/s00607-015-0452-y ◽

2015 ◽

Vol 97 (12) ◽

pp. 1141-1164 ◽

Cited By ~ 1

Author(s):

Marios Fragkoulis ◽

Diomidis Spinellis ◽

Panos Louridas

Keyword(s):

Data Structures ◽

Main Memory

Download Full-text

Optimization of T-Tree Index of Main Memory Database in Critical Application

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.40-41.206 ◽

2010 ◽

Vol 40-41 ◽

pp. 206-211

Author(s):

Zhi Lin Zhu

Keyword(s):

Data Structures ◽

High Performance ◽

Main Memory ◽

Ongoing Study ◽

Index Structure ◽

Index Structures ◽

Main Memory Database ◽

Data Structures And Algorithms ◽

Tree Index ◽

Overall Performance

One approach to achieving high performance in the DBMS in the critical application is to store the database in main memory rather than on disk. One can then design new data structures and algorithms oriented towards increasing the efficiency of the main memory database -MMDB. In this paper we present some results on index structures from an ongoing study of MMDB. We propose a new index structure, the T-tail Tree. We give the main algorithm of the T-tail Tree and the performance of these algorithms. Our results indicate that T-tail Tree provides good overall performance in main memory.

Download Full-text

MetaFunPrimer: primer design for targeting genes observed in metagenomes

10.1101/2020.07.01.183509 ◽

2020 ◽

Author(s):

Jia Liu ◽

Paul Villanueva ◽

Jinlyung Choi ◽

Santosh Gunturu ◽

Yang Ouyang ◽

...

Keyword(s):

High Throughput ◽

Primer Design ◽

Functional Genes ◽

Ammonia Oxidizing Bacteria ◽

Design Tools ◽

Ammonia Monooxygenase ◽

Bioinformatic Pipeline ◽

Subunit A ◽

Gene Characterization ◽

Qpcr Primer

ABSTRACTHigh throughput primer design is needed to simultaneously design primers for multiple genes of interest, such as a group of functional genes. We have developed MetaFunPrimer, a bioinformatic pipeline to design primer targets for genes of interests, with a prioritization based on ranking the presence of gene targets in references, such as metagenomes. MetaFunPrimer takes inputs of protein and nucleotide sequences for gene targets of interest accompanied by a set of reference metagenomes or genomes for determining genes of interest. Its output is a set of primers that may be used to amplify genes of interest. To demonstrate the usage and benefits of MetaFunPrimer, a total of 78 HT-qPCR primer pairs were designed to target observed ammonia monooxygenase subunit A (amoA) genes of ammonia-oxidizing bacteria (AOB) in 1,550 soil metagenomes. We demonstrate that these primers can significantly improve targeting of amoA-AOB genes in soil metagenomes compared to previously published primers.IMPORTANCEAmplification-based gene characterization allows for sensitive and specific quantification of functional genes. Often, there is a large diversity of genes represented for a function of interest, and multiple primers may be necessary to target associated genes. Current primer design tools are limited to designing primers for only a few genes of interest. MetaFunPrimer allows for high throughput primer design for functional genes of interest and also allows for ranking gene targets by their presence and abundance in environmental datasets. This tool enables high throughput qPCR approaches for characterizing functional genes.

Download Full-text

COMPRESSION OF HIGH-RESOLUTION VOXEL PHANTOMS BY MEANS OF B+ TREE

International Journal of Research -GRANTHAALAYAH ◽

10.29121/granthaalayah.v7.i12.2019.312 ◽

2020 ◽

Vol 7 (12) ◽

pp. 199-208

Author(s):

A. Kavinilavu ◽

S. Neelavathy Pari

Keyword(s):

Data Structures ◽

Hierarchical Structures ◽

Main Memory ◽

Memory Access ◽

Scientific Models ◽

Access Time ◽

Structural Representation ◽

Access To Data ◽

Tree Leaf ◽

Voxel Phantoms

Data structures are chosen to save space and to grant fast access to data by it’s key for a particular structural representation. The data structures surveyed are linear lists, hierarchical structures, graph structures. B+ tree is an expansion of a B tree data structure which allows efficient insertions, deletions and search operations. It is used to store a large amount of data that cannot be stored in the main memory. B+ tree leaf nodes are connected together in the form of a singly linked list to make search queries more efficient and effective. The drawback of binary tree geometry is that the decrease in memory use comes at the expense of more frequent memory access, might slow down simulation in which frequent memory access constitutes a significant part of the execution time. Processing and compression of voxel phantoms without loss of quality. Voxels are often utilized in the visualization and analysis of medical and scientific (logical) information. Voxel phantoms which comprise a set of small volume components appeared towards the end of the 1980s and improved on the first scientific models. These are the models of the human body. These phantoms are an extremely exact representation. Fetching of records in the equal number of disk accesses and to reduce the access time by reducing the height of the tree and increasing the number of branches in the node.

Download Full-text

Optimal Solutions to the Sort Algorithms of Database Structure

European Journal of Education ◽

10.26417/ejed.v3i1.p40-50 ◽

2020 ◽

Vol 3 (1) ◽

pp. 40

Author(s):

Rifat Osmanaj ◽

Hysen Binjaku

Keyword(s):

Data Structures ◽

Main Memory ◽

Sorting Algorithm ◽

Massive Data ◽

Optimal Solutions ◽

Tabular Data ◽

Data Set ◽

Advantages And Disadvantages ◽

Database Structure ◽

Selection Of

Sorting is much used in massive data applications, insurance systems, education, health, business, etc. To the sorting operation that sorts the data as desired, quick access to the required data is achieved. Typically sorted data are organized in strings as file elements or tables. The most common case is when the tabular data is processed in the main memory of the computer. The paper presents the algorithms currently used for sorting objects that are involved in static and dynamic data structures. Then the selection of the data set on which particular algorithms will be applied will be made and the advantages and disadvantages of each of the algorithms in question will be seen.Thereafter, it is determined the efficiency of the sorting algorithm work and it is considered what is determinative when selecting the appropriate algorithm for sorting.

Download Full-text