Faster sequence alignment through GPU-accelerated restriction of the seed-and-extend search space

Motivation: In computing pairwise alignments of biological sequences, software implementations employ a variety of heuristics that decrease the computational effort involved in computing potential alignments. A key element in achieving high processing throughput is to identify and prioritize potential alignments where high-scoring mappings can be expected. These tasks involve list-processing operations that can be efficiently performed on GPU hardware. Results: We implemented a read aligner called A21 that exploits GPU-based parallel sort and reduction techniques to restrict the number of locations where potential alignments may be found. When compared with other high-throughput aligners, this approach finds more high-scoring mappings without sacrificing speed or accuracy. A21 running on a single GPU is about 10 times faster than comparable CPU-based tools; it is also faster and more sensitive in comparison with other recent GPU-based aligners.

Download Full-text

High throughput TCR sequence alignment using multi-GPU with inter-task parallelization

2012 IEEE-EMBS Conference on Biomedical Engineering and Sciences ◽

10.1109/iecbes.2012.6498184 ◽

2012 ◽

Cited By ~ 2

Author(s):

Guoli Ji ◽

Qiang Li ◽

Mingcheng Wu ◽

Jingyi Fu ◽

Xiaorong Hu ◽

...

Keyword(s):

Sequence Alignment ◽

High Throughput

Download Full-text

Searching novel complex solid solution electrocatalysts in unconventional element combinations

Nano Research ◽

10.1007/s12274-021-3637-z ◽

2021 ◽

Author(s):

Olga A. Krysiak ◽

Simon Schumacher ◽

Alan Savan ◽

Wolfgang Schuhmann ◽

Alfred Ludwig ◽

...

Keyword(s):

High Throughput ◽

High Throughput Screening ◽

Active Sites ◽

Search Space ◽

Reduction Reaction ◽

Film Material ◽

X Ray ◽

High Entropy ◽

Catalyst Composition ◽

Scanning Transmission

AbstractDespite outstanding accomplishments in catalyst discovery, finding new, more efficient, environmentally neutral, and noble metal-free catalysts remains challenging and unsolved. Recently, complex solid solutions consisting of at least five different elements and often named as high-entropy alloys have emerged as a new class of electrocatalysts for a variety of reactions. The multicomponent combinations of elements facilitate tuning of active sites and catalytic properties. Predicting optimal catalyst composition remains difficult, making testing of a very high number of them indispensable. We present the high-throughput screening of the electrochemical activity of thin film material libraries prepared by combinatorial co-sputtering of metals which are commonly used in catalysis (Pd, Cu, Ni) combined with metals which are not commonly used in catalysis (Ti, Hf, Zr). Introducing unusual elements in the search space allows discovery of catalytic activity for hitherto unknown compositions. Material libraries with very similar composition spreads can show different activities vs. composition trends for different reactions. In order to address the inherent challenge of the huge combinatorial material space and the inability to predict active electrocatalyst compositions, we developed a high-throughput process based on co-sputtered material libraries, and performed high-throughput characterization using energy dispersive X-ray spectroscopy (EDS), scanning transmission electron microscopy (SEM), X-ray diffraction (XRD) and conductivity measurements followed by electrochemical screening by means of a scanning droplet cell. The results show surprising material compositions with increased activity for the oxygen reduction reaction and the hydrogen evolution reaction. Such data are important input data for future data-driven materials prediction.

Download Full-text

Arioc: high-throughput read alignment with GPU-accelerated exploration of the seed-and-extend search space

PeerJ ◽

10.7717/peerj.808 ◽

2015 ◽

Vol 3 ◽

pp. e808 ◽

Cited By ~ 16

Author(s):

Richard Wilton ◽

Tamas Budavari ◽

Ben Langmead ◽

Sarah J. Wheelan ◽

Steven L. Salzberg ◽

...

Keyword(s):

High Throughput ◽

Search Space ◽

Read Alignment

Download Full-text

SMT-Based Contention-Free Task Mapping and Scheduling on 2D/3D SMART NoC with Mixed Dimension-Order Routing

ACM Transactions on Architecture and Code Optimization ◽

10.1145/3487018 ◽

2022 ◽

Vol 19 (1) ◽

pp. 1-21

Author(s):

Daeyeal Lee ◽

Bill Lin ◽

Chung-Kuan Cheng

Keyword(s):

System Performance ◽

Search Space ◽

Satisfiability Modulo Theories ◽

Low Latency ◽

Task Mapping ◽

Single Cycle ◽

Space Reduction ◽

Reduction Techniques ◽

2D And 3D ◽

Mixed Dimension

SMART NoCs achieve ultra-low latency by enabling single-cycle multiple-hop transmission via bypass channels. However, contention along bypass channels can seriously degrade the performance of SMART NoCs by breaking the bypass paths. Therefore, contention-free task mapping and scheduling are essential for optimal system performance. In this article, we propose an SMT (Satisfiability Modulo Theories)-based framework to find optimal contention-free task mappings with minimum application schedule lengths on 2D/3D SMART NoCs with mixed dimension-order routing. On top of SMT’s fast reasoning capability for conditional constraints, we develop efficient search-space reduction techniques to achieve practical scalability. Experiments demonstrate that our SMT framework achieves 10× higher scalability than ILP (Integer Linear Programming) with 931.1× (ranges from 2.2× to 1532.1×) and 1237.1× (ranges from 4× to 4373.8×) faster average runtimes for finding optimum solutions on 2D and 3D SMART NoCs and our 2D and 3D extensions of the SMT framework with mixed dimension-order routing also maintain the improved scalability with the extended and diversified routing paths, resulting in reduced application schedule lengths throughout various application benchmarks.

Download Full-text

Improving rule-based classification using Harmony Search

PeerJ Computer Science ◽

10.7717/peerj-cs.188 ◽

2019 ◽

Vol 5 ◽

pp. e188 ◽

Cited By ~ 2

Author(s):

Hesam Hasanpour ◽

Ramak Ghavamizadeh Meibodi ◽

Keivan Navi

Keyword(s):

Harmony Search ◽

Search Space ◽

Classification Model ◽

Classification Algorithms ◽

Apriori Algorithm ◽

Rule Mining ◽

Rule Based ◽

Rule Based Classifier ◽

High Processing ◽

Fundamental Limitation

Classification and associative rule mining are two substantial areas in data mining. Some scientists attempt to integrate these two field called rule-based classifiers. Rule-based classifiers can play a very important role in applications such as fraud detection, medical diagnosis, etc. Numerous previous studies have shown that this type of classifier achieves a higher classification accuracy than traditional classification algorithms. However, they still suffer from a fundamental limitation. Many rule-based classifiers used various greedy techniques to prune the redundant rules that lead to missing some important rules. Another challenge that must be considered is related to the enormous set of mined rules that result in high processing overhead. The result of these approaches is that the final selected rules may not be the global best rules. These algorithms are not successful at exploiting search space effectively in order to select the best subset of candidate rules. We merged the Apriori algorithm, Harmony Search, and classification-based association rules (CBA) algorithm in order to build a rule-based classifier. We applied a modified version of the Apriori algorithm with multiple minimum support for extracting useful rules for each class in the dataset. Instead of using a large number of candidate rules, binary Harmony Search was utilized for selecting the best subset of rules that appropriate for building a classification model. We applied the proposed method on a seventeen benchmark dataset and compared its result with traditional association rule classification algorithms. The statistical results show that our proposed method outperformed other rule-based approaches.

Download Full-text

Damage Identification of Multimember Structure using Improved Neural Networks

International Journal of Manufacturing Materials and Mechanical Engineering ◽

10.4018/ijmmme.2013070104 ◽

2013 ◽

Vol 3 (3) ◽

pp. 57-75

Author(s):

M. Rajendra ◽

K. Shankar

Keyword(s):

Damage Identification ◽

Latin Hypercube Sampling ◽

Search Space ◽

Computational Effort ◽

Frequency Change ◽

The Novel ◽

Damage Prediction ◽

Rbf Network ◽

Accuracy And Precision ◽

Two Stages

A novel two stage Improved Radial Basis Function (IRBF) neural network for the damage identification of a multimember structure in the frequency domain is presented. The improvement of the proposed IRBF network is carried out in two stages. Conventional RBF network is used in the first stage for preliminary damage prediction and in the second stage reduced search space moving technique is used to minimize the prediction error. The network is trained with fractional frequency change ratios (FFCs) and damage signature indices (DSIs) as effective input patterns and the corresponding damage severity values as output patterns. The patterns are searched at different damage levels by Latin hypercube sampling (LHS) technique. The performance of the novel IRBF method is compared with the conventional RBF and Genetic algorithm (GA) methods and it is found to be a good multiple member damage identification strategy in terms of accuracy and precision with less computational effort.

Download Full-text

Human body-fluid proteome: quantitative profiling and computational prediction

Briefings in Bioinformatics ◽

10.1093/bib/bbz160 ◽

2020 ◽

Author(s):

Lan Huang ◽

Dan Shao ◽

Yan Wang ◽

Xueteng Cui ◽

Yufei Li ◽

...

Keyword(s):

High Throughput ◽

Human Body ◽

Body Fluid ◽

Biomarker Discovery ◽

Interaction Network ◽

Computational Prediction ◽

Computational Effort ◽

The Body ◽

Support Vector ◽

Learning Approaches

Abstract Empowered by the advancement of high-throughput bio technologies, recent research on body-fluid proteomes has led to the discoveries of numerous novel disease biomarkers and therapeutic drugs. In the meantime, a tremendous progress in disclosing the body-fluid proteomes was made, resulting in a collection of over 15 000 different proteins detected in major human body fluids. However, common challenges remain with current proteomics technologies about how to effectively handle the large variety of protein modifications in those fluids. To this end, computational effort utilizing statistical and machine-learning approaches has shown early successes in identifying biomarker proteins in specific human diseases. In this article, we first summarized the experimental progresses using a combination of conventional and high-throughput technologies, along with the major discoveries, and focused on current research status of 16 types of body-fluid proteins. Next, the emerging computational work on protein prediction based on support vector machine, ranking algorithm, and protein–protein interaction network were also surveyed, followed by algorithm and application discussion. At last, we discuss additional critical concerns about these topics and close the review by providing future perspectives especially toward the realization of clinical disease biomarker discovery.

Download Full-text

Two-Level Parallel Alignment Based on Sequence Parallel Vectorization

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.490-491.757 ◽

2014 ◽

Vol 490-491 ◽

pp. 757-762

Author(s):

Guo Li Ji ◽

Long Teng Chen ◽

Liang Liang Chen

Keyword(s):

T Cell ◽

T Cell Receptor ◽

Sequence Alignment ◽

High Throughput ◽

Sequence Data ◽

Cell Receptor ◽

Alignment Accuracy ◽

Gpu Acceleration ◽

Alignment Algorithm ◽

Alignment Result

This paper proposed a way of two-level parallel alignment based on sequence parallel vectorization with GPU acceleration on the Fermi architecture, which integrates sequence parallel vectorization, parallel k-means clustering approximate alignment and parallel Smith-Waterman algorithm. The method converts sequence alignment into vector alignment by first. Then it uses k-means alignment to divide sequences into several groups and reduce the size of sequence data. The expected accurate alignment result is achieved using parallel Smith-Waterman algorithm. The high-throughput mouse T-cell receptor (TCR) sequences were used to validate the proposed method. Under the same hardware condition, comparing to serial Smith-Waterman algorithm and CUDASW++2.0 algorithm, our method is the most efficient alignment algorithm with high alignment accuracy.

Download Full-text

Model order reduction techniques applied to magnetodynamic T-Ω-formulation

COMPEL The International Journal for Computation and Mathematics in Electrical and Electronic Engineering ◽

10.1108/compel-01-2020-0025 ◽

2020 ◽

Vol 39 (5) ◽

pp. 1057-1069

Author(s):

Fabian Müller ◽

Lucas Crampen ◽

Thomas Henneron ◽

Stephane Clénet ◽

Kay Hameyer

Keyword(s):

Eddy Current ◽

Model Order Reduction ◽

Scalar Potential ◽

Order Reduction ◽

Computational Effort ◽

Transient Field ◽

Model Order ◽

Field Problem ◽

Content Type ◽

Reduction Techniques

Purpose The purpose of this paper is to use different model order reduction techniques to cope with the computational effort of solving large systems of equations. By appropriate decomposition of the electromagnetic field problem, the number of degrees of freedom (DOF) can be efficiently reduced. In this contribution, the Proper Generalized Decomposition (PGD) and the Proper Orthogonal Decomposition (POD) are used in the frame of the T-Ω-formulation, and the feasibility is elaborated. Design/methodology/approach The POD and the PGD are two methods to reduce the model order. Particularly in the context of eddy current problems, conventional time-stepping algorithms can lead to many numerical simulations of the studied problem. To simulate the transient field, the T-Ω-formulation is used which couples the magnetic scalar potential and the electric vector potential. In this paper, both methods are studied on an academic example of an induction furnace in terms of accuracy and computational effort. Findings Using the proposed reduction techniques significantly reduces the DOF and subsequently the computational effort. Further, the feasibility of the combination of both methods with the T-Ω-formulation is given, and a fundamental step toward fast simulation of eddy current problems is shown. Originality/value In this paper, the PGD is combined for the first time with the T-Ω-formulation. The application of the PGD and POD and the following comparison illustrate the great potential of these techniques in combination with the T-Ω-formulation in context of eddy current problems.

Download Full-text

PLANiTS: a curated sequence reference dataset for plant ITS DNA metabarcoding

Database ◽

10.1093/database/baz155 ◽

2020 ◽

Vol 2020 ◽

Cited By ~ 6

Author(s):

Elisa Banchi ◽

Claudio G Ametrano ◽

Samuele Greco ◽

David Stanković ◽

Lucia Muggia ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Its Region ◽

Computational Effort ◽

Its Sequences ◽

Reference Dataset ◽

Bioinformatic Pipeline ◽

Taxonomic Level ◽

Dna Metabarcoding ◽

Reference Databases

Abstract DNA metabarcoding combines DNA barcoding with high-throughput sequencing to identify different taxa within environmental communities. The ITS has already been proposed and widely used as universal barcode marker for plants, but a comprehensive, updated and accurate reference dataset of plant ITS sequences has not been available so far. Here, we constructed reference datasets of Viridiplantae ITS1, ITS2 and entire ITS sequences including both Chlorophyta and Streptophyta. The sequences were retrieved from NCBI, and the ITS region was extracted. The sequences underwent identity check to remove misidentified records and were clustered at 99% identity to reduce redundancy and computational effort. For this step, we developed a script called ‘better clustering for QIIME’ (bc4q) to ensure that the representative sequences are chosen according to the composition of the cluster at a different taxonomic level. The three datasets obtained with the bc4q script are PLANiTS1 (100 224 sequences), PLANiTS2 (96 771 sequences) and PLANiTS (97 550 sequences), and all are pre-formatted for QIIME, being this the most used bioinformatic pipeline for metabarcoding analysis. Being curated and updated reference databases, PLANiTS1, PLANiTS2 and PLANiTS are proposed as a reliable, pivotal first step for a general standardization of plant DNA metabarcoding studies. The bc4q script is presented as a new tool useful in each research dealing with sequences clustering. Database URL: https://github.com/apallavicini/bc4q; https://github.com/apallavicini/PLANiTS.

Download Full-text