GPU Accelerated Adaptive Banded Event Alignment for Rapid Comparative Nanopore Signal Analysis

AbstractNanopore sequencing has the potential to revolutionise genomics by realising portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these applications requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. For instance, comparing raw nanopore signals to a biological reference sequence is a computationally complex task despite leveraging a dynamic programming algorithm for Adaptive Banded Event Alignment (ABEA)—a commonly used approach to polish sequencing data and identify non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. By optimising memory, compute and load balancing between CPU and GPU, we demonstrate how f5c can perform ~3-5× faster than the original implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c.

Download Full-text

Accelerating Sequence Alignment to Graphs

10.1101/651638 ◽

2019 ◽

Cited By ~ 6

Author(s):

Chirag Jain ◽

Alexander Dilthey ◽

Sanchit Misra ◽

Haowen Zhang ◽

Srinivas Aluru

Keyword(s):

Dna Sequences ◽

Query Sequence ◽

Dynamic Programming Algorithm ◽

Reference Sequence ◽

Programming Algorithm ◽

Peak Performance ◽

Task Parallelism ◽

Sequencing Data ◽

Link Type ◽

String Graph

AbstractAligning DNA sequences to an annotated reference is a key step for genotyping in biology. Recent scientific studies have demonstrated improved inference by aligning reads to a variation graph, i.e., a reference sequence augmented with known genetic variations. Given a variation graph in the form of a directed acyclic string graph, the sequence to graph alignment problem seeks to find the best matching path in the graph for an input query sequence. Solving this problem exactly using a sequential dynamic programming algorithm takes quadratic time in terms of the graph size and query length, making it difficult to scale to high throughput DNA sequencing data. In this work, we propose the first parallel algorithm for computing sequence to graph alignments that leverages multiple cores and single-instruction multiple-data (SIMD) operations. We take advantage of the available inter-task parallelism, and provide a novel blocked approach to compute the score matrix while ensuring high memory locality. Using a 48-core Intel Xeon Skylake processor, the proposed algorithm achieves peak performance of 317 billion cell updates per second (GCUPS), and demonstrates near linear weak and strong scaling on up to 48 cores. It delivers significant performance gains compared to existing algorithms, and results in run-time reduction from multiple days to three hours for the problem of optimally aligning high coverage long (PacBio/ONT) or short (Illumina) DNA reads to an MHC human variation graph containing 10 million vertices.AvailabilityThe implementation of our algorithm is available at https://github.com/ParBLiSS/PaSGAL. Data sets used for evaluation are accessible using https://alurulab.cc.gatech.edu/PaSGAL.

Download Full-text

Optimal scheduling algorithms of system chip power density based on network on chip

Izvestiya vysshikh uchebnykh zavedenii. Fizika ◽

10.17223/00213411/64/9/120 ◽

2021 ◽

pp. 120-127

Author(s):

Jiashen Li ◽

◽

Yun Pan ◽

Keyword(s):

Dynamic Programming ◽

Power Density ◽

Simulated Annealing Algorithm ◽

Dynamic Programming Algorithm ◽

Network On Chip ◽

Thermal Design ◽

Optimal Scheduling ◽

Programming Algorithm ◽

System Throughput ◽

On Chip

The improvement of chip integration leads to the increase of power density of system chips, which leads to the overheating of system chips. When dispatching the power density of system chips, some working modules are selectively closed to avoid all modules on the chip being turned on at the same time and to solve the problem of overheating. Taking 2D grid-on-chip network as the research object, an optimal scheduling algorithm of system-on-chip power density based on network-on-chip (NoC) is proposed. Under the constraints of thermal design power (TDP) and system, dynamic programming algorithm is used to solve the optimal application set throughput allocation from bottom to top by dynamic programming for the number and frequency level of each application configuration processor under the given application set of network-on-chip. On this basis, the simulated annealing algorithm is used to complete the application mapping aiming at heat dissipation effect and communication delay. The open and closed processor layout is determined. After obtaining the layout results, the TDP is adjusted. The maximum TDP constraint is iteratively searched according to the feedback loop of the system over-hot spots, and the power density scheduling performance of the system chip is maximized under this constraint, so as to ensure the system core. At the same time, chip throughput can effectively solve the problem of chip overheating. The experimental results show that the proposed algorithm increases the system chip throughput by about 11%, improves the system throughput loss, and achieves a balance between the system chip power consumption and scheduling time.

Download Full-text

Hardware-Assisted Security Monitoring Unit for Real-Time Ensuring Secure Instruction Execution and Data Processing in Embedded Systems

Micromachines ◽

10.3390/mi12121450 ◽

2021 ◽

Vol 12 (12) ◽

pp. 1450

Author(s):

Xiang Wang ◽

Zhun Zhang ◽

Qiang Hao ◽

Dongdong Xu ◽

Jiqing Wang ◽

...

Keyword(s):

Embedded Systems ◽

Data Processing ◽

Embedded System ◽

High Performance ◽

Program Execution ◽

Security Monitoring ◽

Railway Systems ◽

On Chip ◽

Capability Evaluation ◽

Security Capability

The hardware security of embedded systems is raising more and more concerns in numerous safety-critical applications, such as in the automotive, aerospace, avionic, and railway systems. Embedded systems are gaining popularity in these safety-sensitive sectors with high performance, low power, and great reliability, which are ideal control platforms for executing instruction operation and data processing. However, modern embedded systems are still exposing many potential hardware vulnerabilities to malicious attacks, including software-level and hardware-level attacks; these can cause program execution failure and confidential data leakage. For this reason, this paper presents a novel embedded system by integrating a hardware-assisted security monitoring unit (SMU), for achieving a reinforced system-on-chip (SoC) on ensuring program execution and data processing security. This architecture design was implemented and evaluated on a Xilinx Virtex-5 FPGA development board. Based on the evaluation of the SMU hardware implementation in terms of performance overhead, security capability, and resource consumption, the experimental results indicate that the SMU does not lead to a significant speed degradation to processor while executing different benchmarks, and its average performance overhead reduces to 2.18% on typical 8-KB I/D-Caches. Security capability evaluation confirms the monitoring effectiveness of SMU against both instruction and data tampering attacks. Meanwhile, the SoC satisfies a good balance between high-security and resource overhead.

Download Full-text

Adaptivity in high-performance embedded systems: a reactive control model for reliable and flexible design

The Knowledge Engineering Review ◽

10.1017/s0269888914000150 ◽

2014 ◽

Vol 29 (4) ◽

pp. 433-451

Author(s):

Huafeng Yu ◽

Abdoulaye Gamatié ◽

Éric Rutten ◽

Jean-Luc Dekeyser

Keyword(s):

Embedded Systems ◽

Embedded System ◽

Code Generation ◽

High Performance ◽

Control Model ◽

Multimedia System ◽

Reactive Control ◽

Data Intensive ◽

On Chip ◽

Automatic Code

AbstractSystem adaptivity is increasingly demanded in high-performance embedded systems, particularly in multimedia system-on-chip (SoC), owing to growing quality-of-service requirements. This paper presents a reactive control model that has been introduced in Gaspard, our framework dedicated to SoC hardware/software co-design. This model aims at expressing adaptivity as well as reconfigurability in systems performing data-intensive computations. It is generic enough to be used for description in the different parts of an embedded system, for example, specification of how different data-intensive algorithms can be chosen according to some computation modes at the functional level; and expression of how hardware components can be selected via the usage of a library of intellectual properties according to execution performances. The transformation of this model toward synchronous languages is also presented, in order to allow an automatic code generation usable for formal verification, based on techniques such as model checking and controller synthesis, as illustrated in the paper. This work, based on Model-Driven Engineering and the standard UML MARTE profile, has been implemented in Gaspard.

Download Full-text

Effective Machine-Learning Assembly For Next-Generation Sequencing With Very Low Coverage

10.1101/393116 ◽

2018 ◽

Author(s):

Louis Ranjard ◽

Thomas K. F. Wong ◽

Allen G. Rodrigo

Keyword(s):

Dynamic Programming Algorithm ◽

Reference Sequence ◽

Programming Algorithm ◽

Original Sequence ◽

Insertions And Deletions ◽

Limiting Error ◽

Grey Kangaroo ◽

Western Grey Kangaroo ◽

Sequence Reconstruction ◽

Low Coverage

ABSTRACTIn short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. Here, we introduce a dynamic programming algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Our method allows us to assemble the first full mitochondrial genome for the western-grey kangaroo. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences.

Download Full-text

MethPanel: a parallel pipeline and interactive analysis tool for multiplex bisulphite PCR sequencing to assess DNA methylation biomarker panels for disease detection

10.1101/2020.02.09.941013 ◽

2020 ◽

Author(s):

Phuc-Loi Luu ◽

Phuc-Thinh Ong ◽

Tran Thai Huu Loc ◽

Dilys Lam ◽

Ruth Pidsley ◽

...

Keyword(s):

Dna Methylation ◽

High Performance ◽

Complete Analysis ◽

Disease Detection ◽

Analysis Tool ◽

Clinical Implementation ◽

Sequencing Data ◽

Fresh Tissue ◽

Interactive Analysis ◽

Parallel Pipeline

AbstractBackgroundMultiplex bisulphite PCR sequencing is a convenient and scalable method to comprehensively profile DNA methylation at selected loci. The method is useful for validation of methylation biomarker panels on large clinical cohorts, as it can be applied to DNA isolated from fresh tissue, archival formalin fixed paraffin embedded tissue (FFPET) or circulating cell free DNA in plasma. However, successful clinical implementation of DNA methylation biomarkers for disease detection using multiplex bisulphite PCR sequencing, requires user-friendly sample analysis methods and a diversity of visualisation options, which are not met by current tools.ResultsWe have developed a computational pipeline with an interactive graphical interface, called MethPanel, in order to rapidly analyse multiplex bisulphite PCR sequencing data. MethPanel comprises a complete analysis workflow from genomic alignment to DNA methylation calling and supports an unlimited number of PCR amplicons and input samples. Moreover, MethPanel offers important and unique features, such as calculation of a polymorphism score and bisulphite PCR bias correction capabilities. MethPanel is designed so that the methylation data from all samples can be run in parallel on either a personal computer or a high performance computer. The outputs are also automatically forwarded to a shinyApp for convenient display, visualisation and sharing data with collaborators and clinicians. Importantly the data is centralised in one location, which aids storage management.Availability and ImplementationMethPanel is freely available at https://github.com/thinhong/MethPanelConclusionMethPanel provides a novel parallel pipeline and interactive analysis tool for multiplex bisulphite PCR sequencing to assess DNA methylation marker panels for disease detection.

Download Full-text

Implementation of AMBA Based AHB2APB Bridge

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.e6908.038620 ◽

2020 ◽

Vol 8 (6) ◽

pp. 1033-1037

Keyword(s):

Embedded System ◽

High Speed ◽

High Performance ◽

Data Loss ◽

Verilog Hdl ◽

Main Target ◽

Functional Blocks ◽

Timing Simulation ◽

On Chip ◽

Bus Architecture

The Advance Micro controller Bus Architecture bus protocol is used to build high performance SoC designs (system on chip). This achieves communication through the connection of different functional blocks ( or IP ). By using multiple controllers and peripherals, it makes possible to develop multiprocessor unit. It provides reusability of IP of different buses of AMBA, which can reduce the communication gap between high performance buses and low speed buses. To perform high-speed pipelined data transfers, AMBA based embedded system becomes a demanding hypothesis analytical wise, by using different bus signals supported by AMBA. To synthesize as well as simulate the composite annexation which connects advance high performance bus and advance peripheral bus which known as AHB2APB Bridge in addition to no data loss during transfer is the main target of this work. Implementation of bridge module is designed in Verilog HDL and functional and timing simulation of bridge module are done on a platform of Xilinx.

Download Full-text

Inkjet-printed point-of-care immunoassay on a nanoscale polymer brush enables subpicomolar detection of analytes in blood

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1703200114 ◽

2017 ◽

Vol 114 (34) ◽

pp. E7054-E7062 ◽

Cited By ~ 41

Author(s):

Daniel Y. Joh ◽

Angus M. Hucknall ◽

Qingshan Wei ◽

Kelly A. Mason ◽

Margaret L. Lund ◽

...

Keyword(s):

Polymer Brushes ◽

High Performance ◽

Point Of Care ◽

Polymer Brush ◽

Quantitative Detection ◽

Figures Of Merit ◽

Point Of Care Test ◽

Resource Limited ◽

Multiple Analytes ◽

On Chip

The ELISA is the mainstay for sensitive and quantitative detection of protein analytes. Despite its utility, ELISA is time-consuming, resource-intensive, and infrastructure-dependent, limiting its availability in resource-limited regions. Here, we describe a self-contained immunoassay platform (the “D4 assay”) that converts the sandwich immunoassay into a point-of-care test (POCT). The D4 assay is fabricated by inkjet printing assay reagents as microarrays on nanoscale polymer brushes on glass chips, so that all reagents are “on-chip,” and these chips show durable storage stability without cold storage. The D4 assay can interrogate multiple analytes from a drop of blood, is compatible with a smartphone detector, and displays analytical figures of merit that are comparable to standard laboratory-based ELISA in whole blood. These attributes of the D4 POCT have the potential to democratize access to high-performance immunoassays in resource-limited settings without sacrificing their performance.

Download Full-text

Multiple-Choice Hardware/Software Partitioning for Tree Task-Graph on MPSoC

The Computer Journal ◽

10.1093/comjnl/bxy140 ◽

2019 ◽

Vol 63 (5) ◽

pp. 688-700 ◽

Cited By ~ 1

Author(s):

Wenjun Shi ◽

Jigang Wu ◽

Guiyuan Jiang ◽

Siew-kei Lam

Keyword(s):

Dynamic Programming ◽

Tabu Search ◽

Embedded System ◽

Exact Solutions ◽

Heuristic Algorithm ◽

Search Algorithm ◽

Dynamic Programming Algorithm ◽

Programming Algorithm ◽

Task Graph ◽

Tabu Search Algorithm

Abstract Hardware/software (HW/SW) partitioning, that decides which components of an application are implemented in hardware and which ones in software, is a crucial step in embedded system design. On modern heterogeneous embedded system platform, each component of application can typically have multiple feasible configurations/implementations, trading off quality aspects (e.g. energy consumption, completion time) with usage for various types of resources. This provides new opportunities for further improving the overall system performance, but few works explore the potential opportunity by incorporating the multiple choices of hardware implementation in the partitioning process. This paper proposes three algorithms for multiple-choice HW/SW partitioning of tree-shape task graph on multiple processors system on chip (MPSoC) with the objective of minimizing execution time, while meeting area constraint. Firstly, an efficient heuristic algorithm is proposed to rapidly generate an approximate solution. The obtained solution produced by the first algorithm is then further refined by a customized Tabu search algorithm. We also propose a dynamic programming algorithm to calculate the exact solutions for relatively smaller scale instances. Simulation results show that the proposed heuristic algorithm is able to quickly generate good approximate solutions, and the solutions become very close to the exact solutions after refined by the proposed Tabu search algorithm, in comparison to the exact solutions produced by the dynamic programming algorithm.

Download Full-text

In-Situ Integration of 3D C-MEMS Microelectrodes with Bipolar Exfoliated Graphene for Label-Free Electrochemical Cancer Biomarkers Aptasensor

Micromachines ◽

10.3390/mi13010104 ◽

2022 ◽

Vol 13 (1) ◽

pp. 104

Author(s):

Shahrzad Forouzanfar ◽

Nezih Pala ◽

Chunlei Wang

Keyword(s):

High Performance ◽

Microelectromechanical Systems ◽

Point Of Care ◽

High Sensitivity ◽

Label Free ◽

Lab On Chip ◽

Electrochemical Label ◽

Mems Technology ◽

On Chip

The electrochemical label-free aptamer-based biosensors (also known as aptasensors) are highly suitable for point-of-care applications. The well-established C-MEMS (carbon microelectromechanical systems) platforms have distinguishing features which are highly suitable for biosensing applications such as low background noise, high capacitance, high stability when exposed to different physical/chemical treatments, biocompatibility, and good electrical conductivity. This study investigates the integration of bipolar exfoliated (BPE) reduced graphene oxide (rGO) with 3D C-MEMS microelectrodes for developing PDGF-BB (platelet-derived growth factor-BB) label-free aptasensors. A simple setup has been used for exfoliation, reduction, and deposition of rGO on the 3D C-MEMS microelectrodes based on the principle of bipolar electrochemistry of graphite in deionized water. The electrochemical bipolar exfoliation of rGO resolves the drawbacks of commonly applied methods for synthesis and deposition of rGO, such as requiring complicated and costly processes, excessive use of harsh chemicals, and complex subsequent deposition procedures. The PDGF-BB affinity aptamers were covalently immobilized by binding amino-tag terminated aptamers and rGO surfaces. The turn-off sensing strategy was implemented by measuring the areal capacitance from CV plots. The aptasensor showed a wide linear range of 1 pM–10 nM, high sensitivity of 3.09 mF cm−2 Logc−1 (unit of c, pM), and a low detection limit of 0.75 pM. This study demonstrated the successful and novel in-situ deposition of BPE-rGO on 3D C-MEMS microelectrodes. Considering the BPE technique’s simplicity and efficiency, along with the high potential of C-MEMS technology, this novel procedure is highly promising for developing high-performance graphene-based viable lab-on-chip and point-of-care cancer diagnosis technologies.

Download Full-text