PipeMEM: A Framework to Speed Up BWA-MEM in Spark with Low Overhead

(1) Background: DNA sequence alignment process is an essential step in genome analysis. BWA-MEM has been a prevalent single-node tool in genome alignment because of its high speed and accuracy. The exponentially generated genome data requiring a multi-node solution to handle large volumes of data currently remains a challenge. Spark is a ubiquitous big data platform that has been exploited to assist genome alignment in handling this challenge. Nonetheless, existing works that utilize Spark to optimize BWA-MEM suffer from higher overhead. (2) Methods: In this paper, we presented PipeMEM, a framework to accelerate BWA-MEM with lower overhead with the help of the pipe operation in Spark. We additionally proposed to use a pipeline structure and in-memory-computation to accelerate PipeMEM. (3) Results: Our experiments showed that, on paired-end alignment tasks, our framework had low overhead. In a multi-node environment, our framework, on average, was 2.27× faster compared with BWASpark (an alignment tool in Genome Analysis Toolkit (GATK)), and 2.33× faster compared with SparkBWA. (4) Conclusions: PipeMEM could accelerate BWA-MEM in the Spark environment with high performance and low overhead.

Download Full-text

Workstation benchmark of Spark Capable Genome Analysis ToolKit 4 Variant Calling

10.1101/2020.05.17.101105 ◽

2020 ◽

Author(s):

Marcus H. Hansen ◽

Anita T. Simonsen ◽

Hans B. Ommen ◽

Charlotte G. Nyvold

Keyword(s):

Dna Sequencing ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Variant Calling ◽

Amplicon Sequencing ◽

Targeted Sequencing ◽

Sequencing Analysis ◽

Genome Analysis Toolkit ◽

Order Of Magnitude

AbstractBackgroundRapid and practical DNA-sequencing processing has become essential for modern biomedical laboratories, especially in the field of cancer, pathology and genetics. While sequencing turn-over time has been, and still is, a bottleneck in research and diagnostics, the field of bioinformatics is moving at a rapid pace – both in terms of hardware and software development. Here, we benchmarked the local performance of three of the most important Spark-enabled Genome analysis toolkit 4 (GATK4) tools in a targeted sequencing workflow: Duplicate marking, base quality score recalibration (BQSR) and variant calling on targeted DNA sequencing using a modest hyperthreading 12-core single CPU and a high-speed PCI express solid-state drive.ResultsCompared to the previous GATK version the performance of Spark-enabled BQSR and HaplotypeCaller is shifted towards a more efficient usage of the available cores on CPU and outperforms the earlier GATK3.8 version with an order of magnitude reduction in processing time to analysis ready variants, whereas MarkDuplicateSpark was found to be thrice as fast. Furthermore, HaploTypeCallerSpark and BQSRPipelineSpark were significantly faster than the equivalent GATK4 standard tools with a combined ∼86% reduction in execution time, reaching a median rate of ten million processed bases per second, and duplicate marking was reduced ∼42%. The called variants were found to be in close agreement between the Spark and non-Spark versions, with an overall concordance of 98%. In this setup, the tools were also highly efficient when compared execution on a small 72 virtual CPU/18-node Google Cloud cluster.ConclusionIn conclusion, GATK4 offers practical parallelization possibilities for DNA sequence processing, and the Spark-enabled tools optimize performance and utilization of local CPUs. Spark utilizing GATK variant calling is several times faster than previous GATK3.8 multithreading with the same multi-core, single CPU, configuration. The improved opportunities for parallel computations not only hold implications for high-performance cluster, but also for modest laboratory or research workstations for targeted sequencing analysis, such as exome, panel or amplicon sequencing.

Download Full-text

Big Data Analytics Framework for Real-Time Genome Analysis: A Comprehensive Approach

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2019.8302 ◽

2019 ◽

Vol 16 (8) ◽

pp. 3419-3427

Author(s):

Shishir K. Shandilya ◽

S. Sountharrajan ◽

Smita Shandilya ◽

E. Suganya

Keyword(s):

Big Data ◽

Real Time ◽

Genome Analysis ◽

High Speed ◽

High Performance ◽

Big Data Analytics ◽

Good Precision ◽

Entire Genome ◽

Big Data Technologies ◽

Sequencing Platforms

Big Data Technologies are well-accepted in the recent years in bio-medical and genome informatics. They are capable to process gigantic and heterogeneous genome information with good precision and recall. With the quick advancements in computation and storage technologies, the cost of acquiring and processing the genomic data has decreased significantly. The upcoming sequencing platforms will produce vast amount of data, which will imperatively require high-performance systems for on-demand analysis with time-bound efficiency. Recent bio-informatics tools are capable of utilizing the novel features of Hadoop in a more flexible way. In particular, big data technologies such as MapReduce and Hive are able to provide high-speed computational environment for the analysis of petabyte scale datasets. This has attracted the focus of bio-scientists to use the big data applications to automate the entire genome analysis. The proposed framework is designed over MapReduce and Java on extended Hadoop platform to achieve the parallelism of Big Data Analysis. It will assist the bioinformatics community by providing a comprehensive solution for Descriptive, Comparative, Exploratory, Inferential, Predictive and Causal Analysis on Genome data. The proposed framework is user-friendly, fully-customizable, scalable and fit for comprehensive real-time genome analysis from data acquisition till predictive sequence analysis.

Download Full-text

INSIDE INDUSTRY

Asia-Pacific Biotech News ◽

10.1142/s0219030316000434 ◽

2016 ◽

Vol 20 (06) ◽

pp. 45-53

Keyword(s):

Genome Analysis ◽

Liquid Biopsy ◽

High Performance ◽

Compressed Air ◽

Opioid Overdose ◽

Medical Group ◽

Private Healthcare ◽

Broad Institute ◽

Genome Analysis Toolkit ◽

Unit Dose

APTAR PHARMA Provides Unit-Dose Nasal Spray Technology for Treatment of Opioid Overdose Cloudera, Broad Institute Collaborate on the Next Generation of the Genome Analysis Toolkit Singapore-based Luye Medical Group Completes Acquisition of Healthe Care, Australia's Third Largest Private Healthcare Group FEI Launches Apreo – Industry-Leading Versatile, High-Performance SEM BOGE Publishes New Guide on Specifying Compressed Air for Healthcare Takara Bio USA, Inc. and Integrated DNA Technologies Announce Collaboration to Support Targeted RNA Sequencing Pelican BioThermal Announces Launch of New Asia Headquarters in Singapore A Faster Way to Separate Proteins with Electrophoresis Biosensors Announces Strategic Agreement with Cardinal Health BGI and Clearbridge BioMedics Partner to Develop China CTC Liquid Biopsy Market towards Precision Medicine

Download Full-text

CloudGT: A High Performance Genome Analysis Toolkit Leveraging Pipeline Optimization on Spark

2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2018.8621495 ◽

2018 ◽

Author(s):

Anghong Xiao ◽

Shoubin Dong ◽

Cheng Liu ◽

Lingqi Zhang ◽

Zongze Wu

Keyword(s):

Genome Analysis ◽

High Performance ◽

Genome Analysis Toolkit

Download Full-text

High performance of a GPU-accelerated variant calling tool in genome data analysis

10.1101/2021.12.12.472266 ◽

2021 ◽

Author(s):

Qian Zhang ◽

Hao Liu ◽

Fengxiao Bu

Keyword(s):

High Performance ◽

Genetic Diagnosis ◽

Variant Calling ◽

Hardware Acceleration ◽

Sequencing Data ◽

Standard Samples ◽

Genome Data ◽

Genome Analysis Toolkit ◽

Golden Standard ◽

Genome Data Analysis

Rapid advances in next-generation sequencing (NGS) have facilitated ultralarge population and cohort studies that utilized whole-genome sequencing (WGS) to identify DNA variants that may impact gene function. Massive sequencing data require highly efficient bioinformatics tools to complete read alignment and variant calling as the fundamental analysis. Multiple software and hardware acceleration strategies have been developed to boost the analysis speed. This study comprehensively evaluated the germline variant calling of a GPU-based acceleration tool, BaseNumber, using WGS datasets from several sources, including gold-standard samples from the Genome in a Bottle (GIAB) project and the Golden Standard of China Genome (GSCG) project, resequenced GSCG samples, and 100 in-house samples from the China Deafness Genetics Consortium (CDGC) project. Sequencing data were analyzed on the GPU server using BaseNumber, the variant calling outputs of which were compared to the reference VCF or the results generated by the Burrows-Wheeler Aligner (BWA) + Genome Analysis Toolkit (GATK) pipeline on a generic CPU server. BaseNumber demonstrated high precision (99.32%) and recall (99.86%) rates in variant calls compared to the standard reference. The variant calling outputs of the BaseNumber and GATK pipelines were very similar, with a mean F1 of 99.69%. Additionally, BaseNumber took only 23 minutes on average to analyze a 48X WGS sample, which was 215.33 times shorter than the GATK workflow. The GPU-based BaseNumber provides a highly accurate and ultrafast variant calling capability, significantly improving the WGS analysis efficiency and facilitating time-sensitive tests, such as clinical WGS genetic diagnosis, and sheds light on the GPU-based acceleration of other omics data analyses.

Download Full-text

Vacuum System to Minimize the Specimen Contamination of High-Performance EM

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100077967 ◽

1977 ◽

Vol 35 ◽

pp. 68-69

Author(s):

N. Yoshimura ◽

K. Shirota ◽

T. Etoh

Keyword(s):

Electron Microscope ◽

High Speed ◽

High Performance ◽

High Vacuum ◽

Vacuum System ◽

Pump System ◽

Pumping System ◽

Diffusion Pump ◽

Almost All ◽

Cascade Type

One of the most important requirements for a high-performance EM, especially an analytical EM using a fine beam probe, is to prevent specimen contamination by providing a clean high vacuum in the vicinity of the specimen. However, in almost all commercial EMs, the pressure in the vicinity of the specimen under observation is usually more than ten times higher than the pressure measured at the punping line. The EM column inevitably requires the use of greased Viton O-rings for fine movement, and specimens and films need to be exchanged frequently and several attachments may also be exchanged. For these reasons, a high speed pumping system, as well as a clean vacuum system, is now required. A newly developed electron microscope, the JEM-100CX features clean high vacuum in the vicinity of the specimen, realized by the use of a CASCADE type diffusion pump system which has been essentially improved over its predeces- sorD employed on the JEM-100C.

Download Full-text

PHAX-SCAN: Functional integration of a Scanning Electron Microscope and an energy-dispersive x-ray analyser

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100152252 ◽

1989 ◽

Vol 47 ◽

pp. 56-57

Author(s):

Marc H. Peeters ◽

Max T. Otten

Keyword(s):

Electron Microscope ◽

Scanning Electron Microscope ◽

High Speed ◽

High Performance ◽

Functional Integration ◽

Energy Dispersive ◽

X Rays ◽

X Ray ◽

High Speed Analysis ◽

Scanning Electron

Over the past decades, the combination of energy-dispersive analysis of X-rays and scanning electron microscopy has proved to be a powerful tool for fast and reliable elemental characterization of a large variety of specimens. The technique has evolved rapidly from a purely qualitative characterization method to a reliable quantitative way of analysis. In the last 5 years, an increasing need for automation is observed, whereby energy-dispersive analysers control the beam and stage movement of the scanning electron microscope in order to collect digital X-ray images and perform unattended point analysis over multiple locations.The Philips High-speed Analysis of X-rays system (PHAX-Scan) makes use of the high performance dual-processor structure of the EDAX PV9900 analyser and the databus structure of the Philips series 500 scanning electron microscope to provide a highly automated, user-friendly and extremely fast microanalysis system. The software that runs on the hardware described above was specifically designed to provide the ultimate attainable speed on the system.

Download Full-text

The bright future of digital imaging in scanning electron microscopy

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100149672 ◽

1993 ◽

Vol 51 ◽

pp. 768-769

Author(s):

M. T. Postek ◽

A. E. Vladar

Keyword(s):

Electron Microscopy ◽

Scanning Electron Microscopy ◽

Digital Imaging ◽

High Speed ◽

High Performance ◽

Mass Storage ◽

Analog To Digital ◽

Central Processing ◽

Digital Imaging Technology ◽

Scanning Electron

One of the major advancements applied to scanning electron microscopy (SEM) during the past 10 years has been the development and application of digital imaging technology. Advancements in technology, notably the availability of less expensive, high-density memory chips and the development of high speed analog-to-digital converters, mass storage and high performance central processing units have fostered this revolution. Today, most modern SEM instruments have digital electronics as a standard feature. These instruments, generally have 8 bit or 256 gray levels with, at least, 512 × 512 pixel density operating at TV rate. In addition, current slow-scan commercial frame-grabber cards, directly applicable to the SEM, can have upwards of 12-14 bit lateral resolution permitting image acquisition at 4096 × 4096 resolution or greater. The two major categories of SEM systems to which digital technology have been applied are:In the analog SEM system the scan generator is normally operated in an analog manner and the image is displayed in an analog or "slow scan" mode.

Download Full-text

Preparative Isolation of 5 Antioxidant Constituents from the Medicinal Mushroom Phellinus baumii (Agaricomycetes) by High-Speed Countercurrent Chromatography and Preparative High-Performance Liquid Chromatography

International Journal of Medicinal Mushrooms ◽

10.1615/intjmedmushrooms.v19.i4.20 ◽

2017 ◽

Vol 19 (4) ◽

pp. 319-326 ◽

Cited By ~ 1

Author(s):

Libin Ye ◽

Hongjian Zheng ◽

Zhong Zhang ◽

Yan Yang

Keyword(s):

High Performance Liquid Chromatography ◽

Liquid Chromatography ◽

High Speed ◽

High Performance ◽

Countercurrent Chromatography ◽

Medicinal Mushroom ◽

Preparative Isolation ◽

Phellinus Baumii

Download Full-text

Performance Analysis of Various Multipliers Using 8T-full Adder with 180nm Technology

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2352096513666200107091932 ◽

2020 ◽

Vol 13 (6) ◽

pp. 864-870

Author(s):

Sai Venkatramana Prasada G.S ◽

G. Seshikala ◽

S. Niranjana

Keyword(s):

Low Power ◽

Power Dissipation ◽

High Speed ◽

High Performance ◽

Full Adder ◽

Fundamental Operation ◽

Wallace Tree ◽

Power Delay Product ◽

The Comparative Study ◽

Wallace Tree Multiplier

Background: This paper presents the comparative study of power dissipation, delay and power delay product (PDP) of different full adders and multiplier designs. Methods: Full adder is the fundamental operation for any processors, DSP architectures and VLSI systems. Here ten different full adder structures were analyzed for their best performance using a Mentor Graphics tool with 180nm technology. Results: From the analysis result high performance full adder is extracted for further higher level designs. 8T full adder exhibits high speed, low power delay and low power delay product and hence it is considered to construct four different multiplier designs, such as Array multiplier, Baugh Wooley multiplier, Braun multiplier and Wallace Tree multiplier. These different structures of multipliers were designed using 8T full adder and simulated using Mentor Graphics tool in a constant W/L aspect ratio. Conclusion: From the analysis, it is concluded that Wallace Tree multiplier is the high speed multiplier but dissipates comparatively high power. Baugh Wooley multiplier dissipates less power but exhibits more time delay and low PDP.

Download Full-text