Optimization of data-intensive next generation sequencing in high performance computing

High Performance Computing (HPC) Best Practice offers opportunities to implement lessons learned in areas such as computational chemistry and physics in genomics workflows, specifically Next-Generation Sequencing (NGS) workflows. In this study we will briefly describe how distributed-memory parallelism can be an important enhancement to the performance and resource utilization of NGS workflows. We will illustrate this point by showing results on the parallelization of the Inchworm module of the Trinity RNA-Seq pipeline for de novo transcriptome assembly. We show that these types of applications can scale to thousands of cores. Time scaling as well as memory scaling will be discussed at length using two RNA-Seq datasets, targeting the Mus musculus (mouse) and the Axolotl (Mexican salamander). Details about the efficient MPI communication and the impact on performance will also be shown. We hope to demonstrate that this type of parallelization approach can be extended to most types of bioinformatics workflows, with substantial benefits. The efficient, distributed-memory parallel implementation eliminates memory bottlenecks and dramatically accelerates NGS analysis. We further include a summary of programming paradigms available to the bioinformatics community, such as C++/MPI.

Download Full-text

Leveraging the Power of High Performance Computing for Next Generation Sequencing Data Analysis: Tricks and Twists from a High Throughput Exome Workflow

PLoS ONE ◽

10.1371/journal.pone.0126321 ◽

2015 ◽

Vol 10 (5) ◽

pp. e0126321 ◽

Cited By ~ 29

Author(s):

Amit Kawalia ◽

Susanne Motameny ◽

Stephan Wonczak ◽

Holger Thiele ◽

Lech Nieroda ◽

...

Keyword(s):

Data Analysis ◽

Next Generation Sequencing ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Performance Computing ◽

Generation Sequencing ◽

Sequencing Data Analysis

Download Full-text

Innovative Research and Applications in Next-Generation High Performance Computing

10.4018/978-1-5225-0287-6 ◽

2016 ◽

Cited By ~ 3

Keyword(s):

High Performance Computing ◽

High Performance ◽

Next Generation ◽

Innovative Research ◽

Performance Computing

Download Full-text

Comparison between two different next generation sequencing platforms for clinical relevant gene mutation test in solid tumours

Journal of Clinical Pathology ◽

10.1136/jclinpath-2019-206422 ◽

2020 ◽

Vol 73 (9) ◽

pp. 602-604

Author(s):

Silvia Bessi ◽

Francesco Pepe ◽

Marco Ottaviantonio ◽

Pasquale Pisapia ◽

Umberto Malapelle ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Performance ◽

Solid Tumours ◽

Next Generation ◽

Formalin Fixed Paraffin ◽

Ffpe Samples ◽

Thermo Fisher Scientific ◽

Formalin Fixed Paraffin Embedded ◽

Sequencing Platforms ◽

Generation Sequencing

In the present study, we analysed 44 formalin fixed paraffin embedded (FFPE) from different solid tumours by adopting two different next generation sequencing platforms: GeneReader (QIAGEN, Hilden, Germany) and Ion Torrent (Thermo Fisher Scientific, Waltham, Massachusetts, USA). We highlighted a 100% concordance between the platforms. In addition, focusing on variant detection, we evaluated a very good agreement between the two tests (Cohen’s kappa=0.84) and, when taking into account variant allele fraction value for each variant, a very high concordance was obtained (Pearson’s r=0.94). Our results underlined the high performance rate of GeneReader on FFPE samples and its suitability in routine molecular predictive practice.

Download Full-text

Predictive Resource Management for Next-Generation High-Performance Computing Heterogeneous Platforms

Lecture Notes in Computer Science - Embedded Computer Systems: Architectures, Modeling, and Simulation ◽

10.1007/978-3-030-27562-4_34 ◽

2019 ◽

pp. 470-483 ◽

Cited By ~ 2

Author(s):

Giuseppe Massari ◽

Anna Pupykina ◽

Giovanni Agosta ◽

William Fornaciari

Keyword(s):

Resource Management ◽

High Performance Computing ◽

High Performance ◽

Next Generation ◽

Heterogeneous Platforms ◽

Performance Computing

Download Full-text

Inspiring the Next Generation of HPC Engineers with Reconfigurable, Multi-Tenant Resources for Teaching and Research

Sustainability ◽

10.3390/su132111782 ◽

2021 ◽

Vol 13 (21) ◽

pp. 11782

Author(s):

Taha Al-Jody ◽

Hamza Aagela ◽

Violeta Holmes

Keyword(s):

Open Source ◽

High Performance Computing ◽

Systems Engineering ◽

High Performance ◽

Bare Metal ◽

Next Generation ◽

Teaching And Research ◽

Exascale Computing ◽

Secure Access ◽

Performance Computing

There is a tradition at our university for teaching and research in High Performance Computing (HPC) systems engineering. With exascale computing on the horizon and a shortage of HPC talent, there is a need for new specialists to secure the future of research computing. Whilst many institutions provide research computing training for users within their particular domain, few offer HPC engineering and infrastructure-related courses, making it difficult for students to acquire these skills. This paper outlines how and why we are training students in HPC systems engineering, including the technologies used in delivering this goal. We demonstrate the potential for a multi-tenant HPC system for education and research, using novel container and cloud-based architecture. This work is supported by our previously published work that uses the latest open-source technologies to create sustainable, fast and flexible turn-key HPC environments with secure access via an HPC portal. The proposed multi-tenant HPC resources can be deployed on a “bare metal” infrastructure or in the cloud. An evaluation of our activities over the last five years is given in terms of recruitment metrics, skills audit feedback from students, and research outputs enabled by the multi-tenant usage of the resource.

Download Full-text

A next generation sequence processing and analysis platform with integrated cloud-storage and high performance computing resources

Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB '12 ◽

10.1145/2382936.2383033 ◽

2012 ◽

Author(s):

Jeremy C. Morgan ◽

Robert W. Chapman ◽

Paul E. Anderson

Keyword(s):

High Performance Computing ◽

Cloud Storage ◽

High Performance ◽

Next Generation ◽

Sequence Processing ◽

Analysis Platform ◽

Performance Computing

Download Full-text

On Construction of Cluster and Grid Computing Platforms for Parallel Bioinformatics Applications

Grid and Cloud Computing ◽

10.4018/978-1-4666-0879-5.ch405 ◽

2012 ◽

pp. 841-861

Author(s):

Chao-Tung Yang ◽

Wen-Chung Shih

Keyword(s):

Grid Computing ◽

High Performance Computing ◽

High Speed ◽

High Performance ◽

Cluster Computing ◽

Structural Features ◽

Performance Technology ◽

Data Intensive ◽

Computing Platforms ◽

Performance Computing

Biology databases are diverse and massive. As a result, researchers must compare each sequence with vast numbers of other sequences. Comparison, whether of structural features or protein sequences, is vital in bioinformatics. These activities require high-speed, high-performance computing power to search through and analyze large amounts of data and industrial-strength databases to perform a range of data-intensive computing functions. Grid computing and Cluster computing meet these requirements. Biological data exist in various web services that help biologists search for and extract useful information. The data formats produced are heterogeneous and powerful tools are needed to handle the complex and difficult task of integrating the data. This paper presents a review of the technologies and an approach to solve this problem using cluster and grid computing technologies. The authors implement an experimental distributed computing application for bioinformatics, consisting of basic high-performance computing environments (Grid and PC Cluster systems), multiple interfaces at user portals that provide useful graphical interfaces to enable biologists to benefit directly from the use of high-performance technology, and a translation tool for converting biology data into XML format.

Download Full-text