A next generation sequence processing and analysis platform with integrated cloud-storage and high performance computing resources

There is a tradition at our university for teaching and research in High Performance Computing (HPC) systems engineering. With exascale computing on the horizon and a shortage of HPC talent, there is a need for new specialists to secure the future of research computing. Whilst many institutions provide research computing training for users within their particular domain, few offer HPC engineering and infrastructure-related courses, making it difficult for students to acquire these skills. This paper outlines how and why we are training students in HPC systems engineering, including the technologies used in delivering this goal. We demonstrate the potential for a multi-tenant HPC system for education and research, using novel container and cloud-based architecture. This work is supported by our previously published work that uses the latest open-source technologies to create sustainable, fast and flexible turn-key HPC environments with secure access via an HPC portal. The proposed multi-tenant HPC resources can be deployed on a “bare metal” infrastructure or in the cloud. An evaluation of our activities over the last five years is given in terms of recruitment metrics, skills audit feedback from students, and research outputs enabled by the multi-tenant usage of the resource.

Download Full-text

Next-generation sequencing: big data meets high performance computing

Drug Discovery Today ◽

10.1016/j.drudis.2017.01.014 ◽

2017 ◽

Vol 22 (4) ◽

pp. 712-717 ◽

Cited By ~ 37

Author(s):

Bertil Schmidt ◽

Andreas Hildebrandt

Keyword(s):

Big Data ◽

Next Generation Sequencing ◽

High Performance Computing ◽

High Performance ◽

Next Generation ◽

Performance Computing ◽

Generation Sequencing

Download Full-text

Building the next generation of high performance computing researchers in engineering and science: the NCSA/ARL MSRC PET summer internship program

30th Annual Frontiers in Education Conference. Building on A Century of Progress in Engineering Education. Conference Proceedings (IEEE Cat. No.00CH37135) ◽

10.1109/fie.2000.896556 ◽

2002 ◽

Author(s):

M.B. Walker ◽

E.C. Grove ◽

V.A. To

Keyword(s):

High Performance Computing ◽

High Performance ◽

Next Generation ◽

Performance Computing

Download Full-text

Revisiting reorder buffer architecture for next generation high performance computing

The Journal of Supercomputing ◽

10.1007/s11227-011-0734-x ◽

2012 ◽

Vol 65 (2) ◽

pp. 484-495

Author(s):

Min Choi ◽

Jong Hyuk Park ◽

Young-Sik Jeong

Keyword(s):

High Performance Computing ◽

High Performance ◽

Next Generation ◽

Performance Computing

Download Full-text

Heterogeneous high performance computing modules for next generation onboard processing

2017 IEEE Aerospace Conference ◽

10.1109/aero.2017.7943645 ◽

2017 ◽

Cited By ~ 2

Author(s):

Joseph Marshall ◽

Dale Rickard ◽

Danielle Sova ◽

Hubert Miller ◽

Robert Lapihuska ◽

...

Keyword(s):

High Performance Computing ◽

High Performance ◽

Next Generation ◽

Performance Computing

Download Full-text

The Impact of High-Performance Computing Best Practice Applied to Next-Generation Sequencing Workflows

10.1101/017665 ◽

2015 ◽

Cited By ~ 3

Author(s):

Pierre Carrier ◽

Bill Long ◽

Richard Walsh ◽

Jef Dawson ◽

Carlos P. Sosa ◽

...

Keyword(s):

Next Generation Sequencing ◽

High Performance Computing ◽

High Performance ◽

Best Practice ◽

Distributed Memory ◽

Rna Seq ◽

Next Generation ◽

The Impact ◽

Performance Computing ◽

Generation Sequencing

High Performance Computing (HPC) Best Practice offers opportunities to implement lessons learned in areas such as computational chemistry and physics in genomics workflows, specifically Next-Generation Sequencing (NGS) workflows. In this study we will briefly describe how distributed-memory parallelism can be an important enhancement to the performance and resource utilization of NGS workflows. We will illustrate this point by showing results on the parallelization of the Inchworm module of the Trinity RNA-Seq pipeline for de novo transcriptome assembly. We show that these types of applications can scale to thousands of cores. Time scaling as well as memory scaling will be discussed at length using two RNA-Seq datasets, targeting the Mus musculus (mouse) and the Axolotl (Mexican salamander). Details about the efficient MPI communication and the impact on performance will also be shown. We hope to demonstrate that this type of parallelization approach can be extended to most types of bioinformatics workflows, with substantial benefits. The efficient, distributed-memory parallel implementation eliminates memory bottlenecks and dramatically accelerates NGS analysis. We further include a summary of programming paradigms available to the bioinformatics community, such as C++/MPI.

Download Full-text

Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

EPJ Web of Conferences ◽

10.1051/epjconf/202024509011 ◽

2020 ◽

Vol 245 ◽

pp. 09011

Author(s):

Michael Hildreth ◽

Kenyi Paolo Hurtado Anampa ◽

Cody Kankel ◽

Scott Hampton ◽

Paul Brenner ◽

...

Keyword(s):

Artificial Intelligence ◽

Data Analysis ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Starting Point ◽

Virtual Clusters ◽

Analysis Platform ◽

Performance Computing

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.

Download Full-text