A next generation sequence processing and analysis platform with integrated cloud-storage and high performance computing resources

Author(s):  
Jeremy C. Morgan ◽  
Robert W. Chapman ◽  
Paul E. Anderson
2021 ◽  
Vol 13 (21) ◽  
pp. 11782
Author(s):  
Taha Al-Jody ◽  
Hamza Aagela ◽  
Violeta Holmes

There is a tradition at our university for teaching and research in High Performance Computing (HPC) systems engineering. With exascale computing on the horizon and a shortage of HPC talent, there is a need for new specialists to secure the future of research computing. Whilst many institutions provide research computing training for users within their particular domain, few offer HPC engineering and infrastructure-related courses, making it difficult for students to acquire these skills. This paper outlines how and why we are training students in HPC systems engineering, including the technologies used in delivering this goal. We demonstrate the potential for a multi-tenant HPC system for education and research, using novel container and cloud-based architecture. This work is supported by our previously published work that uses the latest open-source technologies to create sustainable, fast and flexible turn-key HPC environments with secure access via an HPC portal. The proposed multi-tenant HPC resources can be deployed on a “bare metal” infrastructure or in the cloud. An evaluation of our activities over the last five years is given in terms of recruitment metrics, skills audit feedback from students, and research outputs enabled by the multi-tenant usage of the resource.


Author(s):  
Joseph Marshall ◽  
Dale Rickard ◽  
Danielle Sova ◽  
Hubert Miller ◽  
Robert Lapihuska ◽  
...  

2015 ◽  
Author(s):  
Pierre Carrier ◽  
Bill Long ◽  
Richard Walsh ◽  
Jef Dawson ◽  
Carlos P. Sosa ◽  
...  

High Performance Computing (HPC) Best Practice offers opportunities to implement lessons learned in areas such as computational chemistry and physics in genomics workflows, specifically Next-Generation Sequencing (NGS) workflows. In this study we will briefly describe how distributed-memory parallelism can be an important enhancement to the performance and resource utilization of NGS workflows. We will illustrate this point by showing results on the parallelization of the Inchworm module of the Trinity RNA-Seq pipeline for de novo transcriptome assembly. We show that these types of applications can scale to thousands of cores. Time scaling as well as memory scaling will be discussed at length using two RNA-Seq datasets, targeting the Mus musculus (mouse) and the Axolotl (Mexican salamander). Details about the efficient MPI communication and the impact on performance will also be shown. We hope to demonstrate that this type of parallelization approach can be extended to most types of bioinformatics workflows, with substantial benefits. The efficient, distributed-memory parallel implementation eliminates memory bottlenecks and dramatically accelerates NGS analysis. We further include a summary of programming paradigms available to the bioinformatics community, such as C++/MPI.


2020 ◽  
Vol 245 ◽  
pp. 09011
Author(s):  
Michael Hildreth ◽  
Kenyi Paolo Hurtado Anampa ◽  
Cody Kankel ◽  
Scott Hampton ◽  
Paul Brenner ◽  
...  

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.


Sign in / Sign up

Export Citation Format

Share Document