High performance computing for computational biology

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.

Download Full-text

Parallelism in computational biology

The International Journal of High Performance Computing Applications ◽

10.1177/1094342016677599 ◽

2016 ◽

Vol 32 (3) ◽

pp. 317-320 ◽

Cited By ~ 2

Author(s):

Miguel A Vega-Rodríguez ◽

Álvaro Rubio-Largo

Keyword(s):

Computational Biology ◽

High Performance Computing ◽

High Performance ◽

State Of The Art ◽

International Workshop ◽

Special Issue ◽

High Quality ◽

The Third ◽

Points Of View ◽

Performance Computing

Computational biology allows and encourages the application of many different parallelism-based technologies. This special issue brings together high-quality state-of-the-art contributions about parallelism-based technologies in computational biology, from different points of view or perspectives, that is, from diverse high-performance computing applications. The special issue collects considerably extended and improved versions of the best papers, accepted and presented in PBio 2015 (the Third International Workshop on Parallelism in Bioinformatics, and part of IEEE ISPA 2015 ). The domains and topics covered in these seven papers are timely and important, and the authors have done an excellent job of presenting the material.

Download Full-text

CGAT-core: a python framework for building scalable, reproducible computational biology workflows

10.1101/581009 ◽

2019 ◽

Author(s):

Adam Cribbs ◽

Sebastian Luna-Valero ◽

Charlotte George ◽

Ian M Sudbery ◽

Antonio J Berlanga-Taylor ◽

...

Keyword(s):

Computational Biology ◽

High Performance Computing ◽

High Performance ◽

Large Data ◽

Database Integration ◽

Scientific Rigour ◽

Rapid Construction ◽

Rnaseq Data ◽

Performance Computing ◽

Python Package

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.

Download Full-text

CGAT-core: a python framework for building scalable, reproducible computational biology workflows

F1000Research ◽

10.12688/f1000research.18674.2 ◽

2019 ◽

Vol 8 ◽

pp. 377 ◽

Cited By ~ 3

Author(s):

Adam P. Cribbs ◽

Sebastian Luna-Valero ◽

Charlotte George ◽

Ian M. Sudbery ◽

Antonio J. Berlanga-Taylor ◽

...

Keyword(s):

Computational Biology ◽

High Performance Computing ◽

High Performance ◽

Large Data ◽

Database Integration ◽

Scientific Rigour ◽

Rapid Construction ◽

Rnaseq Data ◽

Performance Computing ◽

Python Package

In the genomics era computational biologists regularly need to process, analyse and integrate large and complex biomedical datasets. Analysis inevitably involves multiple dependent steps, resulting in complex pipelines or workflows, often with several branches. Large data volumes mean that processing needs to be quick and efficient and scientific rigour requires that analysis be consistent and fully reproducible. We have developed CGAT-core, a python package for the rapid construction of complex computational workflows. CGAT-core seamlessly handles parallelisation across high performance computing clusters, integration of Conda environments, full parameterisation, database integration and logging. To illustrate our workflow framework, we present a pipeline for the analysis of RNAseq data using pseudo-alignment.

Download Full-text