Xi-cam: a versatile interface for data visualization and analysis

Xi-cam is an extensible platform for data management, analysis and visualization. Xi-cam aims to provide a flexible and extensible approach to synchrotron data treatment as a solution to rising demands for high-volume/high-throughput processing pipelines. The core of Xi-cam is an extensible plugin-based graphical user interface platform which provides users with an interactive interface to processing algorithms. Plugins are available for SAXS/WAXS/GISAXS/GIWAXS, tomography and NEXAFS data. With Xi-cam's `advanced' mode, data processing steps are designed as a graph-based workflow, which can be executed live, locally or remotely. Remote execution utilizes high-performance computing or de-localized resources, allowing for the effective reduction of high-throughput data. Xi-cam's plugin-based architecture targets cross-facility and cross-technique collaborative development, in support of multi-modal analysis. Xi-cam is open-source and cross-platform, and available for download on GitHub.

Download Full-text

High-performance computing service for bioinformatics and data science

Journal of the Medical Library Association JMLA ◽

10.5195/jmla.2018.512 ◽

2018 ◽

Vol 106 (4) ◽

Author(s):

Jean-Paul Courneya ◽

Alexa Mayo

Keyword(s):

Open Source ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Data Science ◽

Wet Work ◽

High Throughput Data ◽

Guided Learning ◽

Computational Resources ◽

Performance Computing

Despite having an ideal setup in their labs for wet work, researchers often lack the computational infrastructure to analyze the magnitude of data that result from “-omics” experiments. In this innovative project, the library supports analysis of high-throughput data from global molecular profiling experiments by offering a high-performance computer with open source software along with expert bioinformationist support. The audience for this new service is faculty, staff, and students for whom using the university’s large scale, CORE computational resources is not warranted because these resources exceed the needs of smaller projects. In the library’s approach, users are empowered to analyze high-throughput data that they otherwise would not be able to on their own computers. To develop the project, the library’s bioinformationist identified the ideal computing hardware and a group of open source bioinformatics software to provide analysis options for experimental data such as scientific images, sequence reads, and flow cytometry files. To close the loop between learning and practice, the bioinformationist developed self-guided learning materials and workshops or consultations on topics such as the National Center for Biotechnology Information’s BLAST, Bioinformatics on the Cloud, and ImageJ. Researchers apply the data analysis techniques that they learned in the classroom in an ideal computing environment.

Download Full-text

Using Managed High Performance Computing Systems for High-Throughput Computing

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_4 ◽

2016 ◽

pp. 61-79 ◽

Cited By ~ 2

Author(s):

Lucas A. Wilson

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Computing Systems ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

High-Performance Computing In High-Throughput Sequencing

Biological Knowledge Discovery Handbook ◽

10.1002/9781118617151.ch43 ◽

2013 ◽

pp. 981-1002 ◽

Cited By ~ 1

Author(s):

Kamer Kaya ◽

Ayat Hatem ◽

Hatice Gülçin Özer ◽

Kun Huang ◽

Ümit V. Çatalyürek

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Sequencing ◽

Performance Computing

Download Full-text

An Introduction to Big Data, High Performance Computing, High-Throughput Computing, and Hadoop

Conquering Big Data with High Performance Computing ◽

10.1007/978-3-319-33742-5_1 ◽

2016 ◽

pp. 1-12 ◽

Cited By ~ 1

Author(s):

Ritu Arora

Keyword(s):

Big Data ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

High Throughput Computing ◽

Performance Computing

Download Full-text

Improvements of common open Grid standards to increase High Throughput and High Performance Computing effectiveness on large-scale Grid and e-science infrastructures

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) ◽

10.1109/ipdpsw.2010.5470916 ◽

2010 ◽

Cited By ~ 2

Author(s):

M. Riedel ◽

M.S. Memon ◽

A.S. Memon ◽

A. Streit ◽

F. Wolf ◽

...

Keyword(s):

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Large Scale Grid ◽

Scale Grid ◽

Performance Computing

Download Full-text

Bioinformatics Clouds for High-Throughput Technologies

Advances in Data Mining and Database Management - Handbook of Research on Cloud Infrastructures for Big Data Analytics ◽

10.4018/978-1-4666-5864-6.ch020 ◽

2014 ◽

pp. 489-507 ◽

Cited By ~ 2

Author(s):

Claudia Cava ◽

Francesca Gallivanone ◽

Christian Salvatore ◽

Pasquale Anthony Della Rosa ◽

Isabella Castiglioni

Keyword(s):

Cloud Computing ◽

Big Data ◽

High Throughput ◽

High Performance ◽

Cellular Level ◽

High Dimensionality ◽

Computational Approaches ◽

New Information ◽

Wide Scale ◽

Performance Computing

Bioinformatics traditionally deals with computational approaches to the analysis of big data from high-throughput technologies as genomics, proteomics, and sequencing. Bioinformatics analysis allows extraction of new information from big data that might help to better assess the biological details at a molecular and cellular level. The wide-scale and high-dimensionality of Bioinformatics data has led to an increasing need of high performance computing and repository. In this chapter, the authors demonstrate the advantages of cloud computing in Bioinformatics research for high-throughput technologies.

Download Full-text

A High Performance Computing Platform for Performing High-Volume Studies with Windows-based Power Grid Tools

IFAC Proceedings Volumes ◽

10.3182/20140824-6-za-1003.00839 ◽

2014 ◽

Vol 47 (3) ◽

pp. 10772-10777 ◽

Cited By ~ 2

Author(s):

Yousu Chen ◽

Zhenyu Huang

Keyword(s):

High Performance Computing ◽

High Performance ◽

Power Grid ◽

High Volume ◽

Computing Platform ◽

Performance Computing

Download Full-text

MSA: reproducible mutational signature attribution with confidence based on simulations

BMC Bioinformatics ◽

10.1186/s12859-021-04450-8 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Sergey Senkin

Keyword(s):

High Performance Computing ◽

High Performance ◽

De Novo ◽

Least Squares Method ◽

Parametric Bootstrap ◽

Mutational Signatures ◽

Mutational Signature ◽

Cross Platform ◽

Computing Environments ◽

Performance Computing

Abstract Background Mutational signatures proved to be a useful tool for identifying patterns of mutations in genomes, often providing valuable insights about mutagenic processes or normal DNA damage. De novo extraction of signatures is commonly performed using Non-Negative Matrix Factorisation methods, however, accurate attribution of these signatures to individual samples is a distinct problem requiring uncertainty estimation, particularly in noisy scenarios or when the acting signatures have similar shapes. Whilst many packages for signature attribution exist, a few provide accuracy measures, and most are not easily reproducible nor scalable in high-performance computing environments. Results We present Mutational Signature Attribution (MSA), a reproducible pipeline designed to assign signatures of different mutation types on a single-sample basis, using Non-Negative Least Squares method with optimisation based on configurable simulations. Parametric bootstrap is proposed as a way to measure statistical uncertainties of signature attribution. Supported mutation types include single and doublet base substitutions, indels and structural variants. Results are validated using simulations with reference COSMIC signatures, as well as randomly generated signatures. Conclusions MSA is a tool for optimised mutational signature attribution based on simulations, providing confidence intervals using parametric bootstrap. It comprises a set of Python scripts unified in a single Nextflow pipeline with containerisation for cross-platform reproducibility and scalability in high-performance computing environments. The tool is publicly available from https://gitlab.com/s.senkin/MSA.

Download Full-text

NeuroPM toolbox: integrating Molecular, Neuroimaging and Clinical data for Characterizing Neuropathological Progression and Individual Therapeutic Needs

10.1101/2020.09.24.20200964 ◽

2020 ◽

Author(s):

Yasser Iturria-Medina ◽

Felix Carbonell ◽

Atoussa Assadi ◽

Quadri Adewale ◽

Ahmed F. Khan ◽

...

Keyword(s):

Open Access ◽

High Performance ◽

Large Scale ◽

Synergistic Interactions ◽

Academic Researchers ◽

Cross Platform ◽

Therapeutic Needs ◽

User Friendly ◽

Performance Computing

There is a critical need for a better multiscale and multifactorial understanding of neurological disorders, covering from genes to neuroimaging to clinical factors and treatments effects. Here we present NeuroPM-box, a cross-platform, user-friendly and open-access software for characterizing multiscale and multifactorial brain pathological mechanisms and identifying individual therapeutic needs. The implemented methods have been extensively tested and validated in the neurodegenerative context, but there is not restriction in the kind of disorders that can be analyzed. By using advanced analytic modeling of molecular, neuroimaging and/or cognitive/behavioral data, this framework allows multiple applications, including characterization of: (i) the series of sequential states (e.g. transcriptomic, imaging or clinical alterations) covering decades of disease progression, (ii) intra-brain spreading of pathological factors (e.g. amyloid and tau misfolded proteins), (iii) synergistic interactions between multiple brain biological factors (e.g. direct tau effects on vascular and structural properties), and (iv) biologically-defined patients stratification based on therapeutic needs (i.e. optimum treatments for each patient). All models outputs are biologically interpretable. A 4D-viewer allows visualization of spatiotemporal brain (dis)organization. Originally implemented in MATLAB, NeuroPM-box is compiled as standalone application for Windows, Linux and Mac environments: neuropm-lab.com/software. In a regular workstation, it can analyze over 150 subjects per day, reducing the need for using clusters or High-Performance Computing (HPC) for large-scale datasets. This open-access tool for academic researchers may significantly contribute to a better understanding of complex brain processes and to accelerating the implementation of Precision Medicine (PM) in neurology.

Download Full-text

Large-scale HPC deployment of Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN)

EPJ Web of Conferences ◽

10.1051/epjconf/202024509011 ◽

2020 ◽

Vol 245 ◽

pp. 09011

Author(s):

Michael Hildreth ◽

Kenyi Paolo Hurtado Anampa ◽

Cody Kankel ◽

Scott Hampton ◽

Paul Brenner ◽

...

Keyword(s):

Artificial Intelligence ◽

Data Analysis ◽

High Performance Computing ◽

High Throughput ◽

High Performance ◽

Large Scale ◽

Starting Point ◽

Virtual Clusters ◽

Analysis Platform ◽

Performance Computing

The NSF-funded Scalable CyberInfrastructure for Artificial Intelligence and Likelihood Free Inference (SCAILFIN) project aims to develop and deploy artificial intelligence (AI) and likelihood-free inference (LFI) techniques and software using scalable cyberinfrastructure (CI) built on top of existing CI elements. Specifically, the project has extended the CERN-based REANA framework, a cloud-based data analysis platform deployed on top of Kubernetes clusters that was originally designed to enable analysis reusability and reproducibility. REANA is capable of orchestrating extremely complicated multi-step workflows, and uses Kubernetes clusters both for scheduling and distributing container-based workloads across a cluster of available machines, as well as instantiating and monitoring the concrete workloads themselves. This work describes the challenges and development efforts involved in extending REANA and the components that were developed in order to enable large scale deployment on High Performance Computing (HPC) resources. Using the Virtual Clusters for Community Computation (VC3) infrastructure as a starting point, we implemented REANA to work with a number of differing workload managers, including both high performance and high throughput, while simultaneously removing REANA’s dependence on Kubernetes support at the workers level.

Download Full-text