MS-PyCloud: An open-source, cloud computing-based pipeline for LC-MS/MS data analysis

ABSTRACTRapid development and wide adoption of mass spectrometry-based proteomics technologies have empowered scientists to study proteins and their modifications in complex samples on a large scale. This progress has also created unprecedented challenges for individual labs to store, manage and analyze proteomics data, both in the cost for proprietary software and high-performance computing, and the long processing time that discourages on-the-fly changes of data processing settings required in explorative and discovery analysis. We developed an open-source, cloud computing-based pipeline, MS-PyCloud, with graphical user interface (GUI) support, for LC-MS/MS data analysis. The major components of this pipeline include data file integrity validation, MS/MS database search for spectral assignment, false discovery rate estimation, protein inference, determination of protein post-translation modifications, and quantitation of specific (modified) peptides and proteins. To ensure the transparency and reproducibility of data analysis, MS-PyCloud includes open source software tools with comprehensive testing and versioning for spectrum assignments. Leveraging public cloud computing infrastructure via Amazon Web Services (AWS), MS-PyCloud scales seamlessly based on analysis demand to achieve fast and efficient performance. Application of the pipeline to the analysis of large-scale iTRAQ/TMT LC-MS/MS data sets demonstrated the effectiveness and high performance of MS-PyCloud. The software can be downloaded at: https://bitbucket.org/mschnau/ms-pycloud/downloads/

Download Full-text

Cloud Computing Enabled Big Multi-Omics Data Analytics

Bioinformatics and Biology Insights ◽

10.1177/11779322211035921 ◽

2021 ◽

Vol 15 ◽

pp. 117793222110359

Author(s):

Saraswati Koppad ◽

Annappa B ◽

Georgios V Gkoutos ◽

Animesh Acharjee

Keyword(s):

Cloud Computing ◽

Data Analytics ◽

Large Scale ◽

Low Cost ◽

Data Sets ◽

Omics Data ◽

Proteomics Data ◽

Phenotypic Data ◽

Multifactorial Diseases ◽

Big Data Technologies

High-throughput experiments enable researchers to explore complex multifactorial diseases through large-scale analysis of omics data. Challenges for such high-dimensional data sets include storage, analyses, and sharing. Recent innovations in computational technologies and approaches, especially in cloud computing, offer a promising, low-cost, and highly flexible solution in the bioinformatics domain. Cloud computing is rapidly proving increasingly useful in molecular modeling, omics data analytics (eg, RNA sequencing, metabolomics, or proteomics data sets), and for the integration, analysis, and interpretation of phenotypic data. We review the adoption of advanced cloud-based and big data technologies for processing and analyzing omics data and provide insights into state-of-the-art cloud bioinformatics applications.

Download Full-text

The Open Cloud Testbed: Supporting Open Source Cloud Computing Systems Based on Large Scale High Performance, Dynamic Network Services

Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering - Networks for Grid Applications ◽

10.1007/978-3-642-11733-6_10 ◽

2010 ◽

pp. 89-97

Author(s):

Robert Grossman ◽

Yunhong Gu ◽

Michal Sabala ◽

Colin Bennet ◽

Jonathan Seidman ◽

...

Keyword(s):

Cloud Computing ◽

Open Source ◽

High Performance ◽

Large Scale ◽

Dynamic Network ◽

Network Services ◽

Computing Systems

Download Full-text

Low Cost, Scalable Proteomics Data Analysis Using Amazon’s Cloud Computing Services and Open Source Search Algorithms

Journal of Proteome Research ◽

10.1021/pr800970z ◽

2009 ◽

Vol 8 (6) ◽

pp. 3148-3153 ◽

Cited By ~ 50

Author(s):

Brian D. Halligan ◽

Joey F. Geiger ◽

Andrew K. Vallejos ◽

Andrew S. Greene ◽

Simon N. Twigger

Keyword(s):

Cloud Computing ◽

Data Analysis ◽

Open Source ◽

Low Cost ◽

Search Algorithms ◽

Proteomics Data ◽

Computing Services ◽

Cloud Computing Services ◽

Proteomics Data Analysis

Download Full-text

Cloud-based intelligent self-diagnosis and department recommendation service using Chinese medical BERT

Journal of Cloud Computing Advances Systems and Applications ◽

10.1186/s13677-020-00218-2 ◽

2021 ◽

Vol 10 (1) ◽

Author(s):

Junshu Wang ◽

Guoming Zhang ◽

Wei Wang ◽

Ka Zhang ◽

Yehua Sheng

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Medical Service ◽

Rapid Development ◽

Medical Knowledge ◽

Language Models ◽

Computing Environment ◽

Computing Power ◽

Cloud Computing Environment ◽

Proposed Model

AbstractWith the rapid development of hospital informatization and Internet medical service in recent years, most hospitals have launched online hospital appointment registration systems to remove patient queues and improve the efficiency of medical services. However, most of the patients lack professional medical knowledge and have no idea of how to choose department when registering. To instruct the patients to seek medical care and register effectively, we proposed CIDRS, an intelligent self-diagnosis and department recommendation framework based on Chinese medical Bidirectional Encoder Representations from Transformers (BERT) in the cloud computing environment. We also established a Chinese BERT model (CHMBERT) trained on a large-scale Chinese medical text corpus. This model was used to optimize self-diagnosis and department recommendation tasks. To solve the limited computing power of terminals, we deployed the proposed framework in a cloud computing environment based on container and micro-service technologies. Real-world medical datasets from hospitals were used in the experiments, and results showed that the proposed model was superior to the traditional deep learning models and other pre-trained language models in terms of performance.

Download Full-text

High Performance Computational Analysis of Large-scale Proteome Data Sets to Assess Incremental Contribution to Coverage of the Human Genome

Journal of Proteome Research ◽

10.1021/pr400181q ◽

2013 ◽

Vol 12 (6) ◽

pp. 2858-2868 ◽

Cited By ~ 29

Author(s):

Nadin Neuhauser ◽

Nagarjuna Nagaraj ◽

Peter McHardy ◽

Sara Zanivan ◽

Richard Scheltema ◽

...

Keyword(s):

Human Genome ◽

High Performance ◽

Large Scale ◽

Computational Analysis ◽

Data Sets

Download Full-text

Massive Image Treatment System Based on Cloud Computing Platform

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.687-691.3733 ◽

2014 ◽

Vol 687-691 ◽

pp. 3733-3737

Author(s):

Dan Wu ◽

Ming Quan Zhou ◽

Rong Fang Bie

Keyword(s):

Image Processing ◽

Cloud Computing ◽

High Performance ◽

Large Scale ◽

Processing System ◽

Virtual Space ◽

Image Processing System ◽

Computing Platform ◽

Simulation Calculation ◽

Computer Resources

Massive image processing technology requires high requirements of processor and memory, and it needs to adopt high performance of processor and the large capacity memory. While the single or single core processing and traditional memory can’t satisfy the need of image processing. This paper introduces the cloud computing function into the massive image processing system. Through the cloud computing function it expands the virtual space of the system, saves computer resources and improves the efficiency of image processing. The system processor uses multi-core DSP parallel processor, and develops visualization parameter setting window and output results using VC software settings. Through simulation calculation we get the image processing speed curve and the system image adaptive curve. It provides the technical reference for the design of large-scale image processing system.

Download Full-text

Large Scale Field Development Optimization Using High Performance Parallel Simulation and Cloud Computing Technology

10.2118/191728-ms ◽

2018 ◽

Cited By ~ 4

Author(s):

Shusei Tanaka ◽

Zhenzhen Wang ◽

Kaveh Dehghani ◽

Jincong He ◽

Baskar Velusamy ◽

...

Keyword(s):

Cloud Computing ◽

High Performance ◽

Large Scale ◽

Parallel Simulation ◽

Computing Technology ◽

Field Development ◽

Scale Field ◽

Large Scale Field

Download Full-text

Accelerating Large-Scale Data Analysis by Offloading to High-Performance Computing Libraries using Alchemist

Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3219819.3219927 ◽

2018 ◽

Cited By ~ 2

Author(s):

Alex Gittens ◽

Kai Rothauge ◽

Shusen Wang ◽

Michael W. Mahoney ◽

Lisa Gerhardt ◽

...

Keyword(s):

Data Analysis ◽

High Performance Computing ◽

High Performance ◽

Large Scale ◽

Large Scale Data ◽

Performance Computing ◽

Scale Data

Download Full-text

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Download Full-text

Usage and Scaling of an Open-Source Spiking Multi-Area Model of Monkey Cortex

Lecture Notes in Computer Science - Brain-Inspired Computing ◽

10.1007/978-3-030-82427-3_4 ◽

2021 ◽

pp. 47-59

Author(s):

Sacha J. van Albada ◽

Jari Pronold ◽

Alexander van Meegen ◽

Markus Diesmann

Keyword(s):

Open Source ◽

Large Scale ◽

Network Models ◽

Macaque Monkey ◽

Source Model ◽

Model Specification ◽

Data Sets ◽

Neural Network Models ◽

Wide Range ◽

Ict Infrastructure

AbstractWe are entering an age of ‘big’ computational neuroscience, in which neural network models are increasing in size and in numbers of underlying data sets. Consolidating the zoo of models into large-scale models simultaneously consistent with a wide range of data is only possible through the effort of large teams, which can be spread across multiple research institutions. To ensure that computational neuroscientists can build on each other’s work, it is important to make models publicly available as well-documented code. This chapter describes such an open-source model, which relates the connectivity structure of all vision-related cortical areas of the macaque monkey with their resting-state dynamics. We give a brief overview of how to use the executable model specification, which employs NEST as simulation engine, and show its runtime scaling. The solutions found serve as an example for organizing the workflow of future models from the raw experimental data to the visualization of the results, expose the challenges, and give guidance for the construction of an ICT infrastructure for neuroscience.

Download Full-text