scholarly journals Cloud-based DIA data analysis module for signal refinement improves accuracy and throughput of large datasets

2021 ◽  
Author(s):  
Karen E. Christianson ◽  
Jacob. D. Jaffe ◽  
Steven A. Carr ◽  
Alvaro Sebastian Vaca Jacome

AbstractData-independent acquisition (DIA) is a powerful mass spectrometry method that promises higher coverage, reproducibility, and throughput than traditional quantitative proteomics approaches. However, the complexity of DIA data caused by fragmentation of co-isolating peptides presents significant challenges for confident assignment of identity and quantity, information that is essential for deriving meaningful biological insight from the data. To overcome this problem, we previously developed Avant-garde, a tool for automated signal refinement of DIA and other targeted mass spectrometry data. AvG is designed to work alongside existing tools for peptide detection to address the reliability and quantitative suitability of signals extracted for the identified peptides. While its use is straightforward and offers efficient refinement for small datasets, the execution of AvG for large DIA datasets is time-consuming, especially if run with limited computational resources. To overcome these limitations, we present here an improved, cloud-based implementation of the AvG algorithm deployed on Terra, a user-friendly cloud-based platform for large-scale data analysis and sharing, as an accessible and standardized resource to the wider community.

2017 ◽  
Author(s):  
Payam Emami Khoonsari ◽  
Pablo Moreno ◽  
Sven Bergmann ◽  
Joachim Burman ◽  
Marco Capuccini ◽  
...  

Developing a robust and performant data analysis workflow that integrates all necessary components whilst still being able to scale over multiple compute nodes is a challenging task. We introduce a generic method based on the microservice architecture, where software tools are encapsulated as Docker containers that can be connected into scientific workflows and executed in parallel using the Kubernetes container orchestrator. The access point is a virtual research environment which can be launched on-demand on cloud resources and desktop computers. IT-expertise requirements on the user side are kept to a minimum, and established workflows can be re-used effortlessly by any novice user. We validate our method in the field of metabolomics on two mass spectrometry studies, one nuclear magnetic resonance spectroscopy study and one fluxomics study, showing that the method scales dynamically with increasing availability of computational resources. We achieved a complete integration of the major software suites resulting in the first turn-key workflow encompassing all steps for mass-spectrometry-based metabolomics including preprocessing, multivariate statistics, and metabolite identification. Microservices is a generic methodology that can serve any scientific discipline and opens up for new types of large-scale integrative science.


2020 ◽  
Author(s):  
Leon Bichmann ◽  
Shubham Gupta ◽  
George Rosenberger ◽  
Leon Kuchenbecker ◽  
Timo Sachsenberg ◽  
...  

ABSTRACTData-independent acquisition (DIA) is becoming a leading analysis method in biomedical mass spectrometry. Main advantages include greater reproducibility, sensitivity and dynamic range compared to data-dependent acquisition (DDA). However, data analysis is complex and often requires expert knowledge when dealing with large-scale data sets. Here we present DIAproteomics a multi-functional, automated high-throughput pipeline implemented in Nextflow that allows to easily process proteomics and peptidomics DIA datasets on diverse compute infrastructures. Central components are well-established tools such as the OpenSwathWorkflow for DIA spectral library search and PyProphet for false discovery rate assessment. In addition, it provides options to generate spectral libraries from existing DDA data and carry out retention time and chromatogram alignment. The output includes annotated tables and diagnostic visualizations from statistical post-processing and computation of fold-changes across pairwise conditions, predefined in an experimental design. DIAproteomics is open-source software and available under a permissive license to the scientific community at https://www.openms.de/diaproteomics/.


1983 ◽  
Vol 38 ◽  
pp. 1-9
Author(s):  
Herbert F. Weisberg

We are now entering a new era of computing in political science. The first era was marked by punched-card technology. Initially, the most sophisticated analyses possible were frequency counts and tables produced on a counter-sorter, a machine that specialized in chewing up data cards. By the early 1960s, batch processing on large mainframe computers became the predominant mode of data analysis, with turnaround time of up to a week. By the late 1960s, turnaround time was cut down to a matter of a few minutes and OSIRIS and then SPSS (and more recently SAS) were developed as general-purpose data analysis packages for the social sciences. Even today, use of these packages in batch mode remains one of the most efficient means of processing large-scale data analysis.


2017 ◽  
Vol 16 (7) ◽  
pp. 2645-2652 ◽  
Author(s):  
Mathieu Courcelles ◽  
Jasmin Coulombe-Huntington ◽  
Émilie Cossette ◽  
Anne-Claude Gingras ◽  
Pierre Thibault ◽  
...  

2021 ◽  
Author(s):  
Scott A. Jarmusch ◽  
Justin J. J. van der Hooft ◽  
Pieter C. Dorrestein ◽  
Alan K. Jarmusch

This review covers the current and potential use of mass spectrometry-based metabolomics data mining in natural products. Public data, metadata, databases and data analysis tools are critical. The value and success of data mining rely on community participation.


Informatica ◽  
2011 ◽  
Vol 22 (1) ◽  
pp. 1-10 ◽  
Author(s):  
Gintautas Dzemyda ◽  
Leonidas Sakalauskas

Sign in / Sign up

Export Citation Format

Share Document