Metabolomics Data Processing and Data Analysis—Current Best Practices

AbstractNuclear Magnetic Resonance (NMR) spectroscopy is, together with liquid chromatography-mass spectrometry (LC-MS), the most established platform to perform metabolomics. In contrast to LC-MS however, NMR data is predominantly being processed with commercial software. This has the effect that its data processing remains tedious and dependent on user interventions. As a follow-up to speaq, a previously released workflow for NMR spectral alignment and quantitation, we present speaq 2.0. This completely revised framework to automatically analyze 1D NMR spectra uses wavelets to efficiently summarize the raw spectra with minimal information loss or user interaction. The tool offers a fast and easy workflow that starts with the common approach of peak-picking, followed by grouping. This yields a matrix consisting of features, samples and peak values that can be conveniently processed either by using included multivariate statistical functions or by using many other recently developed methods for NMR data analysis. speaq 2.0 facilitates robust and high-throughput metabolomics based on 1D NMR but is also compatible with other NMR frameworks or complementary LC-MS workflows. The methods are benchmarked using two publicly available datasets. speaq 2.0 is distributed through the existing speaq R package to provide a complete solution for NMR data processing. The package and the code for the presented case studies are freely available on CRAN (https://cran.r-project.org/package=speaq) and GitHub (https://github.com/beirnaert/speaq).Author summaryWe present speaq 2.0: a user friendly workflow for processing NMR spectra quickly and easily. By limiting the need for user interaction and allowing the construction of workflows by combining R functions, metabolomics data analysis becomes fully reproducible and shareable. Such advances are critical for the future of the metabolomics field as it needs to move towards a fully open-science approach. This is no trivial goal as many researchers are still using black-box commercial software that often requires manually doing several steps, thus hampering reproducibility. To encourage the shift towards open source, we deliberately made our method usable for anyone with the most basic of R experience, something that is easily acquired. speaq 2.0 allows a stand-alone analysis from spectra to statistical analysis. In addition, the package can be combined with existing tools to improve performance, as it provides a superior peak picking method compared to the standard binning approach.

Download Full-text

With Guide of Spike-in Experiment for Optimizing Workflow of LC-MS data Processing in Metabolomics

Natural Product Communications ◽

10.1177/1934578x1701200837 ◽

2017 ◽

Vol 12 (8) ◽

pp. 1934578X1701200

Author(s):

Bing-Peng Yan ◽

Chun-Mei Cao ◽

Jin-Jun Hou ◽

Qi-Rui Bi ◽

Min Yang ◽

...

Keyword(s):

Data Analysis ◽

Data Processing ◽

Multivariate Statistics ◽

Design Of Experiment ◽

Data Preprocessing ◽

Accuracy Evaluation ◽

Metabolomics Data ◽

Scaling Methods ◽

Positive Rate

A systematical study was performed to investigate the processing workflow of LC-MS-based metabolomics data by optimizing parameter settings in XCMS software and comparing different preprocessing methods. Here we use a spike-in experiment combining with design of experiment (DoE) approaches for optimizing XCMS software parameters. A trusted index, which was based on accuracy evaluation of the spike-in data, was employed to assess the optimizing process. After optimizing the XCMS setting, the trusted index was improved from 3.67 to 30 and positive rate of spike-in standards also increased from 20% to 100%. Moreover, different data preprocessing methods, such as normalization, different scaling methods were also investigated on spike-in data since they were found to affect the outcome of the data analysis and ions features identification. Accordingly, UN-normalization and Pareto scaling were chosen as appropriate preprocessing methods to deal with LC-MS data through the evaluation of match index (mainly applied multivariate statistics methods). Finally, the optimized workflow was applied to experimental samples that acquired from metabolomics experiment and analyzed randomly with spike-in sample, which indicated a better applicability in formal metabolomics experiment. It is concluded that the proposed data processing workflow could be used as feasible approach for improving the quality of LC-MS-based metabolomics data and ensured the veracity of metabolites identification in data processing procedures to a certain extent.

Download Full-text

Empowering users - self-service metabolomics data analysis for everyone?

Authorea ◽

10.22541/au.151326160.02842606 ◽

2017 ◽

Author(s):

Jianguo Xia ◽

Backstories Admin

Keyword(s):

Data Analysis ◽

Metabolomics Data ◽

Self Service

Download Full-text

Advancements in capturing and mining mass spectrometry data are transforming natural products research

Natural Product Reports ◽

10.1039/d1np00040c ◽

2021 ◽

Author(s):

Scott A. Jarmusch ◽

Justin J. J. van der Hooft ◽

Pieter C. Dorrestein ◽

Alan K. Jarmusch

Keyword(s):

Mass Spectrometry ◽

Data Mining ◽

Natural Products ◽

Data Analysis ◽

Community Participation ◽

Mass Spectrometry Data ◽

Metabolomics Data ◽

Analysis Tools ◽

Public Data ◽

Potential Use

This review covers the current and potential use of mass spectrometry-based metabolomics data mining in natural products. Public data, metadata, databases and data analysis tools are critical. The value and success of data mining rely on community participation.

Download Full-text

A Review on Metabolomics Data Analysis for Cancer Applications

Practical Applications of Computational Biology and Bioinformatics, 12th International Conference - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-319-98702-6_19 ◽

2018 ◽

pp. 157-165

Author(s):

Sara Cardoso ◽

Delora Baptista ◽

Rebeca Santos ◽

Miguel Rocha

Keyword(s):

Data Analysis ◽

Metabolomics Data

Download Full-text

SAR Speckle Filtering and Agriculture Field Size: Development of SAR Data Processing Best Practices for the JECAM SAR Inter-Comparison Experiment

IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium ◽

10.1109/igarss.2018.8519299 ◽

2018 ◽

Author(s):

Laura Dingle Robertson ◽

Andrew Davidson ◽

Heather McNairn ◽

Mehdi Hosseini ◽

Scott Mitchell ◽

...

Keyword(s):

Best Practices ◽

Data Processing ◽

Field Size ◽

Comparison Experiment ◽

Sar Data ◽

Speckle Filtering

Download Full-text

Corrigendum to “Comprehensive evaluation of untargeted metabolomics data processing software in feature detection, quantification and discriminating marker selection” [ACA 1029, (2018) 50–57]

Analytica Chimica Acta ◽

10.1016/j.aca.2018.10.029 ◽

2018 ◽

Vol 1044 ◽

pp. 199

Author(s):

Zhucui Li ◽

Yan Lu ◽

Yufeng Guo ◽

Haijie Cao ◽

Qinhong Wang ◽

...

Keyword(s):

Data Processing ◽

Feature Detection ◽

Comprehensive Evaluation ◽

Untargeted Metabolomics ◽

Metabolomics Data ◽

Marker Selection ◽

Data Processing Software ◽

Processing Software

Download Full-text

Extension of the sasCIF format and its applications for data processing and deposition

Journal of Applied Crystallography ◽

10.1107/s1600576715024942 ◽

2016 ◽

Vol 49 (1) ◽

pp. 302-310 ◽

Cited By ~ 8

Author(s):

Michael Kachala ◽

John Westbrook ◽

Dmitri Svergun

Keyword(s):

Data Analysis ◽

Data Processing ◽

Data Exchange ◽

Hybrid Methods ◽

Data Bank ◽

Relevant Information ◽

Experimental Information ◽

Biological Data ◽

Task Forces ◽

Software Modules

Recent advances in small-angle scattering (SAS) experimental facilities and data analysis methods have prompted a dramatic increase in the number of users and of projects conducted, causing an upsurge in the number of objects studied, experimental data available and structural models generated. To organize the data and models and make them accessible to the community, the Task Forces on SAS and hybrid methods for the International Union of Crystallography and the Worldwide Protein Data Bank envisage developing a federated approach to SAS data and model archiving. Within the framework of this approach, the existing databases may exchange information and provide independent but synchronized entries to users. At present, ways of exchanging information between the various SAS databases are not established, leading to possible duplication and incompatibility of entries, and limiting the opportunities for data-driven research for SAS users. In this work, a solution is developed to resolve these issues and provide a universal exchange format for the community, based on the use of the widely adopted crystallographic information framework (CIF). The previous version of the sasCIF format, implemented as an extension of the core CIF dictionary, has been available since 2000 to facilitate SAS data exchange between laboratories. The sasCIF format has now been extended to describe comprehensively the necessary experimental information, results and models, including relevant metadata for SAS data analysis and for deposition into a database. Processing tools for these files (sasCIFtools) have been developed, and these are available both as standalone open-source programs and integrated into the SAS Biological Data Bank, allowing the export and import of data entries as sasCIF files. Software modules to save the relevant information directly from beamline data-processing pipelines in sasCIF format are also developed. This update of sasCIF and the relevant tools are an important step in the standardization of the way SAS data are presented and exchanged, to make the results easily accessible to users and to promote further the application of SAS in the structural biology community.

Download Full-text

Pengaruh Reliability, Responsiveness, Assurance, Empathy dan Tangibles Terhadap Kepuasan Konsumen GrabBike

JEKPEND Jurnal Ekonomi dan Pendidikan ◽

10.26858/jekpend.v3i2.14307 ◽

2020 ◽

Vol 3 (2) ◽

pp. 34

Author(s):

Lusiana Lusiana ◽

Salamun Pasda ◽

Mustari Mustari ◽

Muhammad Ihsan Said Ahmad ◽

Muhammad Hasan

Keyword(s):

Regression Analysis ◽

Data Analysis ◽

Data Processing ◽

Significant Influence ◽

Random Sampling ◽

Consumer Satisfaction ◽

Quantitative Approach ◽

Research Results ◽

Transport Services ◽

Analysis Technique

The purpose of this research are to see how reliability, responsiveness, assurance, empathy and tangibles affect the satisfaction of customer users of online transport services to UNM Economic students. Research conducted a method survey with quantitative approach. The population in this research is all students using grab bike online transport, ranging from 2016-2019 to 206 people. Sampling samples using technical samples of random sampling are 67 people. The instruments used to collect data are instruments of questionnaire, observation, and documentation. The data analysis technique used are regression analysis and hypothetical testing with T and F. data processing used SPSS version 21 for windows software. Research results show that variable as reliability, responsiveness, assurance, empathy, and tangibles are partial to positive and significant effects on users GrabBike online transport satisfaction. Simultaneously there is a positive and significant influence between reliability variables, responsiveness, assurance, empathy and tangibles to the consumer satisfaction users of GrabBike online transport services

Download Full-text

The Effect of the Use of Canva Application Learning Media on the Creativity of Students in Language Studio Extracurricular Activities at SMPN 1 Tanjung Emas

Spektrum: Jurnal Pendidikan Luar Sekolah (PLS) ◽

10.24036/spektrumpls.v9i4.113842 ◽

2021 ◽

Vol 9 (4) ◽

pp. 506

Author(s):

Deni Putri Sartika ◽

Vevi Sunarti

Keyword(s):

Data Analysis ◽

Data Processing ◽

Experimental Method ◽

Extracurricular Activities ◽

School Principals ◽

Paired Sample ◽

Significant Difference ◽

Quasi Experimental ◽

Using Data ◽

Efficient Learning

This research is motivated by the low creativity of students in language studio extracurricular activities at SMPN 1 Tanjung Emas. This condition is thought to be due to factors that influence the low creativity of students, one of the factors that is thought to greatly influence is the use of learning media that is less effective and efficient. This study aims to see the creativity of students through the use of Canva application learning media in language studio extracurricular activities at SMPN 1 Tanjung Emas. This type of research is using a quantitative approach to the quasi-experimental method (Quasi Experiment). The population in this study were students in the language studio extracurricular activities at SMPN 1 Tanjung Emas as many as 25 students, all of whom became the research sample. By using data analysis using the test formula paired sample t-test. From the data processing, it can be concluded that there is a significant difference between the creativity of students before being given an action (pretest) and after being given an action (posttest) using the Canva application learning media in the language studio extracurricular activities at SMPN 1 Tanjung Emas. Suggestions and results of this research are for educators to be able to further increase the creativity of students in using the Canva application learning media and for school principals at SMPN 1 Tanjung Emas as leaders at the school so that they can further increase the availability of facilities and infrastructure, especially effective and efficient learning media. as well as to support the creativity of students

Download Full-text