WaveICA 2.0: a novel batch effect removal method for untargeted metabolomics data without using batch information

Metabolomics ◽  
2021 ◽  
Vol 17 (10) ◽  
Author(s):  
Kui Deng ◽  
Falin Zhao ◽  
Zhiwei Rong ◽  
Lei Cao ◽  
Liuchao Zhang ◽  
...  
Metabolites ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 8
Author(s):  
Michiel Bongaerts ◽  
Ramon Bonte ◽  
Serwet Demirdas ◽  
Edwin H. Jacobs ◽  
Esmee Oussoren ◽  
...  

Untargeted metabolomics is an emerging technology in the laboratory diagnosis of inborn errors of metabolism (IEM). Analysis of a large number of reference samples is crucial for correcting variations in metabolite concentrations that result from factors, such as diet, age, and gender in order to judge whether metabolite levels are abnormal. However, a large number of reference samples requires the use of out-of-batch samples, which is hampered by the semi-quantitative nature of untargeted metabolomics data, i.e., technical variations between batches. Methods to merge and accurately normalize data from multiple batches are urgently needed. Based on six metrics, we compared the existing normalization methods on their ability to reduce the batch effects from nine independently processed batches. Many of those showed marginal performances, which motivated us to develop Metchalizer, a normalization method that uses 10 stable isotope-labeled internal standards and a mixed effect model. In addition, we propose a regression model with age and sex as covariates fitted on reference samples that were obtained from all nine batches. Metchalizer applied on log-transformed data showed the most promising performance on batch effect removal, as well as in the detection of 195 known biomarkers across 49 IEM patient samples and performed at least similar to an approach utilizing 15 within-batch reference samples. Furthermore, our regression model indicates that 6.5–37% of the considered features showed significant age-dependent variations. Our comprehensive comparison of normalization methods showed that our Log-Metchalizer approach enables the use out-of-batch reference samples to establish clinically-relevant reference values for metabolite concentrations. These findings open the possibilities to use large scale out-of-batch reference samples in a clinical setting, increasing the throughput and detection accuracy.


2021 ◽  
Author(s):  
Li Chen ◽  
Wenyun Lu ◽  
Lin Wang ◽  
Xi Xing ◽  
Xin Teng ◽  
...  

AbstractA primary goal of metabolomics is to identify all biologically important metabolites. One powerful approach is liquid chromatography-high resolution mass spectrometry (LC-MS), yet most LC-MS peaks remain unidentified. Here, we present a global network optimization approach, NetID, to annotate untargeted LC-MS metabolomics data. We consider all experimentally observed ion peaks together, and assign annotations to all of them simultaneously so as to maximize a score that considers properties of peaks (known masses, retention times, MS/MS fragmentation patterns) as well network constraints that arise based on mass difference between peaks. Global optimization results in accurate peak assignment and trackable peak-peak relationships. Applying this approach to yeast and mouse data, we identify a half-dozen novel metabolites, including thiamine and taurine derivatives. Isotope tracer studies indicate active flux through these metabolites. Thus, NetID applies existing metabolomic knowledge and global optimization to annotate untargeted metabolomics data, revealing novel metabolites.


Metabolites ◽  
2020 ◽  
Vol 10 (10) ◽  
pp. 416
Author(s):  
Gabriel Riquelme ◽  
Nicolás Zabalegui ◽  
Pablo Marchi ◽  
Christina M. Jones ◽  
María Eugenia Monge

Preprocessing data in a reproducible and robust way is one of the current challenges in untargeted metabolomics workflows. Data curation in liquid chromatography–mass spectrometry (LC–MS) involves the removal of biologically non-relevant features (retention time, m/z pairs) to retain only high-quality data for subsequent analysis and interpretation. The present work introduces TidyMS, a package for the Python programming language for preprocessing LC–MS data for quality control (QC) procedures in untargeted metabolomics workflows. It is a versatile strategy that can be customized or fit for purpose according to the specific metabolomics application. It allows performing quality control procedures to ensure accuracy and reliability in LC–MS measurements, and it allows preprocessing metabolomics data to obtain cleaned matrices for subsequent statistical analysis. The capabilities of the package are shown with pipelines for an LC–MS system suitability check, system conditioning, signal drift evaluation, and data curation. These applications were implemented to preprocess data corresponding to a new suite of candidate plasma reference materials developed by the National Institute of Standards and Technology (NIST; hypertriglyceridemic, diabetic, and African-American plasma pools) to be used in untargeted metabolomics studies in addition to NIST SRM 1950 Metabolites in Frozen Human Plasma. The package offers a rapid and reproducible workflow that can be used in an automated or semi-automated fashion, and it is an open and free tool available to all users.


2015 ◽  
Vol 377 ◽  
pp. 719-727 ◽  
Author(s):  
Neha Garg ◽  
Clifford A. Kapono ◽  
Yan Wei Lim ◽  
Nobuhiro Koyama ◽  
Mark J.A. Vermeij ◽  
...  

Metabolites ◽  
2021 ◽  
Vol 11 (9) ◽  
pp. 568
Author(s):  
Brechtje Hoegen ◽  
Alan Zammit ◽  
Albert Gerritsen ◽  
Udo F. H. Engelke ◽  
Steven Castelein ◽  
...  

Inborn errors of metabolism (IEM) are inherited conditions caused by genetic defects in enzymes or cofactors. These defects result in a specific metabolic fingerprint in patient body fluids, showing accumulation of substrate or lack of an end-product of the defective enzymatic step. Untargeted metabolomics has evolved as a high throughput methodology offering a comprehensive readout of this metabolic fingerprint. This makes it a promising tool for diagnostic screening of IEM patients. However, the size and complexity of metabolomics data have posed a challenge in translating this avalanche of information into knowledge, particularly for clinical application. We have previously established next-generation metabolic screening (NGMS) as a metabolomics-based diagnostic tool for analyzing plasma of individual IEM-suspected patients. To fully exploit the clinical potential of NGMS, we present a computational pipeline to streamline the analysis of untargeted metabolomics data. This pipeline allows for time-efficient and reproducible data analysis, compatible with ISO:15189 accredited clinical diagnostics. The pipeline implements a combination of tools embedded in a workflow environment for large-scale clinical metabolomics data analysis. The accompanying graphical user interface aids end-users from a diagnostic laboratory for efficient data interpretation and reporting. We also demonstrate the application of this pipeline with a case study and discuss future prospects.


2020 ◽  
Author(s):  
Bruno L. Santos-Lobato ◽  
Luiz Gustavo Gardinassi ◽  
Mariza Bortolanza ◽  
Ana Paula Ferranti Peti ◽  
Ângela V. Pimentel ◽  
...  

Structured AbstractBackgroundThe existence of few biomarkers and the lack of a better understanding of the pathophysiology of levodopa-induced dyskinesia (LID) in Parkinson’s disease (PD) require new approaches, as the metabolomic analysis, for discoveries.ObjectivesWe aimed to identify a metabolic profile associated with LID in patients with PD in an original cohort, and to confirm the results in an external cohort (BioFIND).MethodsIn the original cohort, plasma and CSF were collected from 20 healthy controls, 23 patients with PD without LID, and 24 patients with PD with LID. LC-MS/MS and metabolomics data analysis were used to perform untargeted metabolomics. Untargeted metabolomics data from the BioFIND cohort were analyzed.ResultsWe identified a metabolic profile associated with LID in PD, composed of multiple metabolic pathways. In particular, the dysregulation of glycosphingolipids metabolic pathway was more related to LID and was strongly associated with the severity of dyskinetic movements. Further, bile acid biosynthesis and C21-steroid hormone biosynthesis metabolites simultaneously found in plasma and CSF have distinguished patients with LID from other participants. Levels of cortisol and cortisone were reduced in patients with PD and LID compared to patients with PD without LID. Data from the BioFIND cohort confirmed dysregulation in plasma metabolites from the bile acid biosynthesis and C21-steroid hormone biosynthesis pathways.ConclusionThere is a distinct metabolic profile associated with LID in PD, both in plasma and CSF, which may be associated with the dysregulation of lipid metabolism and neuroinflammation.


Metabolites ◽  
2019 ◽  
Vol 9 (3) ◽  
pp. 54 ◽  
Author(s):  
Charlie Beirnaert ◽  
Laura Peeters ◽  
Pieter Meysman ◽  
Wout Bittremieux ◽  
Kenn Foubert ◽  
...  

Data analysis for metabolomics is undergoing rapid progress thanks to the proliferation of novel tools and the standardization of existing workflows. As untargeted metabolomics datasets and experiments continue to increase in size and complexity, standardized workflows are often not sufficiently sophisticated. In addition, the ground truth for untargeted metabolomics experiments is intrinsically unknown and the performance of tools is difficult to evaluate. Here, the problem of dynamic multi-class metabolomics experiments was investigated using a simulated dataset with a known ground truth. This simulated dataset was used to evaluate the performance of tinderesting, a new and intuitive tool based on gathering expert knowledge to be used in machine learning. The results were compared to EDGE, a statistical method for time series data. This paper presents three novel outcomes. The first is a way to simulate dynamic metabolomics data with a known ground truth based on ordinary differential equations. This method is made available through the MetaboLouise R package. Second, the EDGE tool, originally developed for genomics data analysis, is highly performant in analyzing dynamic case vs. control metabolomics data. Third, the tinderesting method is introduced to analyse more complex dynamic metabolomics experiments. This tool consists of a Shiny app for collecting expert knowledge, which in turn is used to train a machine learning model to emulate the decision process of the expert. This approach does not replace traditional data analysis workflows for metabolomics, but can provide additional information, improved performance or easier interpretation of results. The advantage is that the tool is agnostic to the complexity of the experiment, and thus is easier to use in advanced setups. All code for the presented analysis, MetaboLouise and tinderesting are freely available.


Sign in / Sign up

Export Citation Format

Share Document