Missing value imputation for LC-MS metabolomics data by incorporating metabolic network and adduct ion relations

Zhuxuan Jin; Jian Kang; Tianwei Yu

doi:10.1093/bioinformatics/btx816

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

10.1101/171967 ◽

2017 ◽

Cited By ~ 1

Author(s):

Runmin Wei ◽

Jingye Wang ◽

Mingming Su ◽

Erik Jia ◽

Tianlu Chen ◽

...

Keyword(s):

Mass Spectrometry ◽

Missing Values ◽

Pearson Correlation ◽

Imputation Accuracy ◽

Metabolomics Data ◽

Missing Value ◽

Sample Distribution ◽

Imputation Methods ◽

Missing Value Imputation ◽

Squared Error

AbstractIntroductionMissing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection of methods can significantly affect following data analyses and interpretations. According to the definition, there are three types of missing values, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).ObjectivesThe aim of this study was to comprehensively compare common imputation methods for different types of missing values using two separate metabolomics data sets (977 and 198 serum samples respectively) to propose a strategy to deal with missing values in metabolomics studies.MethodsImputation methods included zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC). Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate the imputation accuracy for MCAR/MAR and MNAR correspondingly. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes sum of squared error were used to evaluate the overall sample distribution. Student’s t-test followed by Pearson correlation analysis was conducted to evaluate the effect of imputation on univariate statistical analysis.ResultsOur findings demonstrated that RF imputation performed the best for MCAR/MAR and QRILC was the favored one for MNAR.ConclusionCombining with “modified 80% rule”, we proposed a comprehensive strategy and developed a public-accessible web-tool for missing value imputation in metabolomics data.

Download Full-text

Missing value imputation strategies for metabolomics data

Electrophoresis ◽

10.1002/elps.201500352 ◽

2015 ◽

Vol 36 (24) ◽

pp. 3050-3060 ◽

Cited By ~ 55

Author(s):

Emily Grace Armitage ◽

Joanna Godzien ◽

Vanesa Alonso-Herranz ◽

Ángeles López-Gonzálvez ◽

Coral Barbas

Keyword(s):

Metabolomics Data ◽

Missing Value ◽

Missing Value Imputation

Download Full-text

Missing Value Imputation Approach for Mass Spectrometry-based Metabolomics Data

Scientific Reports ◽

10.1038/s41598-017-19120-0 ◽

2018 ◽

Vol 8 (1) ◽

Cited By ~ 93

Author(s):

Runmin Wei ◽

Jingye Wang ◽

Mingming Su ◽

Erik Jia ◽

Shaoqiu Chen ◽

...

Keyword(s):

Mass Spectrometry ◽

Metabolomics Data ◽

Missing Value ◽

Missing Value Imputation ◽

Imputation Approach

Download Full-text

A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis

Current Bioinformatics ◽

10.2174/1574893612666171121154655 ◽

2018 ◽

Vol 14 (1) ◽

pp. 43-52 ◽

Cited By ~ 3

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Md. Shahjaman ◽

S.M. Shahinul Islam ◽

Md. Nurul Haque Mollah

Keyword(s):

Mass Spectrometry ◽

Data Analysis ◽

Missing Values ◽

Real Data ◽

Gas Chromatography Mass Spectrometry ◽

Data Generation ◽

Metabolomics Data ◽

Missing Value ◽

Missing Value Imputation ◽

Chromatography Mass Spectrometry

Background: Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatography mass spectrometry (GC-MS), liquid chromatography mass spectrometry (LC-MS), etc.) metabolomics data frequently contain missing values that make some quantitative analysis complex. Typically metabolomics datasets contain 10% to 20% missing values that originate from several reasons, like analytical, computational as well as biological hazard. Imputation of missing values is a very important and interesting issue for further metabolomics data analysis. </P><P> Objective: This paper introduces a new algorithm for missing value imputation in the presence of outliers for metabolomics data analysis. </P><P> Method: Currently, the most well known missing value imputation techniques in metabolomics data are knearest neighbours (kNN), random forest (RF) and zero imputation. However, these techniques are sensitive to outliers. In this paper, we have proposed an outlier robust missing imputation technique by minimizing twoway empirical mean absolute error (MAE) loss function for imputing missing values in metabolomics data. Results: We have investigated the performance of the proposed missing value imputation technique in a comparison of the other traditional imputation techniques using both simulated and real data analysis in the absence and presence of outliers. Conclusion: Results of both simulated and real data analyses show that the proposed outlier robust missing imputation technique is better performer than the traditional missing imputation methods in both absence and presence of outliers.

Download Full-text

rMisbeta: A Robust Missing Value Imputation Approach in Transcriptomics and Metabolomics Data

Computers in Biology and Medicine ◽

10.1016/j.compbiomed.2021.104911 ◽

2021 ◽

pp. 104911

Author(s):

Md. Shahjaman ◽

Md. Rezanur Rahman ◽

Tania Islam ◽

Md. Rabiul Auwul ◽

Mohammad Ali Moni ◽

...

Keyword(s):

Metabolomics Data ◽

Missing Value ◽

Missing Value Imputation ◽

Imputation Approach

Download Full-text

Metabolomic Biomarker Identification in Presence of Outliers and Missing Values

BioMed Research International ◽

10.1155/2017/2437608 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 9

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Md. Shahjaman ◽

S. M. Shahinul Islam ◽

Md. Nurul Haque Mollah

Keyword(s):

Data Analysis ◽

High Throughput ◽

Missing Values ◽

Real Data ◽

Data Matrix ◽

Metabolomics Data ◽

Missing Value ◽

Biomarker Identification ◽

Missing Value Imputation ◽

High Throughput Technology

Metabolomics is the sophisticated and high-throughput technology based on the entire set of metabolites which is known as the connector between genotypes and phenotypes. For any phenotypic changes, potential metabolite (biomarker) identification is very important because it provides diagnostic as well as prognostic markers and can help to develop new biomolecular therapy. Biomarker identification from metabolomics data analysis is hampered by the use of high-throughput technology that provides high dimensional data matrix which contains missing values as well as outliers. However, missing value imputation and outliers handling techniques play important role in identifying biomarker correctly. Although several missing value imputation techniques are available, outliers deteriorate the accuracy of imputation as well as the accuracy of biomarker identification. Therefore, in this paper we have proposed a new biomarker identification technique combining the groupwise robust singular value decomposition, t-test, and fold-change approach that can identify biomarkers more correctly from metabolomics dataset. We have also compared the performance of the proposed technique with those of other traditional techniques for biomarker identification using both simulated and real data analysis in absence and presence of outliers. Using our proposed method in hepatocellular carcinoma (HCC) dataset, we have also identified the four upregulated and two downregulated metabolites as potential metabolomic biomarkers for HCC disease.

Download Full-text

Enriching Integrated Statistical Open City Data by Combining Equational Knowledge and Missing Value Imputation

SSRN Electronic Journal ◽

10.2139/ssrn.3199313 ◽

2018 ◽

Author(s):

Stefan Bischof ◽

Andreas Harth ◽

Benedikt KKmpgen ◽

Axel Polleres ◽

Patrik Schneider

Keyword(s):

Missing Value ◽

Missing Value Imputation

Download Full-text

Effective Missing Value Imputation Methods for Building Monitoring Data

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378230 ◽

2020 ◽

Author(s):

Brian Cho ◽

Teresa Dayrit ◽

Yuan Gao ◽

Zhe Wang ◽

Tianzhen Hong ◽

...

Keyword(s):

Monitoring Data ◽

Missing Value ◽

Imputation Methods ◽

Missing Value Imputation ◽

Building Monitoring

Download Full-text

Kernel weighted least square approach for imputing missing values of metabolomics data

Scientific Reports ◽

10.1038/s41598-021-90654-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Masahiro Sugimoto

Keyword(s):

Missing Data ◽

Large Scale ◽

Missing Values ◽

Kernel Weight ◽

Least Square ◽

Data Matrix ◽

Data Imputation ◽

Metabolomics Data ◽

Missing Value ◽

Missing Data Imputation

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.

Download Full-text

IFGAN: Missing Value Imputation using Feature-specific Generative Adversarial Networks

2020 IEEE International Conference on Big Data (Big Data) ◽

10.1109/bigdata50022.2020.9378240 ◽

2020 ◽

Author(s):

Wei Qiu ◽

Yangsibo Huang ◽

Quanzheng Li

Keyword(s):

Generative Adversarial Networks ◽

Missing Value ◽

Missing Value Imputation ◽

Adversarial Networks

Download Full-text