Comparison of missing value imputation methods in time series: the case of Turkish meteorological data

AbstractIntroductionMissing values exist widely in mass-spectrometry (MS) based metabolomics data. Various methods have been applied for handling missing values, but the selection of methods can significantly affect following data analyses and interpretations. According to the definition, there are three types of missing values, missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR).ObjectivesThe aim of this study was to comprehensively compare common imputation methods for different types of missing values using two separate metabolomics data sets (977 and 198 serum samples respectively) to propose a strategy to deal with missing values in metabolomics studies.MethodsImputation methods included zero, half minimum (HM), mean, median, random forest (RF), singular value decomposition (SVD), k-nearest neighbors (kNN), and quantile regression imputation of left-censored data (QRILC). Normalized root mean squared error (NRMSE) and NRMSE-based sum of ranks (SOR) were applied to evaluate the imputation accuracy for MCAR/MAR and MNAR correspondingly. Principal component analysis (PCA)/partial least squares (PLS)-Procrustes sum of squared error were used to evaluate the overall sample distribution. Student’s t-test followed by Pearson correlation analysis was conducted to evaluate the effect of imputation on univariate statistical analysis.ResultsOur findings demonstrated that RF imputation performed the best for MCAR/MAR and QRILC was the favored one for MNAR.ConclusionCombining with “modified 80% rule”, we proposed a comprehensive strategy and developed a public-accessible web-tool for missing value imputation in metabolomics data.

Download Full-text

Evaluation of missing value imputation methods for wireless soil datasets

Personal and Ubiquitous Computing ◽

10.1007/s00779-016-0978-9 ◽

2016 ◽

Vol 21 (1) ◽

pp. 113-123 ◽

Cited By ~ 5

Author(s):

Jia Shao ◽

Wei Meng ◽

Guodong Sun

Keyword(s):

Missing Value ◽

Imputation Methods ◽

Missing Value Imputation

Download Full-text

Missing value imputation in time series using Singular Spectrum Analysis

International Journal of Energy and Statistics ◽

10.1142/s2335680416500058 ◽

2016 ◽

Vol 04 (01) ◽

pp. 1650005 ◽

Cited By ~ 12

Author(s):

Rahim Mahmoudvand ◽

Paulo Canas Rodrigues

Keyword(s):

Time Series ◽

Spectrum Analysis ◽

Singular Spectrum Analysis ◽

Missing Value ◽

Missing Value Imputation ◽

Singular Spectrum

Download Full-text

Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182212213 ◽

2021 ◽

Vol 18 (22) ◽

pp. 12213

Author(s):

Taesung Kim ◽

Jinhee Kim ◽

Wonho Yang ◽

Hunjoo Lee ◽

Jaegul Choo

Keyword(s):

Time Series ◽

Deep Learning ◽

Air Quality ◽

Time Series Data ◽

Quality Data ◽

Series Data ◽

Missing Value ◽

Missing Value Imputation ◽

Spatio Temporal ◽

Air Quality Data

To prevent severe air pollution, it is important to analyze time-series air quality data, but this is often challenging as the time-series data is usually partially missing, especially when it is collected from multiple locations simultaneously. To solve this problem, various deep-learning-based missing value imputation models have been proposed. However, often they are barely interpretable, which makes it difficult to analyze the imputed data. Thus, we propose a novel deep learning-based imputation model that achieves high interpretability as well as shows great performance in missing value imputation for spatio-temporal data. We verify the effectiveness of our method through quantitative and qualitative results on a publicly available air-quality dataset.

Download Full-text