Missing value imputation on multidimensional time series

We present DeepMVI, a deep learning method for missing value imputation in multidimensional time-series datasets. Missing values are commonplace in decision support platforms that aggregate data over long time stretches from disparate sources, whereas reliable data analytics calls for careful handling of missing data. One strategy is imputing the missing values, and a wide variety of algorithms exist spanning simple interpolation, matrix factorization methods like SVD, statistical models like Kalman filters, and recent deep learning methods. We show that often these provide worse results on aggregate analytics compared to just excluding the missing data. DeepMVI expresses the distribution of each missing value conditioned on coarse and fine-grained signals along a time series, and signals from correlated series at the same time. Instead of resorting to linearity assumptions of conventional matrix factorization methods, DeepMVI harnesses a flexible deep network to extract and combine these signals in an end-to-end manner. To prevent over-fitting with high-capacity neural networks, we design a robust parameter training with labeled data created using synthetic missing blocks around available indices. Our neural network uses a modular design with a novel temporal transformer with convolutional features, and kernel regression with learned embeddings. Experiments across ten real datasets, five different missing scenarios, comparing seven conventional and three deep learning methods show that DeepMVI is significantly more accurate, reducing error by more than 50% in more than half the cases, compared to the best existing method. Although slower than simpler matrix factorization methods, we justify the increased time overheads by showing that DeepMVI provides significantly more accurate imputation that finally impacts quality of downstream analytics.

Download Full-text

A data-driven missing value imputation approach for longitudinal datasets

Artificial Intelligence Review ◽

10.1007/s10462-021-09963-5 ◽

2021 ◽

Author(s):

Caio Ribeiro ◽

Alex A. Freitas

Keyword(s):

Missing Data ◽

Longitudinal Data ◽

Missing Values ◽

Error Rates ◽

Imputation Method ◽

Data Driven ◽

Missing Value ◽

Missing Value Imputation ◽

Human Ageing ◽

Imputation Approach

AbstractLongitudinal datasets of human ageing studies usually have a high volume of missing data, and one way to handle missing values in a dataset is to replace them with estimations. However, there are many methods to estimate missing values, and no single method is the best for all datasets. In this article, we propose a data-driven missing value imputation approach that performs a feature-wise selection of the best imputation method, using known information in the dataset to rank the five methods we selected, based on their estimation error rates. We evaluated the proposed approach in two sets of experiments: a classifier-independent scenario, where we compared the applicabilities and error rates of each imputation method; and a classifier-dependent scenario, where we compared the predictive accuracy of Random Forest classifiers generated with datasets prepared using each imputation method and a baseline approach of doing no imputation (letting the classification algorithm handle the missing values internally). Based on our results from both sets of experiments, we concluded that the proposed data-driven missing value imputation approach generally resulted in models with more accurate estimations for missing data and better performing classifiers, in longitudinal datasets of human ageing. We also observed that imputation methods devised specifically for longitudinal data had very accurate estimations. This reinforces the idea that using the temporal information intrinsic to longitudinal data is a worthwhile endeavour for machine learning applications, and that can be achieved through the proposed data-driven approach.

Download Full-text

Detection of Threats in Cyberphysical Systems Based on Deep Learning Methods Using Multidimensional Time Series

Automatic Control and Computer Sciences ◽

10.3103/s0146411618080151 ◽

2018 ◽

Vol 52 (8) ◽

pp. 912-917 ◽

Cited By ~ 8

Author(s):

M. O. Kalinin ◽

D. S. Lavrova ◽

A. V. Yarmak

Keyword(s):

Time Series ◽

Deep Learning ◽

Cyberphysical Systems ◽

Learning Methods ◽

Multidimensional Time Series

Download Full-text

Missing Value Imputation of Time-Series Air-Quality Data via Deep Neural Networks

International Journal of Environmental Research and Public Health ◽

10.3390/ijerph182212213 ◽

2021 ◽

Vol 18 (22) ◽

pp. 12213

Author(s):

Taesung Kim ◽

Jinhee Kim ◽

Wonho Yang ◽

Hunjoo Lee ◽

Jaegul Choo

Keyword(s):

Time Series ◽

Deep Learning ◽

Air Quality ◽

Time Series Data ◽

Quality Data ◽

Series Data ◽

Missing Value ◽

Missing Value Imputation ◽

Spatio Temporal ◽

Air Quality Data

To prevent severe air pollution, it is important to analyze time-series air quality data, but this is often challenging as the time-series data is usually partially missing, especially when it is collected from multiple locations simultaneously. To solve this problem, various deep-learning-based missing value imputation models have been proposed. However, often they are barely interpretable, which makes it difficult to analyze the imputed data. Thus, we propose a novel deep learning-based imputation model that achieves high interpretability as well as shows great performance in missing value imputation for spatio-temporal data. We verify the effectiveness of our method through quantitative and qualitative results on a publicly available air-quality dataset.

Download Full-text

Optimization and expansion of non-negative matrix factorization

BMC Bioinformatics ◽

10.1186/s12859-019-3312-5 ◽

2020 ◽

Vol 21 (1) ◽

Cited By ~ 7

Author(s):

Xihui Lin ◽

Paul C. Boutros

Keyword(s):

Prior Knowledge ◽

Matrix Factorization ◽

Missing Values ◽

Complexity Analysis ◽

R Package ◽

Missing Value ◽

Missing Value Imputation ◽

Tuning Method ◽

Novel Applications ◽

Non Negative Matrix Factorization

Abstract Background Non-negative matrix factorization (NMF) is a technique widely used in various fields, including artificial intelligence (AI), signal processing and bioinformatics. However existing algorithms and R packages cannot be applied to large matrices due to their slow convergence or to matrices with missing entries. Besides, most NMF research focuses only on blind decompositions: decomposition without utilizing prior knowledge. Finally, the lack of well-validated methodology for choosing the rank hyperparameters also raises concern on derived results. Results We adopt the idea of sequential coordinate-wise descent to NMF to increase the convergence rate. We demonstrate that NMF can handle missing values naturally and this property leads to a novel method to determine the rank hyperparameter. Further, we demonstrate some novel applications of NMF and show how to use masking to inject prior knowledge and desirable properties to achieve a more meaningful decomposition. Conclusions We show through complexity analysis and experiments that our implementation converges faster than well-known methods. We also show that using NMF for tumour content deconvolution can achieve results similar to existing methods like ISOpure. Our proposed missing value imputation is more accurate than conventional methods like multiple imputation and comparable to missForest while achieving significantly better computational efficiency. Finally, we argue that the suggested rank tuning method based on missing value imputation is theoretically superior to existing methods. All algorithms are implemented in the R package NNLM, which is freely available on CRAN and Github.

Download Full-text

Missing Value Imputation Based on Gaussian Mixture Model for the Internet of Things

Mathematical Problems in Engineering ◽

10.1155/2015/548605 ◽

2015 ◽

Vol 2015 ◽

pp. 1-8 ◽

Cited By ~ 16

Author(s):

Xiaobo Yan ◽

Weiqing Xiong ◽

Liang Hu ◽

Feng Wang ◽

Kuo Zhao

Keyword(s):

Missing Data ◽

Internet Of Things ◽

Gaussian Mixture Model ◽

Mixture Model ◽

Missing Values ◽

Gaussian Mixture ◽

The Internet ◽

Missing Value ◽

Missing Value Imputation ◽

The Internet Of Things

This paper addresses missing value imputation for the Internet of Things (IoT). Nowadays, the IoT has been used widely and commonly by a variety of domains, such as transportation and logistics domain and healthcare domain. However, missing values are very common in the IoT for a variety of reasons, which results in the fact that the experimental data are incomplete. As a result of this, some work, which is related to the data of the IoT, can’t be carried out normally. And it leads to the reduction in the accuracy and reliability of the data analysis results. This paper, for the characteristics of the data itself and the features of missing data in IoT, divides the missing data into three types and defines three corresponding missing value imputation problems. Then, we propose three new models to solve the corresponding problems, and they are model of missing value imputation based on context and linear mean (MCL), model of missing value imputation based on binary search (MBS), and model of missing value imputation based on Gaussian mixture model (MGI). Experimental results showed that the three models can improve the accuracy, reliability, and stability of missing value imputation greatly and effectively.

Download Full-text

Kernel weighted least square approach for imputing missing values of metabolomics data

Scientific Reports ◽

10.1038/s41598-021-90654-0 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Nishith Kumar ◽

Md. Aminul Hoque ◽

Masahiro Sugimoto

Keyword(s):

Missing Data ◽

Large Scale ◽

Missing Values ◽

Kernel Weight ◽

Least Square ◽

Data Matrix ◽

Data Imputation ◽

Metabolomics Data ◽

Missing Value ◽

Missing Data Imputation

AbstractMass spectrometry is a modern and sophisticated high-throughput analytical technique that enables large-scale metabolomic analyses. It yields a high-dimensional large-scale matrix (samples × metabolites) of quantified data that often contain missing cells in the data matrix as well as outliers that originate for several reasons, including technical and biological sources. Although several missing data imputation techniques are described in the literature, all conventional existing techniques only solve the missing value problems. They do not relieve the problems of outliers. Therefore, outliers in the dataset decrease the accuracy of the imputation. We developed a new kernel weight function-based proposed missing data imputation technique that resolves the problems of missing values and outliers. We evaluated the performance of the proposed method and other conventional and recently developed missing imputation techniques using both artificially generated data and experimentally measured data analysis in both the absence and presence of different rates of outliers. Performances based on both artificial data and real metabolomics data indicate the superiority of our proposed kernel weight-based missing data imputation technique to the existing alternatives. For user convenience, an R package of the proposed kernel weight-based missing value imputation technique was developed, which is available at https://github.com/NishithPaul/tWLSA.

Download Full-text

Spatial and temporal deep learning methods for deriving land-use following deforestation: A pan-tropical case study using Landsat time series

Remote Sensing of Environment ◽

10.1016/j.rse.2021.112600 ◽

2021 ◽

Vol 264 ◽

pp. 112600

Author(s):

Robert N. Masolele ◽

Veronique De Sy ◽

Martin Herold ◽

Diego Marcos Gonzalez ◽

Jan Verbesselt ◽

...

Keyword(s):

Land Use ◽

Time Series ◽

Deep Learning ◽

Learning Methods

Download Full-text

Deep Learning Based Classification of Time Series of Chaotic Systems over Graphic Images

10.21203/rs.3.rs-1138927/v1 ◽

2021 ◽

Author(s):

Süleyman UZUN ◽

Sezgin KAÇAR ◽

Burak ARICIOĞLU

Keyword(s):

Time Series ◽

Deep Learning ◽

Transfer Learning ◽

Chaotic Systems ◽

Initial Conditions ◽

Step Size ◽

Learning Methods ◽

Data Set ◽

Graphic Images ◽

Different Chaotic Systems

Abstract In this study, for the first time in the literature, identification of different chaotic systems by classifying graphic images of their time series with deep learning methods is aimed. For this purpose, a data set is generated that consists of the graphic images of time series of the most known three chaotic systems: Lorenz, Chen, and Rossler systems. The time series are obtained for different parameter values, initial conditions, step size and time lengths. After generating the data set, a high-accuracy classification is performed by using transfer learning method. In the study, the most accepted deep learning models of the transfer learning methods are employed. These models are SqueezeNet, VGG-19, AlexNet, ResNet50, ResNet101, DenseNet201, ShuffleNet and GoogLeNet. As a result of the study, classification accuracy is found between 96% and 97% depending on the problem. Thus, this study makes association of real time random signals with a mathematical system possible.

Download Full-text