Multilevel and time-series missing value imputation for combined survey and longitudinal context data

To prevent severe air pollution, it is important to analyze time-series air quality data, but this is often challenging as the time-series data is usually partially missing, especially when it is collected from multiple locations simultaneously. To solve this problem, various deep-learning-based missing value imputation models have been proposed. However, often they are barely interpretable, which makes it difficult to analyze the imputed data. Thus, we propose a novel deep learning-based imputation model that achieves high interpretability as well as shows great performance in missing value imputation for spatio-temporal data. We verify the effectiveness of our method through quantitative and qualitative results on a publicly available air-quality dataset.

Download Full-text

Missing value imputation in multivariate time series with end-to-end generative adversarial networks

Information Sciences ◽

10.1016/j.ins.2020.11.035 ◽

2021 ◽

Vol 551 ◽

pp. 67-82

Author(s):

Ying Zhang ◽

Baohang Zhou ◽

Xiangrui Cai ◽

Wenya Guo ◽

Xiaoke Ding ◽

...

Keyword(s):

Time Series ◽

Multivariate Time Series ◽

Generative Adversarial Networks ◽

Missing Value ◽

Missing Value Imputation ◽

Adversarial Networks ◽

End To End

Download Full-text

Missing value imputation on multidimensional time series

Proceedings of the VLDB Endowment ◽

10.14778/3476249.3476300 ◽

2021 ◽

Vol 14 (11) ◽

pp. 2533-2545

Author(s):

Parikshit Bansal ◽

Prathamesh Deshpande ◽

Sunita Sarawagi

Keyword(s):

Time Series ◽

Deep Learning ◽

Missing Data ◽

Matrix Factorization ◽

Missing Values ◽

Learning Methods ◽

Missing Value ◽

Missing Value Imputation ◽

Multidimensional Time Series ◽

Factorization Methods

We present DeepMVI, a deep learning method for missing value imputation in multidimensional time-series datasets. Missing values are commonplace in decision support platforms that aggregate data over long time stretches from disparate sources, whereas reliable data analytics calls for careful handling of missing data. One strategy is imputing the missing values, and a wide variety of algorithms exist spanning simple interpolation, matrix factorization methods like SVD, statistical models like Kalman filters, and recent deep learning methods. We show that often these provide worse results on aggregate analytics compared to just excluding the missing data. DeepMVI expresses the distribution of each missing value conditioned on coarse and fine-grained signals along a time series, and signals from correlated series at the same time. Instead of resorting to linearity assumptions of conventional matrix factorization methods, DeepMVI harnesses a flexible deep network to extract and combine these signals in an end-to-end manner. To prevent over-fitting with high-capacity neural networks, we design a robust parameter training with labeled data created using synthetic missing blocks around available indices. Our neural network uses a modular design with a novel temporal transformer with convolutional features, and kernel regression with learned embeddings. Experiments across ten real datasets, five different missing scenarios, comparing seven conventional and three deep learning methods show that DeepMVI is significantly more accurate, reducing error by more than 50% in more than half the cases, compared to the best existing method. Although slower than simpler matrix factorization methods, we justify the increased time overheads by showing that DeepMVI provides significantly more accurate imputation that finally impacts quality of downstream analytics.

Download Full-text