Evaluating the state-of-the-art in missing data imputation for clinical data (Preprint)

Mapping Intimacies ◽

10.2196/preprints.28008 ◽

2021 ◽

Author(s):

Yuan Luo

Keyword(s):

Time Series ◽

Missing Data ◽

Clinical Data ◽

Data Analytics ◽

State Of The Art ◽

Ground Truth ◽

The State ◽

Data Imputation ◽

Missing Data Imputation ◽

Clinical Dataset

UNSTRUCTURED The Data Analytics Challenge on Missing data Imputation (DACMI) presented a shared clinical dataset with ground truth for evaluating and advancing the state-of-the-art in imputing missing data for clinical time series. The challenge attracted 12 international teams spanning three continents across multiple industries and academia. The challenge participating systems practically advanced the state-of-the-art with considerable margins, and their designing principles will inform future efforts to better model clinical missing data.

Download Full-text

Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling

Computers ◽

10.3390/computers9020037 ◽

2020 ◽

Vol 9 (2) ◽

pp. 37 ◽

Cited By ~ 1

Author(s):

Luca Cappelletti ◽

Tommaso Fontana ◽

Guido Walter Di Donato ◽

Lorenzo Di Tucci ◽

Elena Casiraghi ◽

...

Keyword(s):

Deep Learning ◽

Missing Data ◽

State Of The Art ◽

The State ◽

Complex Data ◽

Data Imputation ◽

Genome Sequences ◽

Missing Data Imputation ◽

The Past ◽

Learning Techniques

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.

Download Full-text

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12312 ◽

2020 ◽

Vol 69 ◽

pp. 1255-1285

Author(s):

Ricardo Cardoso Pereira ◽

Miriam Seoane Santos ◽

Pedro Pereira Rodrigues ◽

Pedro Henriques Abreu

Keyword(s):

Missing Data ◽

Missing Values ◽

State Of The Art ◽

Data Imputation ◽

Tabular Data ◽

Missing Data Imputation ◽

Learning Techniques ◽

Real World Datasets ◽

And Training ◽

Machine Learning Models

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

Download Full-text

Missing data imputation of high‐resolution temporal climate time series data

Meteorological Applications ◽

10.1002/met.1873 ◽

2020 ◽

Vol 27 (1) ◽

Author(s):

E Afrifa‐Yamoah ◽

U. A. Mueller ◽

S. M. Taylor ◽

A. J. Fisher

Keyword(s):

Time Series ◽

Missing Data ◽

High Resolution ◽

Time Series Data ◽

Series Data ◽

Data Imputation ◽

Missing Data Imputation ◽

Climate Time Series

Download Full-text

Missing Data Imputation for Real Time-series Data in a Steel Industry using Generative Adversarial Networks

10.1109/iecon48115.2021.9589716 ◽

2021 ◽

Author(s):

Kisan Sarda ◽

Amol Yerudkar ◽

Carmen Del Vecchio

Keyword(s):

Time Series ◽

Missing Data ◽

Real Time ◽

Steel Industry ◽

Time Series Data ◽

Series Data ◽

Generative Adversarial Networks ◽

Data Imputation ◽

Missing Data Imputation ◽

Adversarial Networks

Download Full-text

Missing Data Imputation in Time Series by Evolutionary Algorithms

Advanced Intelligent Computing Theories and Applications. With Aspects of Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/978-3-540-85984-0_34 ◽

2008 ◽

pp. 275-283 ◽

Cited By ~ 4

Author(s):

Juan C. Figueroa García ◽

Dusko Kalenatic ◽

Cesar Amilcar Lopez Bello

Keyword(s):

Time Series ◽

Missing Data ◽

Evolutionary Algorithms ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text

Missing data imputation of climate datasets: implications to modeling extreme drought events

Revista Brasileira de Meteorologia ◽

10.1590/s0102-77862014000100003 ◽

2014 ◽

Vol 29 (1) ◽

pp. 21-28 ◽

Cited By ~ 13

Author(s):

Gláucia Tatiana Ferrari ◽

Vitor Ozaki

Keyword(s):

Quality Control ◽

Missing Data ◽

Value Theory ◽

The State ◽

Agricultural Drought ◽

Data Imputation ◽

Missing Data Imputation ◽

Control Procedures ◽

Weather Stations ◽

Agricultural Regions

Time series from weather stations in Brazil have several missing data, outliers and spurious zeroes. In order to use this dataset in risk and meteorological studies, one should take into account alternative methodologies to deal with these problems. This article describes the statistical imputation and quality control procedures applied to a database of daily precipitation from meteorological stations located in the State of Parana, Brazil. After imputation, the data went through a process of quality control to identify possible errors, such as: identical precipitation over seven consecutive days and precipitation values that differ significantly from the values in neighboring weather stations. Next, we used the extreme value theory to model agricultural drought, considering the maximum number of consecutive days with precipitation below 7 mm for the period between January and February, in the main soybean agricultural regions in the State of Parana.

Download Full-text

Traffic prediction, data compression, abnormal data detection and missing data imputation: An integrated study based on the decomposition of traffic time series

17th International IEEE Conference on Intelligent Transportation Systems (ITSC) ◽

10.1109/itsc.2014.6957705 ◽

2014 ◽

Cited By ~ 4

Author(s):

Li Li ◽

Xiaonan Su ◽

Yi Zhang ◽

Jianming Hu ◽

Zhiheng Li

Keyword(s):

Time Series ◽

Missing Data ◽

Data Compression ◽

Traffic Prediction ◽

Data Detection ◽

Data Imputation ◽

Missing Data Imputation ◽

Traffic Time Series ◽

Integrated Study

Download Full-text

Use Case and Performance Analyses for Missing Data Imputation Methods in Big Data Analytics

Proceedings of 2020 the 6th International Conference on Computing and Data Engineering ◽

10.1145/3379247.3379270 ◽

2020 ◽

Author(s):

Lan Yang ◽

Jason Amaro Chiang

Keyword(s):

Big Data ◽

Missing Data ◽

Data Analytics ◽

Big Data Analytics ◽

Use Case ◽

Data Imputation ◽

Missing Data Imputation ◽

Imputation Methods ◽

Performance Analyses ◽

And Performance

Download Full-text

Wind Power Time Series Missing Data Imputation Based on Generative Adversarial Network

10.1109/cieec50170.2021.9510923 ◽

2021 ◽

Author(s):

Hang Fan ◽

Xuemin Zhang ◽

Shengwei Mei

Keyword(s):

Time Series ◽

Missing Data ◽

Wind Power ◽

Data Imputation ◽

Generative Adversarial Network ◽

Missing Data Imputation ◽

Adversarial Network

Download Full-text

Missing Data Imputation in Time Series of Air Pollution

Epidemiology ◽

10.1097/01.ede.0000362970.08869.60 ◽

2009 ◽

Vol 20 ◽

pp. S87 ◽

Cited By ~ 4

Author(s):

Washington Junger ◽

Antonio Ponce de Leon

Keyword(s):

Air Pollution ◽

Time Series ◽

Missing Data ◽

Data Imputation ◽

Missing Data Imputation

Download Full-text