Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling

Luca Cappelletti; Tommaso Fontana; Guido Walter Di Donato; Lorenzo Di Tucci; Elena Casiraghi; Giorgio Valentini

doi:10.3390/computers9020037

Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks—A Case Study on Genome Gap-Filling

Computers ◽

10.3390/computers9020037 ◽

2020 ◽

Vol 9 (2) ◽

pp. 37 ◽

Cited By ~ 1

Author(s):

Luca Cappelletti ◽

Tommaso Fontana ◽

Guido Walter Di Donato ◽

Lorenzo Di Tucci ◽

Elena Casiraghi ◽

...

Keyword(s):

Deep Learning ◽

Missing Data ◽

State Of The Art ◽

The State ◽

Complex Data ◽

Data Imputation ◽

Genome Sequences ◽

Missing Data Imputation ◽

The Past ◽

Learning Techniques

Missing data imputation has been a hot topic in the past decade, and many state-of-the-art works have been presented to propose novel, interesting solutions that have been applied in a variety of fields. In the past decade, the successful results achieved by deep learning techniques have opened the way to their application for solving difficult problems where human skill is not able to provide a reliable solution. Not surprisingly, some deep learners, mainly exploiting encoder-decoder architectures, have also been designed and applied to the task of missing data imputation. However, most of the proposed imputation techniques have not been designed to tackle “complex data”, that is high dimensional data belonging to datasets with huge cardinality and describing complex problems. Precisely, they often need critical parameters to be manually set or exploit complex architecture and/or training phases that make their computational load impracticable. In this paper, after clustering the state-of-the-art imputation techniques into three broad categories, we briefly review the most representative methods and then describe our data imputation proposals, which exploit deep learning techniques specifically designed to handle complex data. Comparative tests on genome sequences show that our deep learning imputers outperform the state-of-the-art KNN-imputation method when filling gaps in human genome sequences.

Download Full-text

Reviewing Autoencoders for Missing Data Imputation: Technical Trends, Applications and Outcomes

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12312 ◽

2020 ◽

Vol 69 ◽

pp. 1255-1285

Author(s):

Ricardo Cardoso Pereira ◽

Miriam Seoane Santos ◽

Pedro Pereira Rodrigues ◽

Pedro Henriques Abreu

Keyword(s):

Missing Data ◽

Missing Values ◽

State Of The Art ◽

Data Imputation ◽

Tabular Data ◽

Missing Data Imputation ◽

Learning Techniques ◽

Real World Datasets ◽

And Training ◽

Machine Learning Models

Missing data is a problem often found in real-world datasets and it can degrade the performance of most machine learning models. Several deep learning techniques have been used to address this issue, and one of them is the Autoencoder and its Denoising and Variational variants. These models are able to learn a representation of the data with missing values and generate plausible new ones to replace them. This study surveys the use of Autoencoders for the imputation of tabular data and considers 26 works published between 2014 and 2020. The analysis is mainly focused on discussing patterns and recommendations for the architecture, hyperparameters and training settings of the network, while providing a detailed discussion of the results obtained by Autoencoders when compared to other state-of-the-art methods, and of the data contexts where they have been applied. The conclusions include a set of recommendations for the technical settings of the network, and show that Denoising Autoencoders outperform their competitors, particularly the often used statistical methods.

Download Full-text

Evaluating the state-of-the-art in missing data imputation for clinical data (Preprint)

10.2196/preprints.28008 ◽

2021 ◽

Author(s):

Yuan Luo

Keyword(s):

Time Series ◽

Missing Data ◽

Clinical Data ◽

Data Analytics ◽

State Of The Art ◽

Ground Truth ◽

The State ◽

Data Imputation ◽

Missing Data Imputation ◽

Clinical Dataset

UNSTRUCTURED The Data Analytics Challenge on Missing data Imputation (DACMI) presented a shared clinical dataset with ground truth for evaluating and advancing the state-of-the-art in imputing missing data for clinical time series. The challenge attracted 12 international teams spanning three continents across multiple industries and academia. The challenge participating systems practically advanced the state-of-the-art with considerable margins, and their designing principles will inform future efforts to better model clinical missing data.

Download Full-text

Representation Learning for Fine-Grained Change Detection

Sensors ◽

10.3390/s21134486 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4486

Author(s):

Niall O’Mahony ◽

Sean Campbell ◽

Lenka Krpalkova ◽

Anderson Carvalho ◽

Joseph Walsh ◽

...

Keyword(s):

Deep Learning ◽

Change Detection ◽

Model Calibration ◽

State Of The Art ◽

Representation Learning ◽

Machine Intelligence ◽

The State ◽

Sensor Data ◽

Fine Grained ◽

Learning Techniques

Fine-grained change detection in sensor data is very challenging for artificial intelligence though it is critically important in practice. It is the process of identifying differences in the state of an object or phenomenon where the differences are class-specific and are difficult to generalise. As a result, many recent technologies that leverage big data and deep learning struggle with this task. This review focuses on the state-of-the-art methods, applications, and challenges of representation learning for fine-grained change detection. Our research focuses on methods of harnessing the latent metric space of representation learning techniques as an interim output for hybrid human-machine intelligence. We review methods for transforming and projecting embedding space such that significant changes can be communicated more effectively and a more comprehensive interpretation of underlying relationships in sensor data is facilitated. We conduct this research in our work towards developing a method for aligning the axes of latent embedding space with meaningful real-world metrics so that the reasoning behind the detection of change in relation to past observations may be revealed and adjusted. This is an important topic in many fields concerned with producing more meaningful and explainable outputs from deep learning and also for providing means for knowledge injection and model calibration in order to maintain user confidence.

Download Full-text

A Novel Missing Data Imputation Algorithm for Deep Learning-Based Anomaly Detection System in IIoT Networks

10.1201/9781003156123-2 ◽

2021 ◽

pp. 27-46

Author(s):

Ancy Jose ◽

S.V. Annlin Jeba ◽

Beulah Joslyn Jose

Keyword(s):

Deep Learning ◽

Missing Data ◽

Anomaly Detection ◽

Detection System ◽

Data Imputation ◽

Missing Data Imputation ◽

Anomaly Detection System

Download Full-text

Deep learning for molecular design—a review of the state of the art

Molecular Systems Design & Engineering ◽

10.1039/c9me00039a ◽

2019 ◽

Vol 4 (4) ◽

pp. 828-849 ◽

Cited By ~ 84

Author(s):

Daniel C. Elton ◽

Zois Boukouvalas ◽

Mark D. Fuge ◽

Peter W. Chung

Keyword(s):

Deep Learning ◽

Molecular Design ◽

State Of The Art ◽

The State ◽

Learning Techniques

We review a recent groundswell of work which uses deep learning techniques to generate and optimize molecules.

Download Full-text

Missing data imputation of climate datasets: implications to modeling extreme drought events

Revista Brasileira de Meteorologia ◽

10.1590/s0102-77862014000100003 ◽

2014 ◽

Vol 29 (1) ◽

pp. 21-28 ◽

Cited By ~ 13

Author(s):

Gláucia Tatiana Ferrari ◽

Vitor Ozaki

Keyword(s):

Quality Control ◽

Missing Data ◽

Value Theory ◽

The State ◽

Agricultural Drought ◽

Data Imputation ◽

Missing Data Imputation ◽

Control Procedures ◽

Weather Stations ◽

Agricultural Regions

Time series from weather stations in Brazil have several missing data, outliers and spurious zeroes. In order to use this dataset in risk and meteorological studies, one should take into account alternative methodologies to deal with these problems. This article describes the statistical imputation and quality control procedures applied to a database of daily precipitation from meteorological stations located in the State of Parana, Brazil. After imputation, the data went through a process of quality control to identify possible errors, such as: identical precipitation over seven consecutive days and precipitation values that differ significantly from the values in neighboring weather stations. Next, we used the extreme value theory to model agricultural drought, considering the maximum number of consecutive days with precipitation below 7 mm for the period between January and February, in the main soybean agricultural regions in the State of Parana.

Download Full-text

An Improved Stacked Denoise Autoencoder with Elu Activation Function for Traffic Data Imputation

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k2022.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 3951-3954

Keyword(s):

Deep Learning ◽

Missing Data ◽

Intelligent Transportation Systems ◽

State Of The Art ◽

Intelligent Transportation ◽

Activation Function ◽

Transportation Systems ◽

Data Imputation ◽

Traffic Data ◽

Spatio Temporal

Traffic data plays a major role in transport related applications. The problem of missing data has greatly impact the performance of Intelligent transportation systems(ITS). In this work impute the missing traffic data with spatio-temporal exploitation for high precision result under various missing rates. Deep learning based stacked denoise autoencoder is proposed with efficient Elu activation function to remove noise and impute the missing value.This imputed value will be used in analyses and prediction of vehicle traffic. Results are discussed that the proposed method outperforms well in state of the art approaches.

Download Full-text

A Review on Deep Learning Techniques for 3D Sensed Data Classification

Remote Sensing ◽

10.3390/rs11121499 ◽

2019 ◽

Vol 11 (12) ◽

pp. 1499 ◽

Cited By ~ 31

Author(s):

David Griffiths ◽

Jan Boehm

Keyword(s):

Deep Learning ◽

State Of The Art ◽

Point Clouds ◽

Image Understanding ◽

Future Research ◽

National Scale ◽

The Past ◽

Current State ◽

Learning Techniques ◽

Learning Architectures

Over the past decade deep learning has driven progress in 2D image understanding. Despite these advancements, techniques for automatic 3D sensed data understanding, such as point clouds, is comparatively immature. However, with a range of important applications from indoor robotics navigation to national scale remote sensing there is a high demand for algorithms that can learn to automatically understand and classify 3D sensed data. In this paper we review the current state-of-the-art deep learning architectures for processing unstructured Euclidean data. We begin by addressing the background concepts and traditional methodologies. We review the current main approaches, including RGB-D, multi-view, volumetric and fully end-to-end architecture designs. Datasets for each category are documented and explained. Finally, we give a detailed discussion about the future of deep learning for 3D sensed data, using literature to justify the areas where future research would be most valuable.

Download Full-text

A Deep Learning Approach for Missing Data Imputation of Rating Scales Assessing Attention-Deficit Hyperactivity Disorder

Frontiers in Psychiatry ◽

10.3389/fpsyt.2020.00673 ◽

2020 ◽

Vol 11 ◽

Author(s):

Chung-Yuan Cheng ◽

Wan-Ling Tseng ◽

Ching-Fen Chang ◽

Chuan-Hsiung Chang ◽

Susan Shur-Fen Gau

Keyword(s):

Attention Deficit Hyperactivity Disorder ◽

Deep Learning ◽

Missing Data ◽

Attention Deficit ◽

Rating Scales ◽

Learning Approach ◽

Data Imputation ◽

Missing Data Imputation ◽

Hyperactivity Disorder

Download Full-text

A Technique of Recursive Reliability-Based Missing Data Imputation for Collaborative Filtering

Applied Sciences ◽

10.3390/app11083719 ◽

2021 ◽

Vol 11 (8) ◽

pp. 3719

Author(s):

Sun-Young Ihm ◽

Shin-Eun Lee ◽

Young-Ho Park ◽

Aziz Nasridinov ◽

Miyeon Kim ◽

...

Keyword(s):

Missing Data ◽

Collaborative Filtering ◽

State Of The Art ◽

High Reliability ◽

User Preferences ◽

User Preference ◽

Data Imputation ◽

Missing Data Imputation ◽

Common Interests ◽

Recommendation Accuracy

Collaborative filtering (CF) is a recommendation technique that analyzes the behavior of various users and recommends the items preferred by users with similar preferences. However, CF methods suffer from poor recommendation accuracy when the user preference data used in the recommendation process is sparse. Data imputation can alleviate the data sparsity problem by substituting a virtual part of the missing user preferences. In this paper, we propose a k-recursive reliability-based imputation (k-RRI) that first selects data with high reliability and then recursively imputes data with additional selection while gradually lowering the reliability criterion. We also propose a new similarity measure that weights common interests and indifferences between users and items. The proposed method can overcome disregarding the importance of missing data and resolve the problem of poor data imputation of existing methods. The experimental results demonstrate that the proposed approach significantly improves recommendation accuracy compared to those resulting from the state-of-the-art methods while demanding less computational complexity.

Download Full-text