Adaptive data augmentation for supervised learning over missing data

Real-world data is dirty, which causes serious problems in (supervised) machine learning (ML). The widely used practice in such scenario is to first repair the labeled source (a.k.a. train) data using rule-, statistical- or ML-based methods and then use the "repaired" source to train an ML model. During production, unlabeled target (a.k.a. test) data will also be repaired, and is then fed in the trained ML model for prediction. However, this process often causes a performance degradation when the source and target datasets are dirty with different noise patterns , which is common in practice. In this paper, we propose an adaptive data augmentation approach, for handling missing data in supervised ML. The approach extracts noise patterns from target data, and adapts the source data with the extracted target noise patterns while still preserving supervision signals in the source. Then, it patches the ML model by retraining it on the adapted data, in order to better serve the target. To effectively support adaptive data augmentation, we propose a novel generative adversarial network (GAN) based framework, called DAGAN, which works in an unsupervised fashion. DAGAN consists of two connected GAN networks. The first GAN learns the noise pattern from the target, for target mask generation. The second GAN uses the learned target mask to augment the source data, for source data adaptation. The augmented source data is used to retrain the ML model. Extensive experiments show that our method significantly improves the ML model performance and is more robust than the state-of-the-art missing data imputation solutions for handling datasets with different missing value patterns.

Download Full-text

Data Augmentation Using Generative Adversarial Network for Automatic Machine Fault Detection Based on Vibration Signals

Applied Sciences ◽

10.3390/app11052166 ◽

2021 ◽

Vol 11 (5) ◽

pp. 2166

Author(s):

Van Bui ◽

Tung Lam Pham ◽

Huy Nguyen ◽

Yeong Min Jang

Keyword(s):

Fault Detection ◽

Data Augmentation ◽

Model Performance ◽

Original Data ◽

Fault Classification ◽

Training Process ◽

Generative Adversarial Network ◽

Data Set ◽

Adversarial Network ◽

Machine Fault

In the last decade, predictive maintenance has attracted a lot of attention in industrial factories because of its wide use of the Internet of Things and artificial intelligence algorithms for data management. However, in the early phases where the abnormal and faulty machines rarely appeared in factories, there were limited sets of machine fault samples. With limited fault samples, it is difficult to perform a training process for fault classification due to the imbalance of input data. Therefore, data augmentation was required to increase the accuracy of the learning model. However, there were limited methods to generate and evaluate the data applied for data analysis. In this paper, we introduce a method of using the generative adversarial network as the fault signal augmentation method to enrich the dataset. The enhanced data set could increase the accuracy of the machine fault detection model in the training process. We also performed fault detection using a variety of preprocessing approaches and classified the models to evaluate the similarities between the generated data and authentic data. The generated fault data has high similarity with the original data and it significantly improves the accuracy of the model. The accuracy of fault machine detection reaches 99.41% with 20% original fault machine data set and 93.1% with 0% original fault machine data set (only use generate data only). Based on this, we concluded that the generated data could be used to mix with original data and improve the model performance.

Download Full-text

Continuous missing data imputation with incomplete dataset by generative adversarial networks–based unsupervised learning for long-term bridge health monitoring

Structural Health Monitoring ◽

10.1177/14759217211021942 ◽

2021 ◽

pp. 147592172110219

Author(s):

Huachen Jiang ◽

Chunfeng Wan ◽

Kang Yang ◽

Youliang Ding ◽

Songtao Xue

Keyword(s):

Missing Data ◽

Health Monitoring ◽

Signal Transmission ◽

Imputation Accuracy ◽

Generative Adversarial Networks ◽

Data Imputation ◽

Sensor Failure ◽

Generative Adversarial Network ◽

Missing Data Imputation ◽

Adversarial Network

Wireless sensors are the key components of structural health monitoring systems. During the signal transmission, sensor failure is inevitable, among which, data loss is the most common type. Missing data problem poses a huge challenge to the consequent damage detection and condition assessment, and therefore, great importance should be attached. Conventional missing data imputation basically adopts the correlation-based method, especially for strain monitoring data. However, such methods often require delicate model selection, and the correlations for vehicle-induced strains are much harder to be captured compared with temperature-induced strains. In this article, a novel data-driven generative adversarial network (GAN) for imputing missing strain response is proposed. As opposed to traditional ways where correlations for inter-strains are explicitly modeled, the proposed method directly imputes the missing data considering the spatial–temporal relationships with other strain sensors based on the remaining observed data. Furthermore, the intact and complete dataset is not even necessary during the training process, which shows another great superiority over the model-based imputation method. The proposed method is implemented and verified on a real concrete bridge. In order to demonstrate the applicability and robustness of the GAN, imputation for single and multiple sensors is studied. Results show the proposed method provides an excellent performance of imputation accuracy and efficiency.

Download Full-text

Urban Intersection Classification: A Comparative Analysis

Sensors ◽

10.3390/s21186269 ◽

2021 ◽

Vol 21 (18) ◽

pp. 6269

Author(s):

Augusto Luis Ballardini ◽

Álvaro Hernández Saz ◽

Sandra Carrasco Limeros ◽

Javier Lorenzo ◽

Ignacio Parra Alonso ◽

...

Keyword(s):

Ad Hoc ◽

Data Augmentation ◽

Comprehensive Evaluation ◽

Temporal Integration ◽

Individual Performance ◽

Training Data ◽

Real World Data ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Integration Schemes

Understanding the scene in front of a vehicle is crucial for self-driving vehicles and Advanced Driver Assistance Systems, and in urban scenarios, intersection areas are one of the most critical, concentrating between 20% to 25% of road fatalities. This research presents a thorough investigation on the detection and classification of urban intersections as seen from onboard front-facing cameras. Different methodologies aimed at classifying intersection geometries have been assessed to provide a comprehensive evaluation of state-of-the-art techniques based on Deep Neural Network (DNN) approaches, including single-frame approaches and temporal integration schemes. A detailed analysis of most popular datasets previously used for the application together with a comparison with ad hoc recorded sequences revealed that the performances strongly depend on the field of view of the camera rather than other characteristics or temporal-integrating techniques. Due to the scarcity of training data, a new dataset is created by performing data augmentation from real-world data through a Generative Adversarial Network (GAN) to increase generalizability as well as to test the influence of data quality. Despite being in the relatively early stages, mainly due to the lack of intersection datasets oriented to the problem, an extensive experimental activity has been performed to analyze the individual performance of each proposed systems.

Download Full-text

Alleviating Class Imbalance in Actuarial Applications Using Generative Adversarial Networks

Risks ◽

10.3390/risks9030049 ◽

2021 ◽

Vol 9 (3) ◽

pp. 49

Author(s):

Kwanda Sydwell Ngwenduna ◽

Rendani Mbuvha

Keyword(s):

Predictive Models ◽

Data Augmentation ◽

Class Imbalance ◽

Machine Learning Techniques ◽

Quality Data ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Missing Data Imputation ◽

Adversarial Network ◽

Imbalanced Classes

To build adequate predictive models, a substantial amount of data is desirable. However, when expanding to new or unexplored territories, this required level of information is rarely always available. To build such models, actuaries often have to: procure data from local providers, use limited unsuitable industry and public research, or rely on extrapolations from other better-known markets. Another common pathology when applying machine learning techniques in actuarial domains is the prevalence of imbalanced classes where risk events of interest, such as mortality and fraud, are under-represented in data. In this work, we show how an implicit model using the Generative Adversarial Network (GAN) can alleviate these problems through the generation of adequate quality data from very limited or highly imbalanced samples. We provide an introduction to GANs and how they are used to synthesize data that accurately enhance the data resolution of very infrequent events and improve model robustness. Overall, we show a significant superiority of GANs for boosting predictive models when compared to competing approaches on benchmark data sets. This work offers numerous of contributions to actuaries with applications to inter alia new sample creation, data augmentation, boosting predictive models, anomaly detection, and missing data imputation.

Download Full-text

Wind Power Time Series Missing Data Imputation Based on Generative Adversarial Network

10.1109/cieec50170.2021.9510923 ◽

2021 ◽

Author(s):

Hang Fan ◽

Xuemin Zhang ◽

Shengwei Mei

Keyword(s):

Time Series ◽

Missing Data ◽

Wind Power ◽

Data Imputation ◽

Generative Adversarial Network ◽

Missing Data Imputation ◽

Adversarial Network

Download Full-text

Wildfire-Detection Method Using DenseNet and CycleGAN Data Augmentation-Based Remote Camera Imagery

Remote Sensing ◽

10.3390/rs12223715 ◽

2020 ◽

Vol 12 (22) ◽

pp. 3715 ◽

Cited By ~ 1

Author(s):

Minsoo Park ◽

Dai Quoc Tran ◽

Daekyo Jung ◽

Seunghee Park

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Model Performance ◽

Image Data ◽

Detection Accuracy ◽

Generative Adversarial Network ◽

Data Imbalance ◽

Surveillance Camera ◽

Convolutional Networks ◽

Adversarial Network

To minimize the damage caused by wildfires, a deep learning-based wildfire-detection technology that extracts features and patterns from surveillance camera images was developed. However, many studies related to wildfire-image classification based on deep learning have highlighted the problem of data imbalance between wildfire-image data and forest-image data. This data imbalance causes model performance degradation. In this study, wildfire images were generated using a cycle-consistent generative adversarial network (CycleGAN) to eliminate data imbalances. In addition, a densely-connected-convolutional-networks-based (DenseNet-based) framework was proposed and its performance was compared with pre-trained models. While training with a train set containing an image generated by a GAN in the proposed DenseNet-based model, the best performance result value was realized among the models with an accuracy of 98.27% and an F1 score of 98.16, obtained using the test dataset. Finally, this trained model was applied to high-quality drone images of wildfires. The experimental results showed that the proposed framework demonstrated high wildfire-detection accuracy.

Download Full-text

Satellite Imagery Noising With Generative Adversarial Networks

International Journal of Cognitive Informatics and Natural Intelligence ◽

10.4018/ijcini.2021010102 ◽

2021 ◽

Vol 15 (1) ◽

pp. 16-25

Author(s):

Mohamed Akram Zaytar ◽

Chaker El Amrani

Keyword(s):

Remote Sensing ◽

Satellite Imagery ◽

Data Augmentation ◽

Remote Sensing Data ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Sensing Data ◽

Adversarial Network

Using satellite imagery and remote sensing data for supervised and self-supervised learning problems can be quite challenging when parts of the underlying datasets are missing due to natural phenomena (clouds, fog, haze, mist, etc.). Solving this problem will improve remote sensing data augmentation and make use of it in a world where satellite imagery represents a great resource to exploit in any big data pipeline setup. In this paper, the authors present a generative adversarial network (GANs) model that can generate natural atmospheric noise that serves as a data augmentation preprocessing tool to produce input to supervised machine learning algorithms.

Download Full-text

Classification of Space Objects by Using Deep Learning with Micro-Doppler Signature Images

Sensors ◽

10.3390/s21134365 ◽

2021 ◽

Vol 21 (13) ◽

pp. 4365

Author(s):

Kwangyong Jung ◽

Jae-In Lee ◽

Nammoon Kim ◽

Sunjin Oh ◽

Dong-Wook Seo

Keyword(s):

Transfer Learning ◽

Data Augmentation ◽

Doppler Frequency ◽

Extraction Methods ◽

Classification Performance ◽

Reconstruction Method ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Space Objects ◽

Target Characteristics

Radar target classification is an important task in the missile defense system. State-of-the-art studies using micro-doppler frequency have been conducted to classify the space object targets. However, existing studies rely highly on feature extraction methods. Therefore, the generalization performance of the classifier is limited and there is room for improvement. Recently, to improve the classification performance, the popular approaches are to build a convolutional neural network (CNN) architecture with the help of transfer learning and use the generative adversarial network (GAN) to increase the training datasets. However, these methods still have drawbacks. First, they use only one feature to train the network. Therefore, the existing methods cannot guarantee that the classifier learns more robust target characteristics. Second, it is difficult to obtain large amounts of data that accurately mimic real-world target features by performing data augmentation via GAN instead of simulation. To mitigate the above problem, we propose a transfer learning-based parallel network with the spectrogram and the cadence velocity diagram (CVD) as the inputs. In addition, we obtain an EM simulation-based dataset. The radar-received signal is simulated according to a variety of dynamics using the concept of shooting and bouncing rays with relative aspect angles rather than the scattering center reconstruction method. Our proposed model is evaluated on our generated dataset. The proposed method achieved about 0.01 to 0.39% higher accuracy than the pre-trained networks with a single input feature.

Download Full-text

An Imbalanced Image Classification Method for the Cell Cycle Phase

Information ◽

10.3390/info12060249 ◽

2021 ◽

Vol 12 (6) ◽

pp. 249

Author(s):

Xin Jin ◽

Yuanwen Zou ◽

Zhongbing Huang

Keyword(s):

Cell Cycle ◽

Deep Learning ◽

Image Classification ◽

Classification Accuracy ◽

Data Augmentation ◽

Cycle Phase ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Cellular Life

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.

Download Full-text

Underwater Acoustic Target Recognition Based on Generative Adversarial Network Data Augmentation

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2737 ◽

2021 ◽

Vol 263 (2) ◽

pp. 4558-4564

Author(s):

Minghong Zhang ◽

Xinwei Luo

Keyword(s):

Data Augmentation ◽

Target Recognition ◽

Training Data ◽

Small Samples ◽

Generative Adversarial Network ◽

Data Set ◽

Underwater Acoustic ◽

Adversarial Network ◽

Acoustic Target ◽

The Impact

Underwater acoustic target recognition is an important aspect of underwater acoustic research. In recent years, machine learning has been developed continuously, which is widely and effectively applied in underwater acoustic target recognition. In order to acquire good recognition results and reduce the problem of overfitting, Adequate data sets are essential. However, underwater acoustic samples are relatively rare, which has a certain impact on recognition accuracy. In this paper, in addition of the traditional audio data augmentation method, a new method of data augmentation using generative adversarial network is proposed, which uses generator and discriminator to learn the characteristics of underwater acoustic samples, so as to generate reliable underwater acoustic signals to expand the training data set. The expanded data set is input into the deep neural network, and the transfer learning method is applied to further reduce the impact caused by small samples by fixing part of the pre-trained parameters. The experimental results show that the recognition result of this method is better than the general underwater acoustic recognition method, and the effectiveness of this method is verified.

Download Full-text