Data Augmentation for Audio–Visual Emotion Recognition with an Efficient Multimodal Conditional GAN

Audio–visual emotion recognition is the research of identifying human emotional states by combining the audio modality and the visual modality simultaneously, which plays an important role in intelligent human–machine interactions. With the help of deep learning, previous works have made great progress for audio–visual emotion recognition. However, these deep learning methods often require a large amount of data for training. In reality, data acquisition is difficult and expensive, especially for the multimodal data with different modalities. As a result, the training data may be in the low-data regime, which cannot be effectively used for deep learning. In addition, class imbalance may occur in the emotional data, which can further degrade the performance of audio–visual emotion recognition. To address these problems, we propose an efficient data augmentation framework by designing a multimodal conditional generative adversarial network (GAN) for audio–visual emotion recognition. Specifically, we design generators and discriminators for audio and visual modalities. The category information is used as their shared input to make sure our GAN can generate fake data of different categories. In addition, the high dependence between the audio modality and the visual modality in the generated multimodal data is modeled based on Hirschfeld–Gebelein–Re´nyi (HGR) maximal correlation. In this way, we relate different modalities in the generated data to approximate the real data. Then, the generated data are used to augment our data manifold. We further apply our approach to deal with the problem of class imbalance. To the best of our knowledge, this is the first work to propose a data augmentation strategy with a multimodal conditional GAN for audio–visual emotion recognition. We conduct a series of experiments on three public multimodal datasets, including eNTERFACE’05, RAVDESS, and CMEW. The results indicate that our multimodal conditional GAN has high effectiveness for data augmentation of audio–visual emotion recognition.

Download Full-text

A deep learning segmentation strategy that minimizes the amount of manually annotated images

F1000Research ◽

10.12688/f1000research.52026.1 ◽

2021 ◽

Vol 10 ◽

pp. 256

Author(s):

Thierry Pécot ◽

Alexander Alekseyenko ◽

Kristin Wallace

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Training Data ◽

Artificial Data ◽

Deep Convolutional Neural Networks ◽

Generative Adversarial Network ◽

Data Set ◽

Segmentation Strategy ◽

Adversarial Network ◽

The Impact

Deep learning has revolutionized the automatic processing of images. While deep convolutional neural networks have demonstrated astonishing segmentation results for many biological objects acquired with microscopy, this technology's good performance relies on large training datasets. In this paper, we present a strategy to minimize the amount of time spent in manually annotating images for segmentation. It involves using an efficient and open source annotation tool, the artificial increase of the training data set with data augmentation, the creation of an artificial data set with a conditional generative adversarial network and the combination of semantic and instance segmentations. We evaluate the impact of each of these approaches for the segmentation of nuclei in 2D widefield images of human precancerous polyp biopsies in order to define an optimal strategy.

Download Full-text

4-Class MI-EEG Signal Generation and Recognition with CVAE-GAN

Applied Sciences ◽

10.3390/app11041798 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1798

Author(s):

Jun Yang ◽

Huijuan Yu ◽

Tao Shen ◽

Yaolian Song ◽

Zhuangfei Chen

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Feature Matching ◽

Continuous Optimization ◽

Training Data ◽

Small Scale ◽

Eeg Signal ◽

Generative Adversarial Network ◽

The Real ◽

Adversarial Network

As the capability of an electroencephalogram’s (EEG) measurement of the real-time electrodynamics of the human brain is known to all, signal processing techniques, particularly deep learning, could either provide a novel solution for learning but also optimize robust representations from EEG signals. Considering the limited data collection and inadequate concentration of during subjects testing, it becomes essential to obtain sufficient training data and useful features with a potential end-user of a brain–computer interface (BCI) system. In this paper, we combined a conditional variational auto-encoder network (CVAE) with a generative adversarial network (GAN) for learning latent representations from EEG brain signals. By updating the fine-tuned parameter fed into the resulting generative model, we could synthetize the EEG signal under a specific category. We employed an encoder network to obtain the distributed samples of the EEG signal, and applied an adversarial learning mechanism to continuous optimization of the parameters of the generator, discriminator and classifier. The CVAE was adopted to adjust the synthetics more approximately to the real sample class. Finally, we demonstrated our approach take advantages of both statistic and feature matching to make the training process converge faster and more stable and address the problem of small-scale datasets in deep learning applications for motor imagery tasks through data augmentation. The augmented training datasets produced by our proposed CVAE-GAN method significantly enhance the performance of MI-EEG recognition.

Download Full-text

EEG data augmentation for emotion recognition with a multiple generator conditional Wasserstein GAN

Complex & Intelligent Systems ◽

10.1007/s40747-021-00336-7 ◽

2021 ◽

Author(s):

Aiming Zhang ◽

Lei Su ◽

Yin Zhang ◽

Yunfa Fu ◽

Liping Wu ◽

...

Keyword(s):

Emotion Recognition ◽

Data Augmentation ◽

Real Data ◽

Training Data ◽

Quality Data ◽

Emotion Classification ◽

High Quality ◽

Proposed Model ◽

Eeg Data ◽

Substantial Progress

AbstractEEG-based emotion recognition has attracted substantial attention from researchers due to its extensive application prospects, and substantial progress has been made in feature extraction and classification modelling from EEG data. However, insufficient high-quality training data are available for building EEG-based emotion recognition models via machine learning or deep learning methods. The artificial generation of high-quality data is an effective approach for overcoming this problem. In this paper, a multi-generator conditional Wasserstein GAN method is proposed for the generation of high-quality artificial that covers a more comprehensive distribution of real data through the use of various generators. Experimental results demonstrate that the artificial data that are generated by the proposed model can effectively improve the performance of emotion classification models that are based on EEG.

Download Full-text

Deep Learning-Based Differentiation between Mucinous Cystic Neoplasm and Serous Cystic Neoplasm in the Pancreas Using Endoscopic Ultrasonography

Diagnostics ◽

10.3390/diagnostics11061052 ◽

2021 ◽

Vol 11 (6) ◽

pp. 1052

Author(s):

Leang Sim Nguon ◽

Kangwon Seo ◽

Jung-Hyun Lim ◽

Tae-Jun Song ◽

Sung-Hyun Cho ◽

...

Keyword(s):

Decision Making ◽

Deep Learning ◽

Network Model ◽

Endoscopic Ultrasonography ◽

Data Augmentation ◽

Clinical Information ◽

Training Data ◽

Fine Tuning ◽

Cystic Neoplasm ◽

Cystic Neoplasms

Mucinous cystic neoplasms (MCN) and serous cystic neoplasms (SCN) account for a large portion of solitary pancreatic cystic neoplasms (PCN). In this study we implemented a convolutional neural network (CNN) model using ResNet50 to differentiate between MCN and SCN. The training data were collected retrospectively from 59 MCN and 49 SCN patients from two different hospitals. Data augmentation was used to enhance the size and quality of training datasets. Fine-tuning training approaches were utilized by adopting the pre-trained model from transfer learning while training selected layers. Testing of the network was conducted by varying the endoscopic ultrasonography (EUS) image sizes and positions to evaluate the network performance for differentiation. The proposed network model achieved up to 82.75% accuracy and a 0.88 (95% CI: 0.817–0.930) area under curve (AUC) score. The performance of the implemented deep learning networks in decision-making using only EUS images is comparable to that of traditional manual decision-making using EUS images along with supporting clinical information. Gradient-weighted class activation mapping (Grad-CAM) confirmed that the network model learned the features from the cyst region accurately. This study proves the feasibility of diagnosing MCN and SCN using a deep learning network model. Further improvement using more datasets is needed.

Download Full-text

An Imbalanced Image Classification Method for the Cell Cycle Phase

Information ◽

10.3390/info12060249 ◽

2021 ◽

Vol 12 (6) ◽

pp. 249

Author(s):

Xin Jin ◽

Yuanwen Zou ◽

Zhongbing Huang

Keyword(s):

Cell Cycle ◽

Deep Learning ◽

Image Classification ◽

Classification Accuracy ◽

Data Augmentation ◽

Cycle Phase ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Cellular Life

The cell cycle is an important process in cellular life. In recent years, some image processing methods have been developed to determine the cell cycle stages of individual cells. However, in most of these methods, cells have to be segmented, and their features need to be extracted. During feature extraction, some important information may be lost, resulting in lower classification accuracy. Thus, we used a deep learning method to retain all cell features. In order to solve the problems surrounding insufficient numbers of original images and the imbalanced distribution of original images, we used the Wasserstein generative adversarial network-gradient penalty (WGAN-GP) for data augmentation. At the same time, a residual network (ResNet) was used for image classification. ResNet is one of the most used deep learning classification networks. The classification accuracy of cell cycle images was achieved more effectively with our method, reaching 83.88%. Compared with an accuracy of 79.40% in previous experiments, our accuracy increased by 4.48%. Another dataset was used to verify the effect of our model and, compared with the accuracy from previous results, our accuracy increased by 12.52%. The results showed that our new cell cycle image classification system based on WGAN-GP and ResNet is useful for the classification of imbalanced images. Moreover, our method could potentially solve the low classification accuracy in biomedical images caused by insufficient numbers of original images and the imbalanced distribution of original images.

Download Full-text

Robust Approach to Supervised Deep Neural Network Training for Real-Time Object Classification in Cluttered Indoor Environment

Applied Sciences ◽

10.3390/app11157148 ◽

2021 ◽

Vol 11 (15) ◽

pp. 7148

Author(s):

Bedada Endale ◽

Abera Tullu ◽

Hayoung Shi ◽

Beom-Soo Kang

Keyword(s):

Neural Network ◽

Deep Learning ◽

Real Time ◽

Network Architecture ◽

Input Data ◽

Deep Neural Network ◽

Data Augmentation ◽

Object Classification ◽

Training Data ◽

Gradient Descent Algorithm

Unmanned aerial vehicles (UAVs) are being widely utilized for various missions: in both civilian and military sectors. Many of these missions demand UAVs to acquire artificial intelligence about the environments they are navigating in. This perception can be realized by training a computing machine to classify objects in the environment. One of the well known machine training approaches is supervised deep learning, which enables a machine to classify objects. However, supervised deep learning comes with huge sacrifice in terms of time and computational resources. Collecting big input data, pre-training processes, such as labeling training data, and the need for a high performance computer for training are some of the challenges that supervised deep learning poses. To address these setbacks, this study proposes mission specific input data augmentation techniques and the design of light-weight deep neural network architecture that is capable of real-time object classification. Semi-direct visual odometry (SVO) data of augmented images are used to train the network for object classification. Ten classes of 10,000 different images in each class were used as input data where 80% were for training the network and the remaining 20% were used for network validation. For the optimization of the designed deep neural network, a sequential gradient descent algorithm was implemented. This algorithm has the advantage of handling redundancy in the data more efficiently than other algorithms.

Download Full-text

Underwater Acoustic Target Recognition Based on Generative Adversarial Network Data Augmentation

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2737 ◽

2021 ◽

Vol 263 (2) ◽

pp. 4558-4564

Author(s):

Minghong Zhang ◽

Xinwei Luo

Keyword(s):

Data Augmentation ◽

Target Recognition ◽

Training Data ◽

Small Samples ◽

Generative Adversarial Network ◽

Data Set ◽

Underwater Acoustic ◽

Adversarial Network ◽

Acoustic Target ◽

The Impact

Underwater acoustic target recognition is an important aspect of underwater acoustic research. In recent years, machine learning has been developed continuously, which is widely and effectively applied in underwater acoustic target recognition. In order to acquire good recognition results and reduce the problem of overfitting, Adequate data sets are essential. However, underwater acoustic samples are relatively rare, which has a certain impact on recognition accuracy. In this paper, in addition of the traditional audio data augmentation method, a new method of data augmentation using generative adversarial network is proposed, which uses generator and discriminator to learn the characteristics of underwater acoustic samples, so as to generate reliable underwater acoustic signals to expand the training data set. The expanded data set is input into the deep neural network, and the transfer learning method is applied to further reduce the impact caused by small samples by fixing part of the pre-trained parameters. The experimental results show that the recognition result of this method is better than the general underwater acoustic recognition method, and the effectiveness of this method is verified.

Download Full-text

INFRASTRUCTURE DEGRADATION AND POST-DISASTER DAMAGE DETECTION USING ANOMALY DETECTING GENERATIVE ADVERSARIAL NETWORKS

ISPRS Annals of Photogrammetry Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-annals-v-2-2020-573-2020 ◽

2020 ◽

Vol V-2-2020 ◽

pp. 573-582 ◽

Cited By ~ 1

Author(s):

S. M. Tilon ◽

F. Nex ◽

D. Duarte ◽

N. Kerle ◽

G. Vosselman

Keyword(s):

Deep Learning ◽

Damage Detection ◽

Training Data ◽

Generative Adversarial Networks ◽

Generative Adversarial Network ◽

Essential Information ◽

Urban Scenes ◽

Adversarial Network ◽

Collapsed Buildings ◽

Post Disaster

Abstract. Degradation and damage detection provides essential information to maintenance workers in routine monitoring and to first responders in post-disaster scenarios. Despite advance in Earth Observation (EO), image analysis and deep learning techniques, the quality and quantity of training data for deep learning is still limited. As a result, no robust method has been found yet that can transfer and generalize well over a variety of geographic locations and typologies of damages. Since damages can be seen as anomalies, occurring sparingly over time and space, we propose to use an anomaly detecting Generative Adversarial Network (GAN) to detect damages. The main advantages of using GANs are that only healthy unannotated images are needed, and that a variety of damages, including the never before seen damage, can be detected. In this study we aimed to investigate 1) the ability of anomaly detecting GANs to detect degradation (potholes and cracks) in asphalt road infrastructures using Mobile Mapper imagery and building damage (collapsed buildings, rubble piles) using post-disaster aerial imagery, and 2) the sensitivity of this method against various types of pre-processing. Our results show that we can detect damages in urban scenes at satisfying levels but not on asphalt roads. Future work will investigate how to further classify the found damages and how to improve damage detection for asphalt roads.

Download Full-text

Super-Resolution Based on Generative Adversarial Network for HRTEM Images

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001421540276 ◽

2021 ◽

pp. 2154027

Author(s):

Fuqi Mao ◽

Xiaohan Guan ◽

Ruoyu Wang ◽

Wen Yue

Keyword(s):

Deep Learning ◽

High Resolution ◽

Information Structure ◽

Rapid Development ◽

Super Resolution ◽

Training Data ◽

Learning Technology ◽

Hrtem Image ◽

Generative Adversarial Network ◽

Dual Regression

As an important tool to study the microstructure and properties of materials, High Resolution Transmission Electron Microscope (HRTEM) images can obtain the lattice fringe image (reflecting the crystal plane spacing information), structure image and individual atom image (which reflects the configuration of atoms or atomic groups in crystal structure). Despite the rapid development of HTTEM devices, HRTEM images still have limited achievable resolution for human visual system. With the rapid development of deep learning technology in recent years, researchers are actively exploring the Super-resolution (SR) model based on deep learning, and the model has reached the current best level in various SR benchmarks. Using SR to reconstruct high-resolution HRTEM image is helpful to the material science research. However, there is one core issue that has not been resolved: most of these super-resolution methods require the training data to exist in pairs. In actual scenarios, especially for HRTEM images, there are no corresponding HR images. To reconstruct high quality HRTEM image, a novel Super-Resolution architecture for HRTEM images is proposed in this paper. Borrowing the idea from Dual Regression Networks (DRN), we introduce an additional dual regression structure to ESRGAN, by training the model with unpaired HRTEM images and paired nature images. Results of extensive benchmark experiments demonstrate that the proposed method achieves better performance than the most resent SISR methods with both quantitative and visual results.

Download Full-text

Oversampling Based on Data Augmentation in Convolutional Neural Network for Silicon Wafer Defect Classification

Knowledge Innovation Through Intelligent Software Methodologies, Tools and Techniques - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200547 ◽

2020 ◽

Author(s):

Uzma Batool ◽

Mohd Ibrahim Shapiai ◽

Nordinah Ismail ◽

Hilman Fauzi ◽

Syahrizal Salleh

Keyword(s):

Neural Network ◽

Deep Learning ◽

Convolutional Neural Network ◽

Silicon Wafer ◽

Data Augmentation ◽

Imbalanced Data ◽

Training Data ◽

Defect Classification ◽

Learning Method ◽

Test Set

Silicon wafer defect data collected from fabrication facilities is intrinsically imbalanced because of the variable frequencies of defect types. Frequently occurring types will have more influence on the classification predictions if a model gets trained on such skewed data. A fair classifier for such imbalanced data requires a mechanism to deal with type imbalance in order to avoid biased results. This study has proposed a convolutional neural network for wafer map defect classification, employing oversampling as an imbalance addressing technique. To have an equal participation of all classes in the classifier’s training, data augmentation has been employed, generating more samples in minor classes. The proposed deep learning method has been evaluated on a real wafer map defect dataset and its classification results on the test set returned a 97.91% accuracy. The results were compared with another deep learning based auto-encoder model demonstrating the proposed method, a potential approach for silicon wafer defect classification that needs to be investigated further for its robustness.

Download Full-text