Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition

Yaman Kumar; Dhruva Sahrawat; Shubham Maheshwari; Debanjan Mahata; Amanda Stent; Yifang Yin; Rajiv Ratn Shah; Roger Zimmermann

doi:10.1609/aaai.v34i03.5649

Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i03.5649 ◽

2020 ◽

Vol 34 (03) ◽

pp. 2645-2652 ◽

Cited By ~ 2

Author(s):

Yaman Kumar ◽

Dhruva Sahrawat ◽

Shubham Maheshwari ◽

Debanjan Mahata ◽

Amanda Stent ◽

...

Keyword(s):

Speech Recognition ◽

Classification Problem ◽

Visual Speech ◽

Training Data ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

Novel Approach ◽

Visual Speech Recognition ◽

Training Samples ◽

English Training

Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code1.

Download Full-text

Investigating the Performance of Generative Adversarial Networks for Prostate Tissue Detection and Segmentation

Journal of Imaging ◽

10.3390/jimaging6090083 ◽

2020 ◽

Vol 6 (9) ◽

pp. 83 ◽

Cited By ~ 1

Author(s):

Ufuk Cem Birbiri ◽

Azam Hamidinekoo ◽

Amélie Grall ◽

Paul Malcolm ◽

Reyer Zwiggelaar

Keyword(s):

Region Of Interest ◽

Training Data ◽

Prostate Tissue ◽

Generative Adversarial Networks ◽

Correct Identification ◽

Adversarial Networks ◽

Training Samples ◽

Mri Scans ◽

Magnetic Resonance Imaging Mri ◽

Public Datasets

The manual delineation of region of interest (RoI) in 3D magnetic resonance imaging (MRI) of the prostate is time-consuming and subjective. Correct identification of prostate tissue is helpful to define a precise RoI to be used in CAD systems in clinical practice during diagnostic imaging, radiotherapy and monitoring the progress of disease. Conditional GAN (cGAN), cycleGAN and U-Net models and their performances were studied for the detection and segmentation of prostate tissue in 3D multi-parametric MRI scans. These models were trained and evaluated on MRI data from 40 patients with biopsy-proven prostate cancer. Due to the limited amount of available training data, three augmentation schemes were proposed to artificially increase the training samples. These models were tested on a clinical dataset annotated for this study and on a public dataset (PROMISE12). The cGAN model outperformed the U-Net and cycleGAN predictions owing to the inclusion of paired image supervision. Based on our quantitative results, cGAN gained a Dice score of 0.78 and 0.75 on the private and the PROMISE12 public datasets, respectively.

Download Full-text

Automated Segmentation of Epithelial Tissue Using Cycle-Consistent Generative Adversarial Networks

10.1101/311373 ◽

2018 ◽

Cited By ~ 6

Author(s):

Matthias Häring ◽

Jörg Großhans ◽

Fred Wolf ◽

Stephan Eule

Keyword(s):

Ground Truth ◽

Training Data ◽

Generative Adversarial Networks ◽

Epithelial Tissue ◽

Automated Segmentation ◽

Segmentation Method ◽

Adversarial Networks ◽

Training Samples ◽

Segmentation Of Images ◽

Image Mask

AbstractA central problem in biomedical imaging is the automated segmentation of images for further quantitative analysis. Recently, fully convolutional neural networks, such as the U-Net, were applied successfully in a variety of segmentation tasks. A downside of this approach is the requirement for a large amount of well-prepared training samples, consisting of image - ground truth mask pairs. Since training data must be created by hand for each experiment, this task can be very costly and time-consuming. Here, we present a segmentation method based on cycle consistent generative adversarial networks, which can be trained even in absence of prepared image - mask pairs. We show that it successfully performs image segmentation tasks on samples with substantial defects and even generalizes well to different tissue types.

Download Full-text

GANDaLF: GAN for Data-Limited Fingerprinting

Proceedings on Privacy Enhancing Technologies ◽

10.2478/popets-2021-0029 ◽

2021 ◽

Vol 2021 (2) ◽

pp. 305-322

Author(s):

Se Eun Oh ◽

Nate Mathews ◽

Mohammad Saidur Rahman ◽

Matthew Wright ◽

Nicholas Hopper

Keyword(s):

Deep Learning ◽

Training Data ◽

Generative Adversarial Networks ◽

Large Set ◽

Generative Adversarial Network ◽

Adversarial Network ◽

Adversarial Networks ◽

Closed World ◽

Training Samples ◽

A Site

Abstract We introduce Generative Adversarial Networks for Data-Limited Fingerprinting (GANDaLF), a new deep-learning-based technique to perform Website Fingerprinting (WF) on Tor traffic. In contrast to most earlier work on deep-learning for WF, GANDaLF is intended to work with few training samples, and achieves this goal through the use of a Generative Adversarial Network to generate a large set of “fake” data that helps to train a deep neural network in distinguishing between classes of actual training data. We evaluate GANDaLF in low-data scenarios including as few as 10 training instances per site, and in multiple settings, including fingerprinting of website index pages and fingerprinting of non-index pages within a site. GANDaLF achieves closed-world accuracy of 87% with just 20 instances per site (and 100 sites) in standard WF settings. In particular, GANDaLF can outperform Var-CNN and Triplet Fingerprinting (TF) across all settings in subpage fingerprinting. For example, GANDaLF outperforms TF by a 29% margin and Var-CNN by 38% for training sets using 20 instances per site.

Download Full-text

Semi-Supervised Learning-Based Live Fish Identification in Aquaculture Using Modified Deep Convolutional Generative Adversarial Networks

Transactions of the ASABE ◽

10.13031/trans.12684 ◽

2018 ◽

Vol 61 (2) ◽

pp. 699-710 ◽

Cited By ~ 5

Author(s):

Jian Zhao ◽

Yihao Li ◽

Fengdeng Zhang ◽

Songming Zhu ◽

Ying Liu ◽

...

Keyword(s):

Supervised Learning ◽

Ground Truth ◽

Training Data ◽

Generative Adversarial Networks ◽

Low Resolution ◽

Adversarial Networks ◽

Live Fish ◽

Training Samples ◽

Spatial Pyramid Pooling ◽

Spatial Pyramid

Abstract. Aiming at live fish identification in aquaculture, a practical and efficient semi-supervised learning model, based on modified deep convolutional generative adversarial networks (DCGANs), was proposed in this study. Benefiting from the modified DCGANs structure, the presented model can be trained effectively using relatively few labeled training samples. In consideration of the complex poses of fish and the low resolution of sampling images in aquaculture, spatial pyramid pooling and some improved techniques specifically for the presented model were used to make the model more robust. Finally, in tests with two preprocessed and challenging datasets (with 5%, 10%, and 15% labeled training data in the fish recognition ground-truth dataset and 25%, 50%, and 75% labeled training data in the Croatian fish dataset), the feasibility and reliability of the presented model for live fish identification were proved with respective accuracies of 80.52%, 81.66%, and 83.07% for the ground-truth dataset and 65.13%, 78.72%, and 82.95% for the Croatian fish dataset. Keywords: Aquaculture, Deep convolutional generative adversarial networks, Few labeled training samples, Live fish identification, Semi-supervised learning, Spatial pyramid pooling.

Download Full-text

Generative Adversarial Networks for Data Augmentation in Structural Adhesive Inspection

Applied Sciences ◽

10.3390/app11073086 ◽

2021 ◽

Vol 11 (7) ◽

pp. 3086

Author(s):

Ricardo Silva Peres ◽

Miguel Azevedo ◽

Sara Oleiro Araújo ◽

Magno Guedes ◽

Fábio Miranda ◽

...

Keyword(s):

Deep Learning ◽

Data Augmentation ◽

Manufacturing Sector ◽

Data Availability ◽

Training Data ◽

Generative Adversarial Networks ◽

Learning Approaches ◽

Adversarial Networks ◽

Novel Approach ◽

Structural Adhesive

The technological advances brought forth by the Industry 4.0 paradigm have renewed the disruptive potential of artificial intelligence in the manufacturing sector, building the data-driven era on top of concepts such as Cyber–Physical Systems and the Internet of Things. However, data availability remains a major challenge for the success of these solutions, particularly concerning those based on deep learning approaches. Specifically in the quality inspection of structural adhesive applications, found commonly in the automotive domain, defect data with sufficient variety, volume and quality is generally costly, time-consuming and inefficient to obtain, jeopardizing the viability of such approaches due to data scarcity. To mitigate this, we propose a novel approach to generate synthetic training data for this application, leveraging recent breakthroughs in training generative adversarial networks with limited data to improve the performance of automated inspection methods based on deep learning, especially for imbalanced datasets. Preliminary results in a real automotive pilot cell show promise in this direction, with the approach being able to generate realistic adhesive bead images and consequently object detection models showing improved mean average precision at different thresholds when trained on the augmented dataset. For reproducibility purposes, the model weights, configurations and data encompassed in this study are made publicly available.

Download Full-text

Asynchrony modeling for audio-visual speech recognition

10.3115/1289189.1289244 ◽

2002 ◽

Cited By ~ 31

Author(s):

Guillaume Gravier ◽

Gerasimos Potamianos ◽

Chalapathy Neti

Keyword(s):

Speech Recognition ◽

Visual Speech ◽

Visual Speech Recognition

Download Full-text

Measuring the effect of high-speed video data on the audio-visual speech recognition accuracy

Information and Control Systems ◽

10.31799/1684-8853-2019-2-26-34 ◽

2019 ◽

pp. 26-34

Author(s):

D. V. Ivanko ◽

D. A. Ryumin ◽

A. A. Karpov ◽

M. Zelezny

Keyword(s):

Speech Recognition ◽

High Speed ◽

Recognition Accuracy ◽

Video Data ◽

Visual Speech ◽

High Speed Video ◽

Visual Speech Recognition

Download Full-text

Visual speech recognition for small scale dataset using VGG16 convolution neural network

Multimedia Tools and Applications ◽

10.1007/s11042-021-11119-0 ◽

2021 ◽

Author(s):

Shashidhar R ◽

Sudarshan Patilkulkarni

Keyword(s):

Neural Network ◽

Speech Recognition ◽

Convolution Neural Network ◽

Visual Speech ◽

Small Scale ◽

Visual Speech Recognition

Download Full-text

Improved SinGAN Integrated with an Attentional Mechanism for Remote Sensing Image Classification

Remote Sensing ◽

10.3390/rs13091713 ◽

2021 ◽

Vol 13 (9) ◽

pp. 1713

Author(s):

Songwei Gu ◽

Rui Zhang ◽

Hongxia Luo ◽

Mengyao Li ◽

Huamei Feng ◽

...

Keyword(s):

Remote Sensing ◽

Real Life ◽

Attention Mechanism ◽

Training Data ◽

Generative Adversarial Networks ◽

Natural Image ◽

Remote Sensing Images ◽

Training Time ◽

Adversarial Networks ◽

Remote Sensing Image Classification

Deep learning is an important research method in the remote sensing field. However, samples of remote sensing images are relatively few in real life, and those with markers are scarce. Many neural networks represented by Generative Adversarial Networks (GANs) can learn from real samples to generate pseudosamples, rather than traditional methods that often require more time and man-power to obtain samples. However, the generated pseudosamples often have poor realism and cannot be reliably used as the basis for various analyses and applications in the field of remote sensing. To address the abovementioned problems, a pseudolabeled sample generation method is proposed in this work and applied to scene classification of remote sensing images. The improved unconditional generative model that can be learned from a single natural image (Improved SinGAN) with an attention mechanism can effectively generate enough pseudolabeled samples from a single remote sensing scene image sample. Pseudosamples generated by the improved SinGAN model have stronger realism and relatively less training time, and the extracted features are easily recognized in the classification network. The improved SinGAN can better identify sub-jects from images with complex ground scenes compared with the original network. This mechanism solves the problem of geographic errors of generated pseudosamples. This study incorporated the generated pseudosamples into training data for the classification experiment. The result showed that the SinGAN model with the integration of the attention mechanism can better guarantee feature extraction of the training data. Thus, the quality of the generated samples is improved and the classification accuracy and stability of the classification network are also enhanced.

Download Full-text

Fine-Tuning of Pre-Trained End-to-End Speech Recognition with Generative Adversarial Networks

ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ◽

10.1109/icassp39728.2021.9413703 ◽

2021 ◽

Author(s):

Md. Akmal Haidar ◽

Mehdi Rezagholizadeh

Keyword(s):

Speech Recognition ◽

Fine Tuning ◽

Generative Adversarial Networks ◽

Adversarial Networks ◽

End To End

Download Full-text