scholarly journals Harnessing GANs for Zero-Shot Learning of New Classes in Visual Speech Recognition

2020 ◽  
Vol 34 (03) ◽  
pp. 2645-2652 ◽  
Author(s):  
Yaman Kumar ◽  
Dhruva Sahrawat ◽  
Shubham Maheshwari ◽  
Debanjan Mahata ◽  
Amanda Stent ◽  
...  

Visual Speech Recognition (VSR) is the process of recognizing or interpreting speech by watching the lip movements of the speaker. Recent machine learning based approaches model VSR as a classification problem; however, the scarcity of training data leads to error-prone systems with very low accuracies in predicting unseen classes. To solve this problem, we present a novel approach to zero-shot learning by generating new classes using Generative Adversarial Networks (GANs), and show how the addition of unseen class samples increases the accuracy of a VSR system by a significant margin of 27% and allows it to handle speaker-independent out-of-vocabulary phrases. We also show that our models are language agnostic and therefore capable of seamlessly generating, using English training data, videos for a new language (Hindi). To the best of our knowledge, this is the first work to show empirical evidence of the use of GANs for generating training samples of unseen classes in the domain of VSR, hence facilitating zero-shot learning. We make the added videos for new classes publicly available along with our code1.

2020 ◽  
Vol 6 (9) ◽  
pp. 83 ◽  
Author(s):  
Ufuk Cem Birbiri ◽  
Azam Hamidinekoo ◽  
Amélie Grall ◽  
Paul Malcolm ◽  
Reyer Zwiggelaar

The manual delineation of region of interest (RoI) in 3D magnetic resonance imaging (MRI) of the prostate is time-consuming and subjective. Correct identification of prostate tissue is helpful to define a precise RoI to be used in CAD systems in clinical practice during diagnostic imaging, radiotherapy and monitoring the progress of disease. Conditional GAN (cGAN), cycleGAN and U-Net models and their performances were studied for the detection and segmentation of prostate tissue in 3D multi-parametric MRI scans. These models were trained and evaluated on MRI data from 40 patients with biopsy-proven prostate cancer. Due to the limited amount of available training data, three augmentation schemes were proposed to artificially increase the training samples. These models were tested on a clinical dataset annotated for this study and on a public dataset (PROMISE12). The cGAN model outperformed the U-Net and cycleGAN predictions owing to the inclusion of paired image supervision. Based on our quantitative results, cGAN gained a Dice score of 0.78 and 0.75 on the private and the PROMISE12 public datasets, respectively.


2018 ◽  
Author(s):  
Matthias Häring ◽  
Jörg Großhans ◽  
Fred Wolf ◽  
Stephan Eule

AbstractA central problem in biomedical imaging is the automated segmentation of images for further quantitative analysis. Recently, fully convolutional neural networks, such as the U-Net, were applied successfully in a variety of segmentation tasks. A downside of this approach is the requirement for a large amount of well-prepared training samples, consisting of image - ground truth mask pairs. Since training data must be created by hand for each experiment, this task can be very costly and time-consuming. Here, we present a segmentation method based on cycle consistent generative adversarial networks, which can be trained even in absence of prepared image - mask pairs. We show that it successfully performs image segmentation tasks on samples with substantial defects and even generalizes well to different tissue types.


2021 ◽  
Vol 2021 (2) ◽  
pp. 305-322
Author(s):  
Se Eun Oh ◽  
Nate Mathews ◽  
Mohammad Saidur Rahman ◽  
Matthew Wright ◽  
Nicholas Hopper

Abstract We introduce Generative Adversarial Networks for Data-Limited Fingerprinting (GANDaLF), a new deep-learning-based technique to perform Website Fingerprinting (WF) on Tor traffic. In contrast to most earlier work on deep-learning for WF, GANDaLF is intended to work with few training samples, and achieves this goal through the use of a Generative Adversarial Network to generate a large set of “fake” data that helps to train a deep neural network in distinguishing between classes of actual training data. We evaluate GANDaLF in low-data scenarios including as few as 10 training instances per site, and in multiple settings, including fingerprinting of website index pages and fingerprinting of non-index pages within a site. GANDaLF achieves closed-world accuracy of 87% with just 20 instances per site (and 100 sites) in standard WF settings. In particular, GANDaLF can outperform Var-CNN and Triplet Fingerprinting (TF) across all settings in subpage fingerprinting. For example, GANDaLF outperforms TF by a 29% margin and Var-CNN by 38% for training sets using 20 instances per site.


2018 ◽  
Vol 61 (2) ◽  
pp. 699-710 ◽  
Author(s):  
Jian Zhao ◽  
Yihao Li ◽  
Fengdeng Zhang ◽  
Songming Zhu ◽  
Ying Liu ◽  
...  

Abstract. Aiming at live fish identification in aquaculture, a practical and efficient semi-supervised learning model, based on modified deep convolutional generative adversarial networks (DCGANs), was proposed in this study. Benefiting from the modified DCGANs structure, the presented model can be trained effectively using relatively few labeled training samples. In consideration of the complex poses of fish and the low resolution of sampling images in aquaculture, spatial pyramid pooling and some improved techniques specifically for the presented model were used to make the model more robust. Finally, in tests with two preprocessed and challenging datasets (with 5%, 10%, and 15% labeled training data in the fish recognition ground-truth dataset and 25%, 50%, and 75% labeled training data in the Croatian fish dataset), the feasibility and reliability of the presented model for live fish identification were proved with respective accuracies of 80.52%, 81.66%, and 83.07% for the ground-truth dataset and 65.13%, 78.72%, and 82.95% for the Croatian fish dataset. Keywords: Aquaculture, Deep convolutional generative adversarial networks, Few labeled training samples, Live fish identification, Semi-supervised learning, Spatial pyramid pooling.


2021 ◽  
Vol 11 (7) ◽  
pp. 3086
Author(s):  
Ricardo Silva Peres ◽  
Miguel Azevedo ◽  
Sara Oleiro Araújo ◽  
Magno Guedes ◽  
Fábio Miranda ◽  
...  

The technological advances brought forth by the Industry 4.0 paradigm have renewed the disruptive potential of artificial intelligence in the manufacturing sector, building the data-driven era on top of concepts such as Cyber–Physical Systems and the Internet of Things. However, data availability remains a major challenge for the success of these solutions, particularly concerning those based on deep learning approaches. Specifically in the quality inspection of structural adhesive applications, found commonly in the automotive domain, defect data with sufficient variety, volume and quality is generally costly, time-consuming and inefficient to obtain, jeopardizing the viability of such approaches due to data scarcity. To mitigate this, we propose a novel approach to generate synthetic training data for this application, leveraging recent breakthroughs in training generative adversarial networks with limited data to improve the performance of automated inspection methods based on deep learning, especially for imbalanced datasets. Preliminary results in a real automotive pilot cell show promise in this direction, with the approach being able to generate realistic adhesive bead images and consequently object detection models showing improved mean average precision at different thresholds when trained on the augmented dataset. For reproducibility purposes, the model weights, configurations and data encompassed in this study are made publicly available.


Author(s):  
Guillaume Gravier ◽  
Gerasimos Potamianos ◽  
Chalapathy Neti

2021 ◽  
Vol 13 (9) ◽  
pp. 1713
Author(s):  
Songwei Gu ◽  
Rui Zhang ◽  
Hongxia Luo ◽  
Mengyao Li ◽  
Huamei Feng ◽  
...  

Deep learning is an important research method in the remote sensing field. However, samples of remote sensing images are relatively few in real life, and those with markers are scarce. Many neural networks represented by Generative Adversarial Networks (GANs) can learn from real samples to generate pseudosamples, rather than traditional methods that often require more time and man-power to obtain samples. However, the generated pseudosamples often have poor realism and cannot be reliably used as the basis for various analyses and applications in the field of remote sensing. To address the abovementioned problems, a pseudolabeled sample generation method is proposed in this work and applied to scene classification of remote sensing images. The improved unconditional generative model that can be learned from a single natural image (Improved SinGAN) with an attention mechanism can effectively generate enough pseudolabeled samples from a single remote sensing scene image sample. Pseudosamples generated by the improved SinGAN model have stronger realism and relatively less training time, and the extracted features are easily recognized in the classification network. The improved SinGAN can better identify sub-jects from images with complex ground scenes compared with the original network. This mechanism solves the problem of geographic errors of generated pseudosamples. This study incorporated the generated pseudosamples into training data for the classification experiment. The result showed that the SinGAN model with the integration of the attention mechanism can better guarantee feature extraction of the training data. Thus, the quality of the generated samples is improved and the classification accuracy and stability of the classification network are also enhanced.


Sign in / Sign up

Export Citation Format

Share Document