scholarly journals Web-Supervised Network with Softly Update-Drop Training for Fine-Grained Visual Classification

2020 ◽  
Vol 34 (07) ◽  
pp. 12781-12788 ◽  
Author(s):  
Chuanyi Zhang ◽  
Yazhou Yao ◽  
Huafeng Liu ◽  
Guo-Sen Xie ◽  
Xiangbo Shu ◽  
...  

Labeling objects at the subordinate level typically requires expert knowledge, which is not always available from a random annotator. Accordingly, learning directly from web images for fine-grained visual classification (FGVC) has attracted broad attention. However, the existence of noise in web images is a huge obstacle for training robust deep neural networks. In this paper, we propose a novel approach to remove irrelevant samples from the real-world web images during training, and only utilize useful images for updating the networks. Thus, our network can alleviate the harmful effects caused by irrelevant noisy web images to achieve better performance. Extensive experiments on three commonly used fine-grained datasets demonstrate that our approach is much superior to state-of-the-art webly supervised methods. The data and source code of this work have been made anonymously available at: https://github.com/z337-408/WSNFGVC.

2020 ◽  
Vol 34 (04) ◽  
pp. 4272-4279
Author(s):  
Ayush Jaiswal ◽  
Daniel Moyer ◽  
Greg Ver Steeg ◽  
Wael AbdAlmageed ◽  
Premkumar Natarajan

We propose a novel approach to achieving invariance for deep neural networks in the form of inducing amnesia to unwanted factors of data through a new adversarial forgetting mechanism. We show that the forgetting mechanism serves as an information-bottleneck, which is manipulated by the adversarial training to learn invariance to unwanted factors. Empirical results show that the proposed framework achieves state-of-the-art performance at learning invariance in both nuisance and bias settings on a diverse collection of datasets and tasks.


Author(s):  
Saad Sadiq ◽  
Mei-Ling Shyu ◽  
Daniel J. Feaster

Deep Neural Networks (DNNs) are best known for being the state-of-the-art in artificial intelligence (AI) applications including natural language processing (NLP), speech processing, computer vision, etc. In spite of all recent achievements of deep learning, it has yet to achieve semantic learning required to reason about the data. This lack of reasoning is partially imputed to the boorish memorization of patterns and curves from millions of training samples and ignoring the spatiotemporal relationships. The proposed framework puts forward a novel approach based on variational autoencoders (VAEs) by using the potential outcomes model and developing the counterfactual autoencoders. The proposed framework transforms any sort of multimedia input distributions to a meaningful latent space while giving more control over how the latent space is created. This allows us to model data that is better suited to answer inference-based queries, which is very valuable in reasoning-based AI applications.


2019 ◽  
Vol 39 (2-3) ◽  
pp. 183-201 ◽  
Author(s):  
Douglas Morrison ◽  
Peter Corke ◽  
Jürgen Leitner

We present a novel approach to perform object-independent grasp synthesis from depth images via deep neural networks. Our generative grasping convolutional neural network (GG-CNN) predicts a pixel-wise grasp quality that can be deployed in closed-loop grasping scenarios. GG-CNN overcomes shortcomings in existing techniques, namely discrete sampling of grasp candidates and long computation times. The network is orders of magnitude smaller than other state-of-the-art approaches while achieving better performance, particularly in clutter. We run a suite of real-world tests, during which we achieve an 84% grasp success rate on a set of previously unseen objects with adversarial geometry and 94% on household items. The lightweight nature enables closed-loop control of up to 50 Hz, with which we observed 88% grasp success on a set of household objects that are moved during the grasp attempt. We further propose a method combining our GG-CNN with a multi-view approach, which improves overall grasp success rate in clutter by 10%. Code is provided at https://github.com/dougsm/ggcnn


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 122740-122757 ◽  
Author(s):  
Yang-Yang Zheng ◽  
Jian-Lei Kong ◽  
Xue-Bo Jin ◽  
Xiao-Yi Wang ◽  
Ting-Li Su ◽  
...  

2019 ◽  
Vol 11 (5) ◽  
pp. 544 ◽  
Author(s):  
Kun Fu ◽  
Wei Dai ◽  
Yue Zhang ◽  
Zhirui Wang ◽  
Menglong Yan ◽  
...  

Aircraft recognition in remote sensing images has long been a meaningful topic. Most related methods treat entire images as a whole and do not concentrate on the features of parts. In fact, a variety of aircraft types have small interclass variance, and the main evidence for classifying subcategories is related to some discriminative object parts. In this paper, we introduce the idea of fine-grained visual classification (FGVC) and attempt to make full use of the features from discriminative object parts. First, multiple class activation mapping (MultiCAM) is proposed to extract the discriminative parts of aircrafts of different categories. Second, we present a mask filter (MF) strategy to enhance the discriminative object parts and filter the interference of the background from original images. Third, a selective connected feature fusion method is proposed to fuse the features extracted from both networks, focusing on the original images and the results of MF, respectively. Compared with the single prediction category in class activation mapping (CAM), MultiCAM makes full use of the predictions of all categories to overcome the wrong discriminative parts produced by a wrong single prediction category. Additionally, the designed MF preserves the object scale information and helps the network to concentrate on the object itself rather than the interfering background. Experiments on a challenging dataset prove that our method can achieve state-of-the-art performance.


Author(s):  
Pieter Van Molle ◽  
Tim Verbelen ◽  
Bert Vankeirsbilck ◽  
Jonas De Vylder ◽  
Bart Diricx ◽  
...  

AbstractModern deep learning models achieve state-of-the-art results for many tasks in computer vision, such as image classification and segmentation. However, its adoption into high-risk applications, e.g. automated medical diagnosis systems, happens at a slow pace. One of the main reasons for this is that regular neural networks do not capture uncertainty. To assess uncertainty in classification, several techniques have been proposed casting neural network approaches in a Bayesian setting. Amongst these techniques, Monte Carlo dropout is by far the most popular. This particular technique estimates the moments of the output distribution through sampling with different dropout masks. The output uncertainty of a neural network is then approximated as the sample variance. In this paper, we highlight the limitations of such a variance-based uncertainty metric and propose an novel approach. Our approach is based on the overlap between output distributions of different classes. We show that our technique leads to a better approximation of the inter-class output confusion. We illustrate the advantages of our method using benchmark datasets. In addition, we apply our metric to skin lesion classification—a real-world use case—and show that this yields promising results.


Author(s):  
Nicola Messina ◽  
Giuseppe Amato ◽  
Andrea Esuli ◽  
Fabrizio Falchi ◽  
Claudio Gennaro ◽  
...  

Despite the evolution of deep-learning-based visual-textual processing systems, precise multi-modal matching remains a challenging task. In this work, we tackle the task of cross-modal retrieval through image-sentence matching based on word-region alignments, using supervision only at the global image-sentence level. Specifically, we present a novel approach called Transformer Encoder Reasoning and Alignment Network (TERAN). TERAN enforces a fine-grained match between the underlying components of images and sentences (i.e., image regions and words, respectively) to preserve the informative richness of both modalities. TERAN obtains state-of-the-art results on the image retrieval task on both MS-COCO and Flickr30k datasets. Moreover, on MS-COCO, it also outperforms current approaches on the sentence retrieval task. Focusing on scalable cross-modal information retrieval, TERAN is designed to keep the visual and textual data pipelines well separated. Cross-attention links invalidate any chance to separately extract visual and textual features needed for the online search and the offline indexing steps in large-scale retrieval systems. In this respect, TERAN merges the information from the two domains only during the final alignment phase, immediately before the loss computation. We argue that the fine-grained alignments produced by TERAN pave the way toward the research for effective and efficient methods for large-scale cross-modal information retrieval. We compare the effectiveness of our approach against relevant state-of-the-art methods. On the MS-COCO 1K test set, we obtain an improvement of 5.7% and 3.5% respectively on the image and the sentence retrieval tasks on the Recall@1 metric. The code used for the experiments is publicly available on GitHub at https://github.com/mesnico/TERAN .


2020 ◽  
Vol 34 (07) ◽  
pp. 10451-10459
Author(s):  
Kyungjune Baek ◽  
Minhyun Lee ◽  
Hyunjung Shim

Existing co-localization techniques significantly lose performance over weakly or fully supervised methods in accuracy and inference time. In this paper, we overcome common drawbacks of co-localization techniques by utilizing self-supervised learning approach. The major technical contributions of the proposed method are two-fold. 1) We devise a new geometric transformation, namely point symmetric transformation and utilize its parameters as an artificial label for self-supervised learning. This new transformation can also play the role of region-drop based regularization. 2) We suggest a heat map extraction method for computing the heat map from the network trained by self-supervision, namely class-agnostic activation mapping. It is done by computing the spatial attention map. Based on extensive evaluations, we observe that the proposed method records new state-of-the-art performance in three fine-grained datasets for unsupervised object localization. Moreover, we show that the idea of the proposed method can be adopted in a modified manner to solve the weakly supervised object localization task. As a result, we outperform the current state-of-the-art technique in weakly supervised object localization by a significant gap.


1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


1986 ◽  
Author(s):  
Simon S. Kim ◽  
Mary Lou Maher ◽  
Raymond E. Levitt ◽  
Martin F. Rooney ◽  
Thomas J. Siller

Sign in / Sign up

Export Citation Format

Share Document