scholarly journals Filtration and Distillation: Enhancing Region Attention for Fine-Grained Visual Categorization

2020 ◽  
Vol 34 (07) ◽  
pp. 11555-11562 ◽  
Author(s):  
Chuanbin Liu ◽  
Hongtao Xie ◽  
Zheng-Jun Zha ◽  
Lingfeng Ma ◽  
Lingyun Yu ◽  
...  

Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most of the existing attention models perform poorly in FGVC, due to the pivotal limitations in discriminative regions proposing and region-based feature learning. 1) The discriminative regions are predominantly located based on the filter responses over the images, which can not be directly optimized with a performance metric. 2) Existing methods train the region-based feature extractor as a one-hot classification task individually, while neglecting the knowledge from the entire object. To address the above issues, in this paper, we propose a novel “Filtration and Distillation Learning” (FDL) model to enhance the region attention of discriminate parts for FGVC. Firstly, a Filtration Learning (FL) method is put forward for discriminative part regions proposing based on the matchability between proposing and predicting. Specifically, we utilize the proposing-predicting matchability as the performance metric of Region Proposal Network (RPN), thus enable a direct optimization of RPN to filtrate most discriminative regions. Go in detail, the object-based feature learning and region-based feature learning are formulated as “teacher” and “student”, which can furnish better supervision for region-based feature learning. Accordingly, our FDL can enhance the region attention effectively, and the overall framework can be trained end-to-end without neither object nor parts annotations. Extensive experiments verify that FDL yields state-of-the-art performance under the same backbone with the most competitive approaches on several FGVC tasks.

Author(s):  
Xiawu Zheng ◽  
Rongrong Ji ◽  
Xiaoshuai Sun ◽  
Yongjian Wu ◽  
Feiyue Huang ◽  
...  

Fine-grained object retrieval has attracted extensive research focus recently. Its state-of-the-art schemesare typically based upon convolutional neural network (CNN) features. Despite the extensive progress, two issues remain open. On one hand, the deep features are coarsely extracted at image level rather than precisely at object level, which are interrupted by background clutters. On the other hand, training CNN features with a standard triplet loss is time consuming and incapable to learn discriminative features. In this paper, we present a novel fine-grained object retrieval scheme that conquers these issues in a unified framework. Firstly, we introduce a novel centralized ranking loss (CRL), which achieves a very efficient (1,000times training speedup comparing to the triplet loss) and discriminative feature learning by a ?centralized? global pooling. Secondly, a weakly supervised attractive feature extraction is proposed, which segments object contours with top-down saliency. Consequently, the contours are integrated into the CNN response map to precisely extract features ?within? the target object. Interestingly, we have discovered that the combination of CRL and weakly supervised learning can reinforce each other. We evaluate the performance ofthe proposed scheme on widely-used benchmarks including CUB200-2011 and CARS196. We havereported significant gains over the state-of-the-art schemes, e.g., 5.4% over SCDA [Wei et al., 2017]on CARS196, and 3.7% on CUB200-2011.  


2020 ◽  
Vol 9 (5) ◽  
pp. 1882-1889
Author(s):  
Umar Akbar Khan ◽  
Saira Moin U. Din ◽  
Saima Anwar Lashari ◽  
Murtaja Ali Saare ◽  
Muhammad Ilyas

Fine-grained visual categorization (FGVC) dealt with objects belonging to one class with intra-class differences into subclasses. FGVC is challenging due to the fact that it is very difficult to collect enough training samples. This study presents a novel image dataset named Cowbreefor FGVC. Cowbree dataset contains 4000 images belongs to eight different cow breeds. Images are properly categorized under different breed names (labels) based on different texture and color features with the help of experts. While evidence shows that the existing dataset are of low quality, targeting few breeds with less number of images. To validate the dataset, three state of the art classifiers sequential minimal optimization (SMO), Multiclass classifier and J48 were used. Their results in term of accuracy are 68.81%, 55.81% and 57.45% respectively. Where results shows that SMO out performed with 68.81% accuracy, 68.4% precision and 68.8% recall.


Author(s):  
Xiangteng He ◽  
Yuxin Peng ◽  
Junjie Zhao

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.


Author(s):  
Yutao Hu ◽  
Xiaolong Jiang ◽  
Xuhui Liu ◽  
Xiaoyan Luo ◽  
Yao Hu ◽  
...  

1995 ◽  
Vol 38 (5) ◽  
pp. 1126-1142 ◽  
Author(s):  
Jeffrey W. Gilger

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).


Sensors ◽  
2021 ◽  
Vol 21 (13) ◽  
pp. 4486
Author(s):  
Niall O’Mahony ◽  
Sean Campbell ◽  
Lenka Krpalkova ◽  
Anderson Carvalho ◽  
Joseph Walsh ◽  
...  

Fine-grained change detection in sensor data is very challenging for artificial intelligence though it is critically important in practice. It is the process of identifying differences in the state of an object or phenomenon where the differences are class-specific and are difficult to generalise. As a result, many recent technologies that leverage big data and deep learning struggle with this task. This review focuses on the state-of-the-art methods, applications, and challenges of representation learning for fine-grained change detection. Our research focuses on methods of harnessing the latent metric space of representation learning techniques as an interim output for hybrid human-machine intelligence. We review methods for transforming and projecting embedding space such that significant changes can be communicated more effectively and a more comprehensive interpretation of underlying relationships in sensor data is facilitated. We conduct this research in our work towards developing a method for aligning the axes of latent embedding space with meaningful real-world metrics so that the reasoning behind the detection of change in relation to past observations may be revealed and adjusted. This is an important topic in many fields concerned with producing more meaningful and explainable outputs from deep learning and also for providing means for knowledge injection and model calibration in order to maintain user confidence.


2020 ◽  
Vol 153 (20) ◽  
pp. 201103
Author(s):  
Yoshifumi Noguchi ◽  
Miyabi Hiyama ◽  
Motoyuki Shiga ◽  
Hidefumi Akiyama ◽  
Osamu Sugino

Energies ◽  
2021 ◽  
Vol 14 (13) ◽  
pp. 3800
Author(s):  
Sebastian Krapf ◽  
Nils Kemmerzell ◽  
Syed Khawaja Haseeb Khawaja Haseeb Uddin ◽  
Manuel Hack Hack Vázquez ◽  
Fabian Netzler ◽  
...  

Roof-mounted photovoltaic systems play a critical role in the global transition to renewable energy generation. An analysis of roof photovoltaic potential is an important tool for supporting decision-making and for accelerating new installations. State of the art uses 3D data to conduct potential analyses with high spatial resolution, limiting the study area to places with available 3D data. Recent advances in deep learning allow the required roof information from aerial images to be extracted. Furthermore, most publications consider the technical photovoltaic potential, and only a few publications determine the photovoltaic economic potential. Therefore, this paper extends state of the art by proposing and applying a methodology for scalable economic photovoltaic potential analysis using aerial images and deep learning. Two convolutional neural networks are trained for semantic segmentation of roof segments and superstructures and achieve an Intersection over Union values of 0.84 and 0.64, respectively. We calculated the internal rate of return of each roof segment for 71 buildings in a small study area. A comparison of this paper’s methodology with a 3D-based analysis discusses its benefits and disadvantages. The proposed methodology uses only publicly available data and is potentially scalable to the global level. However, this poses a variety of research challenges and opportunities, which are summarized with a focus on the application of deep learning, economic photovoltaic potential analysis, and energy system analysis.


Author(s):  
Anil S. Baslamisli ◽  
Partha Das ◽  
Hoang-An Le ◽  
Sezer Karaoglu ◽  
Theo Gevers

AbstractIn general, intrinsic image decomposition algorithms interpret shading as one unified component including all photometric effects. As shading transitions are generally smoother than reflectance (albedo) changes, these methods may fail in distinguishing strong photometric effects from reflectance variations. Therefore, in this paper, we propose to decompose the shading component into direct (illumination) and indirect shading (ambient light and shadows) subcomponents. The aim is to distinguish strong photometric effects from reflectance variations. An end-to-end deep convolutional neural network (ShadingNet) is proposed that operates in a fine-to-coarse manner with a specialized fusion and refinement unit exploiting the fine-grained shading model. It is designed to learn specific reflectance cues separated from specific photometric effects to analyze the disentanglement capability. A large-scale dataset of scene-level synthetic images of outdoor natural environments is provided with fine-grained intrinsic image ground-truths. Large scale experiments show that our approach using fine-grained shading decompositions outperforms state-of-the-art algorithms utilizing unified shading on NED, MPI Sintel, GTA V, IIW, MIT Intrinsic Images, 3DRMS and SRD datasets.


Author(s):  
Jonas Austerjost ◽  
Robert Söldner ◽  
Christoffer Edlund ◽  
Johan Trygg ◽  
David Pollard ◽  
...  

Machine vision is a powerful technology that has become increasingly popular and accurate during the last decade due to rapid advances in the field of machine learning. The majority of machine vision applications are currently found in consumer electronics, automotive applications, and quality control, yet the potential for bioprocessing applications is tremendous. For instance, detecting and controlling foam emergence is important for all upstream bioprocesses, but the lack of robust foam sensing often leads to batch failures from foam-outs or overaddition of antifoam agents. Here, we report a new low-cost, flexible, and reliable foam sensor concept for bioreactor applications. The concept applies convolutional neural networks (CNNs), a state-of-the-art machine learning system for image processing. The implemented method shows high accuracy for both binary foam detection (foam/no foam) and fine-grained classification of foam levels.


Sign in / Sign up

Export Citation Format

Share Document