scholarly journals StackDRL: Stacked Deep Reinforcement Learning for Fine-grained Visual Categorization

Author(s):  
Xiangteng He ◽  
Yuxin Peng ◽  
Junjie Zhao

Fine-grained visual categorization (FGVC) is the discrimination of similar subcategories, whose main challenge is to localize the quite subtle visual distinctions between similar subcategories. There are two pivotal problems: discovering which region is discriminative and representative, and determining how many discriminative regions are necessary to achieve the best performance. Existing methods generally solve these two problems relying on the prior knowledge or experimental validation, which extremely restricts the usability and scalability of FGVC. To address the "which" and "how many" problems adaptively and intelligently, this paper proposes a stacked deep reinforcement learning approach (StackDRL). It adopts a two-stage learning architecture, which is driven by the semantic reward function. Two-stage learning localizes the object and its parts in sequence ("which"), and determines the number of discriminative regions adaptively ("how many"), which is quite appealing in FGVC. Semantic reward function drives StackDRL to fully learn the discriminative and conceptual visual information, via jointly combining the attention-based reward and category-based reward. Furthermore, unsupervised discriminative localization avoids the heavy labor consumption of labeling, and extremely strengthens the usability and scalability of our StackDRL approach. Comparing with ten state-of-the-art methods on CUB-200-2011 dataset, our StackDRL approach achieves the best categorization accuracy.

2020 ◽  
Vol 34 (05) ◽  
pp. 8600-8607
Author(s):  
Haiyun Peng ◽  
Lu Xu ◽  
Lidong Bing ◽  
Fei Huang ◽  
Wei Lu ◽  
...  

Target-based sentiment analysis or aspect-based sentiment analysis (ABSA) refers to addressing various sentiment analysis tasks at a fine-grained level, which includes but is not limited to aspect extraction, aspect sentiment classification, and opinion extraction. There exist many solvers of the above individual subtasks or a combination of two subtasks, and they can work together to tell a complete story, i.e. the discussed aspect, the sentiment on it, and the cause of the sentiment. However, no previous ABSA research tried to provide a complete solution in one shot. In this paper, we introduce a new subtask under ABSA, named aspect sentiment triplet extraction (ASTE). Particularly, a solver of this task needs to extract triplets (What, How, Why) from the inputs, which show WHAT the targeted aspects are, HOW their sentiment polarities are and WHY they have such polarities (i.e. opinion reasons). For instance, one triplet from “Waiters are very friendly and the pasta is simply average” could be (‘Waiters’, positive, ‘friendly’). We propose a two-stage framework to address this task. The first stage predicts what, how and why in a unified model, and then the second stage pairs up the predicted what (how) and why from the first stage to output triplets. In the experiments, our framework has set a benchmark performance in this novel triplet extraction task. Meanwhile, it outperforms a few strong baselines adapted from state-of-the-art related methods.


2017 ◽  
Vol 61 (1) ◽  
Author(s):  
Lihua Guo ◽  
Chenggang Guo ◽  
Lei Li ◽  
Qinghua Huang ◽  
Yanshan Li ◽  
...  

2020 ◽  
Vol 9 (5) ◽  
pp. 1882-1889
Author(s):  
Umar Akbar Khan ◽  
Saira Moin U. Din ◽  
Saima Anwar Lashari ◽  
Murtaja Ali Saare ◽  
Muhammad Ilyas

Fine-grained visual categorization (FGVC) dealt with objects belonging to one class with intra-class differences into subclasses. FGVC is challenging due to the fact that it is very difficult to collect enough training samples. This study presents a novel image dataset named Cowbreefor FGVC. Cowbree dataset contains 4000 images belongs to eight different cow breeds. Images are properly categorized under different breed names (labels) based on different texture and color features with the help of experts. While evidence shows that the existing dataset are of low quality, targeting few breeds with less number of images. To validate the dataset, three state of the art classifiers sequential minimal optimization (SMO), Multiclass classifier and J48 were used. Their results in term of accuracy are 68.81%, 55.81% and 57.45% respectively. Where results shows that SMO out performed with 68.81% accuracy, 68.4% precision and 68.8% recall.


Author(s):  
Alberto Camacho ◽  
Rodrigo Toro Icarte ◽  
Toryn Q. Klassen ◽  
Richard Valenzano ◽  
Sheila A. McIlraith

In Reinforcement Learning (RL), an agent is guided by the rewards it receives from the reward function. Unfortunately, it may take many interactions with the environment to learn from sparse rewards, and it can be challenging to specify reward functions that reflect complex reward-worthy behavior. We propose using reward machines (RMs), which are automata-based representations that expose reward function structure, as a normal form representation for reward functions. We show how specifications of reward in various formal languages, including LTL and other regular languages, can be automatically translated into RMs, easing the burden of complex reward function specification. We then show how the exposed structure of the reward function can be exploited by tailored q-learning algorithms and automated reward shaping techniques in order to improve the sample efficiency of reinforcement learning methods. Experiments show that these RM-tailored techniques significantly outperform state-of-the-art (deep) RL algorithms, solving problems that otherwise cannot reasonably be solved by existing approaches.


Author(s):  
Yu. V. Dubenko ◽  
Ye. Ye. Dyshkant ◽  
D. A. Gura

The paper discusses the task of evaluating the possibility of using robotic systems (intelligent agents) as a way to solve a problem of monitoring complex infrastructure objects, such as buildings, structures, bridges, roads and other transport infrastructure objects. Methods and algorithms for implementing behavioral strategies of robots, in particular, search algorithms based on decision trees, are examined. The emphasis is placed on the importance of forming the ability of robots to self-learn through reinforcement learning associated with modeling the behavior of living creatures when interacting with unknown elements of the environment. The Q-learning method is considered as one of the types of reinforcement learning that introduces the concept of action value, as well as the approach of “hierarchical reinforcement learning” and its varieties “Options Framework”, “Feudal”, “MaxQ”. The problems of determining such parameters as the value and reward function of agents (mobile robots), as well as the mandatory presence of a subsystem of technical vision, are identified in the segmentation of macro actions. Thus, the implementation of the task of segmentation of macro-actions requires improving the methodological base by applying intelligent algorithms and methods, including deep clustering methods. Improving the effectiveness of hierarchical training with reinforcement when mobile robots operate in conditions of lack of information about the monitoring object is possible by transmitting visual information in a variety of states, which will also increase the portability of experience between them in the future when performing tasks on various objects.


Author(s):  
Qiming Fu ◽  
Quan Liu ◽  
Shan Zhong ◽  
Heng Luo ◽  
Hongjie Wu ◽  
...  

In reinforcement learning (RL), the exploration/exploitation (E/E) dilemma is a very crucial issue, which can be described as searching between the exploration of the environment to find more profitable actions, and the exploitation of the best empirical actions for the current state. We focus on the single trajectory RL problem where an agent is interacting with a partially unknown MDP over single trajectories, and try to deal with the E/E in this setting. Given the reward function, we try to find a good E/E strategy to address the MDPs under some MDP distribution. This is achieved by selecting the best strategy in mean over a potential MDP distribution from a large set of candidate strategies, which is done by exploiting single trajectories drawn from plenty of MDPs. In this paper, we mainly make the following contributions: (1) We discuss the strategy-selector algorithm based on formula set and polynomial function. (2) We provide the theoretical and experimental regret analysis of the learned strategy under an given MDP distribution. (3) We compare these methods with the “state-of-the-art” Bayesian RL method experimentally.


2019 ◽  
Vol 109 (3) ◽  
pp. 493-512 ◽  
Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.


2020 ◽  
Vol 34 (07) ◽  
pp. 11555-11562 ◽  
Author(s):  
Chuanbin Liu ◽  
Hongtao Xie ◽  
Zheng-Jun Zha ◽  
Lingfeng Ma ◽  
Lingyun Yu ◽  
...  

Delicate attention of the discriminative regions plays a critical role in Fine-Grained Visual Categorization (FGVC). Unfortunately, most of the existing attention models perform poorly in FGVC, due to the pivotal limitations in discriminative regions proposing and region-based feature learning. 1) The discriminative regions are predominantly located based on the filter responses over the images, which can not be directly optimized with a performance metric. 2) Existing methods train the region-based feature extractor as a one-hot classification task individually, while neglecting the knowledge from the entire object. To address the above issues, in this paper, we propose a novel “Filtration and Distillation Learning” (FDL) model to enhance the region attention of discriminate parts for FGVC. Firstly, a Filtration Learning (FL) method is put forward for discriminative part regions proposing based on the matchability between proposing and predicting. Specifically, we utilize the proposing-predicting matchability as the performance metric of Region Proposal Network (RPN), thus enable a direct optimization of RPN to filtrate most discriminative regions. Go in detail, the object-based feature learning and region-based feature learning are formulated as “teacher” and “student”, which can furnish better supervision for region-based feature learning. Accordingly, our FDL can enhance the region attention effectively, and the overall framework can be trained end-to-end without neither object nor parts annotations. Extensive experiments verify that FDL yields state-of-the-art performance under the same backbone with the most competitive approaches on several FGVC tasks.


2021 ◽  
Author(s):  
Oren Kolodny ◽  
Roy Moyal ◽  
Shimon Edelman

Evolutionary accounts of feelings, and in particular of negative affect and of pain, assume that creaturesthat feel and care about the outcomes of their behavior outperform those that do not in terms of theirevolutionary fitness. Such accounts, however, can only work if feelings can be shown to contribute tofitness-influencing outcomes. Simply assuming that a learner that feels and cares about outcomes is morestrongly motivated than one that doesn’t is not enough, if only because motivation can be tied directly tooutcomes by incorporating an appropriate reward function, without leaving any apparent role to feelings(as it is done in state-of-the-art engineered systems based on reinforcement learning). Here, we proposea possible mechanism whereby feelings contribute to fitness: an actor-critic functional architecture forreinforcement learning, in which feelings reflect costs and gains that promote effective actor selectionand help the system optimize learning and future behavior.


Author(s):  
Huapeng Xu ◽  
Guilin Qi ◽  
Jingjing Li ◽  
Meng Wang ◽  
Kang Xu ◽  
...  

This paper investigates a challenging problem,which is known as fine-grained image classification(FGIC). Different from conventional computer visionproblems, FGIC suffers from the large intraclassdiversities and subtle inter-class differences.Existing FGIC approaches are limited to exploreonly the visual information embedded in the images.In this paper, we present a novel approachwhich can use handy prior knowledge from eitherstructured knowledge bases or unstructured text tofacilitate FGIC. Specifically, we propose a visual-semanticembedding model which explores semanticembedding from knowledge bases and text, andfurther trains a novel end-to-end CNN frameworkto linearly map image features to a rich semanticembedding space. Experimental results on a challenginglarge-scale UCSD Bird-200-2011 datasetverify that our approach outperforms several state-of-the-art methods with significant advances.


Sign in / Sign up

Export Citation Format

Share Document