Geometric Multi-Model Fitting by Deep Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.330110081 ◽

2019 ◽

Vol 33 ◽

pp. 10081-10082

Author(s):

Zongliang Zhang ◽

Hongbin Zeng ◽

Jonathan Li ◽

Yiping Chen ◽

Chenhui Yang ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

State Of The Art ◽

Model Fitting ◽

Simulated Data ◽

Point Clouds ◽

Sequential Decision ◽

Optimal Decisions ◽

Best Fitting ◽

Point Set

This paper deals with the geometric multi-model fitting from noisy, unstructured point set data (e.g., laser scanned point clouds). We formulate multi-model fitting problem as a sequential decision making process. We then use a deep reinforcement learning algorithm to learn the optimal decisions towards the best fitting result. In this paper, we have compared our method against the state-of-the-art on simulated data. The results demonstrated that our approach significantly reduced the number of fitting iterations.

Download Full-text

Selective network discovery via deep reinforcement learning on embedded spaces

Applied Network Science ◽

10.1007/s41109-021-00365-8 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Peter Morales ◽

Rajmonda Sulo Caceres ◽

Tina Eliassi-Rad

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Sequential Decision ◽

Network Discovery ◽

Learning Tasks ◽

Partially Observed ◽

Decision Making Problem ◽

Resource Collection ◽

Improved Performance ◽

Discovery Algorithms

AbstractComplex networks are often either too large for full exploration, partially accessible, or partially observed. Downstream learning tasks on these incomplete networks can produce low quality results. In addition, reducing the incompleteness of the network can be costly and nontrivial. As a result, network discovery algorithms optimized for specific downstream learning tasks given resource collection constraints are of great interest. In this paper, we formulate the task-specific network discovery problem as a sequential decision-making problem. Our downstream task is selective harvesting, the optimal collection of vertices with a particular attribute. We propose a framework, called network actor critic (NAC), which learns a policy and notion of future reward in an offline setting via a deep reinforcement learning algorithm. The NAC paradigm utilizes a task-specific network embedding to reduce the state space complexity. A detailed comparative analysis of popular network embeddings is presented with respect to their role in supporting offline planning. Furthermore, a quantitative study is presented on various synthetic and real benchmarks using NAC and several baselines. We show that offline models of reward and network discovery policies lead to significantly improved performance when compared to competitive online discovery algorithms. Finally, we outline learning regimes where planning is critical in addressing sparse and changing reward signals.

Download Full-text

A Unified Analysis of Value-Function-Based Reinforcement-Learning Algorithms

Neural Computation ◽

10.1162/089976699300016070 ◽

1999 ◽

Vol 11 (8) ◽

pp. 2017-2060 ◽

Cited By ~ 70

Author(s):

Csaba Szepesvári ◽

Michael L. Littman

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Learning Algorithms ◽

Sequential Decision ◽

Q Learning ◽

Markov Games ◽

Optimal Behavior ◽

Risk Sensitive ◽

Optimal Value

Reinforcement learning is the problem of generating optimal behavior in a sequential decision-making environment given the opportunity of interacting with it. Many algorithms for solving reinforcement-learning problems work by computing improved estimates of the optimal value function. We extend prior analyses of reinforcement-learning algorithms and present a powerful new theorem that can provide a unified analysis of such value-function-based reinforcement-learning algorithms. The usefulness of the theorem lies in how it allows the convergence of a complex asynchronous reinforcement-learning algorithm to be proved by verifying that a simpler synchronous algorithm converges. We illustrate the application of the theorem by analyzing the convergence of Q-learning, model-based reinforcement learning, Q-learning with multistate updates, Q-learning for Markov games, and risk-sensitive reinforcement learning.

Download Full-text

Read, Watch, and Move: Reinforcement Learning for Temporally Grounding Natural Language Descriptions in Videos

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33018393 ◽

2019 ◽

Vol 33 ◽

pp. 8393-8400 ◽

Cited By ~ 8

Author(s):

Dongliang He ◽

Xiang Zhao ◽

Jizhou Huang ◽

Fu Li ◽

Xiao Liu ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Natural Language ◽

State Of The Art ◽

Sliding Window ◽

Sequential Decision Making ◽

Sequential Decision ◽

Boundary Information ◽

Performance Gains ◽

Steady Performance

The task of video grounding, which temporally localizes a natural language description in a video, plays an important role in understanding videos. Existing studies have adopted strategies of sliding window over the entire video or exhaustively ranking all possible clip-sentence pairs in a presegmented video, which inevitably suffer from exhaustively enumerated candidates. To alleviate this problem, we formulate this task as a problem of sequential decision making by learning an agent which regulates the temporal grounding boundaries progressively based on its policy. Specifically, we propose a reinforcement learning based framework improved by multi-task learning and it shows steady performance gains by considering additional supervised boundary information during training. Our proposed framework achieves state-of-the-art performance on ActivityNet’18 DenseCaption dataset (Krishna et al. 2017) and Charades-STA dataset (Sigurdsson et al. 2016; Gao et al. 2017) while observing only 10 or less clips per video.

Download Full-text

Pointwise Rotation-Invariant Network with Adaptive Sampling and 3D Spherical Voxel Convolution

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i07.6965 ◽

2020 ◽

Vol 34 (07) ◽

pp. 12717-12724

Author(s):

Yang You ◽

Yujing Lou ◽

Qi Liu ◽

Yu-Wing Tai ◽

Lizhuang Ma ◽

...

Keyword(s):

Adaptive Sampling ◽

Point Cloud ◽

Data Augmentation ◽

Feature Matching ◽

State Of The Art ◽

Point Clouds ◽

Rotation Invariant ◽

Learning Framework ◽

Point Set ◽

Part Segmentation

Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown. In this paper, we propose a brand new point-set learning framework PRIN, namely, Pointwise Rotation-Invariant Network, focusing on rotation-invariant feature extraction in point clouds analysis. We construct spherical signals by Density Aware Adaptive Sampling to deal with distorted point distributions in spherical space. In addition, we propose Spherical Voxel Convolution and Point Re-sampling to extract rotation-invariant features for each point. Our network can be applied to tasks ranging from object classification, part segmentation, to 3D feature matching and label alignment. We show that, on the dataset with randomly rotated point clouds, PRIN demonstrates better performance than state-of-the-art methods without any data augmentation. We also provide theoretical analysis for the rotation-invariance achieved by our methods.

Download Full-text

A Novel Adaptive Sampling Strategy for Deep Reinforcement Learning

International Journal of Computational Intelligence and Applications ◽

10.1142/s1469026821500115 ◽

2021 ◽

Vol 20 (02) ◽

pp. 2150011

Author(s):

Xingxing Liang ◽

Li Chen ◽

Yanghe Feng ◽

Zhong Liu ◽

Yang Ma ◽

...

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Adaptive Sampling ◽

Learning Algorithm ◽

Sampling Strategy ◽

Sequential Decision ◽

Fixed Temperature ◽

Sample Distribution ◽

Intelligent Decision Making ◽

Experience Replay

Reinforcement learning, as an effective method to solve complex sequential decision-making problems, plays an important role in areas such as intelligent decision-making and behavioral cognition. It is well known that the sample experience replay mechanism contributes to the development of current deep reinforcement learning by reusing past samples to improve the efficiency of samples. However, the existing priority experience replay mechanism changes the sample distribution in the sample set due to the higher sampling frequency assigned to a specific transition, and it cannot be applied to actor-critic and other on-policy reinforcement learning algorithm. To address this, we propose an adaptive factor based on TD-error, which further increases sample utilization by giving more attention weight to samples of larger TD-error, and embeds it flexibly into the original Deep Q Network and Advantage Actor-Critic algorithm to improve their performance. Then we carried out the performance evaluation for the proposed architecture in the context of CartPole-V1 and 6 environments of Atari game experiments, respectively, and the obtained results either on the conditions of fixed temperature or annealing temperature, when compared to those produced by the vanilla DQN and original A2C, highlight the advantages in cumulative rewards and climb speed of the improved algorithms.

Download Full-text

A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments

Adaptive Behavior ◽

10.1177/1059712319869313 ◽

2019 ◽

Vol 28 (4) ◽

pp. 273-292 ◽

Cited By ~ 1

Author(s):

Sherif Abdelfattah ◽

Kathryn Kasmarik ◽

Jiankun Hu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Markov Property ◽

Special Kind ◽

Optimization Techniques ◽

Optimization Approach ◽

Multi Objective Optimization ◽

Sequential Decision ◽

Major Drawback ◽

Multi Objective

Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this kind of problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity in order to evolve a coverage set of policies that can solve the problem. This article introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.

Download Full-text

Deep Reinforcement Learning on HVAC Control

Information Technology and Management Science ◽

10.7250/itms-2018-0004 ◽

2018 ◽

Vol 21 ◽

pp. 29-36 ◽

Cited By ~ 1

Author(s):

Ivars Namatēvs

Keyword(s):

Reinforcement Learning ◽

Predictive Control ◽

Learning Algorithm ◽

State Of The Art ◽

Building Energy ◽

Computing Power ◽

Smart Building ◽

Q Learning ◽

Sensory Inputs ◽

Q Function

Due to increase of computing power and innovative approaches of an end-to-end reinforcement learning (RL) that feed data from high-dimensional sensory inputs, it is now plausible to combine RL and Deep learning to perform Smart Building Energy Control (SBEC) systems. Deep reinforcement learning (DRL) revolutionizes existing Q-learning algorithm to Deep Q-learning (DQL) profited by artificial neural networks. Deep Neural Network (DNN) is well trained to calculate the Q-function. To create comprehensive SBEC system it is crucial to choose appropriate mathematical background and benchmark the best framework of a model based predictive control to manage the building heating, ventilation, and air condition (HVAC) system. The main contribution of this paper is to explore a state-of-the-art DRL methodology to smart building control.

Download Full-text

Efficient Reinforcement Learning with Hierarchies of Machines by Leveraging Internal Transitions

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/196 ◽

2017 ◽

Cited By ~ 5

Author(s):

Aijun Bai ◽

Stuart Russell

Keyword(s):

Reinforcement Learning ◽

Hierarchical Structure ◽

Learning Algorithm ◽

State Of The Art ◽

State Machines ◽

Learning To Learn ◽

Hierarchical Reinforcement Learning ◽

Abstract Machines ◽

Finite State ◽

Q Values

In the context of hierarchical reinforcement learning, the idea of hierarchies of abstract machines (HAMs) is to write a partial policy as a set of hierarchical finite state machines with unspecified choice states, and use reinforcement learning to learn an optimal completion of this partial policy. Given a HAM with potentially deep hierarchical structure, there often exist many internal transitions where a machine calls another machine with the environment state unchanged. In this paper, we propose a new hierarchical reinforcement learning algorithm that discovers such internal transitions automatically, and shortcircuits them recursively in computation of Q values. The resulting HAMQ-INT algorithm outperforms the state of the art significantly on the benchmark Taxi domain and a much more complex RoboCup Keepaway domain.

Download Full-text

Multi-Task Deep Reinforcement Learning with PopArt

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33013796 ◽

2019 ◽

Vol 33 ◽

pp. 3796-3803 ◽

Cited By ~ 21

Author(s):

Matteo Hessel ◽

Hubert Soyer ◽

Lasse Espeholt ◽

Wojciech Czarnecki ◽

Simon Schmitt ◽

...

Keyword(s):

Reinforcement Learning ◽

Human Performance ◽

Learning Algorithm ◽

State Of The Art ◽

Single Agent ◽

Learning System ◽

Learning Platform ◽

Art Performance ◽

The One ◽

First Time

The reinforcement learning (RL) community has made great strides in designing algorithms capable of exceeding human performance on specific tasks. These algorithms are mostly trained one task at the time, each new task requiring to train a brand new agent instance. This means the learning algorithm is general, but each solution is not; each agent can only solve the one task it was trained on. In this work, we study the problem of learning to master not one but multiple sequentialdecision tasks at once. A general issue in multi-task learning is that a balance must be found between the needs of multiple tasks competing for the limited resources of a single learning system. Many learning algorithms can get distracted by certain tasks in the set of tasks to solve. Such tasks appear more salient to the learning process, for instance because of the density or magnitude of the in-task rewards. This causes the algorithm to focus on those salient tasks at the expense of generality. We propose to automatically adapt the contribution of each task to the agent’s updates, so that all tasks have a similar impact on the learning dynamics. This resulted in state of the art performance on learning to play all games in a set of 57 diverse Atari games. Excitingly, our method learned a single trained policy - with a single set of weights - that exceeds median human performance. To our knowledge, this was the first time a single agent surpassed human-level performance on this multi-task domain. The same approach also demonstrated state of the art performance on a set of 30 tasks in the 3D reinforcement learning platform DeepMind Lab.

Download Full-text

Skill-based curiosity for intrinsically motivated reinforcement learning

Machine Learning ◽

10.1007/s10994-019-05845-8 ◽

2019 ◽

Vol 109 (3) ◽

pp. 493-512 ◽

Cited By ~ 2

Author(s):

Nicolas Bougie ◽

Ryutaro Ichise

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Skill Learning ◽

High Dimensional ◽

Sequential Decision ◽

Learning Methods ◽

Reward Function ◽

Intrinsic Reward ◽

Reinforcement Learning Models ◽

Data Efficiency

Abstract Reinforcement learning methods rely on rewards provided by the environment that are extrinsic to the agent. However, many real-world scenarios involve sparse or delayed rewards. In such cases, the agent can develop its own intrinsic reward function called curiosity to enable the agent to explore its environment in the quest of new skills. We propose a novel end-to-end curiosity mechanism for deep reinforcement learning methods, that allows an agent to gradually acquire new skills. Our method scales to high-dimensional problems, avoids the need of directly predicting the future, and, can perform in sequential decision scenarios. We formulate the curiosity as the ability of the agent to predict its own knowledge about the task. We base the prediction on the idea of skill learning to incentivize the discovery of new skills, and guide exploration towards promising solutions. To further improve data efficiency and generalization of the agent, we propose to learn a latent representation of the skills. We present a variety of sparse reward tasks in MiniGrid, MuJoCo, and Atari games. We compare the performance of an augmented agent that uses our curiosity reward to state-of-the-art learners. Experimental evaluation exhibits higher performance compared to reinforcement learning models that only learn by maximizing extrinsic rewards.

Download Full-text