scholarly journals Self-Supervised Mixture-of-Experts by Uncertainty Estimation

Author(s):  
Zhuobin Zheng ◽  
Chun Yuan ◽  
Xinrui Zhu ◽  
Zhihui Lin ◽  
Yangyang Cheng ◽  
...  

Learning related tasks in various domains and transferring exploited knowledge to new situations is a significant challenge in Reinforcement Learning (RL). However, most RL algorithms are data inefficient and fail to generalize in complex environments, limiting their adaptability and applicability in multi-task scenarios. In this paper, we propose SelfSupervised Mixture-of-Experts (SUM), an effective algorithm driven by predictive uncertainty estimation for multitask RL. SUM utilizes a multi-head agent with shared parameters as experts to learn a series of related tasks simultaneously by Deep Deterministic Policy Gradient (DDPG). Each expert is extended by predictive uncertainty estimation on known and unknown states to enhance the Q-value evaluation capacity against overfitting and the overall generalization ability. These enable the agent to capture and diffuse the common knowledge across different tasks improving sample efficiency in each task and the effectiveness of expert scheduling across multiple tasks. Instead of task-specific design as common MoEs, a self-supervised gating network is adopted to determine a potential expert to handle each interaction from unseen environments and calibrated completely by the uncertainty feedback from the experts without explicit supervision. To alleviate the imbalanced expert utilization as the crux of MoE, optimization is accomplished via decayedmasked experience replay, which encourages both diversification and specialization of experts during different periods. We demonstrate that our approach learns faster and achieves better performance by efficient transfer and robust generalization, outperforming several related methods on extended OpenAI Gym’s MuJoCo multi-task environments.

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Chaohai Kang ◽  
Chuiting Rong ◽  
Weijian Ren ◽  
Fengcai Huo ◽  
Pengyun Liu

2021 ◽  
Vol 6 (2) ◽  
pp. 951-957
Author(s):  
Ze Yang Ding ◽  
Junn Yong Loo ◽  
Vishnu Monn Baskaran ◽  
Surya Girinatha Nurzaman ◽  
Chee Pin Tan

Author(s):  
Julissa Villanueva Llerena

Tractable Deep Probabilistic Models (TPMs) are generative models based on arithmetic circuits that allow for exact marginal inference in linear time. These models have obtained promising results in several machine learning tasks. Like many other models, TPMs can produce over-confident incorrect inferences, especially on regions with small statistical support. In this work, we will develop efficient estimators of the predictive uncertainty that are robust to data scarcity and outliers. We investigate two approaches. The first approach measures the variability of the output to perturbations of the model weights. The second approach captures the variability of the prediction to changes in the model architecture. We will evaluate the approaches on challenging tasks such as image completion and multilabel classification.


2019 ◽  
Vol 24 (6) ◽  
pp. 4307-4322 ◽  
Author(s):  
Sergio Hernández ◽  
Diego Vergara ◽  
Matías Valdenegro-Toro ◽  
Felipe Jorquera

Information ◽  
2020 ◽  
Vol 11 (2) ◽  
pp. 77 ◽  
Author(s):  
Juan Chen ◽  
Zhengxuan Xue ◽  
Daiqian Fan

In order to solve the problem of vehicle delay caused by stops at signalized intersections, a micro-control method of a left-turning connected and automated vehicle (CAV) based on an improved deep deterministic policy gradient (DDPG) is designed in this paper. In this paper, the micro-control of the whole process of a left-turn vehicle approaching, entering, and leaving a signalized intersection is considered. In addition, in order to solve the problems of low sampling efficiency and overestimation of the critic network of the DDPG algorithm, a positive and negative reward experience replay buffer sampling mechanism and multi-critic network structure are adopted in the DDPG algorithm in this paper. Finally, the effectiveness of the signal control method, six DDPG-based methods (DDPG, PNRERB-1C-DDPG, PNRERB-3C-DDPG, PNRERB-5C-DDPG, PNRERB-5CNG-DDPG, and PNRERB-7C-DDPG), and four DQN-based methods (DQN, Dueling DQN, Double DQN, and Prioritized Replay DQN) are verified under 0.2, 0.5, and 0.7 saturation degrees of left-turning vehicles at a signalized intersection within a VISSIM simulation environment. The results show that the proposed deep reinforcement learning method can get a number of stops benefits ranging from 5% to 94%, stop time benefits ranging from 1% to 99%, and delay benefits ranging from −17% to 93%, respectively compared with the traditional signal control method.


2020 ◽  
Vol 12 (22) ◽  
pp. 3789
Author(s):  
Bo Li ◽  
Zhigang Gan ◽  
Daqing Chen ◽  
Dyachenko Sergey Aleksandrovich

This paper combines deep reinforcement learning (DRL) with meta-learning and proposes a novel approach, named meta twin delayed deep deterministic policy gradient (Meta-TD3), to realize the control of unmanned aerial vehicle (UAV), allowing a UAV to quickly track a target in an environment where the motion of a target is uncertain. This approach can be applied to a variety of scenarios, such as wildlife protection, emergency aid, and remote sensing. We consider a multi-task experience replay buffer to provide data for the multi-task learning of the DRL algorithm, and we combine meta-learning to develop a multi-task reinforcement learning update method to ensure the generalization capability of reinforcement learning. Compared with the state-of-the-art algorithms, namely the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3), experimental results show that the Meta-TD3 algorithm has achieved a great improvement in terms of both convergence value and convergence rate. In a UAV target tracking problem, Meta-TD3 only requires a few steps to train to enable a UAV to adapt quickly to a new target movement mode more and maintain a better tracking effectiveness.


Water ◽  
2018 ◽  
Vol 10 (4) ◽  
pp. 475 ◽  
Author(s):  
Amos Anele ◽  
Ezio Todini ◽  
Yskandar Hamam ◽  
Adnan Abu-Mahfouz

Symmetry ◽  
2019 ◽  
Vol 11 (11) ◽  
pp. 1352 ◽  
Author(s):  
Kim ◽  
Park

In terms of deep reinforcement learning (RL), exploration is highly significant in achieving better generalization. In benchmark studies, ε-greedy random actions have been used to encourage exploration and prevent over-fitting, thereby improving generalization. Deep RL with random ε-greedy policies, such as deep Q-networks (DQNs), can demonstrate efficient exploration behavior. A random ε-greedy policy exploits additional replay buffers in an environment of sparse and binary rewards, such as in the real-time online detection of network securities by verifying whether the network is “normal or anomalous.” Prior studies have illustrated that a prioritized replay memory attributed to a complex temporal difference error provides superior theoretical results. However, another implementation illustrated that in certain environments, the prioritized replay memory is not superior to the randomly-selected buffers of random ε-greedy policy. Moreover, a key challenge of hindsight experience replay inspires our objective by using additional buffers corresponding to each different goal. Therefore, we attempt to exploit multiple random ε-greedy buffers to enhance explorations for a more near-perfect generalization with one original goal in off-policy RL. We demonstrate the benefit of off-policy learning from our method through an experimental comparison of DQN and a deep deterministic policy gradient in terms of discrete action, as well as continuous control for complete symmetric environments.


2021 ◽  
Vol 285 ◽  
pp. 116386
Author(s):  
Jiawen Li ◽  
Tao Yu ◽  
Xiaoshun Zhang ◽  
Fusheng Li ◽  
Dan Lin ◽  
...  

Water ◽  
2016 ◽  
Vol 8 (10) ◽  
pp. 463 ◽  
Author(s):  
Silvia Barbetta ◽  
Gabriele Coccia ◽  
Tommaso Moramarco ◽  
Ezio Todini

Sign in / Sign up

Export Citation Format

Share Document