hierarchical reinforcement learning
Recently Published Documents


TOTAL DOCUMENTS

267
(FIVE YEARS 126)

H-INDEX

14
(FIVE YEARS 5)

2021 ◽  
Author(s):  
Daoming Lyu ◽  
Fangkai Yang ◽  
Hugh Kwon ◽  
Bo Liu ◽  
Wen Dong ◽  
...  

Human-robot interactive decision-making is increasingly becoming ubiquitous, and explainability is an influential factor in determining the reliance on autonomy. However, it is not reasonable to trust systems beyond our comprehension, and typical machine learning and data-driven decision-making are black-box paradigms that impede explainability. Therefore, it is critical to establish computational efficient decision-making mechanisms enhanced by explainability-aware strategies. To this end, we propose the Trustworthy Decision-Making (TDM), which is an explainable neuro-symbolic approach by integrating symbolic planning into hierarchical reinforcement learning. The framework of TDM enables the subtask-level explainability from the causal relational and understandable subtasks. Besides, TDM also demonstrates the advantage of the integration between symbolic planning and reinforcement learning, reaping the benefits of both worlds. Experimental results validate the effectiveness of proposed method while improving the explainability in the process of decision-making.


Author(s):  
Tulika Saha ◽  
Dhawal Gupta ◽  
Sriparna Saha ◽  
Pushpak Bhattacharyya

Building Virtual Agents capable of carrying out complex queries of the user involving multiple intents of a domain is quite a challenge, because it demands that the agent manages several subtasks simultaneously. This article presents a universal Deep Reinforcement Learning framework that can synthesize dialogue managers capable of working in a task-oriented dialogue system encompassing various intents pertaining to a domain. The conversation between agent and user is broken down into hierarchies, to segregate subtasks pertinent to different intents. The concept of Hierarchical Reinforcement Learning, particularly options , is used to learn policies in different hierarchies that operates in distinct time steps to fulfill the user query successfully. The dialogue manager comprises top-level intent meta-policy to select among subtasks or options and a low-level controller policy to pick primitive actions to communicate with the user to complete the subtask provided to it by the top-level policy in varying intents of a domain. The proposed dialogue management module has been trained in a way such that it can be reused for any language for which it has been developed with little to no supervision. The developed system has been demonstrated for “Air Travel” and “Restaurant” domain in English and Hindi languages. Empirical results determine the robustness and efficacy of the learned dialogue policy as it outperforms several baselines and a state-of-the-art system.


Sensors ◽  
2021 ◽  
Vol 21 (21) ◽  
pp. 7053
Author(s):  
Motahareh Mobasheri ◽  
Yangwoo Kim ◽  
Woongsup Kim

With the increase in Internet of Things (IoT) devices and network communications, but with less bandwidth growth, the resulting constraints must be overcome. Due to the network complexity and uncertainty of emergency distribution parameters in smart environments, using predetermined rules seems illogical. Reinforcement learning (RL), as a powerful machine learning approach, can handle such smart environments without a trainer or supervisor. Recently, we worked on bandwidth management in a smart environment with several fog fragments using limited shared bandwidth, where IoT devices may experience uncertain emergencies in terms of the time and sequence needed for more bandwidth for further higher-level communication. We introduced fog fragment cooperation using an RL approach under a predefined fixed threshold constraint. In this study, we promote this approach by removing the fixed level of restriction of the threshold through hierarchical reinforcement learning (HRL) and completing the cooperation qualification. At the first learning hierarchy level of the proposed approach, the best threshold level is learned over time, and the final results are used by the second learning hierarchy level, where the fog node learns the best device for helping an emergency device by temporarily lending the bandwidth. Although equipping the method to the adaptive threshold and restricting fog fragment cooperation make the learning procedure more difficult, the HRL approach increases the method’s efficiency in terms of time and performance.


Author(s):  
Mitsuo Kawato ◽  
Aurelio Cortese

AbstractIn several papers published in Biological Cybernetics in the 1980s and 1990s, Kawato and colleagues proposed computational models explaining how internal models are acquired in the cerebellum. These models were later supported by neurophysiological experiments using monkeys and neuroimaging experiments involving humans. These early studies influenced neuroscience from basic, sensory-motor control to higher cognitive functions. One of the most perplexing enigmas related to internal models is to understand the neural mechanisms that enable animals to learn large-dimensional problems with so few trials. Consciousness and metacognition—the ability to monitor one’s own thoughts, may be part of the solution to this enigma. Based on literature reviews of the past 20 years, here we propose a computational neuroscience model of metacognition. The model comprises a modular hierarchical reinforcement-learning architecture of parallel and layered, generative-inverse model pairs. In the prefrontal cortex, a distributed executive network called the “cognitive reality monitoring network” (CRMN) orchestrates conscious involvement of generative-inverse model pairs in perception and action. Based on mismatches between computations by generative and inverse models, as well as reward prediction errors, CRMN computes a “responsibility signal” that gates selection and learning of pairs in perception, action, and reinforcement learning. A high responsibility signal is given to the pairs that best capture the external world, that are competent in movements (small mismatch), and that are capable of reinforcement learning (small reward-prediction error). CRMN selects pairs with higher responsibility signals as objects of metacognition, and consciousness is determined by the entropy of responsibility signals across all pairs. This model could lead to new-generation AI, which exhibits metacognition, consciousness, dimension reduction, selection of modules and corresponding representations, and learning from small samples. It may also lead to the development of a new scientific paradigm that enables the causal study of consciousness by combining CRMN and decoded neurofeedback.


2021 ◽  
Vol 8 (10) ◽  
pp. 1686-1696
Author(s):  
Chenghao Liu ◽  
Fei Zhu ◽  
Quan Liu ◽  
Yuchen Fu

Sign in / Sign up

Export Citation Format

Share Document