Novelty search for deep reinforcement learning policy network weights by action sequence edit metric distance

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.

Download Full-text

Robust Walking Control of a Lower Limb Rehabilitation Exoskeleton Coupled with a Musculoskeletal Model via Deep Reinforcement Learning

10.21203/rs.3.rs-1212542/v1 ◽

2021 ◽

Author(s):

Shuzhen Luo ◽

Ghaith Androwis ◽

Sergei Adamovich ◽

Erick Nunez ◽

Hao Su ◽

...

Keyword(s):

Reinforcement Learning ◽

Neuromuscular Disorders ◽

Control Policy ◽

Policy Network ◽

Control Parameters ◽

Robust Controller ◽

Interaction Forces ◽

Learning Framework ◽

Limb Rehabilitation ◽

Lower Limb Rehabilitation

Abstract Background: Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods: We present a novel and robust controller for a LLRE based on a decoupled deep reinforcement learning framework with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human-interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to the human with different degrees of neuromuscular disorders. Results and Conclusion: A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions. An ablation study demonstrates strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameters tuning.

Download Full-text

Hierarchical Policy Network with Multi-agent for Knowledge Graph Reasoning Based on Reinforcement Learning

Knowledge Science, Engineering and Management - Lecture Notes in Computer Science ◽

10.1007/978-3-030-82136-4_36 ◽

2021 ◽

pp. 445-457

Author(s):

Mingming Zheng ◽

Yanquan Zhou ◽

Qingyao Cui

Keyword(s):

Reinforcement Learning ◽

Knowledge Graph ◽

Policy Network ◽

Multi Agent

Download Full-text

Explore Deep Neural Network and Reinforcement Learning to Large-scale Tasks Processing in Big Data

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001419510108 ◽

2019 ◽

Vol 33 (13) ◽

pp. 1951010 ◽

Cited By ~ 1

Author(s):

Chunyi Wu ◽

Gaochao Xu ◽

Yan Ding ◽

Jia Zhao

Keyword(s):

Neural Network ◽

Big Data ◽

Reinforcement Learning ◽

Large Scale ◽

Policy Network ◽

Text Learning ◽

Mapping Algorithm ◽

Task Requirements ◽

Network Mapping

Large-scale tasks processing based on cloud computing has become crucial to big data analysis and disposal in recent years. Most previous work, generally, utilize the conventional methods and architectures for general scale tasks to achieve tons of tasks disposing, which is limited by the issues of computing capability, data transmission, etc. Based on this argument, a fat-tree structure-based approach called LTDR (Large-scale Tasks processing using Deep network model and Reinforcement learning) has been proposed in this work. Aiming at exploring the optimal task allocation scheme, a virtual network mapping algorithm based on deep convolutional neural network and [Formula: see text]-learning is presented herein. After feature extraction, we design and implement a policy network to make node mapping decisions. The link mapping scheme can be attained by the designed distributed value-function based reinforcement learning model. Eventually, tasks are allocated onto proper physical nodes and processed efficiently. Experimental results show that LTDR can significantly improve the utilization of physical resources and long-term revenue while satisfying task requirements in big data.

Download Full-text

A Reinforcement Learning Based Dirt-Exploration for Cleaning-Auditing Robot

Sensors ◽

10.3390/s21248331 ◽

2021 ◽

Vol 21 (24) ◽

pp. 8331

Author(s):

Thejus Pathmakumar ◽

Mohan Rajesh Elara ◽

Braulio Félix Gómez ◽

Balakrishnan Ramalingam

Keyword(s):

Reinforcement Learning ◽

Research Work ◽

Leading Edge ◽

Research Problem ◽

Fundamental Question ◽

Policy Learning ◽

Policy Network ◽

Important Research ◽

Important Research Problem

Cleaning is one of the fundamental tasks with prime importance given in our day-to-day life. Moreover, the importance of cleaning drives the research efforts towards bringing leading edge technologies, including robotics, into the cleaning domain. However, an effective method to assess the quality of cleaning is an equally important research problem to be addressed. The primary footstep towards addressing the fundamental question of “How clean is clean” is addressed using an autonomous cleaning-auditing robot that audits the cleanliness of a given area. This research work focuses on a novel reinforcement learning-based experience-driven dirt exploration strategy for a cleaning-auditing robot. The proposed approach uses proximal policy approximation (PPO) based on-policy learning method to generate waypoints and sampling decisions to explore the probable dirt accumulation regions in a given area. The policy network is trained in multiple environments with simulated dirt patterns. Experiment trials have been conducted to validate the trained policy in both simulated and real-world environments using an in-house developed cleaning audit robot called BELUGA.

Download Full-text

PNS: Population-Guided Novelty Search for Reinforcement Learning in Hard Exploration Environments

10.1109/iros51168.2021.9636234 ◽

2021 ◽

Author(s):

Qihao Liu ◽

Yujia Wang ◽

Xiaofeng Liu

Keyword(s):

Reinforcement Learning ◽

Novelty Search

Download Full-text

End-to-End Deep Reinforcement Learning for Image-Based UAV Autonomous Control

Applied Sciences ◽

10.3390/app11188419 ◽

2021 ◽

Vol 11 (18) ◽

pp. 8419

Author(s):

Jiang Zhao ◽

Jiaming Sun ◽

Zhihao Cai ◽

Longhong Wang ◽

Yingxun Wang

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Control Method ◽

Control Policy ◽

Input Image ◽

Autonomous Control ◽

Policy Network ◽

Model Free ◽

Control Command ◽

End To End

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.

Download Full-text

A Topic-Aware Reinforced Model for Weakly Supervised Stance Detection

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33017249 ◽

2019 ◽

Vol 33 ◽

pp. 7249-7256

Author(s):

Penghui Wei ◽

Wenji Mao ◽

Guandan Chen

Keyword(s):

Reinforcement Learning ◽

Opinion Mining ◽

State Of The Art ◽

Public Attitudes ◽

Representation Learning ◽

Experimental Results ◽

Training Data ◽

Policy Network ◽

Proposed Model ◽

Weakly Supervised

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.

Download Full-text

Digital Twin and Reinforcement Learning-Based Resilient Production Control for Micro Smart Factory

Applied Sciences ◽

10.3390/app11072977 ◽

2021 ◽

Vol 11 (7) ◽

pp. 2977

Author(s):

Kyu Tae Park ◽

Yoo Ho Son ◽

Sang Wook Ko ◽

Sang Do Noh

Keyword(s):

Reinforcement Learning ◽

Production Control ◽

Control Method ◽

Production Systems ◽

Smart Factory ◽

Production Cycle ◽

Policy Network ◽

Digital Twin ◽

Event Logs ◽

Architectural Framework

To achieve efficient personalized production at an affordable cost, a modular manufacturing system (MMS) can be utilized. MMS enables restructuring of its configuration to accommodate product changes and is thus an efficient solution to reduce the costs involved in personalized production. A micro smart factory (MSF) is an MMS with heterogeneous production processes to enable personalized production. Similar to MMS, MSF also enables the restructuring of production configuration; additionally, it comprises cyber-physical production systems (CPPSs) that help achieve resilience. However, MSFs need to overcome performance hurdles with respect to production control. Therefore, this paper proposes a digital twin (DT) and reinforcement learning (RL)-based production control method. This method replaces the existing dispatching rule in the type and instance phases of the MSF. In this method, the RL policy network is learned and evaluated by coordination between DT and RL. The DT provides virtual event logs that include states, actions, and rewards to support learning. These virtual event logs are returned based on vertical integration with the MSF. As a result, the proposed method provides a resilient solution to the CPPS architectural framework and achieves appropriate actions to the dynamic situation of MSF. Additionally, applying DT with RL helps decide what-next/where-next in the production cycle. Moreover, the proposed concept can be extended to various manufacturing domains because the priority rule concept is frequently applied.

Download Full-text