scholarly journals Novelty search for deep reinforcement learning policy network weights by action sequence edit metric distance

Author(s):  
Ethan C. Jackson ◽  
Mark Daley
Author(s):  
Heecheol Kim ◽  
Masanori Yamada ◽  
Kosuke Miyoshi ◽  
Tomoharu Iwata ◽  
Hiroshi Yamakawa

Author(s):  
Peng Zhang ◽  
Jianye Hao ◽  
Weixun Wang ◽  
Hongyao Tang ◽  
Yi Ma ◽  
...  

Reinforcement learning agents usually learn from scratch, which requires a large number of interactions with the environment. This is quite different from the learning process of human. When faced with a new task, human naturally have the common sense and use the prior knowledge to derive an initial policy and guide the learning process afterwards. Although the prior knowledge may be not fully applicable to the new task, the learning process is significantly sped up since the initial policy ensures a quick-start of learning and intermediate guidance allows to avoid unnecessary exploration. Taking this inspiration, we propose knowledge guided policy network (KoGuN), a novel framework that combines human prior suboptimal knowledge with reinforcement learning. Our framework consists of a fuzzy rule controller to represent human knowledge and a refine module to finetune suboptimal prior knowledge. The proposed framework is end-to-end and can be combined with existing policy-based reinforcement learning algorithm. We conduct experiments on several control tasks. The empirical results show that our approach, which combines suboptimal human knowledge and RL, achieves significant improvement on learning efficiency of flat RL algorithms, even with very low-performance human prior knowledge.


2021 ◽  
Author(s):  
Shuzhen Luo ◽  
Ghaith Androwis ◽  
Sergei Adamovich ◽  
Erick Nunez ◽  
Hao Su ◽  
...  

Abstract Background: Few studies have systematically investigated robust controllers for lower limb rehabilitation exoskeletons (LLREs) that can safely and effectively assist users with a variety of neuromuscular disorders to walk with full autonomy. One of the key challenges for developing such a robust controller is to handle different degrees of uncertain human-exoskeleton interaction forces from the patients. Consequently, conventional walking controllers either are patient-condition specific or involve tuning of many control parameters, which could behave unreliably and even fail to maintain balance. Methods: We present a novel and robust controller for a LLRE based on a decoupled deep reinforcement learning framework with three independent networks, which aims to provide reliable walking assistance against various and uncertain human-exoskeleton interaction forces. The exoskeleton controller is driven by a neural network control policy that acts on a stream of the LLRE’s proprioceptive signals, including joint kinematic states, and subsequently predicts real-time position control targets for the actuated joints. To handle uncertain human-interaction forces, the control policy is trained intentionally with an integrated human musculoskeletal model and realistic human-exoskeleton interaction forces. Two other neural networks are connected with the control policy network to predict the interaction forces and muscle coordination. To further increase the robustness of the control policy, we employ domain randomization during training that includes not only randomization of exoskeleton dynamics properties but, more importantly, randomization of human muscle strength to simulate the variability of the patient’s disability. Through this decoupled deep reinforcement learning framework, the trained controller of LLREs is able to provide reliable walking assistance to the human with different degrees of neuromuscular disorders. Results and Conclusion: A universal, RL-based walking controller is trained and virtually tested on a LLRE system to verify its effectiveness and robustness in assisting users with different disabilities such as passive muscles (quadriplegic), muscle weakness, or hemiplegic conditions. An ablation study demonstrates strong robustness of the control policy under large exoskeleton dynamic property ranges and various human-exoskeleton interaction forces. The decoupled network structure allows us to isolate the LLRE control policy network for testing and sim-to-real transfer since it uses only proprioception information of the LLRE (joint sensory state) as the input. Furthermore, the controller is shown to be able to handle different patient conditions without the need for patient-specific control parameters tuning.


Author(s):  
Chunyi Wu ◽  
Gaochao Xu ◽  
Yan Ding ◽  
Jia Zhao

Large-scale tasks processing based on cloud computing has become crucial to big data analysis and disposal in recent years. Most previous work, generally, utilize the conventional methods and architectures for general scale tasks to achieve tons of tasks disposing, which is limited by the issues of computing capability, data transmission, etc. Based on this argument, a fat-tree structure-based approach called LTDR (Large-scale Tasks processing using Deep network model and Reinforcement learning) has been proposed in this work. Aiming at exploring the optimal task allocation scheme, a virtual network mapping algorithm based on deep convolutional neural network and [Formula: see text]-learning is presented herein. After feature extraction, we design and implement a policy network to make node mapping decisions. The link mapping scheme can be attained by the designed distributed value-function based reinforcement learning model. Eventually, tasks are allocated onto proper physical nodes and processed efficiently. Experimental results show that LTDR can significantly improve the utilization of physical resources and long-term revenue while satisfying task requirements in big data.


Sensors ◽  
2021 ◽  
Vol 21 (24) ◽  
pp. 8331
Author(s):  
Thejus Pathmakumar ◽  
Mohan Rajesh Elara ◽  
Braulio Félix Gómez ◽  
Balakrishnan Ramalingam

Cleaning is one of the fundamental tasks with prime importance given in our day-to-day life. Moreover, the importance of cleaning drives the research efforts towards bringing leading edge technologies, including robotics, into the cleaning domain. However, an effective method to assess the quality of cleaning is an equally important research problem to be addressed. The primary footstep towards addressing the fundamental question of “How clean is clean” is addressed using an autonomous cleaning-auditing robot that audits the cleanliness of a given area. This research work focuses on a novel reinforcement learning-based experience-driven dirt exploration strategy for a cleaning-auditing robot. The proposed approach uses proximal policy approximation (PPO) based on-policy learning method to generate waypoints and sampling decisions to explore the probable dirt accumulation regions in a given area. The policy network is trained in multiple environments with simulated dirt patterns. Experiment trials have been conducted to validate the trained policy in both simulated and real-world environments using an in-house developed cleaning audit robot called BELUGA.


2021 ◽  
Vol 11 (18) ◽  
pp. 8419
Author(s):  
Jiang Zhao ◽  
Jiaming Sun ◽  
Zhihao Cai ◽  
Longhong Wang ◽  
Yingxun Wang

To achieve the perception-based autonomous control of UAVs, schemes with onboard sensing and computing are popular in state-of-the-art work, which often consist of several separated modules with respective complicated algorithms. Most methods depend on handcrafted designs and prior models with little capacity for adaptation and generalization. Inspired by the research on deep reinforcement learning, this paper proposes a new end-to-end autonomous control method to simplify the separate modules in the traditional control pipeline into a single neural network. An image-based reinforcement learning framework is established, depending on the design of the network architecture and the reward function. Training is performed with model-free algorithms developed according to the specific mission, and the control policy network can map the input image directly to the continuous actuator control command. A simulation environment for the scenario of UAV landing was built. In addition, the results under different typical cases, including both the small and large initial lateral or heading angle offsets, show that the proposed end-to-end method is feasible for perception-based autonomous control.


Author(s):  
Penghui Wei ◽  
Wenji Mao ◽  
Guandan Chen

Analyzing public attitudes plays an important role in opinion mining systems. Stance detection aims to determine from a text whether its author is in favor of, against, or neutral towards a given target. One challenge of this task is that a text may not explicitly express an attitude towards the target, but existing approaches utilize target content alone to build models. Moreover, although weakly supervised approaches have been proposed to ease the burden of manually annotating largescale training data, such approaches are confronted with noisy labeling problem. To address the above two issues, in this paper, we propose a Topic-Aware Reinforced Model (TARM) for weakly supervised stance detection. Our model consists of two complementary components: (1) a detection network that incorporates target-related topic information into representation learning for identifying stance effectively; (2) a policy network that learns to eliminate noisy instances from auto-labeled data based on off-policy reinforcement learning. Two networks are alternately optimized to improve each other’s performances. Experimental results demonstrate that our proposed model TARM outperforms the state-of-the-art approaches.


2021 ◽  
Vol 11 (7) ◽  
pp. 2977
Author(s):  
Kyu Tae Park ◽  
Yoo Ho Son ◽  
Sang Wook Ko ◽  
Sang Do Noh

To achieve efficient personalized production at an affordable cost, a modular manufacturing system (MMS) can be utilized. MMS enables restructuring of its configuration to accommodate product changes and is thus an efficient solution to reduce the costs involved in personalized production. A micro smart factory (MSF) is an MMS with heterogeneous production processes to enable personalized production. Similar to MMS, MSF also enables the restructuring of production configuration; additionally, it comprises cyber-physical production systems (CPPSs) that help achieve resilience. However, MSFs need to overcome performance hurdles with respect to production control. Therefore, this paper proposes a digital twin (DT) and reinforcement learning (RL)-based production control method. This method replaces the existing dispatching rule in the type and instance phases of the MSF. In this method, the RL policy network is learned and evaluated by coordination between DT and RL. The DT provides virtual event logs that include states, actions, and rewards to support learning. These virtual event logs are returned based on vertical integration with the MSF. As a result, the proposed method provides a resilient solution to the CPPS architectural framework and achieves appropriate actions to the dynamic situation of MSF. Additionally, applying DT with RL helps decide what-next/where-next in the production cycle. Moreover, the proposed concept can be extended to various manufacturing domains because the priority rule concept is frequently applied.


Sign in / Sign up

Export Citation Format

Share Document