Continuous reinforcement learning to adapt multi-objective optimization online for robot motion

This article introduces a continuous reinforcement learning framework to enable online adaptation of multi-objective optimization functions for guiding a mobile robot to move in changing dynamic environments. The robot with this framework can continuously learn from multiple or changing environments where it encounters different numbers of obstacles moving in unknown ways at different times. Using both planned trajectories from a real-time motion planner and already executed trajectories as feedback observations, our reinforcement learning agent enables the robot to adapt motion behaviors to environmental changes. The agent contains a Q network connected to a long short-term memory network. The proposed framework is tested in both simulations and real robot experiments over various, dynamically varied task environments. The results show the efficacy of online continuous reinforcement learning for quick adaption to different, unknown, and dynamic environments.

Download Full-text

Contracts for Difference: A Reinforcement Learning Approach

Journal of Risk and Financial Management ◽

10.3390/jrfm13040078 ◽

2020 ◽

Vol 13 (4) ◽

pp. 78

Author(s):

Nico Zengeler ◽

Uwe Handmann

Keyword(s):

Reinforcement Learning ◽

Short Term Memory ◽

Learning Agents ◽

Learning Framework ◽

Learning Agent ◽

Markov Decision ◽

Economic Trends ◽

Model Size ◽

Contracts For Difference ◽

Partially Observable

We present a deep reinforcement learning framework for an automatic trading of contracts for difference (CfD) on indices at a high frequency. Our contribution proves that reinforcement learning agents with recurrent long short-term memory (LSTM) networks can learn from recent market history and outperform the market. Usually, these approaches depend on a low latency. In a real-world example, we show that an increased model size may compensate for a higher latency. As the noisy nature of economic trends complicates predictions, especially in speculative assets, our approach does not predict courses but instead uses a reinforcement learning agent to learn an overall lucrative trading policy. Therefore, we simulate a virtual market environment, based on historical trading data. Our environment provides a partially observable Markov decision process (POMDP) to reinforcement learners and allows the training of various strategies.

Download Full-text

Cluster decomposing and multi-objective optimization based-ensemble learning framework for motor imagery-based brain-computer interfaces

Journal of Neural Engineering ◽

10.1088/1741-2552/abe20f ◽

2021 ◽

Author(s):

Cili Zuo ◽

Jing Jin ◽

Ren Xu ◽

Lianghong Wu ◽

Chang Liu ◽

...

Keyword(s):

Ensemble Learning ◽

Motor Imagery ◽

Brain Computer Interfaces ◽

Multi Objective Optimization ◽

Multi Objective ◽

Learning Framework ◽

Computer Interfaces

Download Full-text

Self-adaptive multi-objective optimization method design based on agent reinforcement learning for elevator group control systems

2010 8th World Congress on Intelligent Control and Automation ◽

10.1109/wcica.2010.5554696 ◽

2010 ◽

Cited By ~ 1

Author(s):

Fanlin Zeng ◽

Qun Zong ◽

Zhengya Sun ◽

Liqian Dou

Keyword(s):

Reinforcement Learning ◽

Control Systems ◽

Optimization Method ◽

Multi Objective Optimization ◽

Multi Objective ◽

Group Control ◽

Elevator Group Control ◽

Self Adaptive

Download Full-text

DrugEx v2: De Novo Design of Drug Molecule by Pareto-based Multi-Objective Reinforcement Learning in Polypharmacology

10.26434/chemrxiv.14474127 ◽

2021 ◽

Author(s):

Xuhan Liu ◽

Kai Ye ◽

Herman Van Vlijmen ◽

Michael T. M. Emmerich ◽

Adriaan P. IJzerman ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Drug Target ◽

De Novo ◽

Multi Objective ◽

Learning Framework ◽

Drug Molecules ◽

Pareto Ranking ◽

Speed Up ◽

Deep Learning Model

In polypharmacology, ideal drugs are required to bind to multiple specific targets to enhance efficacy or to reduce resistance formation. Although deep learning has achieved breakthrough in drug discovery, most of its applications only focus on a single drug target to generate drug-like active molecules in spite of the reality that drug molecules often interact with more than one target which can have desired (polypharmacology) or undesired (toxicity) effects. In a previous study we proposed a new method named DrugEx that integrates an exploration strategy into RNN-based reinforcement learning to improve the diversity of the generated molecules. Here, we extended our DrugEx algorithm with multi-objective optimization to generate drug molecules towards more than one specific target (two adenosine receptors, A1AR and A2AAR, and the potassium ion channel hERG in this study). In our model, we applied an RNN as the agent and machine learning predictors as the environment, both of which were pre-trained in advance and then interplayed under the reinforcement learning framework. The concept of evolutionary algorithms was merged into our method such that crossover and mutation operations were implemented by the same deep learning model as the agent. During the training loop, the agent generates a batch of SMILES-based molecules. Subsequently scores for all objectives provided by the environment are used for constructing Pareto ranks of the generated molecules with non-dominated sorting and Tanimoto-based crowding distance algorithms. Here, we adopted GPU acceleration to speed up the process of Pareto optimization. The final reward of each molecule is calculated based on the Pareto ranking with the ranking selection algorithm. The agent is trained under the guidance of the reward to make sure it can generate more desired molecules after convergence of the training process. All in all we demonstrate generation of compounds with a diverse predicted selectivity profile toward multiple targets, offering the potential of high efficacy and lower toxicity.

Download Full-text

Ensemble just-in-time learning framework through evolutionary multi-objective optimization for soft sensor development of nonlinear industrial processes

Chemometrics and Intelligent Laboratory Systems ◽

10.1016/j.chemolab.2018.12.002 ◽

2019 ◽

Vol 184 ◽

pp. 153-166 ◽

Cited By ~ 12

Author(s):

Huaiping Jin ◽

Bei Pan ◽

Xiangguang Chen ◽

Bin Qian

Keyword(s):

Soft Sensor ◽

Industrial Processes ◽

Just In Time ◽

Multi Objective Optimization ◽

Sensor Development ◽

Multi Objective ◽

Learning Framework

Download Full-text

Parallel Processing for Multi-objective Optimization in Dynamic Environments

2007 IEEE International Parallel and Distributed Processing Symposium ◽

10.1109/ipdps.2007.370433 ◽

2007 ◽

Cited By ~ 17

Author(s):

Mario Camara ◽

Julio Ortega ◽

Francisco J. Toro

Keyword(s):

Parallel Processing ◽

Dynamic Environments ◽

Multi Objective Optimization ◽

Multi Objective

Download Full-text

A Dynamic Resource Allocation Strategy with Reinforcement Learning for Multimodal Multi-objective Optimization

Machine Intelligence Research ◽

10.1007/s11633-022-1314-7 ◽

2022 ◽

Author(s):

Qian-Long Dang ◽

Wei Xu ◽

Yang-Fei Yuan

Keyword(s):

Resource Allocation ◽

Reinforcement Learning ◽

Dynamic Resource Allocation ◽

Multi Objective Optimization ◽

Allocation Strategy ◽

Multi Objective ◽

Resource Allocation Strategy ◽

Dynamic Resource

Download Full-text

An Adaptation System in Unknown Environments Using a Mixture Probability Model and Clustering Distributions

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2012.p0733 ◽

2012 ◽

Vol 16 (6) ◽

pp. 733-740 ◽

Cited By ~ 3

Author(s):

Uthai Phommasak ◽

◽

Daisuke Kitakoshi ◽

Hiroyuki Shioya ◽

Keyword(s):

Reinforcement Learning ◽

Computational Complexity ◽

Bayesian Network ◽

Computer Simulations ◽

Environmental Changes ◽

Probability Model ◽

Dynamic Environments ◽

Learning System ◽

Clustering Method ◽

Processing Resources

Adaptation to dynamic environments is required in an agent system using Reinforcement Learning (RL). A mixture model of Bayesian network was introduced into the learning system for quickly adapting to such environments. This increases the computational complexity for training the parameters of the system. Therefore, reducing such complexities is necessary when there are limitations in the processing resources. In this paper, we introduce a mixture probability into RL for allowing an agent to adjust to environmental changes. We also introduce a new clustering method that enables one to select fewer elements of the mixture probability in order to reduce the computational complexity and simultaneously maintain the system’s performance. Computer simulations are presented to investigate the effectiveness of our proposed method.

Download Full-text