reward shaping Latest Research Papers

Deep Reinforcement Learning For Trading—A Critical Survey

Data ◽

10.3390/data6110119 ◽

2021 ◽

Vol 6 (11) ◽

pp. 119

Author(s):

Adrian Millea

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Financial Markets ◽

Risk Measures ◽

Google Scholar ◽

Critical Survey ◽

Problem Space ◽

World Model ◽

Short Survey ◽

Reward Shaping

Deep reinforcement learning (DRL) has achieved significant results in many machine learning (ML) benchmarks. In this short survey, we provide an overview of DRL applied to trading on financial markets with the purpose of unravelling common structures used in the trading community using DRL, as well as discovering common issues and limitations of such approaches. We include also a short corpus summarization using Google Scholar. Moreover, we discuss how one can use hierarchy for dividing the problem space, as well as using model-based RL to learn a world model of the trading environment which can be used for prediction. In addition, multiple risk measures are defined and discussed, which not only provide a way of quantifying the performance of various algorithms, but they can also act as (dense) reward-shaping mechanisms for the agent. We discuss in detail the various state representations used for financial markets, which we consider critical for the success and efficiency of such DRL agents. The market in focus for this survey is the cryptocurrency market; the results of this survey are two-fold: firstly, to find the most promising directions for further research and secondly, to show how a lack of consistency in the community can significantly impede research and the development of DRL agents for trading.

Download Full-text

Deep Reinforcement Learning For Trading - A Critical Survey

10.20944/preprints202111.0044.v1 ◽

2021 ◽

Author(s):

Adrian Millea

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Financial Markets ◽

Risk Measures ◽

Meta Analysis ◽

Google Scholar ◽

Critical Survey ◽

Problem Space ◽

World Model ◽

Reward Shaping

Deep reinforcement learning (DRL) has achieved significant results in many Machine Learning (ML) benchmarks. In this short survey we provide an overview of DRL applied to trading on financial markets, including a short meta-analysis using Google Scholar, with an emphasis on using hierarchy for dividing the problem space as well as using model-based RL to learn a world model of the trading environment which can be used for prediction. In addition, multiple risk measures are defined and discussed, which not only provide a way of quantifying the performance of various algorithms, but they can also act as (dense) reward-shaping mechanisms for the agent. We discuss in detail the various state representations used for financial markets, which we consider critical for the success and efficiency of such DRL agents. The market in focus for this survey is the cryptocurrency market.

Download Full-text

Reward Shaping to Learn Natural Object Manipulation With an Anthropomorphic Robotic Hand and Hand Pose Priors via On-Policy Reinforcement Learning

10.1109/ictc52510.2021.9620901 ◽

2021 ◽

Author(s):

Patricio Rivera ◽

Jiheon Oh ◽

Edwin Valarezo ◽

Gahyeon Ryu ◽

Hwanseok Jung ◽

...

Keyword(s):

Reinforcement Learning ◽

Object Manipulation ◽

Robotic Hand ◽

Natural Object ◽

Reward Shaping ◽

Hand Pose

Download Full-text

Traffic Signal Control via Reinforcement Learning for Reducing Global Vehicle Emission

Sustainability ◽

10.3390/su132011254 ◽

2021 ◽

Vol 13 (20) ◽

pp. 11254

Author(s):

Bálint Kővári ◽

Lászlo Szőke ◽

Tamás Bécsi ◽

Szilárd Aradi ◽

Péter Gáspár

Keyword(s):

Reinforcement Learning ◽

Queue Length ◽

Emission Reduction ◽

Control Method ◽

Traffic Signal ◽

Vehicle Emission ◽

Signal Control ◽

Traffic Signal Control ◽

The Core ◽

Reward Shaping

The traffic signal control problem is an extensively researched area providing different approaches, from classic methods to machine learning based ones. Different aspects can be considered to find an optima, from which this paper emphasises emission reduction. The core of our solution is a novel rewarding concept for deep reinforcement learning (DRL) which does not utilize any reward shaping, hence exposes new insights into the traffic signal control (TSC) problem. Despite the omission of the standard measures in the rewarding scheme, the proposed approach can outperform a modern actuated control method in classic performance measures such as waiting time and queue length. Moreover, the sustainability of the realized controls is also placed under investigation to evaluate their environmental impacts. Our results show that the proposed solution goes beyond the actuated control not just in the classic measures but in emission-related measures too.

Download Full-text

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

European Journal of Operational Research ◽

10.1016/j.ejor.2021.10.045 ◽

2021 ◽

Author(s):

Bram J. De Moor ◽

Joren Gijsbrechts ◽

Robert N. Boute

Keyword(s):

Reinforcement Learning ◽

Inventory Management ◽

Perishable Inventory ◽

Reward Shaping

Download Full-text

Policy invariant explicit shaping: an efficient alternative to reward shaping

Neural Computing and Applications ◽

10.1007/s00521-021-06259-1 ◽

2021 ◽

Author(s):

Paniz Behboudian ◽

Yash Satsangi ◽

Matthew E. Taylor ◽

Anna Harutyunyan ◽

Michael Bowling

Keyword(s):

Potential Function ◽

Optimal Policy ◽

Correction Term ◽

Training Data ◽

Simple Method ◽

Powerful Learning ◽

Dynamic Potential ◽

Reward Shaping ◽

Good Advice ◽

Efficient Alternative

AbstractReinforcement learning (RL) is a powerful learning paradigm in which agents can learn to maximize sparse and delayed reward signals. Although RL has had many impressive successes in complex domains, learning can take hours, days, or even years of training data. A major challenge of contemporary RL research is to discover how to learn with less data. Previous work has shown that domain information can be successfully used to shape the reward; by adding additional reward information, the agent can learn with much less data. Furthermore, if the reward is constructed from a potential function, the optimal policy is guaranteed to be unaltered. While such potential-based reward shaping (PBRS) holds promise, it is limited by the need for a well-defined potential function. Ideally, we would like to be able to take arbitrary advice from a human or other agent and improve performance without affecting the optimal policy. The recently introduced dynamic potential-based advice (DPBA) was proposed to tackle this challenge by predicting the potential function values as part of the learning process. However, this article demonstrates theoretically and empirically that, while DPBA can facilitate learning with good advice, it does in fact alter the optimal policy. We further show that when adding the correction term to “fix” DPBA it no longer shows effective shaping with good advice. We then present a simple method called policy invariant explicit shaping (PIES) and show theoretically and empirically that PIES can use arbitrary advice, speed-up learning, and leave the optimal policy unchanged.

Download Full-text

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

10.1109/iros51168.2021.9636020 ◽

2021 ◽

Author(s):

Farzan Memarian ◽

Wonjoon Goo ◽

Rudolf Lioutikov ◽

Scott Niekum ◽

Ufuk Topcu

Keyword(s):

Reward Shaping

Download Full-text

Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

Robotics ◽

10.3390/robotics10030105 ◽

2021 ◽

Vol 10 (3) ◽

pp. 105

Author(s):

Andrew Lobbezoo ◽

Yanjun Qian ◽

Hyock-Ju Kwon

Keyword(s):

Reinforcement Learning ◽

Critical Discussion ◽

Value Iteration ◽

Open Problems ◽

Work Related ◽

Model Generalization ◽

Pick And Place ◽

Reward Shaping ◽

Place Task ◽

Policy Optimization

The field of robotics has been rapidly developing in recent years, and the work related to training robotic agents with reinforcement learning has been a major focus of research. This survey reviews the application of reinforcement learning for pick-and-place operations, a task that a logistics robot can be trained to complete without support from a robotics engineer. To introduce this topic, we first review the fundamentals of reinforcement learning and various methods of policy optimization, such as value iteration and policy search. Next, factors which have an impact on the pick-and-place task, such as reward shaping, imitation learning, pose estimation, and simulation environment are examined. Following the review of the fundamentals and key factors for reinforcement learning, we present an extensive review of all methods implemented by researchers in the field to date. The strengths and weaknesses of each method from literature are discussed, and details about the contribution of each manuscript to the field are reviewed. The concluding critical discussion of the available literature, and the summary of open problems indicates that experiment validation, model generalization, and grasp pose selection are topics that require additional research.

Download Full-text

A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm

Sensors ◽

10.3390/s21165643 ◽

2021 ◽

Vol 21 (16) ◽

pp. 5643

Author(s):

Wenqiang Zu ◽

Hongyu Yang ◽

Renyu Liu ◽

Yulong Ji

Keyword(s):

Arrival Time ◽

Specific Pattern ◽

Reward Function ◽

The Neural Network ◽

Dubins Path ◽

Reward Shaping ◽

Neural Network Structure ◽

Simulation Results ◽

Guidance Problem ◽

Application Prospects

Guiding an aircraft to 4D waypoints at a certain heading is a multi-dimensional goal aircraft guidance problem. [d=Zu]In order to improve the performance and solve this problem, this paper proposes a multi-layer RL approach.To enhance the performance, in the present study, a multi-layer RL approach to solve the multi-dimensional goal aircraft guidance problem is proposed. The approach [d=Zu]enablesassists the autopilot in an ATC simulator to guide an aircraft to 4D waypoints at certain latitude, longitude, altitude, heading, and arrival time, respectively. To be specific, a multi-layer RL [d=Zu]approach is proposedmethod to simplify the neural network structure and reduce the state dimensions. A shaped reward function that involves the potential function and Dubins path method is applied. [d=Zu]Experimental and simulation results show that the proposed approachExperiments are conducted and the simulation results reveal that the proposed method can significantly improve the convergence efficiency and trajectory performance. [d=Zu]FurthermoreFurther, the results indicate possible application prospects in team aircraft guidance tasks, since the aircraft can directly approach a goal without waiting in a specific pattern, thereby overcoming the problem of current ATC simulators.

Download Full-text

Emergent Prosociality in Multi-Agent Games Through Gifting

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/61 ◽

2021 ◽

Author(s):

Woodrow Z. Wang ◽

Mark Beliaev ◽

Erdem Bıyık ◽

Daniel A. Lazar ◽

Ramtin Pedarsani ◽

...

Keyword(s):

Dynamical System ◽

Numerical Analysis ◽

High Risk ◽

Multiple Equilibria ◽

State Of The Art ◽

Basins Of Attraction ◽

Coordination Games ◽

Strong Assumption ◽

Reward Shaping ◽

Multi Agent

Coordination is often critical to forming prosocial behaviors -- behaviors that increase the overall sum of rewards received by all agents in a multi-agent game. However, state of the art reinforcement learning algorithms often suffer from converging to socially less desirable equilibria when multiple equilibria exist. Previous works address this challenge with explicit reward shaping, which requires the strong assumption that agents can be forced to be prosocial. We propose using a less restrictive peer-rewarding mechanism, gifting, that guides the agents toward more socially desirable equilibria while allowing agents to remain selfish and decentralized. Gifting allows each agent to give some of their reward to other agents. We employ a theoretical framework that captures the benefit of gifting in converging to the prosocial equilibrium by characterizing the equilibria's basins of attraction in a dynamical system. With gifting, we demonstrate increased convergence of high risk, general-sum coordination games to the prosocial equilibrium both via numerical analysis and experiments.

Download Full-text

reward shaping
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Deep Reinforcement Learning For Trading—A Critical Survey

Deep Reinforcement Learning For Trading - A Critical Survey

Reward Shaping to Learn Natural Object Manipulation With an Anthropomorphic Robotic Hand and Hand Pose Priors via On-Policy Reinforcement Learning

Traffic Signal Control via Reinforcement Learning for Reducing Global Vehicle Emission

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

Policy invariant explicit shaping: an efficient alternative to reward shaping

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm

Emergent Prosociality in Multi-Agent Games Through Gifting

Export Citation Format

reward shapingRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Deep Reinforcement Learning For Trading—A Critical Survey

Deep Reinforcement Learning For Trading - A Critical Survey

Reward Shaping to Learn Natural Object Manipulation With an Anthropomorphic Robotic Hand and Hand Pose Priors via On-Policy Reinforcement Learning

Traffic Signal Control via Reinforcement Learning for Reducing Global Vehicle Emission

Reward shaping to improve the performance of deep reinforcement learning in perishable inventory management

Policy invariant explicit shaping: an efficient alternative to reward shaping

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Reinforcement Learning for Pick and Place Operations in Robotics: A Survey

A Multi-Dimensional Goal Aircraft Guidance Approach Based on Reinforcement Learning with a Reward Shaping Algorithm

Emergent Prosociality in Multi-Agent Games Through Gifting

reward shaping
Recently Published Documents