Deep soccer analytics: learning an action-value function for evaluating soccer players

AbstractFor improving the response performance of engine, a novel aero-engine control method based on Deep Q Learning (DQL) is proposed. The engine controller based on DQL has been designed. The model free algorithm – Q learning, which can be performed online, is adopted to calculate the action value function. To improve the learning capacity of DQL, the deep learning algorithm – On Line Sliding Window Deep Neural Network (OL-SW-DNN), is adopted to estimate the action value function. For reducing the sensitivity to the noise of training data, OL-SW-DNN selects nearest point data of certain length as training data. Finally, the engine acceleration simulations of DQR and the Proportion Integration Differentiation (PID) which is the most commonly used as engine controller algorithm in industry are both conducted to verify the validity of the proposed method. The results show that the acceleration time of the proposed method decreased by 1.475 second while satisfied all of engine limits compared with the tradition controller.

Download Full-text

Inaccuracy of State-Action Value Function For Non-Optimal Actions in Adversarially Trained Deep Neural Policies

10.1109/cvprw53098.2021.00264 ◽

2021 ◽

Author(s):

Ezgi Korkmaz

Keyword(s):

Value Function ◽

State Action ◽

Action Value

Download Full-text

Reinforcement learning in the environment where optimal action value function is partly discontinuous

2016 55th Annual Conference of the Society of Instrument and Control Engineers of Japan (SICE) ◽

10.1109/sice.2016.7749277 ◽

2016 ◽

Author(s):

Shingo Shibusawa ◽

Takeshi Shibuya

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Optimal Action ◽

Action Value

Download Full-text

Reinforcement Learning for Optimizing Driving Policies on Cruising Taxis Services

Sustainability ◽

10.3390/su12218883 ◽

2020 ◽

Vol 12 (21) ◽

pp. 8883

Author(s):

Kun Jin ◽

Wei Wang ◽

Xuedong Hua ◽

Wei Zhou

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

State Action ◽

Future Reward ◽

Long Run ◽

Markov Decision ◽

Action Value ◽

Data Expansion ◽

Taking Action ◽

The Value Function

As the key element of urban transportation, taxis services significantly provide convenience and comfort for residents’ travel. However, the reality has not shown much efficiency. Previous researchers mainly aimed to optimize policies by order dispatch on ride-hailing services, which cannot be applied in cruising taxis services. This paper developed the reinforcement learning (RL) framework to optimize driving policies on cruising taxis services. Firstly, we formulated the drivers’ behaviours as the Markov decision process (MDP) progress, considering the influences after taking action in the long run. The RL framework using dynamic programming and data expansion was employed to calculate the state-action value function. Following the value function, drivers can determine the best choice and then quantify the expected future reward at a particular state. By utilizing historic orders data in Chengdu, we analysed the function value’s spatial distribution and demonstrated how the model could optimize the driving policies. Finally, the realistic simulation of the on-demand platform was built. Compared with other benchmark methods, the results verified that the new model performs better in increasing total revenue, answer rate and decreasing waiting time, with the relative percentages of 4.8%, 6.2% and −27.27% at most.

Download Full-text

Determinantal Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33014659 ◽

2019 ◽

Vol 33 ◽

pp. 4659-4666

Author(s):

Takayuki Osogami ◽

Rudy Raymond

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Positive Semidefinite ◽

Positive Semidefinite Matrix ◽

The Matrix ◽

Multi Agent ◽

Action Value ◽

The Individual ◽

Partially Observable ◽

Semidefinite Matrix

We study reinforcement learning for controlling multiple agents in a collaborative manner. In some of those tasks, it is insufficient for the individual agents to take relevant actions, but those actions should also have diversity. We propose the approach of using the determinant of a positive semidefinite matrix to approximate the action-value function in reinforcement learning, where we learn the matrix in a way that it represents the relevance and diversity of the actions. Experimental results show that the proposed approach allows the agents to learn a nearly optimal policy approximately ten times faster than baseline approaches in benchmark tasks of multi-agent reinforcement learning. The proposed approach is also shown to achieve the performance that cannot be achieved with conventional approaches in partially observable environment with exponentially large action space.

Download Full-text

A solution for the Elevators Group Dispatch by Multiagent Reinforcement Learning

10.5753/eniac.2019.9322 ◽

2019 ◽

Author(s):

Jordão Memória ◽

José Maia

Keyword(s):

Reinforcement Learning ◽

Function Approximation ◽

Value Function ◽

The State ◽

Evaluation Function ◽

State Action ◽

Traffic Pattern ◽

Multiagent Reinforcement Learning ◽

Multi Agent ◽

Action Value

In this work, a modeling and algorithm based on multiagent reinforcement learning is developed for the problem of elevator group dispatch. The main advantage is that, along with the function approximation, this multi-agent solution leads to reduction of the state space, allowing complex states to be addressed with a synthesizing evaluation function. Each elevator is considered an agent that have to decide about two actions: answer or ignore the new call. With some iterations, the agents learn the weights of an evaluation function which approximate the state-action value function. The performance of solution (average waiting time - AWT), shown varying the traffic pattern, flow of people, number of elevators and number of floors, is comparable to other current proposals reported in the literature.

Download Full-text

Design of Action-value Function in Motion Planning for Autonomous Blimp Robots

IEEJ Transactions on Electronics Information and Systems ◽

10.1541/ieejeiss.124.1930 ◽

2004 ◽

Vol 124 (10) ◽

pp. 1930-1937 ◽

Cited By ~ 3

Author(s):

Keiko Motoyama ◽

Hidenori Kawamura ◽

Masahito Yamamoto ◽

Azuma Ohuchi

Keyword(s):

Motion Planning ◽

Value Function ◽

Action Value

Download Full-text

Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.6055 ◽

2020 ◽

Vol 34 (04) ◽

pp. 5948-5955

Author(s):

Tian Tan ◽

Zhihan Xiong ◽

Vikranth R. Dwaracherla

Keyword(s):

Reinforcement Learning ◽

Network Architecture ◽

Value Function ◽

Point Of View ◽

Value Functions ◽

Computationally Efficient ◽

Computational Point ◽

Action Value ◽

Efficient Exploration ◽

Computational Resources

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.

Download Full-text

Low-rank State-action Value-function Approximation

10.23919/eusipco54536.2021.9616008 ◽

2021 ◽

Author(s):

Sergio Rozada ◽

Victor Tenorio ◽

Antonio G. Marques

Keyword(s):

Function Approximation ◽

Value Function ◽

Low Rank ◽

Value Function Approximation ◽

State Action ◽

Action Value

Download Full-text