An Application of Reinforced Learning-Based Dynamic Pricing for Improvement of Ridesharing Platform Service in Seoul

Jaein Song; Yun Ji Cho; Min Hee Kang; Kee Yeon Hwang

doi:10.3390/electronics9111818

An Application of Reinforced Learning-Based Dynamic Pricing for Improvement of Ridesharing Platform Service in Seoul

Electronics ◽

10.3390/electronics9111818 ◽

2020 ◽

Vol 9 (11) ◽

pp. 1818

Author(s):

Jaein Song ◽

Yun Ji Cho ◽

Min Hee Kang ◽

Kee Yeon Hwang

Keyword(s):

Reinforcement Learning ◽

Waiting Time ◽

Dynamic Pricing ◽

Learning Algorithm ◽

Residential Areas ◽

Private Companies ◽

Reward Function ◽

Time Period ◽

Reinforced Learning ◽

Centrality Analysis

As ridesharing services (including taxi) are often run by private companies, profitability is the top priority in operation. This leads to an increase in the driver’s refusal to take passengers to areas with low demand where they will have difficulties finding subsequent passengers, causing problems such as an extended waiting time when hailing a vehicle for passengers bound for these regions. The study used Seoul’s taxi data to find appropriate surge rates of ridesharing services between 10:00 p.m. and 4:00 a.m. by region using a reinforcement learning algorithm to resolve this problem during the worst time period. In reinforcement learning, the outcome of centrality analysis was applied as a weight affecting drivers’ destination choice probability. Furthermore, the reward function used in the learning was adjusted according to whether the passenger waiting time value was applied or not. The profit was used for reward value. By using a negative reward for the passenger waiting time, the study was able to identify a more appropriate surge level. Across the region, the surge averaged a value of 1.6. To be more specific, those located on the outskirts of the city and in residential areas showed a higher surge, while central areas had a lower surge. Due to this different surge, a driver’s refusal to take passengers can be lessened and the passenger waiting time can be shortened. The supply of ridesharing services in low-demand regions can be increased by as much as 7.5%, allowing regional equity problems related to ridesharing services in Seoul to be reduced to a greater extent.

Download Full-text

Safety-aware Adversarial Inverse Reinforcement Learning (S-AIRL) for Highway Autonomous Driving

Journal of Autonomous Vehicles and Systems ◽

10.1115/1.4053427 ◽

2022 ◽

pp. 1-14

Author(s):

Fangjian Li ◽

John R Wagner ◽

Yue Wang

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Risky Behaviors ◽

Autonomous Driving ◽

Inverse Reinforcement Learning ◽

Safety Issues ◽

Reward Function ◽

Sampling Process ◽

Safety Awareness ◽

Driving Scenario

Abstract Inverse reinforcement learning (IRL) has been successfully applied in many robotics and autonomous driving studies without the need for hand-tuning a reward function. However, it suffers from safety issues. Compared to the reinforcement learning (RL) algorithms, IRL is even more vulnerable to unsafe situations as it can only infer the importance of safety based on expert demonstrations. In this paper, we propose a safety-aware adversarial inverse reinforcement learning algorithm (S-AIRL). First, the control barrier function (CBF) is used to guide the training of a safety critic, which leverages the knowledge of system dynamics in the sampling process without training an additional guiding policy. The trained safety critic is then integrated into the discriminator to help discern the generated data and expert demonstrations from the standpoint of safety. Finally, to further improve the safety awareness, a regulator is introduced in the loss function of the discriminator training to prevent the recovered reward function from assigning high rewards to the risky behaviors. We tested our S-AIRL in the highway autonomous driving scenario. Comparing to the original AIRL algorithm, with the same level of imitation learning (IL) performance, the proposed S-AIRL can reduce the collision rate by 32.6%.

Download Full-text

Driver-like decision-making method for vehicle longitudinal autonomous driving based on deep reinforcement learning

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/09544070211063081 ◽

2021 ◽

pp. 095440702110630

Author(s):

Zhenhai Gao ◽

Xiangtong Yan ◽

Fei Gao ◽

Lei He

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Learning Algorithm ◽

Autonomous Driving ◽

Decision Strategies ◽

Reward Function ◽

Human Driver ◽

Reward Functions ◽

A Current ◽

Better Than

Decision-making is one of the key parts of the research on vehicle longitudinal autonomous driving. Considering the behavior of human drivers when designing autonomous driving decision-making strategies is a current research hotspot. In longitudinal autonomous driving decision-making strategies, traditional rule-based decision-making strategies are difficult to apply to complex scenarios. Current decision-making methods that use reinforcement learning and deep reinforcement learning construct reward functions designed with safety, comfort, and economy. Compared with human drivers, the obtained decision strategies still have big gaps. Focusing on the above problems, this paper uses the driver’s behavior data to design the reward function of the deep reinforcement learning algorithm through BP neural network fitting, and uses the deep reinforcement learning DQN algorithm and the DDPG algorithm to establish two driver-like longitudinal autonomous driving decision-making models. The simulation experiment compares the decision-making effect of the two models with the driver curve. The results shows that the two algorithms can realize driver-like decision-making, and the consistency of the DDPG algorithm and human driver behavior is higher than that of the DQN algorithm, the effect of the DDPG algorithm is better than the DQN algorithm.

Download Full-text

A Reward Optimization Method Based on Action Subrewards in Hierarchical Reinforcement Learning

The Scientific World JOURNAL ◽

10.1155/2014/120760 ◽

2014 ◽

Vol 2014 ◽

pp. 1-6

Author(s):

Yuchen Fu ◽

Quan Liu ◽

Xionghong Ling ◽

Zhiming Cui

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Optimization Method ◽

Curse Of Dimensionality ◽

Convergence Speed ◽

Learning Method ◽

Trial And Error ◽

State Spaces ◽

Reward Function ◽

Hierarchical Reinforcement Learning

Reinforcement learning (RL) is one kind of interactive learning methods. Its main characteristics are “trial and error” and “related reward.” A hierarchical reinforcement learning method based on action subrewards is proposed to solve the problem of “curse of dimensionality,” which means that the states space will grow exponentially in the number of features and low convergence speed. The method can reduce state spaces greatly and choose actions with favorable purpose and efficiency so as to optimize reward function and enhance convergence speed. Apply it to the online learning in Tetris game, and the experiment result shows that the convergence speed of this algorithm can be enhanced evidently based on the new method which combines hierarchical reinforcement learning algorithm and action subrewards. The “curse of dimensionality” problem is also solved to a certain extent with hierarchical method. All the performance with different parameters is compared and analyzed as well.

Download Full-text

A hierarchical reinforcement learning algorithm based on heuristic reward function

2010 2nd International Conference on Advanced Computer Control ◽

10.1109/icacc.2010.5486837 ◽

2010 ◽

Author(s):

Qicui Yan ◽

Quan Liu ◽

Daojing Hu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Reward Function ◽

Hierarchical Reinforcement Learning ◽

Reinforcement Learning Algorithm

Download Full-text

Reward-Free Reinforcement Learning Algorithm Using Prediction Network

Fuzzy Systems and Data Mining VI - Frontiers in Artificial Intelligence and Applications ◽

10.3233/faia200744 ◽

2020 ◽

Author(s):

Zhen Yu ◽

Yimin Feng ◽

Lijun Liu

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Value Functions ◽

Learning Method ◽

Reward Function ◽

Network Training ◽

Learning Tasks ◽

Reward Value ◽

Policy Gradient ◽

Reward Functions

In general reinforcement learning tasks, the formulation of reward functions is a very important step in reinforcement learning. The reward function is not easy to formulate in a large number of systems. The network training effect is sensitive to the reward function, and different reward value functions will get different results. For a class of systems that meet specific conditions, the traditional reinforcement learning method is improved. A state quantity function is designed to replace the reward function, which is more efficient than the traditional reward function. At the same time, the predictive network link is designed so that the network can learn the value of the general state by using the special state. The overall structure of the network will be improved based on the Deep Deterministic Policy Gradient (DDPG) algorithm. Finally, the algorithm was successfully applied in the environment of FrozenLake, and achieved good performance. The experiment proves the effectiveness of the algorithm and realizes rewardless reinforcement learning in a class of systems.

Download Full-text

Deep reinforcement learning algorithm for dynamic pricing of express lanes with multiple access locations

Transportation Research Part C Emerging Technologies ◽

10.1016/j.trc.2020.102715 ◽

2020 ◽

Vol 119 ◽

pp. 102715 ◽

Cited By ~ 1

Author(s):

Venktesh Pandey ◽

Evana Wang ◽

Stephen D. Boyles

Keyword(s):

Reinforcement Learning ◽

Dynamic Pricing ◽

Multiple Access ◽

Learning Algorithm ◽

Express Lanes ◽

Reinforcement Learning Algorithm

Download Full-text

Multi-Frame Star Image Denoising Algorithm Based on Deep Reinforcement Learning and Mixed Poisson–Gaussian Likelihood

Sensors ◽

10.3390/s20215983 ◽

2020 ◽

Vol 20 (21) ◽

pp. 5983

Author(s):

Ming Xie ◽

Zhenduo Zhang ◽

Wenbo Zheng ◽

Ying Li ◽

Kai Cao

Keyword(s):

Reinforcement Learning ◽

Image Denoising ◽

Gaussian Noise ◽

Likelihood Function ◽

Learning Algorithm ◽

Likelihood Estimation ◽

Star Image ◽

Reward Function ◽

Markov Decision ◽

Star Images

Mixed Poisson–Gaussian noise exists in the star images and is difficult to be effectively suppressed via maximum likelihood estimation (MLE) method due to its complicated likelihood function. In this article, the MLE method is incorporated with a state-of-the-art machine learning algorithm in order to achieve accurate restoration results. By applying the mixed Poisson–Gaussian likelihood function as the reward function of a reinforcement learning algorithm, an agent is able to form the restored image that achieves the maximum value of the complex likelihood function through the Markov Decision Process (MDP). In order to provide the appropriate parameter settings of the denoising model, the key hyperparameters of the model and their influences on denoising results are tested through simulated experiments. The model is then compared with two existing star image denoising methods so as to verify its performance. The experiment results indicate that this algorithm based on reinforcement learning is able to suppress the mixed Poisson–Gaussian noise in the star image more accurately than the traditional MLE method, as well as the method based on the deep convolutional neural network (DCNN).

Download Full-text

Decision-Making for the Autonomous Navigation of Maritime Autonomous Surface Ships Based on Scene Division and Deep Reinforcement Learning

Sensors ◽

10.3390/s19184055 ◽

2019 ◽

Vol 19 (18) ◽

pp. 4055 ◽

Cited By ~ 9

Author(s):

Zhang ◽

Wang ◽

Liu ◽

Chen

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Collision Avoidance ◽

Autonomous Navigation ◽

Learning Algorithm ◽

Q Learning ◽

Reward Function ◽

International Regulations ◽

Convergence Trend ◽

Decision Making Model

This research focuses on the adaptive navigation of maritime autonomous surface ships (MASSs) in an uncertain environment. To achieve intelligent obstacle avoidance of MASSs in a port, an autonomous navigation decision-making model based on hierarchical deep reinforcement learning is proposed. The model is mainly composed of two layers: the scene division layer and an autonomous navigation decision-making layer. The scene division layer mainly quantifies the sub-scenarios according to the International Regulations for Preventing Collisions at Sea (COLREG). This research divides the navigational situation of a ship into entities and attributes based on the ontology model and Protégé language. In the decision-making layer, we designed a deep Q-learning algorithm utilizing the environmental model, ship motion space, reward function, and search strategy to learn the environmental state in a quantized sub-scenario to train the navigation strategy. Finally, two sets of verification experiments of the deep reinforcement learning (DRL) and improved DRL algorithms were designed with Rizhao port as a study case. Moreover, the experimental data were analyzed in terms of the convergence trend, iterative path, and collision avoidance effect. The results indicate that the improved DRL algorithm could effectively improve the navigation safety and collision avoidance.

Download Full-text

A Study of Continuous Maximum Entropy Deep Inverse Reinforcement Learning

Mathematical Problems in Engineering ◽

10.1155/2019/4834516 ◽

2019 ◽

Vol 2019 ◽

pp. 1-8

Author(s):

Xi-liang Chen ◽

Lei Cao ◽

Zhi-xiong Xu ◽

Jun Lai ◽

Chen-xi Li

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Maximum Entropy ◽

Learning Algorithm ◽

Action Space ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Continuous State Space ◽

Hot Start ◽

Continuous State

The assumption of IRL is that demonstrations are optimally acting in an environment. In the past, most of the work on IRL needed to calculate optimal policies for different reward functions. However, this requirement is difficult to satisfy in large or continuous state space tasks. Let alone continuous action space. We propose a continuous maximum entropy deep inverse reinforcement learning algorithm for continuous state space and continues action space, which realizes the depth cognition of the environment model by the way of reconstructing the reward function based on the demonstrations, and a hot start mechanism based on demonstrations to make the training process faster and better. We compare this new approach to well-known IRL algorithms using Maximum Entropy IRL, DDPG, hot start DDPG, etc. Empirical results on classical control environments on OpenAI Gym: MountainCarContinues-v0 show that our approach is able to learn policies faster and better.

Download Full-text

Seizure Control in a Computational Model Using a Reinforcement Learning Stimulation Paradigm

International Journal of Neural Systems ◽

10.1142/s0129065717500125 ◽

2017 ◽

Vol 27 (07) ◽

pp. 1750012 ◽

Cited By ~ 8

Author(s):

Vivek Nagaraj ◽

Andrew Lamperski ◽

Theoden I Netoff

Keyword(s):

Reinforcement Learning ◽

Computational Model ◽

Therapeutic Efficacy ◽

Learning Algorithm ◽

Field Potential ◽

Stimulation Frequency ◽

Patient Specific ◽

Reward Function ◽

Wide Range ◽

Reinforcement Learning Algorithm

Neuromodulation technologies such as vagus nerve stimulation and deep brain stimulation, have shown some efficacy in controlling seizures in medically intractable patients. However, inherent patient-to-patient variability of seizure disorders leads to a wide range of therapeutic efficacy. A patient specific approach to determining stimulation parameters may lead to increased therapeutic efficacy while minimizing stimulation energy and side effects. This paper presents a reinforcement learning algorithm that optimizes stimulation frequency for controlling seizures with minimum stimulation energy. We apply our method to a computational model called the epileptor. The epileptor model simulates inter-ictal and ictal local field potential data. In order to apply reinforcement learning to the Epileptor, we introduce a specialized reward function and state-space discretization. With the reward function and discretization fixed, we test the effectiveness of the temporal difference reinforcement learning algorithm (TD(0)). For periodic pulsatile stimulation, we derive a relation that describes, for any stimulation frequency, the minimal pulse amplitude required to suppress seizures. The TD(0) algorithm is able to identify parameters that control seizures quickly. Additionally, our results show that the TD(0) algorithm refines the stimulation frequency to minimize stimulation energy thereby converging to optimal parameters reliably. An advantage of the TD(0) algorithm is that it is adaptive so that the parameters necessary to control the seizures can change over time. We show that the algorithm can converge on the optimal solution in simulation with slow and fast inter-seizure intervals.

Download Full-text