A Modified Deep Deterministic Policy Gradient Algorithm for Data-Driven Inventory Management

A data-driven PEMFC output voltage control method is proposed. Moreover, an Improved deep deterministic policy gradient algorithm is proposed for this method. The algorithm introduces three techniques: Clipped multiple Q-learning, policy delay update, and policy smoothing to improve the robustness of the control policy. In this algorithm, the hydrogen controller is treated as an agent, which is pre-trained to fully interact with the environment and obtain the optimal control policy. The effectiveness of the proposed algorithm is demonstrated experimentally.

Download Full-text

Dynamic Online Ordering and Data-Driven Inventory Management System with SMS for Security

International Journal of Emerging Trends in Engineering Research ◽

10.30534/ijeter/2020/221892020 ◽

2020 ◽

Vol 8 (9) ◽

pp. 6286-6290

Keyword(s):

Inventory Management ◽

Management System ◽

Data Driven ◽

Inventory Management System

Download Full-text

Data-Driven Inventory Management Solution for Procurement and Supply Chain of Utility Company

SSRN Electronic Journal ◽

10.2139/ssrn.3582428 ◽

2020 ◽

Author(s):

NUR AZALIAH ABU BAKAR

Keyword(s):

Supply Chain ◽

Inventory Management ◽

Data Driven ◽

Utility Company

Download Full-text

Heuristic Gait Learning of Quadruped Robot Based on Deep Deterministic Policy Gradient Algorithm

2020 Chinese Automation Congress (CAC) ◽

10.1109/cac51589.2020.9326973 ◽

2020 ◽

Author(s):

Mingchao Wang ◽

Xiaogang Ruan ◽

Xiaoqing Zhu

Keyword(s):

Quadruped Robot ◽

Gradient Algorithm ◽

Policy Gradient ◽

Gait Learning

Download Full-text

Safe option-critic: learning safety in the option-critic architecture

The Knowledge Engineering Review ◽

10.1017/s0269888921000035 ◽

2021 ◽

Vol 36 ◽

Author(s):

Arushi Jain ◽

Khimya Khetarpal ◽

Doina Precup

Keyword(s):

Model Uncertainty ◽

Gradient Algorithm ◽

Intrinsic Variability ◽

Expected Return ◽

Practical Applications ◽

Hierarchical Reinforcement Learning ◽

Continuous State ◽

End Conditions ◽

Policy Gradient ◽

High Uncertainty

Abstract Designing hierarchical reinforcement learning algorithms that exhibit safe behaviour is not only vital for practical applications but also facilitates a better understanding of an agent’s decisions. We tackle this problem in the options framework (Sutton, Precup & Singh, 1999), a particular way to specify temporally abstract actions which allow an agent to use sub-policies with start and end conditions. We consider a behaviour as safe that avoids regions of state space with high uncertainty in the outcomes of actions. We propose an optimization objective that learns safe options by encouraging the agent to visit states with higher behavioural consistency. The proposed objective results in a trade-off between maximizing the standard expected return and minimizing the effect of model uncertainty in the return. We propose a policy gradient algorithm to optimize the constrained objective function. We examine the quantitative and qualitative behaviours of the proposed approach in a tabular grid world, continuous-state puddle world, and three games from the Arcade Learning Environment: Ms. Pacman, Amidar, and Q*Bert. Our approach achieves a reduction in the variance of return, boosts performance in environments with intrinsic variability in the reward structure, and compares favourably both with primitive actions and with risk-neutral options.

Download Full-text

Preceding vehicle following algorithm with human driving characteristics

Proceedings of the Institution of Mechanical Engineers Part D Journal of Automobile Engineering ◽

10.1177/0954407020981546 ◽

2021 ◽

pp. 095440702098154

Author(s):

Feng Pan ◽

Hong Bao

Keyword(s):

Reinforcement Learning ◽

Weight Vector ◽

Gradient Algorithm ◽

Inner Product ◽

Inverse Reinforcement Learning ◽

Reward Function ◽

Human Driver ◽

Policy Gradient ◽

Preceding Vehicle ◽

Action Spaces

This paper proposes a new approach of using reinforcement learning (RL) to train an agent to perform the task of vehicle following with human driving characteristics. We refer to the ideal of inverse reinforcement learning to design the reward function of the RL model. The factors that need to be weighed in vehicle following were vectorized into reward vectors, and the reward function was defined as the inner product of the reward vector and weights. Driving data of human drivers was collected and analyzed to obtain the true reward function. The RL model was trained with the deterministic policy gradient algorithm because the state and action spaces are continuous. We adjusted the weight vector of the reward function so that the value vector of the RL model could continuously approach that of a human driver. After dozens of rounds of training, we selected the policy with the nearest value vector to that of a human driver and tested it in the PanoSim simulation environment. The results showed the desired performance for the task of an agent following the preceding vehicle safely and smoothly.

Download Full-text

Data-Driven Inventory Management: Integrated Estimation and Optimization

SSRN Electronic Journal ◽

10.2139/ssrn.3256643 ◽

2018 ◽

Author(s):

Fabian Taigel ◽

Jan Meller

Keyword(s):

Inventory Management ◽

Data Driven ◽

Integrated Estimation

Download Full-text

Deterministic Value-Policy Gradients

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5732 ◽

2020 ◽

Vol 34 (04) ◽

pp. 3316-3323

Author(s):

Qingpeng Cai ◽

Ling Pan ◽

Pingzhong Tang

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

Learning Algorithms ◽

Infinite Horizon ◽

Gradient Algorithm ◽

Continuous Control ◽

Model Bias ◽

Model Free ◽

Policy Gradient ◽

Analytical Gradients

Reinforcement learning algorithms such as the deep deterministic policy gradient algorithm (DDPG) has been widely used in continuous control tasks. However, the model-free DDPG algorithm suffers from high sample complexity. In this paper we consider the deterministic value gradients to improve the sample efficiency of deep reinforcement learning algorithms. Previous works consider deterministic value gradients with the finite horizon, but it is too myopic compared with infinite horizon. We firstly give a theoretical guarantee of the existence of the value gradients in this infinite setting. Based on this theoretical guarantee, we propose a class of the deterministic value gradient algorithm (DVG) with infinite horizon, and different rollout steps of the analytical gradients by the learned model trade off between the variance of the value gradients and the model bias. Furthermore, to better combine the model-based deterministic value gradient estimators with the model-free deterministic policy gradient estimator, we propose the deterministic value-policy gradient (DVPG) algorithm. We finally conduct extensive experiments comparing DVPG with state-of-the-art methods on several standard continuous control benchmarks. Results demonstrate that DVPG substantially outperforms other baselines.

Download Full-text

A Novel Solution to JSPs Based on Long Short-Term Memory and Policy Gradient Algorithm

International Journal of Simulation Modelling ◽

10.2507/ijsimm19-1-co4 ◽

2020 ◽

Vol 19 (1) ◽

pp. 157-168

Author(s):

J. F. Ren ◽

C. M. Ye ◽

F. Yang

Keyword(s):

Short Term Memory ◽

Gradient Algorithm ◽

Short Term ◽

Term Memory ◽

Policy Gradient ◽

Long Short Term Memory

Download Full-text

End-to-end optimization of goal-driven and visually grounded dialogue systems

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/385 ◽

2017 ◽

Cited By ~ 16

Author(s):

Florian Strub ◽

Harm de Vries ◽

Jérémie Mary ◽

Bilal Piot ◽

Aaron Courville ◽

...

Keyword(s):

Gradient Algorithm ◽

Planning Problem ◽

Dialogue Systems ◽

Generation Task ◽

Policy Gradient ◽

Specific Object ◽

Complex Picture ◽

History Of ◽

End To End ◽

Task Oriented

End-to-end design of dialogue systems has recently become a popular research topic thanks to powerful tools such as encoder-decoder architectures for sequence-to-sequence learning. Yet, most current approaches cast human-machine dialogue management as a supervised learning problem, aiming at predicting the next utterance of a participant given the full history of the dialogue. This vision may fail to correctly render the planning problem inherent to dialogue as well as its contextual and grounded nature. In this paper, we introduce a Deep Reinforcement Learning method to optimize visually grounded task-oriented dialogues, based on the policy gradient algorithm. This approach is tested on the question generation task from the dataset GuessWhat?! containing 120k dialogues and provides encouraging results at solving both the problem of generating natural dialogues and the task of discovering a specific object in a complex picture.

Download Full-text