Data-Driven Reinforcement-Learning-Based Automatic Bucket-Filling for Wheel Loaders

Automation of bucket-filling is of crucial significance to the fully automated systems for wheel loaders. Most previous works are based on a physical model, which cannot adapt to the changeable and complicated working environment. Thus, in this paper, a data-driven reinforcement-learning (RL)-based approach is proposed to achieve automatic bucket-filling. An automatic bucket-filling algorithm based on Q-learning is developed to enhance the adaptability of the autonomous scooping system. A nonlinear, non-parametric statistical model is also built to approximate the real working environment using the actual data obtained from tests. The statistical model is used for predicting the state of wheel loaders in the bucket-filling process. Then, the proposed algorithm is trained on the prediction model. Finally, the results of the training confirm that the proposed algorithm has good performance in adaptability, convergence, and fuel consumption in the absence of a physical model. The results also demonstrate the transfer learning capability of the proposed approach. The proposed method can be applied to different machine-pile environments.

Download Full-text

Reinforcement Learning

10.31219/osf.io/dz6sx ◽

2019 ◽

Author(s):

Jimut Bahan Pal ◽

Debadri Chatterjee ◽

Sounak Modak

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Industrial Applications ◽

Machine Learning Algorithms ◽

Learning Capability ◽

Q Learning ◽

Model Free ◽

Learning Agent ◽

Living Organisms ◽

Free Machine

Reinforcement Learning (RL) is one of the model free machine learning algorithms where the agent learns its behaviours from the environment by actually interacting with it. This is better than the offline planner because the agent actually interacts with the environment to learn its behaviours because it is almost impossible to simulate a real world in a computer. By using the reinforcement learning, the agent learns those extra features which can only be learned in an real world environment hence giving it a learning capability like living organisms because in a real world there are certain parameters which cannot be simulated by a computer. Since the reinforcement learning agent gets its feedback from the environment, it allows the agent to automatically determine its behaviours that are considered ideal within a specified context. Reinforcement learning is deemed important in the field of artificial intelligence as it starts to make breakthrough and benchmarks in various industrial applications. Previously we have analysed the pacman game where the pacman agent is a reflex agent, here, we are trying to make the pacman agent more smarter by applying RL techniques, i.e, Q-learning successfully.

Download Full-text

Personalized project recommendations: using reinforcement learning

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1619-6 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 1

Author(s):

Faxin Qi ◽

Xiangrong Tong ◽

Lei Yu ◽

Yingjie Wang

Keyword(s):

Reinforcement Learning ◽

User Behavior ◽

Collaborative Work ◽

Recursive Least Squares ◽

The Internet ◽

Dynamic Impact ◽

Rls Algorithm ◽

Trust Value ◽

Q Learning ◽

Actual Evaluation

AbstractWith the development of the Internet and the progress of human-centered computing (HCC), the mode of man-machine collaborative work has become more and more popular. Valuable information in the Internet, such as user behavior and social labels, is often provided by users. A recommendation based on trust is an important human-computer interaction recommendation application in a social network. However, previous studies generally assume that the trust value between users is static, unable to respond to the dynamic changes of user trust and preferences in a timely manner. In fact, after receiving the recommendation, there is a difference between actual evaluation and expected evaluation which is correlated with trust value. Based on the dynamics of trust and the changing process of trust between users, this paper proposes a trust boost method through reinforcement learning. Recursive least squares (RLS) algorithm is used to learn the dynamic impact of evaluation difference on user’s trust. In addition, a reinforcement learning method Deep Q-Learning (DQN) is studied to simulate the process of learning user’s preferences and boosting trust value. Experiments indicate that our method applied to recommendation systems could respond to the changes quickly on user’s preferences. Compared with other methods, our method has better accuracy on recommendation.

Download Full-text

Improving Student-System Interaction Through Data-driven Explanations of Hierarchical Reinforcement Learning Induced Pedagogical Policies

Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization ◽

10.1145/3340631.3394848 ◽

2020 ◽

Author(s):

Guojing Zhou ◽

Xi Yang ◽

Hamoon Azizsoltani ◽

Tiffany Barnes ◽

Min Chi

Keyword(s):

Reinforcement Learning ◽

Data Driven ◽

Hierarchical Reinforcement Learning

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Data-driven dynamic multi-objective optimal control: A Hamiltonian-inequality driven satisficing reinforcement learning approach

IFAC-PapersOnLine ◽

10.1016/j.ifacol.2020.12.2275 ◽

2020 ◽

Vol 53 (2) ◽

pp. 8070-8075

Author(s):

Majid Mazouchi ◽

Yongliang Yang ◽

Hamidreza Modares

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Data Driven ◽

Learning Approach ◽

Multi Objective

Download Full-text

Deep Reinforcement Learning With Spatio-Temporal Traffic Forecasting for Data-Driven Base Station Sleep Control

IEEE/ACM Transactions on Networking ◽

10.1109/tnet.2021.3053771 ◽

2021 ◽

pp. 1-14

Author(s):

Qiong Wu ◽

Xu Chen ◽

Zhi Zhou ◽

Liang Chen ◽

Junshan Zhang

Keyword(s):

Reinforcement Learning ◽

Base Station ◽

Data Driven ◽

Traffic Forecasting ◽

Spatio Temporal ◽

Sleep Control

Download Full-text

Joint relay and channel selection in relay‐aided anti‐jamming system: A reinforcement learning approach

Transactions on Emerging Telecommunications Technologies ◽

10.1002/ett.4243 ◽

2021 ◽

Author(s):

Luying Huang ◽

Tao Xu ◽

Xueqiang Chen ◽

Yitao Xu ◽

Xiao Zhang ◽

...

Keyword(s):

Reinforcement Learning ◽

Channel Selection ◽

Learning Approach ◽

System A

Download Full-text

Operating data-driven inverse design optimization for product usage personalization with an application to wheel loaders

Journal of Industrial Information Integration ◽

10.1016/j.jii.2021.100212 ◽

2021 ◽

Vol 23 ◽

pp. 100212

Author(s):

Wei Zhang ◽

Shaojie Wang ◽

Liang Hou ◽

Roger J. Jiao

Keyword(s):

Design Optimization ◽

Inverse Design ◽

Data Driven ◽

Product Usage ◽

Operating Data ◽

Wheel Loaders

Download Full-text

Aircraft Maintenance Check Scheduling Using Reinforcement Learning

Aerospace ◽

10.3390/aerospace8040113 ◽

2021 ◽

Vol 8 (4) ◽

pp. 113

Author(s):

Pedro Andrade ◽

Catarina Silva ◽

Bernardete Ribeiro ◽

Bruno F. Santos

Keyword(s):

Reinforcement Learning ◽

Time Horizon ◽

Learning Algorithm ◽

Initial Conditions ◽

Q Learning ◽

Scheduling Policy ◽

Real Scenario ◽

Maintenance Plan ◽

Small Disturbances

This paper presents a Reinforcement Learning (RL) approach to optimize the long-term scheduling of maintenance for an aircraft fleet. The problem considers fleet status, maintenance capacity, and other maintenance constraints to schedule hangar checks for a specified time horizon. The checks are scheduled within an interval, and the goal is to, schedule them as close as possible to their due date. In doing so, the number of checks is reduced, and the fleet availability increases. A Deep Q-learning algorithm is used to optimize the scheduling policy. The model is validated in a real scenario using maintenance data from 45 aircraft. The maintenance plan that is generated with our approach is compared with a previous study, which presented a Dynamic Programming (DP) based approach and airline estimations for the same period. The results show a reduction in the number of checks scheduled, which indicates the potential of RL in solving this problem. The adaptability of RL is also tested by introducing small disturbances in the initial conditions. After training the model with these simulated scenarios, the results show the robustness of the RL approach and its ability to generate efficient maintenance plans in only a few seconds.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text