Categorization of material quality using a model-free reinforcement learning algorithm

Abstract Reinforcement learning is the most preferred algorithms for optimization problems in industrial automation. Model-free reinforcement learning algorithms optimize for rewards without the knowledge of the environmental dynamics and require less computation. Regulating the quality of the raw materials in the inbound inventory can improve the manufacturing process. In this paper, the raw materials arriving at the incoming inspection process are categorized and labeled based on their quality through the path traveled. A model-free temporal difference learning approach is used to predict the acceptance and rejection path of raw materials in the incoming inspection process. The algorithm presented eight routes paths that the raw materials could travel. Four pathways correspond to material acceptance, while the rest lead to material refusal. The materials are annotated using the total scores acquired in the incoming inspection process. The materials traveling on the ideal path (path A) get the highest total score. The rest of the accepted materials in the acceptance path have a 7.37% lower score in path B, whereas path C and path D get 37.28% and 42.44% lower than the ideal approach.

Download Full-text

Comparison of optimized Markov Decision Process using Dynamic Programming and Temporal Differencing – A reinforcement learning approach

Journal of Physics Conference Series ◽

10.1088/1742-6596/2107/1/012026 ◽

2021 ◽

Vol 2107 (1) ◽

pp. 012026

Author(s):

Annapoorni Mani ◽

Shahriman Abu Bakar ◽

Pranesh Krishnan ◽

Sazali Yaacob

Keyword(s):

Dynamic Programming ◽

Reinforcement Learning ◽

Raw Materials ◽

Raw Material ◽

Process Failure ◽

Markov Decision ◽

Inspection Process ◽

Research Problems ◽

Temporal Differencing ◽

Incoming Inspection

Abstract Reinforcement learning is one of the promising approaches for operations research problems. The incoming inspection process in any manufacturing plant aims to control quality, reduce manufacturing costs, eliminate scrap, and process failure downtimes due to non-conforming raw materials. Prediction of the raw material acceptance rate can regulate the raw material supplier selection and improve the manufacturing process by filtering out non-conformities. This paper presents a Markov model developed to estimate the probability of the raw material being accepted or rejected in an incoming inspection environment. The proposed forecasting model is further optimized for efficiency using the two reinforcement learning algorithms (dynamic programming and temporal differencing). The results of the two optimized models are compared, and the findings are discussed.

Download Full-text

A New Distributed Reinforcement Learning Algorithm for Multiple Objective Optimization Problems

Advances in Artificial Intelligence - Lecture Notes in Computer Science ◽

10.1007/3-540-44399-1_30 ◽

2000 ◽

pp. 290-299 ◽

Cited By ~ 10

Author(s):

Carlos Mariano ◽

Eduardo Morales

Keyword(s):

Reinforcement Learning ◽

Optimization Problems ◽

Learning Algorithm ◽

Multiple Objective ◽

Multiple Objective Optimization ◽

Distributed Reinforcement ◽

Reinforcement Learning Algorithm

Download Full-text

An Enhanced Model-Free Reinforcement Learning Algorithm to Solve Nash Equilibrium for Multi-Agent Cooperative Game Systems

IEEE Access ◽

10.1109/access.2020.3043806 ◽

2020 ◽

Vol 8 ◽

pp. 223743-223755

Author(s):

Yuannan Jiang ◽

Fuxiao Tan

Keyword(s):

Reinforcement Learning ◽

Nash Equilibrium ◽

Cooperative Game ◽

Learning Algorithm ◽

Model Free ◽

Multi Agent ◽

Reinforcement Learning Algorithm

Download Full-text

Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

Journal of Artificial Intelligence Research ◽

10.1613/jair.1666 ◽

2005 ◽

Vol 24 ◽

pp. 81-108 ◽

Cited By ~ 65

Author(s):

P. Geibel ◽

F. Wysotzki

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Learning Algorithm ◽

Optimal Solution ◽

Feed Tank ◽

Model Free ◽

Constrained Problem ◽

Risk Sensitive ◽

Markov Decision ◽

The Value Function

In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed.

Download Full-text

Impact of Raw Material Quality on Performance of Mixing and Charging Machines in Drilling and Blasting Operations

Горная промышленность ◽

10.30686/1609-9192-2021-1-69-73 ◽

2021 ◽

pp. 69-73

Author(s):

T.S. Morozova

Keyword(s):

Diesel Fuel ◽

Raw Materials ◽

Raw Material ◽

Production Facility ◽

High Quality ◽

Equipment Failures ◽

Hazardous Production ◽

Materials Used ◽

Incoming Inspection

A study into the failure causes of mixing and charging equipment confirms that the main impact on the probability of accidents is the use of raw materials that do not meet the specifications and have unstable properties. The raw materials used for explosives preparation in mechanized charging of boreholes include such components as ammonium nitrate, emulsion phase, diesel fuel, emulsifier and others. The paper describes the application of various formulations with these components in specific types of mixing and charging machines manufactured by AZOTTECH LLC. The main properties that affect the quality of raw materials are summarised, and the incoming inspection of explosive components is described as part of the acceptance procedure at temporary storage sites at a hazardous production facility. The paper describes common types of equipment failures and maintenance procedures when using substandard raw materials. The conclusion highlights the key practices to improve the equipment uptime as well as recommendations for incoming inspection and the use of high-quality explosive components.

Download Full-text

Enhanced Policy Adaptation Through Directed Explorative Learning

International Journal of Humanoid Robotics ◽

10.1142/s0219843615500280 ◽

2015 ◽

Vol 12 (03) ◽

pp. 1550028 ◽

Cited By ~ 2

Author(s):

Rok Vuga ◽

Bojan Nemec ◽

Aleš Ude

Keyword(s):

Reinforcement Learning ◽

Iterative Learning Control ◽

Optimization Problems ◽

Learning Algorithm ◽

Learning Control ◽

Iterative Learning ◽

Learning Framework ◽

Policy Adaptation ◽

Explorative Learning ◽

Integrated Policy

In this paper, we propose an integrated policy learning framework that fuses iterative learning control (ILC) and reinforcement learning. Integration is accomplished at the exploration level of the reinforcement learning algorithm. The proposed algorithm combines fast convergence properties of iterative learning control and robustness of reinforcement learning. This way, the advantages of both approaches are retained while overcoming their respective limitations. The proposed approach was verified in simulation and in real robot experiments on three challenging motion optimization problems.

Download Full-text

DIRECT RADIOCARBON DATING OF CHARCOAL-BASED INK IN PAPYRI: A FEASIBILITY STUDY

Radiocarbon ◽

10.1017/rdc.2020.94 ◽

2020 ◽

pp. 1-8

Author(s):

Mariaelena Fedi ◽

Serena Barone ◽

Luca Carraresi ◽

Simona Dominici ◽

Lucia Liccioli

Keyword(s):

Feasibility Study ◽

Radiocarbon Dating ◽

Soluble Fraction ◽

Raw Materials ◽

Experimental Procedure ◽

Arabic Gum ◽

Suitable Material ◽

The Past ◽

Ideal Approach ◽

The Ideal

ABSTRACT When dating documents by radiocarbon (14C), what we typically measure is the concentration of the support (e.g. paper, parchment, or papyrus). This can however lead to a possible misinterpretation of results because the support may be older than the writing itself. To minimize such a possible ambiguity, the ideal approach would be direct dating of the ink or color (if organic). Here we propose a feasibility study to date carbon-based black inks when deposited on papyrus, one of the most widespread writing supports used in the past. We prepared test samples, using a commercial papyrus and a homemade black ink, obtained combining modern charcoal fragments and Arabic gum. Even though the ink binder might have represented the best candidate to be dated, we verified by FTIR that the molecular composition of its soluble fraction is very similar to papyrus extractives, thus identifying the residual charcoals recovered after extraction as the most suitable material for the measurement. Enough charcoal material was extracted from the test samples and processed using our new setup optimized for microgram-size samples. The overall experimental procedure was found to be reproducible, and measured 14C concentrations were coherent with the data obtained from larger samples and raw materials.

Download Full-text

FlexPool: A Distributed Model-Free Deep Reinforcement Learning Algorithm for Joint Passengers and Goods Transportation

IEEE Transactions on Intelligent Transportation Systems ◽

10.1109/tits.2020.3048361 ◽

2021 ◽

pp. 1-13

Author(s):

Kaushik Manchella ◽

Abhishek K. Umrawal ◽

Vaneet Aggarwal

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Distributed Model ◽

Model Free ◽

Reinforcement Learning Algorithm

Download Full-text

Autonomous Drifting Using Reinforcement Learning

Periodica Polytechnica Transportation Engineering ◽

10.3311/pptr.18581 ◽

2021 ◽

Author(s):

László Orgován ◽

Tamás Bécsi ◽

Szilárd Aradi

Keyword(s):

Reinforcement Learning ◽

Autonomous Vehicles ◽

Learning Algorithm ◽

Autonomous Vehicle ◽

Autonomous Driving ◽

The Road ◽

Model Free ◽

On The Road ◽

Self Driving Cars ◽

Planning Problems

Autonomous vehicles or self-driving cars are prevalent nowadays, many vehicle manufacturers, and other tech companies are trying to develop autonomous vehicles. One major goal of the self-driving algorithms is to perform manoeuvres safely, even when some anomaly arises. To solve these kinds of complex issues, Artificial Intelligence and Machine Learning methods are used. One of these motion planning problems is when the tires lose their grip on the road, an autonomous vehicle should handle this situation. Thus the paper provides an Autonomous Drifting algorithm using Reinforcement Learning. The algorithm is based on a model-free learning algorithm, Twin Delayed Deep Deterministic Policy Gradients (TD3). The model is trained on six different tracks in a simulator, which is developed specifically for autonomous driving systems; namely CARLA.

Download Full-text

Path-Integral-Based Reinforcement Learning Algorithm for Goal-Directed Locomotion of Snake-Shaped Robot

Discrete Dynamics in Nature and Society ◽

10.1155/2021/8824377 ◽

2021 ◽

Vol 2021 ◽

pp. 1-12

Author(s):

Qi Yongqiang ◽

Yang Hailan ◽

Rong Dan ◽

Ke Yi ◽

Lu Dongchen ◽

...

Keyword(s):

Reinforcement Learning ◽

Path Integral ◽

Learning Algorithm ◽

Control Parameters ◽

Complex Environment ◽

Joint Angles ◽

Q Learning ◽

Model Free ◽

Action Strategies ◽

Simulation Results

This paper proposes a goal-directed locomotion method for a snake-shaped robot in 3D complex environment based on path-integral reinforcement learning. This method uses a model-free online Q-learning algorithm to evaluate action strategies and optimize decision-making through repeated “exploration-learning-utilization” processes to complete snake-shaped robot goal-directed locomotion in 3D complex environment. The proper locomotion control parameters such as joint angles and screw-drive velocities can be learned by path-integral reinforcement learning, and the learned parameters were successfully transferred to the snake-shaped robot. Simulation results show that the planned path can avoid all obstacles and reach the destination smoothly and swiftly.

Download Full-text