scholarly journals Comparison of optimized Markov Decision Process using Dynamic Programming and Temporal Differencing – A reinforcement learning approach

2021 ◽  
Vol 2107 (1) ◽  
pp. 012026
Author(s):  
Annapoorni Mani ◽  
Shahriman Abu Bakar ◽  
Pranesh Krishnan ◽  
Sazali Yaacob

Abstract Reinforcement learning is one of the promising approaches for operations research problems. The incoming inspection process in any manufacturing plant aims to control quality, reduce manufacturing costs, eliminate scrap, and process failure downtimes due to non-conforming raw materials. Prediction of the raw material acceptance rate can regulate the raw material supplier selection and improve the manufacturing process by filtering out non-conformities. This paper presents a Markov model developed to estimate the probability of the raw material being accepted or rejected in an incoming inspection environment. The proposed forecasting model is further optimized for efficiency using the two reinforcement learning algorithms (dynamic programming and temporal differencing). The results of the two optimized models are compared, and the findings are discussed.


2021 ◽  
Vol 2107 (1) ◽  
pp. 012025
Author(s):  
Annapoorni Mani ◽  
Shahriman Abu Bakar ◽  
Pranesh Krishnan ◽  
Sazali Yaacob

Abstract The incoming inspection process in any manufacturing plant aims to control quality, reduce manufacturing costs, eliminate scrap, and process failure downtime due to defective raw materials. Prediction of the raw material acceptance rate can regulate the raw material supplier selection and improve the manufacturing process by filtering out non-conformities. This paper presents a raw material acceptance prediction model (RMAP) developed based on the Markov analysis. RFID tags are used to track the parts throughout the process. A secondary dataset can be derived from the raw RFID data. In this study, a dataset is simulated to reflect a typical incoming inspection process consisting of six substations (Packaging Inspection, Visual Inspection, Gauge Inspection, Rework1, and Rework2) are considered. The accepted parts are forwarded to the Pack and Store station and stored in the warehouse. The non-conforming parts are returned to the supplier. The proposed RMAP model estimates the probability of the raw material being accepted or rejected at each inspection station. The proposed model is evaluated using three test cases: case A (lower conformities), case B (higher conformities) and case C (equal chances of being accepted and rejected). Based on the outcome of the limiting matrix for the three test cases, the results are discussed. The steady-state matrix forecasts the probability of the raw material in a random state. This prediction and forecasting ability of the proposed model enables the industries to save time and cost.



2021 ◽  
Vol 2107 (1) ◽  
pp. 012027
Author(s):  
Annapoorni Mani ◽  
Shahriman Abu Bakar ◽  
Pranesh Krishnan ◽  
Sazali Yaacob

Abstract Reinforcement learning is the most preferred algorithms for optimization problems in industrial automation. Model-free reinforcement learning algorithms optimize for rewards without the knowledge of the environmental dynamics and require less computation. Regulating the quality of the raw materials in the inbound inventory can improve the manufacturing process. In this paper, the raw materials arriving at the incoming inspection process are categorized and labeled based on their quality through the path traveled. A model-free temporal difference learning approach is used to predict the acceptance and rejection path of raw materials in the incoming inspection process. The algorithm presented eight routes paths that the raw materials could travel. Four pathways correspond to material acceptance, while the rest lead to material refusal. The materials are annotated using the total scores acquired in the incoming inspection process. The materials traveling on the ideal path (path A) get the highest total score. The rest of the accepted materials in the acceptance path have a 7.37% lower score in path B, whereas path C and path D get 37.28% and 42.44% lower than the ideal approach.



Author(s):  
T.S. Morozova

A study into the failure causes of mixing and charging equipment confirms that the main impact on the probability of accidents is the use of raw materials that do not meet the specifications and have unstable properties. The raw materials used for explosives preparation in mechanized charging of boreholes include such components as ammonium nitrate, emulsion phase, diesel fuel, emulsifier and others. The paper describes the application of various formulations with these components in specific types of mixing and charging machines manufactured by AZOTTECH LLC. The main properties that affect the quality of raw materials are summarised, and the incoming inspection of explosive components is described as part of the acceptance procedure at temporary storage sites at a hazardous production facility. The paper describes common types of equipment failures and maintenance procedures when using substandard raw materials. The conclusion highlights the key practices to improve the equipment uptime as well as recommendations for incoming inspection and the use of high-quality explosive components.



2021 ◽  
Vol 73 (09) ◽  
pp. 46-47
Author(s):  
Chris Carpenter

This article, written by JPT Technology Editor Chris Carpenter, contains highlights of paper SPE 201254, “Reinforcement Learning for Field-Development Policy Optimization,” by Giorgio De Paola, SPE, and Cristina Ibanez-Llano, Repsol, and Jesus Rios, IBM, et al., prepared for the 2020 SPE Annual Technical Conference and Exhibition, originally scheduled to be held in Denver, Colorado, 5–7 October. The paper has not been peer reviewed. A field-development plan consists of a sequence of decisions. Each action taken affects the reservoir and conditions any future decision. The presence of uncertainty associated with this process, however, is undeniable. The novelty of the approach proposed by the authors in the complete paper is the consideration of the sequential nature of the decisions through the framework of dynamic programming (DP) and reinforcement learning (RL). This methodology allows moving the focus from a static field-development plan optimization to a more-dynamic framework that the authors call field-development policy optimization. This synopsis focuses on the methodology, while the complete paper also contains a real-field case of application of the methodology. Methodology Deep RL (DRL). RL is considered an important learning paradigm in artificial intelligence (AI) but differs from supervised or unsupervised learning, the most commonly known types currently studied in the field of machine learning. During the last decade, RL has attracted greater attention because of success obtained in applications related to games and self-driving cars resulting from its combination with deep-learning architectures such as DRL, which has allowed RL to scale on to previously unsolvable problems and, therefore, solve much larger sequential decision problems. RL, also referred to as stochastic approximate dynamic programming, is a goal-directed sequential-learning-from-interaction paradigm. The learner or agent is not told what to do but instead has to learn which actions or decisions yield a maximum reward through interaction with an uncertain environment without losing too much reward along the way. This way of learning from interaction to achieve a goal must be achieved in balance with the exploration and exploitation of possible actions. Another key characteristic of this type of problem is its sequential nature, where the actions taken by the agent affect the environment itself and, therefore, the subsequent data it receives and the subsequent actions to be taken. Mathematically, such problems are formulated in the framework of the Markov decision process (MDP) that primarily arises in the field of optimal control. An RL problem consists of two principal parts: the agent, or decision-making engine, and the environment, the interactive world for an agent (in this case, the reservoir). Sequentially, at each timestep, the agent takes an action (e.g., changing control rates or deciding a well location) that makes the environment (reservoir) transition from one state to another. Next, the agent receives a reward (e.g., a cash flow) and an observation of the state of the environment (partial or total) before taking the next action. All relevant information informing the agent of the state of the system is assumed to be included in the last state observed by the agent (Markov property). If the agent observes the full environment state once it has acted, the MDP is said to be fully observable; otherwise, a partially observable Markov decision process (POMDP) results. The agent’s objective is to learn policy mapping from states (MDPs) or histories (POMDPs) to actions such that the agent’s cumulated (discounted) reward in the long run is maximized.



2020 ◽  
pp. 87-94
Author(s):  
Rafał Kamprowski

This article examines the issue of ensuring raw material security as seen in the contemporary national security strategies of Poland. The temporal framework identified as “contemporary” indicates that the analysis embraced documents from the period 2003–2020, which mark a significant qualitative change in terms of defining and understanding security as an area which is not dominated by its hard, military aspect. The study aims to identify, discuss and compare various concepts of ensuring raw material security to Poland on the basis of the four strategies of Poland’s national security. The following research problems were formulated to achieve the research objectives defined in this way: to what extent is the issue of Poland’s raw material security present in the documents analyzed? What tools have the authors of the strategies employed to ensure security in Poland in terms of raw materials? What are the main difficulties in ensuring raw material security to Poland highlighted by the analyzed national security strategies? Are there any convergences in the visions of ensuring Poland’s raw material security presented in the analyzed strategies? The research questions formulated in this way served as the basis for the following research hypothesis: given an increase in non-military threats, the raw material-related dimension of security is increasingly emphasized in national security strategies. The research methods used in this article include the comparative method and source analysis. The technique of analysis was also used.



2015 ◽  
Vol 760 ◽  
pp. 659-664
Author(s):  
Dragoș Iliescu ◽  
Ion Diaconu ◽  
Ion Mateias ◽  
Marian Gheorghe

A process improvement of incoming inspection for the materials used by a steelmaking enterprise is explained in this paper. The applied methods were described with examples considering the need of the improved process. The process complexity is explained in turn by the existence of the involved actors that need to exchange information about the inspected materials. The results obtained by the improved process justify the action started in 2012 for the incoming inspection process by a more constant quality level of the materials, reduction of the overall process duration, better materials information exchange and a convenient method for supplier surveillance and evaluation.



2019 ◽  
Vol 16 (3) ◽  
pp. 334-351
Author(s):  
A. S. Mavlyanov ◽  
E. K. Sardarbekova

Introduction. The objective of the research is to study the effect of the complex activation of the alumina raw material on the rheological properties of the ceramic mass. In addition, the authors investigate solutions for the application of optimal coagulation structures based on loams and ash together with plastic certificates.Materials and methods. The authors used the local forest like reserves of clay loams at the BashKarasu, ash fields of the Bishkek Central Heating Centre (BTEC) and plasticizer (sodium naphthenate obtained from alkaline chemical production wastes) as fibrous materials. Moreover, the authors defined technological properties of raw materials within standard laboratory methodology in accordance with current GOSTs.Results. The researchers tested plastic durability on variously prepared masses for the choice of optimal structures. The paper demonstrated the plastic durability of complexly activated compounds comparing with non-activated and mechanically activated compounds. The sensitivity coefficient increased the amount of clay loams by mechanically and complexly activated, which predetermined the possibility of intensifying the process of drying samples based on complexly activated masses.Discussion and conclusions. However, mechanical activation of clay material reduces the period of relaxation and increases the elasticity coefficient of ceramic masses by 1.8–3.4 times, meanwhile decreases elasticity, viscosity and the conventional power during molding, which generally worsens the molding properties of the masses. Сomplex activation of ash-clay material decreases the period of relaxation and provides an increase in elasticity, plasticity of ceramic masses by 46–47%, reduction in viscosity by 1.5–2 times, conventional power on molding by 37–122% in comparison with MA clay loams. Ceramic masses based on spacecraft alumina raw materials belong to the SMT with improved rheological properties; products based on them pass through the mouthpiece for 5–7 seconds.



2018 ◽  
Vol 7 (2) ◽  
Author(s):  
Firman L. Sahwan

Organic materials that are generally used as raw material for organic fertilizer granules (POG) is a natural organic material that has been degrade, smooth and dry. One of the main raw materials are always used with a very high percentage of usage, is manure. Manure potential in Indonesia is very high, amounting to 113.6 million tons per year, or 64.7 million tons per year to the island of Java. From this amount, it will be generated numbers POG production potential of 17.5 million tons per year (total Indonesia) or 9.9 million tons per year for the island of Java. While the realistic POG production predictions figures made from raw manure is 2.5 million tons annually, a figure that has been unable to meet the number requirement of POG greater than 4 million tons per year. Therefore, in producing POG, it should be to maximize the using of the potential of other organic materials so that the use of manure can be saved. With the use of a small amount of manure (maximum 30% for cow manure), it would be useful also to avoid the production of POG with high Fe content.keywods: organic material, manure, granule organic fertilizer



Sign in / Sign up

Export Citation Format

Share Document