Analysis of a Method Improving Reinforcement Learning Agents’ Policies

Author(s):  
Daisuke Kitakoshi ◽  
◽  
Hiroyuki Shioya ◽  
Masahito Kurihara ◽  

Reinforcement learning (RL) is a kind of machine learning. It aims to optimize agents’ policies by adapting the agents to an environment according to rewards. In this paper, we propose a method for improving policies by using stochastic knowledge, in which reinforcement learning agents obtain. We use a Bayesian Network (BN), which is a stochastic model, as knowledge of an agent. Its structure is decided by minimum description length criterion using series of an agent’s input-output and rewards as sample data. A BN constructed in our study represents stochastic dependences between input-output and rewards. In our proposed method, policies are improved by supervised learning using the structure of BN (i.e. stochastic knowledge). The proposed improvement mechanism makes RL agents acquire more effective policies. We carry out simulations in the pursuit problem in order to show the effectiveness of our proposed method.

Computers ◽  
2019 ◽  
Vol 8 (1) ◽  
pp. 8 ◽  
Author(s):  
Marcus Lim ◽  
Azween Abdullah ◽  
NZ Jhanjhi ◽  
Mahadevan Supramaniam

Criminal network activities, which are usually secret and stealthy, present certain difficulties in conducting criminal network analysis (CNA) because of the lack of complete datasets. The collection of criminal activities data in these networks tends to be incomplete and inconsistent, which is reflected structurally in the criminal network in the form of missing nodes (actors) and links (relationships). Criminal networks are commonly analyzed using social network analysis (SNA) models. Most machine learning techniques that rely on the metrics of SNA models in the development of hidden or missing link prediction models utilize supervised learning. However, supervised learning usually requires the availability of a large dataset to train the link prediction model in order to achieve an optimum performance level. Therefore, this research is conducted to explore the application of deep reinforcement learning (DRL) in developing a criminal network hidden links prediction model from the reconstruction of a corrupted criminal network dataset. The experiment conducted on the model indicates that the dataset generated by the DRL model through self-play or self-simulation can be used to train the link prediction model. The DRL link prediction model exhibits a better performance than a conventional supervised machine learning technique, such as the gradient boosting machine (GBM) trained with a relatively smaller domain dataset.


2012 ◽  
pp. 695-703
Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


Author(s):  
George Tzanis ◽  
Christos Berberidis ◽  
Ioannis Vlahavas

Machine learning is one of the oldest subfields of artificial intelligence and is concerned with the design and development of computational systems that can adapt themselves and learn. The most common machine learning algorithms can be either supervised or unsupervised. Supervised learning algorithms generate a function that maps inputs to desired outputs, based on a set of examples with known output (labeled examples). Unsupervised learning algorithms find patterns and relationships over a given set of inputs (unlabeled examples). Other categories of machine learning are semi-supervised learning, where an algorithm uses both labeled and unlabeled examples, and reinforcement learning, where an algorithm learns a policy of how to act given an observation of the world.


Author(s):  
Ahmad Roihan ◽  
Po Abas Sunarya ◽  
Ageng Setiani Rafika

Abstrak - Pembelajaran mesin merupakan bagian dari kecerdasan buatan yang banyak digunakan untuk memecahkan berbagai masalah. Artikel ini menyajikan ulasan pemecahan masalah dari penelitian-penelitian terkini dengan mengklasifikasikan machine learning menjadi tiga kategori: pembelajaran terarah, pembelajaran tidak terarah, dan pembelajaran reinforcement. Hasil ulasan menunjukkan ketiga kategori masih berpeluang digunakan dalam beberapa kasus terkini dan dapat ditingkatkan untuk mengurangi beban komputasi dan mempercepat kinerja untuk mendapatkan tingkat akurasi dan presisi yang tinggi. Tujuan ulasan artikel ini diharapkan dapat menemukan celah dan dijadikan pedoman untuk penelitian pada masa yang akan datang.Katakunci: pembelajaran mesin, pembelajaran reinforcement, pembelajaran terarah, pembelajaran tidak terarahAbstract - Machine learning is part of artificial intelligence that is widely used to solve various problems. This article reviews problem solving from the latest studies by classifying machine learning into three categories: supervised learning, unsupervised learning, and reinforcement learning. The results of the review show that the three categories are still likely to be used in some of the latest cases and can be improved to reduce computational costs and accelerate performance to get a high level of accuracy and precision. The purpose of this article review is expected to be able to find a gap and it is used as a guideline for future research.Keywords: machine learning, reinforcement learning, supervised learning, unsupervised learning


Author(s):  
Antonio Serrano-Muñoz ◽  
Nestor Arana-Arexolaleiba ◽  
Dimitrios Chrysostomou ◽  
Simon Bøgh

AbstractRemanufacturing automation must be designed to be flexible and robust enough to overcome the uncertainties, conditions of the products, and complexities in the planning and operation of the processes. Machine learning methods, in particular reinforcement learning, are presented as techniques to learn, improve, and generalise the automation of many robotic manipulation tasks (most of them related to grasping, picking, or assembly). However, not much has been exploited in remanufacturing, in particular in disassembly tasks. This work presents the state of the art of contact-rich disassembly using reinforcement learning algorithms and a study about the generalisation of object extraction skills when applied to contact-rich disassembly tasks. The generalisation capabilities of two state-of-the-art reinforcement learning agents (trained in simulation) are tested and evaluated in simulation, and real world while perform a disassembly task. Results show that at least one of the agents can generalise the contact-rich extraction skill. Besides, this work identifies key concepts and gaps for the reinforcement learning algorithms’ research and application on disassembly tasks.


Improving the performance of link prediction is a significant role in the evaluation of social network. Link prediction is known as one of the primary purposes for recommended systems, bio information, and web. Most machine learning methods that depend on SNA model’s metrics use supervised learning to develop link prediction models. Supervised learning actually needed huge amount of data set to train the model of link prediction to obtain an optimal level of performance. In few years, Deep Reinforcement Learning (DRL) has achieved excellent success in various domain such as SNA. In this paper, we present the use of deep reinforcement learning (DRL) to improve the performance and accuracy of the model for the applied dataset. The experiment shows that the dataset created by the DRL model through self-play or auto-simulation can be utilized to improve the link prediction model. We have used three different datasets: JUNANES, MAMBO, JAKE. Experimental results show that the DRL proposed method provide accuracy of 85% for JUNANES, 87% for MAMABO, and 78% for JAKE dataset which outperforms the GBM next highest accuracy of 75% for JUNANES, 79% for MAMBO and 71% for JAKE dataset respectively trained with 2500 iteration and also in terms of AUC measures as well. The DRL model shows the better efficiency than a traditional machine learning strategy, such as, Random Forest and the gradient boosting machine (GBM).


2021 ◽  
Author(s):  
Antonio Serrano Muñoz ◽  
Nestor Arana-Arexolaleiba ◽  
Dimitrios Chrysostomou ◽  
Simon Bøgh

Abstract Remanufacturing automation must be designed to be flexible and robust enough to overcome the uncertainties, conditions of the products, and complexities in the process's planning and operation. Machine learning, particularly reinforcement learning, methods are presented as techniques to learn, improve, and generalise the automation of many robotic manipulation tasks (most of them related to grasping, picking, or assembly). However, not much has been exploited in remanufacturing, in particular in disassembly tasks. This work presents the State-of-the-Art of contact-rich disassembly using reinforcement learning algorithms and a study about the object extraction skill's generalisation when applied to contact-rich disassembly tasks. The generalisation capabilities of two State-of-the-Art reinforcement learning agents (trained in simulation) are tested and evaluated in simulation and real-world while perform a disassembly task. Results shows that, at least, one of the agents can generalise the contact-rich extraction skill. Also, this work identifies key concepts and gaps for the reinforcement learning algorithms' research and application on disassembly tasks.


Sign in / Sign up

Export Citation Format

Share Document