Two-level Q-learning: learning from conflict demonstrations

2019 ◽  
Vol 34 ◽  
Author(s):  
Mao Li ◽  
Yi Wei ◽  
Daniel Kudenko

Abstract One way to address this low sample efficiency of reinforcement learning (RL) is to employ human expert demonstrations to speed up the RL process (RL from demonstration or RLfD). The research so far has focused on demonstrations from a single expert. However, little attention has been given to the case where demonstrations are collected from multiple experts, whose expertise may vary on different aspects of the task. In such scenarios, it is likely that the demonstrations will contain conflicting advice in many parts of the state space. We propose a two-level Q-learning algorithm, in which the RL agent not only learns the policy of deciding on the optimal action but also learns to select the most trustworthy expert according to the current state. Thus, our approach removes the traditional assumption that demonstrations come from one single source and are mostly conflict-free. We evaluate our technique on three different domains and the results show that the state-of-the-art RLfD baseline fails to converge or performs similarly to conventional Q-learning. In contrast, the performance level of our novel algorithm increases with more experts being involved in the learning process and the proposed approach has the capability to handle demonstration conflicts well.

Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


Author(s):  
Usman Ahmed ◽  
Jerry Chun-Wei Lin ◽  
Gautam Srivastava

Deep learning methods have led to a state of the art medical applications, such as image classification and segmentation. The data-driven deep learning application can help stakeholders to collaborate. However, limited labelled data set limits the deep learning algorithm to generalize for one domain into another. To handle the problem, meta-learning helps to learn from a small set of data. We proposed a meta learning-based image segmentation model that combines the learning of the state-of-the-art model and then used it to achieve domain adoption and high accuracy. Also, we proposed a prepossessing algorithm to increase the usability of the segments part and remove noise from the new test image. The proposed model can achieve 0.94 precision and 0.92 recall. The ability to increase 3.3% among the state-of-the-art algorithms.


Author(s):  
Esteban Real ◽  
Alok Aggarwal ◽  
Yanping Huang ◽  
Quoc V. Le

The effort devoted to hand-crafting neural network image classifiers has motivated the use of architecture search to discover them automatically. Although evolutionary algorithms have been repeatedly applied to neural network topologies, the image classifiers thus discovered have remained inferior to human-crafted ones. Here, we evolve an image classifier— AmoebaNet-A—that surpasses hand-designs for the first time. To do this, we modify the tournament selection evolutionary algorithm by introducing an age property to favor the younger genotypes. Matching size, AmoebaNet-A has comparable accuracy to current state-of-the-art ImageNet models discovered with more complex architecture-search methods. Scaled to larger size, AmoebaNet-A sets a new state-of-theart 83.9% top-1 / 96.6% top-5 ImageNet accuracy. In a controlled comparison against a well known reinforcement learning algorithm, we give evidence that evolution can obtain results faster with the same hardware, especially at the earlier stages of the search. This is relevant when fewer compute resources are available. Evolution is, thus, a simple method to effectively discover high-quality architectures.


2019 ◽  
Vol 11 (7) ◽  
pp. 2963-2986 ◽  
Author(s):  
Nikos Dipsis ◽  
Kostas Stathis

Abstract The numerous applications of internet of things (IoT) and sensor networks combined with specialized devices used in each has led to a proliferation of domain specific middleware, which in turn creates interoperability issues between the corresponding architectures and the technologies used. But what if we wanted to use a machine learning algorithm to an IoT application so that it adapts intelligently to changes of the environment, or enable a software agent to enrich with artificial intelligence (AI) a smart home consisting of multiple and possibly incompatible technologies? In this work we answer these questions by studying a framework that explores how to simplify the incorporation of AI capabilities to existing sensor-actuator networks or IoT infrastructures making the services offered in such settings smarter. Towards this goal we present eVATAR+, a middleware that implements the interactions within the context of such integrations systematically and transparently from the developers’ perspective. It also provides a simple and easy to use interface for developers to use. eVATAR+ uses JAVA server technologies enhanced by mediator functionality providing interoperability, maintainability and heterogeneity support. We exemplify eVATAR+ with a concrete case study and we evaluate the relative merits of our approach by comparing our work with the current state of the art.


2020 ◽  
Vol 10 (21) ◽  
pp. 7780
Author(s):  
Dokyeong Kwon ◽  
Junseok Kwon

In this study, we present a novel tracking system, in which the tracking accuracy can be considerably enhanced by state prediction. Accordingly, we present a new Q-learning-based reinforcement method, augmented by Wang–Landau sampling. In the proposed method, reinforcement learning is used to predict a target configuration for the subsequent frame, while Wang–Landau sampler balances the exploitation and exploration degrees of the prediction. Our method can adapt to control the randomness of policy, using statistics on the number of visits in a particular state. Thus, our method considerably enhances conventional Q-learning algorithm performance, which also enhances visual tracking performance. Numerical results demonstrate that our method substantially outperforms other state-of-the-art visual trackers and runs in realtime because our method contains no complicated deep neural network architectures.


2021 ◽  
Author(s):  
Julian D. Richards ◽  
Ulf Jakobsson ◽  
David Novák ◽  
Benjamin Štular ◽  
Holly Wright

The articles in this special issue demonstrate significant differences in digital archiving capacity in different countries. In part these reflect differences in the history of archaeology in each country, its relationship to the state, whether it is centralised or decentralised, state-led or commercially driven. They also reflect some of the different attitudes to archaeology across the world, most recently explored in a survey conducted under the auspices of the NEARCH project. They reflect a snapshot in time, but our aim is to record the current state-of-the-art in each country, to inform knowledge, stimulate discussion, and to provoke change.


2021 ◽  
Vol 27 (3) ◽  
pp. 50-56
Author(s):  
Michal Prauzek ◽  
Jaromir Konecny

This research article presents the application of the Q-learning algorithm in the operational duty cycle control of solar-powered environmental wireless sensor network (EWSN) nodes. Those nodes are commonly implemented as embedded devices using low-power and low-cost microcontrollers. Therefore, there is a significant need for an effective and easy way to implement a machine learning (ML) algorithm in terms of computer performance. This approach uses a Q-learning-based policy implementing a sleep/run switching algorithm driven by the state of charge. The presented algorithm is based on two modes: daylight and nighttime, which is a suitable solution for solar-powered systems. The study includes the complete process of design EWSN node strategy with an optimal reward policy. The presented algorithm was tested and verified on an EWSN node model and a 5-year data set of solar irradiance values was used for the learning process and its validation. As part of the study, we are also presenting the validation in terms of Q-learning parameters, which include the learning rate and discount factor. The result section shows that the overall performance of the presented solution is more suitable for solar-powered EWSN then state-of-the-art studies. Both day/night experiments reached 828 203 measurement/transmission cycles, which is 12.7 % more than in the previous studies using the strategy defined by the state of energy storage.


2020 ◽  
Vol 50 (11) ◽  
pp. 3788-3807
Author(s):  
Jerry Chun-Wei Lin ◽  
Matin Pirouz ◽  
Youcef Djenouri ◽  
Chien-Fu Cheng ◽  
Usman Ahmed

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.


2020 ◽  
Vol 222 (3) ◽  
pp. 1750-1764 ◽  
Author(s):  
Yangkang Chen

SUMMARY Effective and efficient arrival picking plays an important role in microseismic and earthquake data processing and imaging. Widely used short-term-average long-term-average ratio (STA/LTA) based arrival picking algorithms suffer from the sensitivity to moderate-to-strong random ambient noise. To make the state-of-the-art arrival picking approaches effective, microseismic data need to be first pre-processed, for example, removing sufficient amount of noise, and second analysed by arrival pickers. To conquer the noise issue in arrival picking for weak microseismic or earthquake event, I leverage the machine learning techniques to help recognizing seismic waveforms in microseismic or earthquake data. Because of the dependency of supervised machine learning algorithm on large volume of well-designed training data, I utilize an unsupervised machine learning algorithm to help cluster the time samples into two groups, that is, waveform points and non-waveform points. The fuzzy clustering algorithm has been demonstrated to be effective for such purpose. A group of synthetic, real microseismic and earthquake data sets with different levels of complexity show that the proposed method is much more robust than the state-of-the-art STA/LTA method in picking microseismic events, even in the case of moderately strong background noise.


Sign in / Sign up

Export Citation Format

Share Document