Parallel exploration via negatively correlated search

AbstractEffective exploration is key to a successful search process. The recently proposed negatively correlated search (NCS) tries to achieve this by coordinated parallel exploration, where a set of search processes are driven to be negatively correlated so that different promising areas of the search space can be visited simultaneously. Despite successful applications of NCS, the negatively correlated search behaviors were mostly devised by intuition, while deeper (e.g., mathematical) understanding is missing. In this paper, a more principled NCS, namely NCNES, is presented, showing that the parallel exploration is equivalent to a process of seeking probabilistic models that both lead to solutions of high quality and are distant from previous obtained probabilistic models. Reinforcement learning, for which exploration is of particular importance, are considered for empirical assessment. The proposed NCNES is applied to directly train a deep convolution network with 1.7 million connection weights for playing Atari games. Empirical results show that the significant advantages of NCNES, especially on games with uncertain and delayed rewards, can be highly owed to the effective parallel exploration ability.

Download Full-text

Robotic Grasping Training Using Deep Reinforcement Learning With Policy Guidance Mechanism

Volume 2: Manufacturing Processes; Manufacturing Systems; Nano/Micro/Meso Manufacturing; Quality and Reliability ◽

10.1115/msec2021-63974 ◽

2021 ◽

Author(s):

Junying Yao ◽

Yongkui Liu ◽

Tingyu Lin ◽

Xubin Ping ◽

He Xu ◽

...

Keyword(s):

Reinforcement Learning ◽

Success Rate ◽

Search Space ◽

Q Value ◽

High Quality ◽

Sample Quality ◽

Policy Model ◽

The Past ◽

Robot Grasping ◽

Policy Guidance

Abstract For the past few years, training robots to enable them to learn various manipulative skills using deep reinforcement learning (DRL) has arisen wide attention. However, large search space, low sample quality, and difficulties in network convergence pose great challenges to robot training. This paper deals with assembly-oriented robot grasping training and proposes a DRL algorithm with a new mechanism, namely, policy guidance mechanism (PGM). PGM can effectively transform useless or low-quality samples to useful or high-quality ones. Based on the improved Deep Q Network algorithm, an end-to-end policy model that takes images as input and outputs actions is established. Through continuous interactions with the environment, robots are able to learn how to optimally grasp objects according to the location of maximum Q value. A number of experiments for different scenarios using simulations and physical robots are conducted. Results indicate that the proposed DRL algorithm with PGM is effective in increasing the success rate of robot grasping, and moreover, is robust to changes of environment and objects.

Download Full-text

What Can You Do with a Rock? Affordance Extraction via Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/144 ◽

2017 ◽

Cited By ~ 9

Author(s):

Nancy Fulda ◽

Daniel Ricks ◽

Ben Murdoch ◽

David Wingate

Keyword(s):

Reinforcement Learning ◽

Computational Complexity ◽

Linear Algebra ◽

Autonomous Agents ◽

Common Knowledge ◽

Search Space ◽

Word Embeddings ◽

Knowledge Database ◽

Learning Agent ◽

Action Spaces

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

Download Full-text

EXPRESS: Sources of Variation In Search and Foraging A Theoretical Perspective

Quarterly Journal of Experimental Psychology ◽

10.1177/17470218211050314 ◽

2021 ◽

pp. 174702182110503

Author(s):

Alastair David Smith ◽

Carlo De Lillo

Keyword(s):

Spatial Scales ◽

Theoretical Perspective ◽

Search Space ◽

Visual Exploration ◽

Two Dimensional ◽

Search Behaviour ◽

Wide Range ◽

Search Processes ◽

Sources Of Variation ◽

The Individual

Search – the problem of exploring a space of alternatives in order to identify target goals – is a fundamental behaviour for many species. Although its foundation lies in foraging, most studies of human search behaviour have been directed towards understanding the attentional mechanisms that underlie the efficient visual exploration of two-dimensional scenes. With this review, we aim to characterise how search behaviour can be explained across a wide range of contexts, environments, spatial scales, and populations, both typical and atypical. We first consider the generality of search processes across psychological domains. We then review studies of interspecies differences in search. Finally, we explore in detail the individual and contextual variables that affect visual search and related behaviours in established experimental psychology paradigms. Despite the heterogeneity of the findings discussed, we identify that variations in control processes, along with the ability to regulate behaviour as a function of the structure of search space and the sampling processes adopted, to be central to explanations of variations in search behaviour. We propose a tentative theoretical model aimed at integrating these notions and close by exploring questions that remain unaddressed.

Download Full-text

Development and research of a genetic method for the analysis and determination of the location of power grid objects

Artificial Intelligence ◽

10.15407/jai2020.01.020 ◽

2020 ◽

Vol 25 (1) ◽

pp. 20-42

Author(s):

Fedorchenko I. ◽

◽

Oliinyk A. ◽

Korniienko S. ◽

Kharchenko A. ◽

...

Keyword(s):

Power Supply ◽

Narrow Range ◽

Search Space ◽

Distribution Networks ◽

Power Supplies ◽

Search Process ◽

Stable Convergence ◽

Genetic Method

The problem of combinatorial optimization is considered in relation to the choice of the location of the location of power supplies when solving the problem of the development of urban distribution networks of power supply. Two methods have been developed for placing power supplies and assigning consumers to them to solve this problem. The first developed method consists in placing power supplies of the same standard sizes, and the second - of different standard sizes. The fundamental difference between the created methods and the existing ones is that the proposed methods take into account all the material of the problem and have specialized methods for coding possible solutions, modified operators of crossing and selection. The proposed methods effectively solve the problem of low inheritance, topological unfeasibility of the found solutions, as a result of which the execution time is significantly reduced and the accuracy of calculations is increased. In the developed methods, the lack of taking into account the restrictions on the placement of new power supplies is realized, which made it possible to solve the problem of applying the methods for a narrow range of problems. A comparative analysis of the results obtained by placing power supplies of the same standard sizes and known methods was carried out, and it was found that the developed method works faster than the known methods. It is shown that the proposed approach ensures stable convergence of the search process by an acceptable number of steps without artificial limitation of the search space and the use of additional expert information on the feasibility of possible solutions. The results obtained allow us to propose effective methods to improve the quality of decisions made on the choice of the location of power supply facilities in the design of urban electrical.

Download Full-text

Deep inverse reinforcement learning for structural evolution of small molecules

Briefings in Bioinformatics ◽

10.1093/bib/bbaa364 ◽

2020 ◽

Author(s):

Brighter Agyemang ◽

Wei-Ping Wu ◽

Daniel Addo ◽

Michael Y Kpiebaareh ◽

Ebenezer Nanor ◽

...

Keyword(s):

Reinforcement Learning ◽

High Throughput Screening ◽

Structural Evolution ◽

Search Space ◽

New Drugs ◽

Inverse Reinforcement Learning ◽

Generative Adversarial Network ◽

Entropy Maximization ◽

Reward Function ◽

Adversarial Network

Abstract The size and quality of chemical libraries to the drug discovery pipeline are crucial for developing new drugs or repurposing existing drugs. Existing techniques such as combinatorial organic synthesis and high-throughput screening usually make the process extraordinarily tough and complicated since the search space of synthetically feasible drugs is exorbitantly huge. While reinforcement learning has been mostly exploited in the literature for generating novel compounds, the requirement of designing a reward function that succinctly represents the learning objective could prove daunting in certain complex domains. Generative adversarial network-based methods also mostly discard the discriminator after training and could be hard to train. In this study, we propose a framework for training a compound generator and learn a transferable reward function based on the entropy maximization inverse reinforcement learning (IRL) paradigm. We show from our experiments that the IRL route offers a rational alternative for generating chemical compounds in domains where reward function engineering may be less appealing or impossible while data exhibiting the desired objective is readily available.

Download Full-text

Global search in single-solution-based metaheuristics

Data Technologies and Applications ◽

10.1108/dta-07-2019-0115 ◽

2020 ◽

Vol 54 (3) ◽

pp. 275-296 ◽

Cited By ~ 2

Author(s):

Najmeh Sadat Jaddi ◽

Salwani Abdullah

Keyword(s):

Search Space ◽

Population Based ◽

Global Optimum ◽

Global Search ◽

Hill Climbing ◽

Search Process ◽

Local Optimum ◽

Content Type ◽

Solution Algorithms ◽

Single Solution

PurposeMetaheuristic algorithms are classified into two categories namely: single-solution and population-based algorithms. Single-solution algorithms perform local search process by employing a single candidate solution trying to improve this solution in its neighborhood. In contrast, population-based algorithms guide the search process by maintaining multiple solutions located in different points of search space. However, the main drawback of single-solution algorithms is that the global optimum may not reach and it may get stuck in local optimum. On the other hand, population-based algorithms with several starting points that maintain the diversity of the solutions globally in the search space and results are of better exploration during the search process. In this paper more chance of finding global optimum is provided for single-solution-based algorithms by searching different regions of the search space.Design/methodology/approachIn this method, different starting points in initial step, searching locally in neighborhood of each solution, construct a global search in search space for the single-solution algorithm.FindingsThe proposed method was tested based on three single-solution algorithms involving hill-climbing (HC), simulated annealing (SA) and tabu search (TS) algorithms when they were applied on 25 benchmark test functions. The results of the basic version of these algorithms were then compared with the same algorithms integrated with the global search proposed in this paper. The statistical analysis of the results proves outperforming of the proposed method. Finally, 18 benchmark feature selection problems were used to test the algorithms and were compared with recent methods proposed in the literature.Originality/valueIn this paper more chance of finding global optimum is provided for single-solution-based algorithms by searching different regions of the search space.

Download Full-text

Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results

Recent Advances in Reinforcement Learning ◽

10.1007/978-0-585-33656-5_8 ◽

2007 ◽

pp. 159-195 ◽

Cited By ~ 3

Author(s):

Sridhar Mahadevan

Keyword(s):

Reinforcement Learning ◽

Average Reward ◽

Empirical Results

Download Full-text

Auditing, Trust, and the Social Contract

Controlling Corruption ◽

10.1093/oso/9780192894908.003.0007 ◽

2021 ◽

pp. 112-131

Author(s):

Bo Rothstein

Keyword(s):

Empirical Research ◽

Social Contract ◽

Quality System ◽

Representative Democracy ◽

High Quality ◽

Empirical Results ◽

The Social ◽

The Relationship

The relationship between trust and auditing can be described as a paradox. In the social contract that forms the basis of modern societies, extensive trust issues arise. How can citizens trust that what is promised in the contract will also be provided? Elections should work to put politicians who do not deliver according to the social contract to be voted out of their position. Empirical research shows that this often does not work, hence the need for an auditing body. Empirical results have shown that national auditing institutions work towards reducing corruption and other forms of malfeasance, and are thereby vital to creating a working social contract. A high-quality system for auditing also has a much stronger effect on reducing corruption than is the case for democracy. Auditing turns out to be an undervalued instrument that not only complements but in some ways proves even more effective than representative democracy.

Download Full-text

Dialogue Generation: From Imitation Learning to Inverse Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016722 ◽

2019 ◽

Vol 33 ◽

pp. 6722-6729 ◽

Cited By ~ 4

Author(s):

Ziming Li ◽

Julia Kiseleva ◽

Maarten De Rijke

Keyword(s):

Reinforcement Learning ◽

State Of The Art ◽

The State ◽

Experimental Results ◽

Imitation Learning ◽

Local Optimum ◽

Inverse Reinforcement Learning ◽

High Quality ◽

Overall Performance

The performance of adversarial dialogue generation models relies on the quality of the reward signal produced by the discriminator. The reward signal from a poor discriminator can be very sparse and unstable, which may lead the generator to fall into a local optimum or to produce nonsense replies. To alleviate the first problem, we first extend a recently proposed adversarial dialogue generation method to an adversarial imitation learning solution. Then, in the framework of adversarial inverse reinforcement learning, we propose a new reward model for dialogue generation that can provide a more accurate and precise reward signal for generator training. We evaluate the performance of the resulting model with automatic metrics and human evaluations in two annotation settings. Our experimental results demonstrate that our model can generate more high-quality responses and achieve higher overall performance than the state-of-the-art.

Download Full-text

Acceleration of Reinforcement Learning with Incomplete Prior Information

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2013.p0721 ◽

2013 ◽

Vol 17 (5) ◽

pp. 721-730 ◽

Cited By ~ 1

Author(s):

Kento Terashima ◽

◽

Hirotaka Takano ◽

Junichi Murata

Keyword(s):

Reinforcement Learning ◽

Prior Information ◽

Search Process ◽

Trial And Error ◽

Calculation Time ◽

Learning Speed ◽

Incomplete Prior Information

Reinforcement learning is applicable to complex or unknown problems because the solution search process is done by trial-and-error. However, the calculation time for the trial-and-error search becomes larger as the scale of the problem increases. Therefore, in order to decrease calculation time, some methods have been proposed using the prior information on the problem. This paper improves a previously proposed method utilizing options as prior information. In order to increase the learning speed even with wrong options, methods for option correction by forgetting the policy and extending initiation sets are proposed.

Download Full-text