Matching Qualitative Constraint Networks with Online Reinforcement Learning

Mapping Intimacies ◽

10.29007/1g5q ◽

2018 ◽

Author(s):

Malumbo Chipofya

Keyword(s):

Reinforcement Learning ◽

Time Complexity ◽

Graph Matching ◽

Search Space ◽

Qualitative Reasoning ◽

Spatial Knowledge ◽

Constraint Networks ◽

True Value ◽

Reasoning Systems ◽

Prior Estimate

Local Compatibility Matrices (LCMs) are mechanisms for computing heuristics for graph matching that are particularly suited for matching qualitative constraint networks enabling the transfer of qualitative spatial knowledge between qualitative reasoning systems or agents. A system of LCMs can be used during matching to compute a pre-move evaluation, which acts as a prior optimistic estimate of the value of matching a pair of nodes, and a post-move evaluation which adjusts the prior estimate in the direction of the true value upon completing the move. We present a metaheuristic method that uses reinforcement learning to improve the prior estimates based on the posterior evaluation. The learned values implicitly identify unprofitable regions of the search space. We also present data structures that allow a more compact implementation, limiting the space and time complexity of our algorithm.

Download Full-text

Incremental Mobile User Profiling: Reinforcement Learning with Spatial Knowledge Graph for Modeling Event Streams

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining ◽

10.1145/3394486.3403128 ◽

2020 ◽

Author(s):

Pengyang Wang ◽

Kunpeng Liu ◽

Lu Jiang ◽

Xiaolin Li ◽

Yanjie Fu

Keyword(s):

Reinforcement Learning ◽

Mobile User ◽

Spatial Knowledge ◽

User Profiling ◽

Knowledge Graph

Download Full-text

What Can You Do with a Rock? Affordance Extraction via Word Embeddings

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2017/144 ◽

2017 ◽

Cited By ~ 9

Author(s):

Nancy Fulda ◽

Daniel Ricks ◽

Ben Murdoch ◽

David Wingate

Keyword(s):

Reinforcement Learning ◽

Computational Complexity ◽

Linear Algebra ◽

Autonomous Agents ◽

Common Knowledge ◽

Search Space ◽

Word Embeddings ◽

Knowledge Database ◽

Learning Agent ◽

Action Spaces

Autonomous agents must often detect affordances: the set of behaviors enabled by a situation. Affordance extraction is particularly helpful in domains with large action spaces, allowing the agent to prune its search space by avoiding futile behaviors. This paper presents a method for affordance extraction via word embeddings trained on a tagged Wikipedia corpus. The resulting word vectors are treated as a common knowledge database which can be queried using linear algebra. We apply this method to a reinforcement learning agent in a text-only environment and show that affordance-based action selection improves performance in most cases. Our method increases the computational complexity of each learning step but significantly reduces the total number of steps needed. In addition, the agent's action selections begin to resemble those a human would choose.

Download Full-text

Deep inverse reinforcement learning for structural evolution of small molecules

Briefings in Bioinformatics ◽

10.1093/bib/bbaa364 ◽

2020 ◽

Author(s):

Brighter Agyemang ◽

Wei-Ping Wu ◽

Daniel Addo ◽

Michael Y Kpiebaareh ◽

Ebenezer Nanor ◽

...

Keyword(s):

Reinforcement Learning ◽

High Throughput Screening ◽

Structural Evolution ◽

Search Space ◽

New Drugs ◽

Inverse Reinforcement Learning ◽

Generative Adversarial Network ◽

Entropy Maximization ◽

Reward Function ◽

Adversarial Network

Abstract The size and quality of chemical libraries to the drug discovery pipeline are crucial for developing new drugs or repurposing existing drugs. Existing techniques such as combinatorial organic synthesis and high-throughput screening usually make the process extraordinarily tough and complicated since the search space of synthetically feasible drugs is exorbitantly huge. While reinforcement learning has been mostly exploited in the literature for generating novel compounds, the requirement of designing a reward function that succinctly represents the learning objective could prove daunting in certain complex domains. Generative adversarial network-based methods also mostly discard the discriminator after training and could be hard to train. In this study, we propose a framework for training a compound generator and learn a transferable reward function based on the entropy maximization inverse reinforcement learning (IRL) paradigm. We show from our experiments that the IRL route offers a rational alternative for generating chemical compounds in domains where reward function engineering may be less appealing or impossible while data exhibiting the desired objective is readily available.

Download Full-text

About the Time Complexity of EAs Based on Finite Search Space

2006 International Conference on Computational Intelligence and Security ◽

10.1109/iccias.2006.294144 ◽

2006 ◽

Author(s):

Lixin Ding ◽

Yingzhou Bi

Keyword(s):

Time Complexity ◽

Search Space

Download Full-text

A View on Deep Reinforcement Learning in Imperfect Information Games

Studia Universitatis Babeș-Bolyai Informatica ◽

10.24193/subbi.2020.2.03 ◽

2020 ◽

Vol 65 (2) ◽

pp. 31

Author(s):

T.V. Pricope

Keyword(s):

Reinforcement Learning ◽

Imperfect Information ◽

Large Scale ◽

Traditional Approach ◽

Search Space ◽

Fictitious Play ◽

Learning Agents ◽

Real World Applications ◽

Imperfect Information Games ◽

Human Player

Many real-world applications can be described as large-scale games of imperfect information. This kind of games is particularly harder than the deterministic one as the search space is even more sizeable. In this paper, I want to explore the power of reinforcement learning in such an environment; that is why I take a look at one of the most popular game of such type, no limit Texas Hold’em Poker, yet unsolved, developing multiple agents with different learning paradigms and techniques and then comparing their respective performances. When applied to no-limit Hold’em Poker, deep reinforcement learning agents clearly outperform agents with a more traditional approach. Moreover, if these last agents rival a human beginner level of play, the ones based on reinforcement learning compare to an amateur human player. The main algorithm uses Fictitious Play in combination with ANNs and some handcrafted metrics. We also applied the main algorithm to another game of imperfect information, less complex than Poker, in order to show the scalability of this solution and the increase in performance when put neck in neck with established classical approaches from the reinforcement learning literature.

Download Full-text

Generation of Policy-Level Explanations for Reinforcement Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33012514 ◽

2019 ◽

Vol 33 ◽

pp. 2514-2521

Author(s):

Nicholay Topin ◽

Manuela Veloso

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Reinforcement Learning ◽

Markov Chains ◽

Time Complexity ◽

Value Function ◽

Worst Case ◽

Policy Level ◽

Individual Decisions ◽

The Value Function

Though reinforcement learning has greatly benefited from the incorporation of neural networks, the inability to verify the correctness of such systems limits their use. Current work in explainable deep learning focuses on explaining only a single decision in terms of input features, making it unsuitable for explaining a sequence of decisions. To address this need, we introduce Abstracted Policy Graphs, which are Markov chains of abstract states. This representation concisely summarizes a policy so that individual decisions can be explained in the context of expected future transitions. Additionally, we propose a method to generate these Abstracted Policy Graphs for deterministic policies given a learned value function and a set of observed transitions, potentially off-policy transitions used during training. Since no restrictions are placed on how the value function is generated, our method is compatible with many existing reinforcement learning methods. We prove that the worst-case time complexity of our method is quadratic in the number of features and linear in the number of provided transitions, O(|F|2|tr samples|). By applying our method to a family of domains, we show that our method scales well in practice and produces Abstracted Policy Graphs which reliably capture relationships within these domains.

Download Full-text

Comparison between Hirschberg’s algorithm and Needleman-Wunsch algorithm in finding optimal alignment in terms of search space and time complexity

Journal of Medical Science And clinical Research ◽

10.18535/jmscr/v5i3.168 ◽

2017 ◽

Vol 05 (03) ◽

pp. 19388-19394

Author(s):

Fahad Almsned ◽

Keyword(s):

Time Complexity ◽

Search Space ◽

Optimal Alignment ◽

Space And Time

Download Full-text

Induction and Exploitation of Subgoal Automata for Reinforcement Learning

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12372 ◽

2021 ◽

Vol 70 ◽

pp. 1031-1116

Author(s):

Daniel Furelos-Blanco ◽

Mark Law ◽

Anders Jonsson ◽

Krysia Broda ◽

Alessandra Russo

Keyword(s):

Reinforcement Learning ◽

Symmetry Breaking ◽

Inductive Logic ◽

Search Space ◽

Learning Performance ◽

Programming System ◽

Continuous State ◽

Automaton Learning ◽

Minimum Number ◽

High Level

In this paper we present ISA, an approach for learning and exploiting subgoals in episodic reinforcement learning (RL) tasks. ISA interleaves reinforcement learning with the induction of a subgoal automaton, an automaton whose edges are labeled by the task’s subgoals expressed as propositional logic formulas over a set of high-level events. A subgoal automaton also consists of two special states: a state indicating the successful completion of the task, and a state indicating that the task has finished without succeeding. A state-of-the-art inductive logic programming system is used to learn a subgoal automaton that covers the traces of high-level events observed by the RL agent. When the currently exploited automaton does not correctly recognize a trace, the automaton learner induces a new automaton that covers that trace. The interleaving process guarantees the induction of automata with the minimum number of states, and applies a symmetry breaking mechanism to shrink the search space whilst remaining complete. We evaluate ISA in several gridworld and continuous state space problems using different RL algorithms that leverage the automaton structures. We provide an in-depth empirical analysis of the automaton learning performance in terms of the traces, the symmetry breaking and specific restrictions imposed on the final learnable automaton. For each class of RL problem, we show that the learned automata can be successfully exploited to learn policies that reach the goal, achieving an average reward comparable to the case where automata are not learned but handcrafted and given beforehand.

Download Full-text

Generative Adversarial Neural Architecture Search

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/307 ◽

2021 ◽

Author(s):

Seyed Saeed Changiz Rezaei ◽

Fred X. Han ◽

Di Niu ◽

Mohammad Salameh ◽

Keith Mills ◽

...

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Importance Sampling ◽

Ad Hoc ◽

Search Space ◽

Learning Approach ◽

Adversarial Learning ◽

Search Spaces ◽

Neural Architecture ◽

Empirical Success

Despite the empirical success of neural architecture search (NAS) in deep learning applications, the optimality, reproducibility and cost of NAS schemes remain hard to assess. In this paper, we propose Generative Adversarial NAS (GA-NAS) with theoretically provable convergence guarantees, promoting stability and reproducibility in neural architecture search. Inspired by importance sampling, GA-NAS iteratively fits a generator to previously discovered top architectures, thus increasingly focusing on important parts of a large search space. Furthermore, we propose an efficient adversarial learning approach, where the generator is trained by reinforcement learning based on rewards provided by a discriminator, thus being able to explore the search space without evaluating a large number of architectures. Extensive experiments show that GA-NAS beats the best published results under several cases on three public NAS benchmarks. In the meantime, GA-NAS can handle ad-hoc search constraints and search spaces. We show that GA-NAS can be used to improve already optimized baselines found by other NAS methods, including EfficientNet and ProxylessNAS, in terms of ImageNet accuracy or the number of parameters, in their original search space.

Download Full-text

Scalable and Efficient Bayes-Adaptive Reinforcement Learning Based on Monte-Carlo Tree Search

Journal of Artificial Intelligence Research ◽

10.1613/jair.4117 ◽

2013 ◽

Vol 48 ◽

pp. 841-883 ◽

Cited By ~ 11

Author(s):

A. Guez ◽

D. Silver ◽

P. Dayan

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Search Space ◽

Search Tree ◽

Benchmark Problems ◽

Tree Search ◽

Monte Carlo Tree Search ◽

The Face ◽

Almost All ◽

Infinite State

Bayesian planning is a formally elegant approach to learning optimal behaviour under model uncertainty, trading off exploration and exploitation in an ideal way. Unfortunately, planning optimally in the face of uncertainty is notoriously taxing, since the search space is enormous. In this paper we introduce a tractable, sample-based method for approximate Bayes-optimal planning which exploits Monte-Carlo tree search. Our approach avoids expensive applications of Bayes rule within the search tree by sampling models from current beliefs, and furthermore performs this sampling in a lazy manner. This enables it to outperform previous Bayesian model-based reinforcement learning algorithms by a significant margin on several well-known benchmark problems. As we show, our approach can even work in problems with an infinite state space that lie qualitatively out of reach of almost all previous work in Bayesian exploration.

Download Full-text