Reinforcement-Learning-Based Quantum Adiabatic Algorithm Design for Integer Programming

SPIN ◽

10.1142/s2010324721400099 ◽

2021 ◽

Author(s):

Jiawei Zhu

Keyword(s):

Quantum Computing ◽

Reinforcement Learning ◽

Integer Programming ◽

Learning Process ◽

Optimization Problems ◽

Algorithm Design ◽

Hamiltonian Path ◽

Quantum Algorithm ◽

Learning Approach ◽

Computation Efficiency

Adiabatic quantum computing (AQC) is a computation protocol to solve difficult problems exploiting quantum advantage, directly applicable to optimization problems. In performing the AQC, different configurations of the Hamiltonian path could lead to dramatic differences in the computation efficiency. It is thus crucial to configure the Hamiltonian path to optimize the computation performance of AQC. Here we apply a reinforcement learning approach to configure AQC for integer programming, where we find the learning process automatically converges to a quantum algorithm that exhibits scaling advantage over the trivial AQC using a linear Hamiltonian path. This reinforcement-learning-based approach for quantum adiabatic algorithm design for integer programming can well be adapted to the quantum resources in different quantum computation devices, due to its built-in flexibility.

Download Full-text

Deep Reinforcement Learning with Interactive Feedback in a Human–Robot Environment

Applied Sciences ◽

10.3390/app10165574 ◽

2020 ◽

Vol 10 (16) ◽

pp. 5574 ◽

Cited By ~ 4

Author(s):

Ithan Moreira ◽

Javier Rivas ◽

Francisco Cruz ◽

Richard Dazeley ◽

Angel Ayala ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Robotic Arm ◽

Learning Approach ◽

Speed Up ◽

Open Issue ◽

Domestic Environments ◽

The Given ◽

Different Sources ◽

Interactive Feedback

Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.

Download Full-text

Learning in Large Worlds

Game Theory in Biology ◽

10.1093/oso/9780198815778.003.0005 ◽

2020 ◽

pp. 91-110

Author(s):

John M. McNamara ◽

Olof Leimar

Keyword(s):

Game Theory ◽

Reinforcement Learning ◽

Learning Process ◽

Learning Rule ◽

Bayesian Updating ◽

Learning Approach ◽

Learning Approaches ◽

Small Worlds ◽

Learning Rates ◽

Relative Payoff

The chapter introduces reinforcement learning in game-theory models. A distinction is made between small-worlds models with Bayesian updating and large-worlds models that implement specific behavioural mechanisms. The actor–critic learning approach is introduced and illustrated with simple examples of learning in a coordination game and in the Hawk–Dove game. Simple versions of a game of investments with joint benefits and a social dominance game are presented, and these games are further developed in Chapter 8. The idea that parameters of the learning process, such as learning rates, can evolve is put forward. For the game examples it is shown that with slow learning over many rounds the outcome can approximate an ESS of a one-shot game, but for higher rates of learning and fewer rounds this need not be the case. The chapter ends with an overview of learning approaches in game theory, including the originally proposed relative-payoff-sum learning rule for games in biology.

Download Full-text

Pitfalls of Learning a Reward Function Online

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/221 ◽

2020 ◽

Author(s):

Stuart Armstrong ◽

Jan Leike ◽

Laurent Orseau ◽

Shane Legg

Keyword(s):

Reinforcement Learning ◽

Learning Process ◽

Learning Approach ◽

Inverse Reinforcement Learning ◽

Function Learning ◽

Reward Function ◽

Reward Functions

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction, refusing to learn, ``learning'' facts already known to the agent, and making decisions that are strictly dominated (for all relevant reward functions). We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. The second is `uninfluenceability', whereby the reward-function learning process operates by learning facts about the environment. We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently large, the converse is true too.

Download Full-text