Reinforcement-Learning-Based Quantum Adiabatic Algorithm Design for Integer Programming

SPIN ◽  
2021 ◽  
Author(s):  
Jiawei Zhu

Adiabatic quantum computing (AQC) is a computation protocol to solve difficult problems exploiting quantum advantage, directly applicable to optimization problems. In performing the AQC, different configurations of the Hamiltonian path could lead to dramatic differences in the computation efficiency. It is thus crucial to configure the Hamiltonian path to optimize the computation performance of AQC. Here we apply a reinforcement learning approach to configure AQC for integer programming, where we find the learning process automatically converges to a quantum algorithm that exhibits scaling advantage over the trivial AQC using a linear Hamiltonian path. This reinforcement-learning-based approach for quantum adiabatic algorithm design for integer programming can well be adapted to the quantum resources in different quantum computation devices, due to its built-in flexibility.

2020 ◽  
Vol 10 (16) ◽  
pp. 5574 ◽  
Author(s):  
Ithan Moreira ◽  
Javier Rivas ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Angel Ayala ◽  
...  

Robots are extending their presence in domestic environments every day, it being more common to see them carrying out tasks in home scenarios. In the future, robots are expected to increasingly perform more complex tasks and, therefore, be able to acquire experience from different sources as quickly as possible. A plausible approach to address this issue is interactive feedback, where a trainer advises a learner on which actions should be taken from specific states to speed up the learning process. Moreover, deep reinforcement learning has been recently widely used in robotics to learn the environment and acquire new skills autonomously. However, an open issue when using deep reinforcement learning is the excessive time needed to learn a task from raw input images. In this work, we propose a deep reinforcement learning approach with interactive feedback to learn a domestic task in a Human–Robot scenario. We compare three different learning methods using a simulated robotic arm for the task of organizing different objects; the proposed methods are (i) deep reinforcement learning (DeepRL); (ii) interactive deep reinforcement learning using a previously trained artificial agent as an advisor (agent–IDeepRL); and (iii) interactive deep reinforcement learning using a human advisor (human–IDeepRL). We demonstrate that interactive approaches provide advantages for the learning process. The obtained results show that a learner agent, using either agent–IDeepRL or human–IDeepRL, completes the given task earlier and has fewer mistakes compared to the autonomous DeepRL approach.


2020 ◽  
pp. 91-110
Author(s):  
John M. McNamara ◽  
Olof Leimar

The chapter introduces reinforcement learning in game-theory models. A distinction is made between small-worlds models with Bayesian updating and large-worlds models that implement specific behavioural mechanisms. The actor–critic learning approach is introduced and illustrated with simple examples of learning in a coordination game and in the Hawk–Dove game. Simple versions of a game of investments with joint benefits and a social dominance game are presented, and these games are further developed in Chapter 8. The idea that parameters of the learning process, such as learning rates, can evolve is put forward. For the game examples it is shown that with slow learning over many rounds the outcome can approximate an ESS of a one-shot game, but for higher rates of learning and fewer rounds this need not be the case. The chapter ends with an overview of learning approaches in game theory, including the originally proposed relative-payoff-sum learning rule for games in biology.


Author(s):  
Stuart Armstrong ◽  
Jan Leike ◽  
Laurent Orseau ◽  
Shane Legg

In some agent designs like inverse reinforcement learning an agent needs to learn its own reward function. Learning the reward function and optimising for it are typically two different processes, usually performed at different stages. We consider a continual (``one life'') learning approach where the agent both learns the reward function and optimises for it at the same time. We show that this comes with a number of pitfalls, such as deliberately manipulating the learning process in one direction, refusing to learn, ``learning'' facts already known to the agent, and making decisions that are strictly dominated (for all relevant reward functions). We formally introduce two desirable properties: the first is `unriggability', which prevents the agent from steering the learning process in the direction of a reward function that is easier to optimise. The second is `uninfluenceability', whereby the reward-function learning process operates by learning facts about the environment. We show that an uninfluenceable process is automatically unriggable, and if the set of possible environments is sufficiently large, the converse is true too.


2020 ◽  
Vol 17 (10) ◽  
pp. 129-141
Author(s):  
Yiwen Nie ◽  
Junhui Zhao ◽  
Jun Liu ◽  
Jing Jiang ◽  
Ruijin Ding

2016 ◽  
Author(s):  
Dario di Nocera ◽  
Alberto Finzi ◽  
Silvia Rossi ◽  
Mariacarla Staffa

Sign in / Sign up

Export Citation Format

Share Document