reinforcement learning models
Recently Published Documents


TOTAL DOCUMENTS

109
(FIVE YEARS 57)

H-INDEX

16
(FIVE YEARS 4)

2021 ◽  
Author(s):  
Haifei Zhang ◽  
Xu Jian ◽  
Liting Lei ◽  
Fang Wu ◽  
Lanmei Qian ◽  
...  

Abstract Focusing on the motion control problem of two link manipulator, a manipulator control approach based on deep deterministic policy gradient with parameter noise is proposed. Firstly, the manipulator simulation environment is built. And then the three deep reinforcement learning models named the deep deterministic policy gradient (DDPG), asynchronous advantage actor-critical (A3C) and distributed proximal policy optimization (DPPO) are established for training according to the target setting, state variables and reward & punishment mechanism of the environment model. Finally the motion control of two link manipulator is realized. After comparing and analyzing the three models, the DDPG approach based on parameter noise is proposed for further research to improve its applicability, so as to cut down the debugging time of the manipulator model and reach the goal smoothly. The experimental results indicate that the DDPG approach based on parameter noise can control the motion of two link manipulator effectively. The convergence speed of the control model is significantly promoted and the stability after convergence is improved. In comparison with the traditional control approach, the DDPG control approach based on parameter noise has higher efficiency and stronger applicability.


2021 ◽  
Vol 11 (22) ◽  
pp. 10870
Author(s):  
Abdikarim Mohamed Ibrahim ◽  
Kok-Lim Alvin Yau ◽  
Yung-Wey Chong ◽  
Celimuge Wu

Recent advancements in deep reinforcement learning (DRL) have led to its application in multi-agent scenarios to solve complex real-world problems, such as network resource allocation and sharing, network routing, and traffic signal controls. Multi-agent DRL (MADRL) enables multiple agents to interact with each other and with their operating environment, and learn without the need for external critics (or teachers), thereby solving complex problems. Significant performance enhancements brought about by the use of MADRL have been reported in multi-agent domains; for instance, it has been shown to provide higher quality of service (QoS) in network resource allocation and sharing. This paper presents a survey of MADRL models that have been proposed for various kinds of multi-agent domains, in a taxonomic approach that highlights various aspects of MADRL models and applications, including objectives, characteristics, challenges, applications, and performance measures. Furthermore, we present open issues and future directions of MADRL.


2021 ◽  
Author(s):  
Jessica Katherine Bone ◽  
Alexandra Claire Pike ◽  
Gemma Lewis ◽  
Glyn Lewis ◽  
Sarah-Jayne Blakemore ◽  
...  

There is a sharp increase in depression in adolescence, but why this occurs is not well understood. We investigated how adolescents learn about social evaluation and whether learning is associated with depressive symptoms. In a cross-sectional school-based study, 598 adolescents (aged 11-15 years) completed a social evaluation learning task and the short Mood and Feelings Questionnaire. We developed and validated reinforcement learning models, formalising the processes hypothesised to underlie learning about social evaluation. Adolescents started the learning task with a positive expectation that they and others would be liked, and this positive bias was larger for the self than others. Expectations about the self were more resistant to feedback than expectations about others. Only initial expectations were associated with depressive symptoms; adolescents whose expectations were less positive had more severe symptoms. Consistent with cognitive theories, prior beliefs about social evaluation may be a risk factor for depressive symptoms.


2021 ◽  
Author(s):  
Lena Esther Ptasczynski ◽  
Isa Steinecker ◽  
Philipp Sterzer ◽  
Matthias Guggenmos

Reinforcement learning algorithms have a long-standing success story in explaining the dynamics of instrumental conditioning in humans and other species. While normative reinforcement learning models are critically dependent on external feedback, recent findings in the field of perceptual learning point to a crucial role of internally-generated reinforcement signals based on subjective confidence, when external feedback is not available. Here, we investigated the existence of such confidence-based learning signals in a key domain of reinforcement-based learning: instrumental conditioning. We conducted a value-based decision making experiment which included phases with and without external feedback and in which participants reported their confidence in addition to choices. Behaviorally, we found signatures of self-reinforcement in phases without feedback, reflected in an increase of subjective confidence and choice consistency. To clarify the mechanistic role of confidence in value-based learning, we compared a family of confidence-based learning models with more standard models predicting either no change in value estimates or a devaluation over time when no external reward is provided. We found that confidence-based models indeed outperformed these reference models, whereby the learning signal of the winning model was based on the prediction error between current confidence and a stimulus-unspecific average of previous confidence levels. Interestingly, individuals with more volatile reward-based value updates in the presence of feedback also showed more volatile confidence-based value updates when feedback was not available. Together, our results provide evidence that confidence-based learning signals affect instrumentally learned subjective values in the absence of external feedback.


2021 ◽  
Vol 11 (16) ◽  
pp. 7240
Author(s):  
Yalew Zelalem Jembre ◽  
Yuniarto Wimbo Nugroho ◽  
Muhammad Toaha Raza Khan ◽  
Muhammad Attique ◽  
Rajib Paul ◽  
...  

Unmanned Aerial Vehicles (UAVs) are abundantly becoming a part of society, which is a trend that is expected to grow even further. The quadrotor is one of the drone technologies that is applicable in many sectors and in both military and civilian activities, with some applications requiring autonomous flight. However, stability, path planning, and control remain significant challenges in autonomous quadrotor flights. Traditional control algorithms, such as proportional-integral-derivative (PID), have deficiencies, especially in tuning. Recently, machine learning has received great attention in flying UAVs to desired positions autonomously. In this work, we configure the quadrotor to fly autonomously by using agents (the machine learning schemes being used to fly the quadrotor autonomously) to learn about the virtual physical environment. The quadrotor will fly from an initial to a desired position. When the agent brings the quadrotor closer to the desired position, it is rewarded; otherwise, it is punished. Two reinforcement learning models, Q-learning and SARSA, and a deep learning deep Q-network network are used as agents. The simulation is conducted by integrating the robot operating system (ROS) and Gazebo, which allowed for the implementation of the learning algorithms and the physical environment, respectively. The result has shown that the Deep Q-network network with Adadelta optimizer is the best setting to fly the quadrotor from the initial to desired position.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Jo Cutler ◽  
Marco K. Wittmann ◽  
Ayat Abdurahman ◽  
Luca D. Hargitai ◽  
Daniel Drew ◽  
...  

AbstractReinforcement learning is a fundamental mechanism displayed by many species. However, adaptive behaviour depends not only on learning about actions and outcomes that affect ourselves, but also those that affect others. Using computational reinforcement learning models, we tested whether young (age 18–36) and older (age 60–80, total n = 152) adults learn to gain rewards for themselves, another person (prosocial), or neither individual (control). Detailed model comparison showed that a model with separate learning rates for each recipient best explained behaviour. Young adults learned faster when their actions benefitted themselves, compared to others. Compared to young adults, older adults showed reduced self-relevant learning rates but preserved prosocial learning. Moreover, levels of subclinical self-reported psychopathic traits (including lack of concern for others) were lower in older adults and the core affective-interpersonal component of this measure negatively correlated with prosocial learning. These findings suggest learning to benefit others is preserved across the lifespan with implications for reinforcement learning and theories of healthy ageing.


2021 ◽  
Vol 17 (7) ◽  
pp. e1008524
Author(s):  
Liyu Xia ◽  
Sarah L. Master ◽  
Maria K. Eckstein ◽  
Beth Baribault ◽  
Ronald E. Dahl ◽  
...  

In the real world, many relationships between events are uncertain and probabilistic. Uncertainty is also likely to be a more common feature of daily experience for youth because they have less experience to draw from than adults. Some studies suggest probabilistic learning may be inefficient in youths compared to adults, while others suggest it may be more efficient in youths in mid adolescence. Here we used a probabilistic reinforcement learning task to test how youth age 8-17 (N = 187) and adults age 18-30 (N = 110) learn about stable probabilistic contingencies. Performance increased with age through early-twenties, then stabilized. Using hierarchical Bayesian methods to fit computational reinforcement learning models, we show that all participants’ performance was better explained by models in which negative outcomes had minimal to no impact on learning. The performance increase over age was driven by 1) an increase in learning rate (i.e. decrease in integration time scale); 2) a decrease in noisy/exploratory choices. In mid-adolescence age 13-15, salivary testosterone and learning rate were positively related. We discuss our findings in the context of other studies and hypotheses about adolescent brain development.


2021 ◽  
Vol 8 (7) ◽  
pp. 202159
Author(s):  
Jonathan Yi ◽  
Philip Pärnamets ◽  
Andreas Olsson

Responding appropriately to others' facial expressions is key to successful social functioning. Despite the large body of work on face perception and spontaneous responses to static faces, little is known about responses to faces in dynamic, naturalistic situations, and no study has investigated how goal directed responses to faces are influenced by learning during dyadic interactions. To experimentally model such situations, we developed a novel method based on online integration of electromyography signals from the participants’ face (corrugator supercilii and zygomaticus major) during facial expression exchange with dynamic faces displaying happy and angry facial expressions. Fifty-eight participants learned by trial-and-error to avoid receiving aversive stimulation by either reciprocate (congruently) or respond opposite (incongruently) to the expression of the target face. Our results validated our method, showing that participants learned to optimize their facial behaviour, and replicated earlier findings of faster and more accurate responses in congruent versus incongruent conditions. Moreover, participants performed better on trials when confronted with smiling, when compared with frowning, faces, suggesting it might be easier to adapt facial responses to positively associated expressions. Finally, we applied drift diffusion and reinforcement learning models to provide a mechanistic explanation for our findings which helped clarifying the underlying decision-making processes of our experimental manipulation. Our results introduce a new method to study learning and decision-making in facial expression exchange, in which there is a need to gradually adapt facial expression selection to both social and non-social reinforcements.


2021 ◽  
Author(s):  
Sebastian Bruch ◽  
Patrick McClure ◽  
Jingfeng Zhou ◽  
Geoffrey Schoenbaum ◽  
Francisco Pereira

Deep Reinforcement Learning (Deep RL) agents have in recent years emerged as successful models of animal behavior in a variety of complex learning tasks, as exemplified by Song et al. [2017]. As agents are typically trained to mimic an animal subject, the emphasis in past studies on behavior as a means of evaluating the fitness of models to experimental data is only natural. But the true power of Deep RL agents lies in their ability to learn neural computations and codes that generate a particular behavior|factors that are also of great relevance and interest to computational neuroscience. On that basis, we believe that model evaluation should include an examination of neural representations and validation against neural recordings from animal subjects. In this paper, we introduce a procedure to test hypotheses about the relationship between internal representations of Deep RL agents and those in animal neural recordings. Taking a sequential learning task as a running example, we apply our method and show that the geometry of representations learnt by artificial agents is similar to that of the biological subjects', and that such similarities are driven by shared information in some latent space. Our method is applicable to any Deep RL agent that learns a Markov Decision Process, and as such enables researchers to assess the suitability of more advanced Deep Learning modules, or map hierarchies of representations to different parts of a circuit in the brain, and help shed light on their function. To demonstrate that point, we conduct an ablation study to deduce that, in the sequential task under consideration, temporal information plays a key role in molding a correct representation of the task.


Sign in / Sign up

Export Citation Format

Share Document