scholarly journals Rational models of conditioning

2009 ◽  
Vol 32 (2) ◽  
pp. 204-205 ◽  
Author(s):  
Nick Chater

AbstractMitchell et al. argue that conditioning phenomena may be better explained by high-level, rational processes, rather than by non-cognitive associative mechanisms. This commentary argues that this viewpoint is compatible with neuroscientific data, may extend to nonhuman animals, and casts computational models of reinforcement learning in a new light.

2020 ◽  
Author(s):  
Clay B. Holroyd ◽  
Tom Verguts

Despite continual debate for the past thirty years about the function of anterior cingulate cortex (ACC), its key contribution to neurocognition remains unknown. Here we review computational models that illustrate three core principles of ACC function (related to hierarchy, world models and cost), as well as four constraints on the neural implementation of these principles (related to modularity, binding, encoding and learning and regulation). These observations suggest a role for ACC in hierarchical model-based hierarchical reinforcement learning, which instantiates a mechanism for motivating the execution of high-level plans.


2021 ◽  
Vol 11 (3) ◽  
pp. 1291
Author(s):  
Bonwoo Gu ◽  
Yunsick Sung

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Bo-yong Park ◽  
Seok-Jun Hong ◽  
Sofie L. Valk ◽  
Casey Paquola ◽  
Oualid Benkarim ◽  
...  

AbstractThe pathophysiology of autism has been suggested to involve a combination of both macroscale connectome miswiring and microcircuit anomalies. Here, we combine connectome-wide manifold learning with biophysical simulation models to understand associations between global network perturbations and microcircuit dysfunctions in autism. We studied neuroimaging and phenotypic data in 47 individuals with autism and 37 typically developing controls obtained from the Autism Brain Imaging Data Exchange initiative. Our analysis establishes significant differences in structural connectome organization in individuals with autism relative to controls, with strong between-group effects in low-level somatosensory regions and moderate effects in high-level association cortices. Computational models reveal that the degree of macroscale anomalies is related to atypical increases of recurrent excitation/inhibition, as well as subcortical inputs into cortical microcircuits, especially in sensory and motor areas. Transcriptomic association analysis based on postmortem datasets identifies genes expressed in cortical and thalamic areas from childhood to young adulthood. Finally, supervised machine learning finds that the macroscale perturbations are associated with symptom severity scores on the Autism Diagnostic Observation Schedule. Together, our analyses suggest that atypical subcortico-cortical interactions are associated with both microcircuit and macroscale connectome differences in autism.


2021 ◽  
Vol 31 (3) ◽  
pp. 1-26
Author(s):  
Aravind Balakrishnan ◽  
Jaeyoung Lee ◽  
Ashish Gaurav ◽  
Krzysztof Czarnecki ◽  
Sean Sedwards

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2534
Author(s):  
Oualid Doukhi ◽  
Deok-Jin Lee

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor–critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV’s state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system’s effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.


2019 ◽  
Author(s):  
Allison Letkiewicz ◽  
Amy L. Cochran ◽  
Josh M. Cisler

Trauma and trauma-related disorders are characterized by altered learning styles. Two learning processes that have been delineated using computational modeling are model-free and model-based reinforcement learning (RL), characterized by trial and error and goal-driven, rule-based learning, respectively. Prior research suggests that model-free RL is disrupted among individuals with a history of assaultive trauma and may contribute to altered fear responding. Currently, it is unclear whether model-based RL, which involves building abstract and nuanced representations of stimulus-outcome relationships to prospectively predict action-related outcomes, is also impaired among individuals who have experienced trauma. The present study sought to test the hypothesis of impaired model-based RL among adolescent females exposed to assaultive trauma. Participants (n=60) completed a three-arm bandit RL task during fMRI acquisition. Two computational models compared the degree to which each participant’s task behavior fit the use of a model-free versus model-based RL strategy. Overall, a greater portion of participants’ behavior was better captured by the model-based than model-free RL model. Although assaultive trauma did not predict learning strategy use, greater sexual abuse severity predicted less use of model-based compared to model-free RL. Additionally, severe sexual abuse predicted less left frontoparietal network encoding of model-based RL updates, which was not accounted for by PTSD. Given the significant impact that sexual trauma has on mental health and other aspects of functioning, it is plausible that altered model-based RL is an important route through which clinical impairment emerges.


2021 ◽  
Author(s):  
Shi Pui Donald Li ◽  
Michael F. Bonner

The scene-preferring portion of the human ventral visual stream, known as the parahippocampal place area (PPA), responds to scenes and landmark objects, which tend to be large in real-world size, fixed in location, and inanimate. However, the PPA also exhibits preferences for low-level contour statistics, including rectilinearity and cardinal orientations, that are not directly predicted by theories of scene- and landmark-selectivity. It is unknown whether these divergent findings of both low- and high-level selectivity in the PPA can be explained by a unified computational theory. To address this issue, we fit hierarchical computational models of mid-level tuning to the image-evoked fMRI responses of the PPA, and we performed a series of high-throughput experiments on these models. Our findings show that hierarchical encoding models of the PPA exhibit emergent selectivity across multiple levels of complexity, giving rise to high-level preferences along dimensions of real-world size, fixedness, and naturalness/animacy as well as low-level preferences for rectilinear shapes and cardinal orientations. These results reconcile disparate theories of PPA function in a unified model of mid-level visual representation, and they demonstrate how multifaceted selectivity profiles naturally emerge from the hierarchical computations of visual cortex and the natural statistics of images.


Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


Sign in / Sign up

Export Citation Format

Share Document