scholarly journals On the Impact of Gravity Compensation on Reinforcement Learning in Goal-Reaching Tasks for Robotic Manipulators

Robotics ◽  
2021 ◽  
Vol 10 (1) ◽  
pp. 46
Author(s):  
Jonathan Fugal ◽  
Jihye Bae ◽  
Hasan A. Poonawala

Advances in machine learning technologies in recent years have facilitated developments in autonomous robotic systems. Designing these autonomous systems typically requires manually specified models of the robotic system and world when using classical control-based strategies, or time consuming and computationally expensive data-driven training when using learning-based strategies. Combination of classical control and learning-based strategies may mitigate both requirements. However, the performance of the combined control system is not obvious given that there are two separate controllers. This paper focuses on one such combination, which uses gravity-compensation together with reinforcement learning (RL). We present a study of the effects of gravity compensation on the performance of two reinforcement learning algorithms when solving reaching tasks using a simulated seven-degree-of-freedom robotic arm. The results of our study demonstrate that gravity compensation coupled with RL can reduce the training required in reaching tasks involving elevated target locations, but not all target locations.

2019 ◽  
Author(s):  
Jennifer R Sadler ◽  
Grace Elisabeth Shearrer ◽  
Nichollette Acosta ◽  
Kyle Stanley Burger

BACKGROUND: Dietary restraint represents an individual’s intent to limit their food intake and has been associated with impaired passive food reinforcement learning. However, the impact of dietary restraint on an active, response dependent learning is poorly understood. In this study, we tested the relationship between dietary restraint and food reinforcement learning using an active, instrumental conditioning task. METHODS: A sample of ninety adults completed a response-dependent instrumental conditioning task with reward and punishment using sweet and bitter tastes. Brain response via functional MRI was measured during the task. Participants also completed anthropometric measures, reward/motivation related questionnaires, and a working memory task. Dietary restraint was assessed via the Dutch Restrained Eating Scale. RESULTS: Two groups were selected from the sample: high restraint (n=29, score >2.5) and low restraint (n=30; score <1.85). High restraint was associated with significantly higher BMI (p=0.003) and lower N-back accuracy (p=0.045). The high restraint group also was marginally better at the instrumental conditioning task (p=0.066, r=0.37). High restraint was also associated with significantly greater brain response in the intracalcarine cortex (MNI: 15, -69, 12; k=35, pfwe< 0.05) to bitter taste, compared to neutral taste.CONCLUSIONS: High restraint was associated with improved performance on an instrumental task testing how individuals learn from reward and punishment. This may be mediated by greater brain response in the primary visual cortex, which has been associated with mental representation. Results suggest that dietary restraint does not impair response-dependent reinforcement learning.


Biomimetics ◽  
2021 ◽  
Vol 6 (1) ◽  
pp. 13
Author(s):  
Adam Bignold ◽  
Francisco Cruz ◽  
Richard Dazeley ◽  
Peter Vamplew ◽  
Cameron Foale

Interactive reinforcement learning methods utilise an external information source to evaluate decisions and accelerate learning. Previous work has shown that human advice could significantly improve learning agents’ performance. When evaluating reinforcement learning algorithms, it is common to repeat experiments as parameters are altered or to gain a sufficient sample size. In this regard, to require human interaction every time an experiment is restarted is undesirable, particularly when the expense in doing so can be considerable. Additionally, reusing the same people for the experiment introduces bias, as they will learn the behaviour of the agent and the dynamics of the environment. This paper presents a methodology for evaluating interactive reinforcement learning agents by employing simulated users. Simulated users allow human knowledge, bias, and interaction to be simulated. The use of simulated users allows the development and testing of reinforcement learning agents, and can provide indicative results of agent performance under defined human constraints. While simulated users are no replacement for actual humans, they do offer an affordable and fast alternative for evaluative assisted agents. We introduce a method for performing a preliminary evaluation utilising simulated users to show how performance changes depending on the type of user assisting the agent. Moreover, we describe how human interaction may be simulated, and present an experiment illustrating the applicability of simulating users in evaluating agent performance when assisted by different types of trainers. Experimental results show that the use of this methodology allows for greater insight into the performance of interactive reinforcement learning agents when advised by different users. The use of simulated users with varying characteristics allows for evaluation of the impact of those characteristics on the behaviour of the learning agent.


2021 ◽  
Vol 11 (4) ◽  
pp. 1514 ◽  
Author(s):  
Quang-Duy Tran ◽  
Sang-Hoon Bae

To reduce the impact of congestion, it is necessary to improve our overall understanding of the influence of the autonomous vehicle. Recently, deep reinforcement learning has become an effective means of solving complex control tasks. Accordingly, we show an advanced deep reinforcement learning that investigates how the leading autonomous vehicles affect the urban network under a mixed-traffic environment. We also suggest a set of hyperparameters for achieving better performance. Firstly, we feed a set of hyperparameters into our deep reinforcement learning agents. Secondly, we investigate the leading autonomous vehicle experiment in the urban network with different autonomous vehicle penetration rates. Thirdly, the advantage of leading autonomous vehicles is evaluated using entire manual vehicle and leading manual vehicle experiments. Finally, the proximal policy optimization with a clipped objective is compared to the proximal policy optimization with an adaptive Kullback–Leibler penalty to verify the superiority of the proposed hyperparameter. We demonstrate that full automation traffic increased the average speed 1.27 times greater compared with the entire manual vehicle experiment. Our proposed method becomes significantly more effective at a higher autonomous vehicle penetration rate. Furthermore, the leading autonomous vehicles could help to mitigate traffic congestion.


2020 ◽  
Vol 10 (1) ◽  
Author(s):  
Abu Quwsar Ohi ◽  
M. F. Mridha ◽  
Muhammad Mostafa Monowar ◽  
Md. Abdul Hamid

AbstractPandemic defines the global outbreak of a disease having a high transmission rate. The impact of a pandemic situation can be lessened by restricting the movement of the mass. However, one of its concomitant circumstances is an economic crisis. In this article, we demonstrate what actions an agent (trained using reinforcement learning) may take in different possible scenarios of a pandemic depending on the spread of disease and economic factors. To train the agent, we design a virtual pandemic scenario closely related to the present COVID-19 crisis. Then, we apply reinforcement learning, a branch of artificial intelligence, that deals with how an individual (human/machine) should interact on an environment (real/virtual) to achieve the cherished goal. Finally, we demonstrate what optimal actions the agent perform to reduce the spread of disease while considering the economic factors. In our experiment, we let the agent find an optimal solution without providing any prior knowledge. After training, we observed that the agent places a long length lockdown to reduce the first surge of a disease. Furthermore, the agent places a combination of cyclic lockdowns and short length lockdowns to halt the resurgence of the disease. Analyzing the agent’s performed actions, we discover that the agent decides movement restrictions not only based on the number of the infectious population but also considering the reproduction rate of the disease. The estimation and policy of the agent may improve the human-strategy of placing lockdown so that an economic crisis may be avoided while mitigating an infectious disease.


Author(s):  
Shan Li ◽  
Ying Gao ◽  
Tao Ba ◽  
Wei Zhao

In many countries, energy-saving and emissions mitigation for urban travel and public transportation are important for smart city developments. It is essential to understand the impact of smart transportation (ST) in public transportation in the context of energy savings in smart cities. The general strategy and significant ideas in developing ST for smart cities, focusing on deep learning technologies, simulation experiments, and simultaneous formulation, are in progress. This study hence presents simultaneous transportation monitoring and management frameworks (STMF ). STMF has the potential to be extended to the next generation of smart transportation infrastructure. The proposed framework consists of community signal and community traffic, ST platforms and applications, agent-based traffic control, and transportation expertise augmentation. Experimental outcomes exhibit better quality metrics of the proposed STMF technique in energy saving and emissions mitigation for urban travel and public transportation than other conventional approaches. The deployed system improves the accuracy, consistency, and F-1 measure by 27.50%, 28.81%, and 31.12%. It minimizes the error rate by 75.35%.


2013 ◽  
Vol 5 (4) ◽  
Author(s):  
Nick Eckenstein ◽  
Mark Yim

Two new designs for gravity compensated modular robotic systems are presented and analyzed. The gravity compensation relies on using zero-free-length springs approximated by a cable and pulley system. Simple yet powerful parallel four-bar modules enable the low-profile self-contained modules with sequential gravity compensation using the spring method for motion in a vertical plane. A second module that is formed as a parallel six-bar mechanism adds a horizontal motion to the previous system that also yields a complete decoupling of position and orientation of the distal end of a serial chain. Additionally, we introduce the concept of vanishing effort where as the number of modules that comprise an articulated serial chain increases, the actuation authority required at any joint reduces. Essentially, this results in a method for distributing actuation along the length of an articulated chain. Prototypes were designed and constructed validating the analysis and accomplishing the functions of a general serial-type manipulator arm.


2021 ◽  
Author(s):  
André Forster ◽  
Johannes Hewig ◽  
John JB Allen ◽  
Johannes Rodrigues ◽  
Philipp Ziebell ◽  
...  

Being able to control inner and environmental states is a basic need of living creatures. Control perception (CP) itself may be neurally computed as the subjective ratio of outcome probabilities given the presence and the absence of behavior. If behavior increases the perceived probability of a given outcome, action-outcome contingency is met, and CP may emerge. Nonetheless, in regard of this model, not much is known on how the brain processes CP from these information. This study uses low-intensity transcranial focused ultrasound neuromodulation in a randomized-controlled doubleblind cross-over design to investigate the impact of the right inferior frontal gyrus on this process. Fourty healthy participants visited the laboratory twice (once in a sham, once in a neuromodulation condition) and rated their control perception regarding a classical control illusion task. EEG alpha and theta power density were analyzed in a hierarchical single trial based mixed modeling approach. Results indicate that the right lateral PFC modulates action-outcome learning by providing stochastic information about the situation with increased alpha responses during low control situations (in which the ratio of probabilities is zero). Furthermore, this alpha response was found to modulate mid-frontal theta by altering its relationship with self-reported effort and worrying. These data provide evidencefor right lateral PFC mediated probabilistic stimulus processing during the emergence of CP.


Complexity ◽  
2021 ◽  
Vol 2021 ◽  
pp. 1-7
Author(s):  
Xiaoyi Long ◽  
Zheng He ◽  
Zhongyuan Wang

This paper suggests an online solution for the optimal tracking control of robotic systems based on a single critic neural network (NN)-based reinforcement learning (RL) method. To this end, we rewrite the robotic system model as a state-space form, which will facilitate the realization of optimal tracking control synthesis. To maintain the tracking response, a steady-state control is designed, and then an adaptive optimal tracking control is used to ensure that the tracking error can achieve convergence in an optimal sense. To solve the obtained optimal control via the framework of adaptive dynamic programming (ADP), the command trajectory to be tracked and the modified tracking Hamilton-Jacobi-Bellman (HJB) are all formulated. An online RL algorithm is the developed to address the HJB equation using a critic NN with online learning algorithm. Simulation results are given to verify the effectiveness of the proposed method.


2020 ◽  
Author(s):  
Than Le

<p>In this chapter, we address the competent Autonomous Vehicles should have the ability to analyze the structure and unstructured environments and then to localize itself relative to surrounding things, where GPS, RFID or other similar means cannot give enough information about the location. Reliable SLAM is the most basic prerequisite for any further artificial intelligent tasks of an autonomous mobile robots. The goal of this paper is to simulate a SLAM process on the advanced software development. The model represents the system itself, whereas the simulation represents the operation of the system over time. And the software architecture will help us to focus our work to realize our wish with least trivial work. It is an open-source meta-operating system, which provides us tremendous tools for robotics related problems.</p> <p>Specifically, we address the advanced vehicles should have the ability to analyze the structured and unstructured environment based on solving the search-based planning and then we move to discuss interested in reinforcement learning-based model to optimal trajectory in order to apply to autonomous systems.</p>


Sign in / Sign up

Export Citation Format

Share Document