scholarly journals Improving Reinforcement Learning with Human Input

Author(s):  
Matthew E. Taylor

Reinforcement learning (RL) has had many successes when learning autonomously. This paper and accompanying talk consider how to make use of a non-technical human participant, when available. In particular, we consider the case where a human could 1) provide demonstrations of good behavior, 2) provide online evaluative feedback, or 3) define a curriculum of tasks for the agent to learn on. In all cases, our work has shown such information can be effectively leveraged. After giving a high-level overview of this work, we will highlight a set of open questions and suggest where future work could be usefully focused.

2021 ◽  
Vol 11 (3) ◽  
pp. 1291
Author(s):  
Bonwoo Gu ◽  
Yunsick Sung

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.


2021 ◽  
Vol 31 (3) ◽  
pp. 1-26
Author(s):  
Aravind Balakrishnan ◽  
Jaeyoung Lee ◽  
Ashish Gaurav ◽  
Krzysztof Czarnecki ◽  
Sean Sedwards

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.


Sensors ◽  
2021 ◽  
Vol 21 (7) ◽  
pp. 2534
Author(s):  
Oualid Doukhi ◽  
Deok-Jin Lee

Autonomous navigation and collision avoidance missions represent a significant challenge for robotics systems as they generally operate in dynamic environments that require a high level of autonomy and flexible decision-making capabilities. This challenge becomes more applicable in micro aerial vehicles (MAVs) due to their limited size and computational power. This paper presents a novel approach for enabling a micro aerial vehicle system equipped with a laser range finder to autonomously navigate among obstacles and achieve a user-specified goal location in a GPS-denied environment, without the need for mapping or path planning. The proposed system uses an actor–critic-based reinforcement learning technique to train the aerial robot in a Gazebo simulator to perform a point-goal navigation task by directly mapping the noisy MAV’s state and laser scan measurements to continuous motion control. The obtained policy can perform collision-free flight in the real world while being trained entirely on a 3D simulator. Intensive simulations and real-time experiments were conducted and compared with a nonlinear model predictive control technique to show the generalization capabilities to new unseen environments, and robustness against localization noise. The obtained results demonstrate our system’s effectiveness in flying safely and reaching the desired points by planning smooth forward linear velocity and heading rates.


EvoDevo ◽  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Alice Laciny

AbstractAs social insects, ants represent extremely interaction-rich biological systems shaped by tightly integrated social structures and constant mutual exchange with a multitude of internal and external environmental factors. Due to this high level of ecological interconnection, ant colonies can harbour a diverse array of parasites and pathogens, many of which are known to interfere with the delicate processes of ontogeny and caste differentiation and induce phenotypic changes in their hosts. Despite their often striking nature, parasite-induced changes to host development and morphology have hitherto been largely overlooked in the context of ecological evolutionary developmental biology (EcoEvoDevo). Parasitogenic morphologies in ants can, however, serve as “natural experiments” that may shed light on mechanisms and pathways relevant to host development, plasticity or robustness under environmental perturbations, colony-level effects and caste evolution. By assessing case studies of parasites causing morphological changes in their ant hosts, from the eighteenth century to current research, this review article presents a first overview of relevant host and parasite taxa. Hypotheses about the underlying developmental and evolutionary mechanisms, and open questions for further research are discussed. This will contribute towards highlighting the importance of parasites of social insects for both biological theory and empirical research and facilitate future interdisciplinary work at the interface of myrmecology, parasitology, and the EcoEvoDevo framework.


2016 ◽  
Vol 138 (09) ◽  
pp. S8-S13 ◽  
Author(s):  
Thiago Marinho ◽  
Christopher Widdowson ◽  
Amy Oetting ◽  
Arun Lakshmanan ◽  
Hang Cui ◽  
...  

This article demonstrates a multidisciplinary approach that proposes to augment future caregiving by prolonged independence of older adults. The human–robot system allows the elderly to cooperate with small flying robots through an appropriate interface. ASPIRE provides a platform where high-level controllers can be designed to provide a layer of abstraction between the high-level task requests, the perceptual needs of the users, and the physical demands of the robotic platforms. With a robust framework that has the capability to account for human perception and comfort level, one can provide perceived safety for older adults, and further, add expressively that facilitates communication and interaction continuously throughout the stimulation. The proposed framework relies on an iterative process of low-level controllers design through experimental data collected from psychological trials. Future work includes the exploration of multiple carebots to cooperatively assist in caregiving tasks based on human-centered design approach.


2021 ◽  
pp. 114-124
Author(s):  
Татьяна Вячеславовна Кошкина

Показано, что высокий уровень физической подготовленности студентов обеспечит их способность к качественному выполнению будущей трудовой деятельности. Для оценки уровня физической подготовленности студентов в рамках их физического воспитания в вузе возможно использовать нормативы комплекса «Готов к труду и обороне» (ГТО) как универсального оценочного механизма, позволяющего выделять наиболее физически развитых представителей данного поколения. С этой целью определено соответствие уровня физической подготовленности современных студентов не физкультурных специальностей с нормами комплекса ГТО. Выявлены пути совершенствования физической подготовки студентов в условиях современного вуза. Материалом для исследования послужили теоретические и эмпирические данные, полученные на основе использования методов теоретического анализа специальной литературы и передового педагогического опыта, педагогического эксперимента, контрольных испытаний, математико-статистических методов обработки и анализа данных. Результаты исследования подтверждают, что в настоящее время уровень физической подготовленности студентов не всегда соответствует требованиям норм ГТО. Данный факт был доказан экспериментально на базе Марийского государственного университета. Требуется дополнительная работа по физической подготовке студентов. С этой целью сформулированы методические рекомендации по совершенствованию физической подготовки студентов в соответствии с нормами ГТО. Теоретически обоснована и эмпирически доказана целесообразность использования нормативов ГТО в качестве системы оценивания физической подготовленности студентов. Сформулированы методические рекомендации по повышению уровня физической подготовленности студентов в соответствии с нормативами комплекса ГТО. The importance of physical culture and sports in human life, associated with maintaining the health of the nation as a whole and ensuring individual health and working capacity of the individual in particular, is realized in the form of physical education carried out in educational institutions, including universities. A high level of physical fitness of students will ensure their ability to perform high-quality future work activities. In order to assess the level of physical fitness of students within the framework of their physical education at the university, it is possible to use the standards of the GTO complex as a universal evaluation mechanism that allows identifying the most physically developed representatives of this generation. The purpose is to determine the compliance of the level of physical fitness of modern students of non-physical education specialties with the norms of the GTO complex and to identify on this basis ways to improve the physical training of students in the conditions of a modern university. The materials for the study were theoretical and empirical data obtained on the basis of the use of methods of theoretical analysis of special literature and advanced pedagogical experience, pedagogical experiment, control tests, mathematical and statistical methods of data processing and analysis. When studying the advanced pedagogical experience accumulated in our country since the introduction of the revived GTO standards in 2014, reflected in relevant publications, as well as scientific and methodological literature in the field of physical education, it was shown that at present the level of physical fitness of students does not always meet the requirements of GTO standards. This fact was proved experimentally, by conducting control tests on the basis of the Mari State University. This indicated that additional work is required on the physical training of students, and those indicators of physical fitness of students that require the greatest development were also identified. Methodological recommendations were formulated to improve the physical training of students in accordance with the standards of the GTO. The expediency of using the GTO standards as a system for assessing students’ physical fitness was theoretically justified and empirically proved.


Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


2021 ◽  
Vol 14 (1) ◽  
pp. 104-114
Author(s):  
Edina-Tímea OPRIȘ ◽  
Éva BÁLINT-SVELLA ◽  
Iuliana ZSOLDOS-MARCHIȘ

Abstract. Gamification is a rather new method in education and unfortunately is not a widely known method among Hungarian primary school teachers in Romania. This paper presents the knowledge and opinion of pre-service preschool and primary school teachers about gamification and its use in education. In this study 81 Primary and Preschool Pedagogy students from BabeșBolyai University were participated, 80 of them were female and 1 male. 40 students are in first year and 41 in second year of their studies. The research was carried out during February-March 2020 at Babeș-Bolyai University, Romania. To get to know their point of view and knowledge about gamification, an online questionnaire was developed by the authors. The obtained data was quantitatively (closed questions) and qualitatively (open questions) analyzed. According to the results, half of the students think that there is no difference between gamification and game-based learning and for three quarter it is difficult to see the differences. This is surprising as students were taught about gamification before filling in the questionnaire. Students perceive a high level of utility of gamification in education. The most frequently mentioned benefits by the participants are that gamification motivates and actively involves students. Even if participants gave many advantages of integrating gamification in education, the biggest disadvantage is related with the time necessary for preparation of a gamified lesson and for the time-allocation during the lesson. As obstacles of using gamification, they mentioned the negative attitude or/and lack of methodological knowledge of some teachers and the constrains of the curriculum. Most of the preservice teachers prefer both paper-pencil based and technology-aid gamification. They consider solving exercises the most suitable for gamification.


Product evaluations are precious for upcoming clients in supporting them make choices. To this, numerous mining techniques have been proposed, wherein judging a evaluation sentence’s orientation (e.g. Outstanding or bad) is considered as one of their key worrying conditions. Lately, deep studying has emerged as a powerful technique for fixing sentiment kind issues. A neural network intrinsically learns useful instance routinely without human efforts. But, the fulfilment of deep getting to know pretty is primarily based totally on the supply of big-scale education data. We recommend a unique deep studying framework for product review sentiment classification which employs prevalently to be had rankings as susceptible supervision signs and symptoms. The framework consists of steps: (1) studying a high level representation (an embedding region) which captures the general sentiment distribution of sentences thru score facts; (2) such as a class layer-on top of the embedding layer and use labelled sentences for supervised fine-tuning. We discover styles of low stage community structure for modelling evaluation sentences, specifically, convolution function extractors and prolonged brieftime period memory. To have a take a look at the proposed framework, we gather a data set containing 1.1M weakly classified evaluate sentences and eleven, 754 labelled review sentences from Amazon. Experimental effects display the efficacy of the proposed framework and its superiority over baselines. In this future work todetect false reviews given by robots or by malicious people by taking amount, sometimessome companies may hire people to boost their product ranking higher by assigning fake rating and this malicious people or robots give continuous ranking or review to such product and we can detect such fake rating by analysingratingandremove suchfake rating to give only genuine reviews to users.


Sign in / Sign up

Export Citation Format

Share Document