scholarly journals Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

2020 ◽  
Vol 34 (04) ◽  
pp. 5826-5833
Author(s):  
Yuhang Song ◽  
Jianyi Wang ◽  
Thomas Lukasiewicz ◽  
Zhenghua Xu ◽  
Shangtong Zhang ◽  
...  

Intrinsic rewards were introduced to simulate how human intelligence works; they are usually evaluated by intrinsically-motivated play, i.e., playing games without extrinsic rewards but evaluated with extrinsic rewards. However, none of the existing intrinsic reward approaches can achieve human-level performance under this very challenging setting of intrinsically-motivated play. In this work, we propose a novel megalomania-driven intrinsic reward (called mega-reward), which, to our knowledge, is the first approach that achieves human-level performance in intrinsically-motivated play. Intuitively, mega-reward comes from the observation that infants' intelligence develops when they try to gain more control on entities in an environment; therefore, mega-reward aims to maximize the control capabilities of agents on given entities in a given environment. To formalize mega-reward, a relational transition model is proposed to bridge the gaps between direct and latent control. Experimental studies show that mega-reward (i) can greatly outperform all state-of-the-art intrinsic reward approaches, (ii) generally achieves the same level of performance as Ex-PPO and professional human-level scores, and (iii) has also a superior performance when it is incorporated with extrinsic rewards.

Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


2016 ◽  
Author(s):  
Muhammad Yousefnezhad ◽  
Daoqiang Zhang

AbstractMultivariate Pattern (MVP) classification can map different cognitive states to the brain tasks. One of the main challenges in MVP analysis is validating the generated results across subjects. However, analyzing multi-subject fMRI data requires accurate functional alignments between neuronal activities of different subjects, which can rapidly increase the performance and robustness of the final results. Hyperalignment (HA) is one of the most effective functional alignment methods, which can be mathematically formulated by the Canonical Correlation Analysis (CCA) methods. Since HA mostly uses the unsupervised CCA techniques, its solution may not be optimized for MVP analysis. By incorporating the idea of Local Discriminant Analysis (LDA) into CCA, this paper proposes Local Discriminant Hyperalignment (LDHA) as a novel supervised HA method, which can provide better functional alignment for MVP analysis. Indeed, the locality is defined based on the stimuli categories in the train-set, where the correlation between all stimuli in the same category will be maximized and the correlation between distinct categories of stimuli approaches to near zero. Experimental studies on multi-subject MVP analysis confirm that the LDHA method achieves superior performance to other state-of-the-art HA algorithms.


2009 ◽  
Vol 13 (3) ◽  
pp. 26
Author(s):  
Dione Fagundes Nunes Gomes ◽  
Maria Cristina Sanches Amorim

There are many theories about leadership that converge on the importance of motivation as an attribution to the leader. There are two models of motivation: intrinsic and extrinsic. Although both of them deal with rewards, their execution is not the same. The aim of this study is to analyze the limits and possibilities of the reward programs used by the leadership team within organizations. According to some non-behaviorist’s authors, the extrinsic rewards are translated as manipulating ways or bribes, in order to favor those who hold the power. To some behaviorists, the organization of extrinsic and intrinsic reward systems is the method of choice to motivate, which should be used by the leader. Our thoughts point to the possibilities of a balanced and planned use of the extrinsic and intrinsic rewards, taking into consideration the context, goals and the duration of the group. We used as methodology the study of very prolific authors within the business administration courses, focusing on placing our contribution for the critical reading of this public.


2019 ◽  
Author(s):  
Benjamin Chew ◽  
Bastien Blain ◽  
Raymond J Dolan ◽  
Robb B Rutledge

SUMMARYStandard economic indicators provide an incomplete picture of what we value both as individuals and as a society. Furthermore, canonical macroeconomic measures, such as GDP, do not account for non-market activities (e.g., cooking, childcare) that nevertheless impact well-being. Here, we introduce a computational tool that measures the subjective reward value of experiences (e.g., playing a musical instrument without errors). We go on to validate this tool with neural data, using fMRI to measure neural activity in subjects performing a reinforcement learning task that incorporated periodic ratings of subjective affective state. Learning performance determined level of payment (i.e., extrinsic reward). Crucially, the task also incorporated a skilled performance component (i.e., intrinsic reward) which did not influence payment. Both extrinsic and intrinsic rewards influenced affective dynamics, and their relative influence could be captured in our computational model. Individuals for whom intrinsic rewards had a greater influence on affective state than extrinsic rewards had greater ventromedial prefrontal cortex (vmPFC) activity for intrinsic than extrinsic rewards. Thus, we show that computational modelling of affective dynamics can index the subjective value of intrinsic relative to extrinsic rewards, a ‘computational hedonometer’ that reflects both behavior and neural activity that quantifies the subjective reward value of experience.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0243744
Author(s):  
Jeeyoon Kim ◽  
Younghan Lee ◽  
Mi-Lyang Kim

This study posits that Fear of Missing Out (FOMO) can function as an extrinsic motive stimulating sport event consumption by inducing consumers to overcome leisure constraints. Also, FOMO-driven consumption is proposed to affect consumption experience for being grounded on extrinsic than intrinsic rewards. In Study 1, the moderation of FOMO between intrapersonal and structural constraints and sport media viewing intention are tested. In Study 2, the relations among FOMO-driven consumption, intrinsic rewards (i.e., enjoyment), extrinsic rewards (i.e., social adherence), and consumer satisfaction are assessed. Study 1 results support the notion that FOMO can boost sport media viewing intention through two mechanisms: by directly stimulating intention and by lifting the negative effect of constraints on intention. In Study 2, FOMO-driven consumption shows a stronger link to extrinsic than intrinsic rewards, extrinsic reward is marginally but negatively associated with intrinsic reward, and intrinsic reward is a stronger predictor of satisfaction. Overall, FOMO is identified as a meaningful extrinsic motive for sport event consumption though its effects on consumer satisfaction are arguable. Implications for FOMO-driven marketing are discussed.


2020 ◽  
Vol 3 (1) ◽  

The aim of this study is to investigate the relationship between extrinsic and intrinsic reward on retention among Gen Y employees in Malaysian manufacturing companies. The data was collected from 113 respondents worked in manufacturing companies located in Seri Kembangan, Selangor using questionnaires. Multiple regression analysis was conducted to test the hypotheses. The results showed both extrinsic and intrinsic reward are the factors influencing retaining Gen Y in manufacturing companies. The discussion on the analysis, limitation of the study, recommendation for future research and conclusion were discussed at the end of this study. In a nutshell, it was proven extrinsic reward and intrinsic reward has contributed to the retention of Gen Y employees.


Materials ◽  
2021 ◽  
Vol 14 (15) ◽  
pp. 4171
Author(s):  
Rabia Ikram ◽  
Badrul Mohamed Jan ◽  
Akhmal Sidek ◽  
George Kenanakis

An important aspect of hydrocarbon drilling is the usage of drilling fluids, which remove drill cuttings and stabilize the wellbore to provide better filtration. To stabilize these properties, several additives are used in drilling fluids that provide satisfactory rheological and filtration properties. However, commonly used additives are environmentally hazardous; when drilling fluids are disposed after drilling operations, they are discarded with the drill cuttings and additives into water sources and causes unwanted pollution. Therefore, these additives should be substituted with additives that are environmental friendly and provide superior performance. In this regard, biodegradable additives are required for future research. This review investigates the role of various bio-wastes as potential additives to be used in water-based drilling fluids. Furthermore, utilization of these waste-derived nanomaterials is summarized for rheology and lubricity tests. Finally, sufficient rheological and filtration examinations were carried out on water-based drilling fluids to evaluate the effect of wastes as additives on the performance of drilling fluids.


Author(s):  
Marius Wolf ◽  
Sergey Solovyev ◽  
Fatemi Arshia

In this paper, analytical equations for the central film thickness in slender elliptic contacts are investigated. A comparison of state-of-the-art formulas with simulation results of a multilevel elastohydrodynamic lubrication solver is conducted and shows considerable deviation. Therefore, a new film thickness formula for slender elliptic contacts with variable ellipticity is derived. It incorporates asymptotic solutions, which results in validity over a large parameter domain. It captures the behaviour of increasing film thickness with increasing load for specific very slender contacts. The new formula proves to be significantly more accurate than current equations. Experimental studies and discussions on minimum film thickness will be presented in a subsequent publication.


2022 ◽  
pp. 1-12
Author(s):  
Shuailong Li ◽  
Wei Zhang ◽  
Huiwen Zhang ◽  
Xin Zhang ◽  
Yuquan Leng

Model-free reinforcement learning methods have successfully been applied to practical applications such as decision-making problems in Atari games. However, these methods have inherent shortcomings, such as a high variance and low sample efficiency. To improve the policy performance and sample efficiency of model-free reinforcement learning, we propose proximal policy optimization with model-based methods (PPOMM), a fusion method of both model-based and model-free reinforcement learning. PPOMM not only considers the information of past experience but also the prediction information of the future state. PPOMM adds the information of the next state to the objective function of the proximal policy optimization (PPO) algorithm through a model-based method. This method uses two components to optimize the policy: the error of PPO and the error of model-based reinforcement learning. We use the latter to optimize a latent transition model and predict the information of the next state. For most games, this method outperforms the state-of-the-art PPO algorithm when we evaluate across 49 Atari games in the Arcade Learning Environment (ALE). The experimental results show that PPOMM performs better or the same as the original algorithm in 33 games.


2016 ◽  
Vol 42 (1) ◽  
Author(s):  
Michelle Renard ◽  
Robin J. Snelgar

Orientation: Intrinsic rewards are personal, psychological responses to the work thatemployees perform, which stem from the manner in which their work is designed.Research purpose: This study sought to discover in what ways non-profit employees arepsychologically rewarded by the nature of their work tasks. The use of a qualitative approachto data collection and analysis ensured that in-depth responses from participants were gained.Motivation for the study: Intrinsic rewards are of particular importance to non-profitemployees, who tend to earn below-market salaries. This implies that their motivationoriginates predominantly from intrinsic as opposed to extrinsic rewards; yet, research into thisarea of rewards is lacking.Research approach, design and method: In-depth, semi-structured interviews were conductedusing a sample of 15 extrinsically rewarded non-profit employees working within South Africa.Thematic analysis was utilised in order to generate codes which led to the formation of fiveintrinsic rewards categories.Main findings: Intrinsic rewards were classified into five categories, namely (1) MeaningfulWork, (2) Flexible Work, (3) Challenging Work, (4) Varied Work and (5) Enjoyable Work.These rewards each comprise of various subcategories, which provide insight into why suchwork is rewarding to non-profit employees.Practical/managerial implications: Traditional performance management systems shouldbe re-evaluated in the non-profit sector to shift focus towards intrinsic rewards, asopposed to focusing only on the use of extrinsic rewards such as incentives to motivateemployees.Contribution/value-add: The study provides a qualitative understanding of how extrinsicallyrewarded non-profit employees perceive their work to be intrinsically rewarding, whichbridges the empirical gap pertaining to intrinsic rewards within this sector.


Sign in / Sign up

Export Citation Format

Share Document