scholarly journals A Reinforcement Learning Approach to Understanding Procrastination: Does Inaccurate Value Approximation Cause Irrational Postponing of a Task?

2021 ◽  
Vol 15 ◽  
Author(s):  
Zheyu Feng ◽  
Asako Mitsuto Nagase ◽  
Kenji Morita

Procrastination is the voluntary but irrational postponing of a task despite being aware that the delay can lead to worse consequences. It has been extensively studied in psychological field, from contributing factors, to theoretical models. From value-based decision making and reinforcement learning (RL) perspective, procrastination has been suggested to be caused by non-optimal choice resulting from cognitive limitations. Exactly what sort of cognitive limitations are involved, however, remains elusive. In the current study, we examined if a particular type of cognitive limitation, namely, inaccurate valuation resulting from inadequate state representation, would cause procrastination. Recent work has suggested that humans may adopt a particular type of state representation called the successor representation (SR) and that humans can learn to represent states by relatively low-dimensional features. Combining these suggestions, we assumed a dimension-reduced version of SR. We modeled a series of behaviors of a “student” doing assignments during the school term, when putting off doing the assignments (i.e., procrastination) is not allowed, and during the vacation, when whether to procrastinate or not can be freely chosen. We assumed that the “student” had acquired a rigid reduced SR of each state, corresponding to each step in completing an assignment, under the policy without procrastination. The “student” learned the approximated value of each state which was computed as a linear function of features of the states in the rigid reduced SR, through temporal-difference (TD) learning. During the vacation, the “student” made decisions at each time-step whether to procrastinate based on these approximated values. Simulation results showed that the reduced SR-based RL model generated procrastination behavior, which worsened across episodes. According to the values approximated by the “student,” to procrastinate was the better choice, whereas not to procrastinate was mostly better according to the true values. Thus, the current model generated procrastination behavior caused by inaccurate value approximation, which resulted from the adoption of the reduced SR as state representation. These findings indicate that the reduced SR, or more generally, the dimension reduction in state representation, can be a potential form of cognitive limitation that leads to procrastination.

PLoS ONE ◽  
2021 ◽  
Vol 16 (6) ◽  
pp. e0253241
Author(s):  
Hyun Dong Lee ◽  
Seongmin Lee ◽  
U. Kang

How can we effectively regularize BERT? Although BERT proves its effectiveness in various NLP tasks, it often overfits when there are only a small number of training instances. A promising direction to regularize BERT is based on pruning its attention heads with a proxy score for head importance. However, these methods are usually suboptimal since they resort to arbitrarily determined numbers of attention heads to be pruned and do not directly aim for the performance enhancement. In order to overcome such a limitation, we propose AUBER, an automated BERT regularization method, that leverages reinforcement learning to automatically prune the proper attention heads from BERT. We also minimize the model complexity and the action search space by proposing a low-dimensional state representation and dually-greedy approach for training. Experimental results show that AUBER outperforms existing pruning methods by achieving up to 9.58% better performance. In addition, the ablation study demonstrates the effectiveness of design choices for AUBER.


Author(s):  
Ritesh Noothigattu ◽  
Djallel Bouneffouf ◽  
Nicholas Mattei ◽  
Rachita Chandra ◽  
Piyush Madan ◽  
...  

Autonomous cyber-physical agents play an increasingly large role in our lives. To ensure that they behave in ways aligned with the values of society, we must develop techniques that allow these agents to not only maximize their reward in an environment, but also to learn and follow the implicit constraints of society. We detail a novel approach that uses inverse reinforcement learning to learn a set of unspecified constraints from demonstrations and reinforcement learning to learn to maximize environmental rewards. A contextual bandit-based orchestrator then picks between the two policies: constraint-based and environment reward-based. The contextual bandit orchestrator allows the agent to mix policies in novel ways, taking the best actions from either a reward-maximizing or constrained policy. In addition, the orchestrator is transparent on which policy is being employed at each time step. We test our algorithms using Pac-Man and show that the agent is able to learn to act optimally, act within the demonstrated constraints, and mix these two functions in complex ways.


Author(s):  
Nicolo Botteghi ◽  
Ruben Obbink ◽  
Daan Geijs ◽  
Mannes Poel ◽  
Beril Sirmacek ◽  
...  

2006 ◽  
Vol 17 (11) ◽  
pp. 1527-1549 ◽  
Author(s):  
J. N. CORCORAN ◽  
U. SCHNEIDER ◽  
H.-B. SCHÜTTLER

We describe a new application of an existing perfect sampling technique of Corcoran and Tweedie to estimate the self energy of an interacting Fermion model via Monte Carlo summation. Simulations suggest that the algorithm in this context converges extremely rapidly and results compare favorably to true values obtained by brute force computations for low dimensional toy problems. A variant of the perfect sampling scheme which improves the accuracy of the Monte Carlo sum for small samples is also given.


2019 ◽  
Vol 12 (2) ◽  
pp. 357-374
Author(s):  
Sanaa Ashour

Purpose Theoretical models of attrition have failed to address the interwoven factors from the perspective of undergraduate students that influence their decision to drop out. The purpose of this paper is to unravel these complexities using a qualitative phenomenological approach to gain systematic descriptions of the experience of non-completion. Design/methodology/approach Tinto’s (2004) and Bean and Metzner’s (1985) models serve as the theoretical construct for the study’s design and analysis. In-depth interviews were conducted with 41 students who discontinued studies at universities in the United Arab Emirates, to understand the situations that led them to drop out of university and how they experienced this event in their lives. Findings Several issues were identified as contributing factors for dropping out that are consistent with those found in the international literature. Additional issues were more gender or culture specific and, to some extent, represented the differences that signal a social development that is in a transitional stage. The findings revealed that institutional factors, poor pre-college preparation, environmental factors (work-education conflict), early marriage responsibilities, well-paid job opportunities and financial concerns were most influential. Research limitations/implications Despite the limitations of relying on a small sample to generalize findings, the rich detail of this inductive study has added to the understanding of the dropout phenomenon in a new context. Practical implications The paper recommends both remedial and early intervention strategies to be undertaken by the Ministry of Education and universities. Remedial strategies include re-examining the desired standard of English as a condition for admission and adjusting the grading system. Early intervention measures that accommodate the needs of at-risk students are also proposed. At local, regional and international levels, higher education should be freed from commodification and inflated fees. Originality/value The paper presents a significant departure from the largely North American and European literature on the university dropout, by offering a broader knowledge of this phenomenon in another regional and national context.


2020 ◽  
Vol 34 (05) ◽  
pp. 9410-9417
Author(s):  
Min Yang ◽  
Chengming Li ◽  
Fei Sun ◽  
Zhou Zhao ◽  
Ying Shen ◽  
...  

Real-time event summarization is an essential task in natural language processing and information retrieval areas. Despite the progress of previous work, generating relevant, non-redundant, and timely event summaries remains challenging in practice. In this paper, we propose a Deep Reinforcement learning framework for real-time Event Summarization (DRES), which shows promising performance for resolving all three challenges (i.e., relevance, non-redundancy, timeliness) in a unified framework. Specifically, we (i) devise a hierarchical cross-attention network with intra- and inter-document attentions to integrate important semantic features within and between the query and input document for better text matching. In addition, relevance prediction is leveraged as an auxiliary task to strengthen the document modeling and help to extract relevant documents; (ii) propose a multi-topic dynamic memory network to capture the sequential patterns of different topics belonging to the event of interest and temporally memorize the input facts from the evolving document stream, avoiding extracting redundant information at each time step; (iii) consider both historical dependencies and future uncertainty of the document stream for generating relevant and timely summaries by exploiting the reinforcement learning technique. Experimental results on two real-world datasets have demonstrated the advantages of DRES model with significant improvement in generating relevant, non-redundant, and timely event summaries against the state-of-the-arts.


Algorithms ◽  
2020 ◽  
Vol 13 (11) ◽  
pp. 307
Author(s):  
Luca Pasqualini ◽  
Maurizio Parton

A Pseudo-Random Number Generator (PRNG) is any algorithm generating a sequence of numbers approximating properties of random numbers. These numbers are widely employed in mid-level cryptography and in software applications. Test suites are used to evaluate the quality of PRNGs by checking statistical properties of the generated sequences. These sequences are commonly represented bit by bit. This paper proposes a Reinforcement Learning (RL) approach to the task of generating PRNGs from scratch by learning a policy to solve a partially observable Markov Decision Process (MDP), where the full state is the period of the generated sequence, and the observation at each time-step is the last sequence of bits appended to such states. We use Long-Short Term Memory (LSTM) architecture to model the temporal relationship between observations at different time-steps by tasking the LSTM memory with the extraction of significant features of the hidden portion of the MDP’s states. We show that modeling a PRNG with a partially observable MDP and an LSTM architecture largely improves the results of the fully observable feedforward RL approach introduced in previous work.


Sign in / Sign up

Export Citation Format

Share Document