accumulated reward Latest Research Papers

It is an essential capability of indoor mobile robots to avoid various kinds of obstacles. Recently, multimodal deep reinforcement learning (DRL) methods have demonstrated great capability for learning control policies in robotics by using different sensors. However, due to the complexity of indoor environment and the heterogeneity of different sensor modalities, it remains an open challenge to obtain reliable and robust multimodal information for obstacle avoidance. In this work, we propose a novel multimodal DRL method with auxiliary task (MDRLAT) for obstacle avoidance of indoor mobile robot. In MDRLAT, a powerful bilinear fusion module is proposed to fully capture the complementary information from two-dimensional (2D) laser range findings and depth images, and the generated multimodal representation is subsequently fed into dueling double deep Q-network to output control commands for mobile robot. In addition, an auxiliary task of velocity estimation is introduced to further improve representation learning in DRL. Experimental results show that MDRLAT achieves remarkable performance in terms of average accumulated reward, convergence speed, and success rate. Moreover, experiments in both virtual and real-world testing environments further demonstrate the outstanding generalization capability of our method.

Download Full-text

Optimizing preventive maintenance over a finite planning horizon in a semi-Markov framework

IMA Journal of Management Mathematics ◽

10.1093/imaman/dpaa026 ◽

2020 ◽

Author(s):

Antonio Sánchez Herguedas ◽

Adolfo Crespo Márquez ◽

Francisco Rodrigo Muñoz

Keyword(s):

Preventive Maintenance ◽

Recurrence Relations ◽

Direct Calculation ◽

Planning Horizon ◽

Practical Implementation ◽

Practical Application ◽

Mathematical Solution ◽

Accumulated Reward ◽

Finite Planning Horizon

Abstract This paper describes the optimization of preventive maintenance (PM) over a finite planning horizon in a semi-Markov framework. In this framework, the asset may be operating, and providing income for the asset owner, or not operating and undergoing PM, or not operating and undergoing corrective maintenance following failure. PM is triggered when the asset has been operating for τ time units. A number m of transitions specifies the finite horizon. This system is described with a set of recurrence relations, and their z-transform is used to determine the value of τ that maximizes the average accumulated reward over the horizon. We study under what conditions a solution can be found, and for those specific cases the solution τ* is calculated. Despite the complexity of the mathematical solution, the result obtained allows the analyst to provide a quick and easy-to-use tool for practical application in many real-world cases. To demonstrate this, the method has been implemented for a case study, and its accuracy and practical implementation were tested using Monte Carlo simulation and direct calculation.

Download Full-text

greenMAC Protocol: A Q-Learning-Based Mechanism to Enhance Channel Reliability for WLAN Energy Savings

Electronics ◽

10.3390/electronics9101720 ◽

2020 ◽

Vol 9 (10) ◽

pp. 1720

Author(s):

Rashid Ali ◽

Muhammad Sohail ◽

Alaa Omran Almagrabi ◽

Arslan Musaddiq ◽

Byung-Seo Kim

Keyword(s):

Energy Saving ◽

Communication Networks ◽

Energy Savings ◽

Mac Protocol ◽

Collision Probability ◽

System Throughput ◽

Additional Energy ◽

Q Learning ◽

Save Energy ◽

Accumulated Reward

We have seen a promising acceptance of wireless local area networks (WLANs) in our day-to-day communication devices, such as handheld smartphones, tablets, and laptops. Energy preservation plays a vital role in WLAN communication networks. The efficient use of energy remains one of the most substantial challenges to WLAN devices. Several approaches have been proposed by the industrial and institutional researchers to save energy and reduce the overall power consumption of WLAN devices focusing on static/adaptive energy saving methods. However, most of the approaches save energy at the cost of throughput degradation due to either increased sleep-time or reduced number of transmissions. In this paper, we recognize the potentials of reinforcement learning (RL) techniques, such as the Q-learning (QL) model, to enhance the WLAN’s channel reliability for energy saving. QL is one of the RL techniques, which utilizes the accumulated reward of the actions performed in the state-action model. We propose a QL-based energy-saving MAC protocol, named greenMAC protocol. The proposed greenMAC protocol reduces the energy consumption by utilizing accumulated reward value to optimize the channel reliability, which results in reduced channel collision probability of the network. We assess the degrees of channel congestion in collision probability as a reward function for our QL-based greenMAC protocol. The comparative results show that greenMAC protocol achieves enhanced system throughput performance with additional energy savings compared to existing energy-saving mechanisms in WLANs.

Download Full-text

Mobile Robot Wall-Following Control Using Fuzzy Logic Controller with Improved Differential Search and Reinforcement Learning

Mathematics ◽

10.3390/math8081254 ◽

2020 ◽

Vol 8 (8) ◽

pp. 1254 ◽

Cited By ~ 1

Author(s):

Cheng-Hung Chen ◽

Shiou-Yun Jeng ◽

Cheng-Jian Lin

Keyword(s):

Fuzzy Logic ◽

Reinforcement Learning ◽

Mobile Robot ◽

Fuzzy Logic Controller ◽

Search Algorithm ◽

Experimental Results ◽

Average Error ◽

Stopover Site ◽

Accumulated Reward ◽

Wall Following

In this study, a fuzzy logic controller with the reinforcement improved differential search algorithm (FLC_R-IDS) is proposed for solving a mobile robot wall-following control problem. This study uses the reward and punishment mechanisms of reinforcement learning to train the mobile robot wall-following control. The proposed improved differential search algorithm uses parameter adaptation to adjust the control parameters. To improve the exploration of the algorithm, a change in the number of superorganisms is required as it involves a stopover site. This study uses reinforcement learning to guide the behavior of the robot. When the mobile robot satisfies three reward conditions, it gets reward +1. The accumulated reward value is used to evaluate the controller and to replace the next controller training. Experimental results show that, compared with the traditional differential search algorithm and the chaos differential search algorithm, the average error value of the proposed FLC_R-IDS in the three experimental environments is reduced by 12.44%, 22.54% and 25.98%, respectively. Final, the experimental results also show that the real mobile robot using the proposed method can effectively implement the wall-following control.

Download Full-text

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i04.5857 ◽

2020 ◽

Vol 34 (04) ◽

pp. 4328-4336

Author(s):

Vishal Jain ◽

William Fedus ◽

Hugo Larochelle ◽

Doina Precup ◽

Marc G. Bellemare

Keyword(s):

Reinforcement Learning ◽

Structural Characteristics ◽

Learning Algorithms ◽

Learning Problem ◽

Interactive Fiction ◽

Reward Function ◽

Learning Agent ◽

Accumulated Reward ◽

Partially Observable ◽

Action Spaces

Text-based games are a natural challenge domain for deep reinforcement learning algorithms. Their state and action spaces are combinatorially large, their reward function is sparse, and they are partially observable: the agent is informed of the consequences of its actions through textual feedback. In this paper we emphasize this latter point and consider the design of a deep reinforcement learning agent that can play from feedback alone. Our design recognizes and takes advantage of the structural characteristics of text-based games. We first propose a contextualisation mechanism, based on accumulated reward, which simplifies the learning problem and mitigates partial observability. We then study different methods that rely on the notion that most actions are ineffectual in any given situation, following Zahavy et al.'s idea of an admissible action. We evaluate these techniques in a series of text-based games of increasing difficulty based on the TextWorld framework, as well as the iconic game Zork. Empirically, we find that these techniques improve the performance of a baseline deep reinforcement learning agent applied to text-based games.

Download Full-text

Moments of accumulated reward and completion time in Markovian models with application to unreliable manufacturing systems

Performance Evaluation ◽

10.1016/j.peva.2014.02.005 ◽

2014 ◽

Vol 75-76 ◽

pp. 69-88 ◽

Cited By ~ 16

Author(s):

A. Angius ◽

A. Horváth ◽

M. Colledani

Keyword(s):

Completion Time ◽

Manufacturing Systems ◽

Markovian Models ◽

Accumulated Reward

Download Full-text

What is value—accumulated reward or evidence?

Frontiers in Neurorobotics ◽

10.3389/fnbot.2012.00011 ◽

2012 ◽

Vol 6 ◽

Cited By ~ 23

Author(s):

Karl Friston ◽

Rick Adams ◽

Read Montague

Keyword(s):

Accumulated Reward ◽

Is Value

Download Full-text

Alternatives for Making Language Learning Games More Appealing for Self-access Learning

Studies in Self-Access Learning Journal ◽

10.37237/020304 ◽

2011 ◽

pp. 136-152

Author(s):

Charatdao Intratat

Keyword(s):

Language Learning ◽

Undergraduate Students ◽

Computer Games ◽

Planning Strategy ◽

Learning Games ◽

University Of Technology ◽

Games For Learning ◽

Accumulated Reward ◽

Insight Into ◽

Learning English

This investigation of popular computer games in comparison with language learning games was designed to offer an insight into the potential of games to the field of self-access. The study surveyed and analyzed common characteristics of popular computer games and then compared them with characteristics of several language learning games. It also investigated the participants’ recommended characteristics of computer games for learning English. The data were collected from undergraduate students at King Mongkut’s University of Technology Thonburi, Bangkok, Thailand. The results showed that the most conducive characteristics for attractive language learning games included animation, variety, planning strategy, virtual background, challenging action and accumulated reward.

Download Full-text

Energy control in dependable wireless sensor networks: a modelling perspective

Proceedings of the Institution of Mechanical Engineers Part O Journal of Risk and Reliability ◽

10.1177/1748006x10397845 ◽

2011 ◽

Vol 225 (4) ◽

pp. 424-434 ◽

Cited By ~ 6

Author(s):

D Bruneo ◽

A Puliafito ◽

M Scarpa

Keyword(s):

Wireless Sensor Networks ◽

Sensor Networks ◽

Stochastic Petri Nets ◽

Sensor Nodes ◽

Wireless Sensor ◽

Equivalent Model ◽

Star Tree ◽

Network Topologies ◽

Markov Reward ◽

Accumulated Reward

Wireless sensor networks (WSN) are composed of a large number of tiny sensor nodes randomly distributed over a geographical region. In order to reduce power consumption, battery-operated sensors undergo cycles of sleeping–active periods that reduce their ability to send/receive data. Starting from the Markov reward model theory, this paper presents a dependability model to analyse the reliability of a sensor node. Also, a new dependability parameter is introduced, referred to as producibility, which is able to capture the capability of a sensor to accomplish its mission. Two different model solution techniques are proposed, one based on the evaluation of the accumulated reward distribution and the other based on an equivalent model based on non-Markovian stochastic Petri nets. The obtained results are used to investigate the dependability of a whole WSN taking into account the presence of redundant nodes. Topological aspects are taken into account, providing a quantitative comparison among three typical network topologies: star, tree, and mesh. Numerical results are provided in order to highlight the advantages of the proposed technique and to demonstrate the equivalence of the proposed approaches.

Download Full-text

Measuring and Modeling the Interaction Among Reward Size, Delay to Reward, and Satiation Level on Motivation in Monkeys

Journal of Neurophysiology ◽

10.1152/jn.90959.2008 ◽

2009 ◽

Vol 101 (1) ◽

pp. 437-447 ◽

Cited By ~ 38

Author(s):

Takafumi Minamimoto ◽

Giancarlo La Camera ◽

Barry J. Richmond

Keyword(s):

Reaction Times ◽

Color Discrimination ◽

External Factors ◽

Internal Factors ◽

Subjective Value ◽

Instrumental Task ◽

The Subject ◽

Accumulated Reward ◽

Choice Tasks ◽

Quantitative Account

Motivation is usually inferred from the likelihood or the intensity with which behavior is carried out. It is sensitive to external factors (e.g., the identity, amount, and timing of a rewarding outcome) and internal factors (e.g., hunger or thirst). We trained macaque monkeys to perform a nonchoice instrumental task (a sequential red-green color discrimination) while manipulating two external factors: reward size and delay-to-reward. We also inferred the state of one internal factor, level of satiation, by monitoring the accumulated reward. A visual cue indicated the forthcoming reward size and delay-to-reward in each trial. The fraction of trials completed correctly by the monkeys increased linearly with reward size and was hyperbolically discounted by delay-to-reward duration, relations that are similar to those found in free operant and choice tasks. The fraction of correct trials also decreased progressively as a function of the satiation level. Similar (albeit noiser) relations were obtained for reaction times. The combined effect of reward size, delay-to-reward, and satiation level on the proportion of correct trials is well described as a multiplication of the effects of the single factors when each factor is examined alone. These results provide a quantitative account of the interaction of external and internal factors on instrumental behavior, and allow us to extend the concept of subjective value of a rewarding outcome, usually confined to external factors, to account also for slow changes in the internal drive of the subject.

Download Full-text

accumulated reward
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multimodal Deep Reinforcement Learning with Auxiliary Task for Obstacle Avoidance of Indoor Mobile Robot

Optimizing preventive maintenance over a finite planning horizon in a semi-Markov framework

greenMAC Protocol: A Q-Learning-Based Mechanism to Enhance Channel Reliability for WLAN Energy Savings

Mobile Robot Wall-Following Control Using Fuzzy Logic Controller with Improved Differential Search and Reinforcement Learning

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Moments of accumulated reward and completion time in Markovian models with application to unreliable manufacturing systems

What is value—accumulated reward or evidence?

Alternatives for Making Language Learning Games More Appealing for Self-access Learning

Energy control in dependable wireless sensor networks: a modelling perspective

Measuring and Modeling the Interaction Among Reward Size, Delay to Reward, and Satiation Level on Motivation in Monkeys

Export Citation Format

accumulated rewardRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Multimodal Deep Reinforcement Learning with Auxiliary Task for Obstacle Avoidance of Indoor Mobile Robot

Optimizing preventive maintenance over a finite planning horizon in a semi-Markov framework

greenMAC Protocol: A Q-Learning-Based Mechanism to Enhance Channel Reliability for WLAN Energy Savings

Mobile Robot Wall-Following Control Using Fuzzy Logic Controller with Improved Differential Search and Reinforcement Learning

Algorithmic Improvements for Deep Reinforcement Learning Applied to Interactive Fiction

Moments of accumulated reward and completion time in Markovian models with application to unreliable manufacturing systems

What is value—accumulated reward or evidence?

Alternatives for Making Language Learning Games More Appealing for Self-access Learning

Energy control in dependable wireless sensor networks: a modelling perspective

Measuring and Modeling the Interaction Among Reward Size, Delay to Reward, and Satiation Level on Motivation in Monkeys

accumulated reward
Recently Published Documents