Q-Learning based Routing Protocol to Enhance Network Lifetime in WSNs

In resource constraint Wireless Sensor Networks (WSNs), enhancement of network lifetime has been one of the significantly challenging issues for the researchers. Researchers have been exploiting machine learning techniques, in particular reinforcement learning, to achieve efficient solutions in the domain of WSN. The objective of this paper is to apply Q-learning, a reinforcement learning technique, to enhance the lifetime of the network, by developing distributed routing protocols. Q-learning is an attractive choice for routing due to its low computational requirements and additional memory demands. To facilitate an agent running at each node to take an optimal action, the approach considers node’s residual energy, hop length to sink and transmission power. The parameters, residual energy and hop length, are used to calculate the Q-value, which in turn is used to decide the optimal next-hop for routing. The proposed protocols’ performance is evaluated through NS3 simulations, and compared with AODV protocol in terms of network lifetime, throughput and end-to-end delay.

Download Full-text

Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment

Entropy ◽

10.3390/e23060737 ◽

2021 ◽

Vol 23 (6) ◽

pp. 737

Author(s):

Fengjie Sun ◽

Xianchang Wang ◽

Rui Zhang

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Optimal Policy ◽

Feasible Solution ◽

Learning Algorithm ◽

Plant Protection ◽

Agricultural Plant ◽

Q Learning ◽

Aerial Vehicle ◽

Optimal Action

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.

Download Full-text

Cloud Load Balancing and Reinforcement Learning

Advances in Business Information Systems and Analytics - Cloud Computing Technologies for Green Enterprises ◽

10.4018/978-1-5225-3038-1.ch011 ◽

2018 ◽

pp. 266-291

Author(s):

Abdelghafour Harraz ◽

Mostapha Zbakh

Keyword(s):

Artificial Intelligence ◽

Reinforcement Learning ◽

Load Balancing ◽

Decision Process ◽

Cloud System ◽

Human Intervention ◽

Q Learning ◽

State Action ◽

Learning Techniques ◽

Markov Decision

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.

Download Full-text

Controlling a Simulated Robot Using Machine Learning Techniques

ASME 2010 World Conference on Innovative Virtual Reality ◽

10.1115/winvr2010-3705 ◽

2010 ◽

Author(s):

Jonathan Becker ◽

Aveek Purohit ◽

Zheng Sun

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Linear Regression ◽

Pid Controller ◽

Machine Learning Techniques ◽

Learning Techniques ◽

Gaming Environment ◽

Using Data

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.

Download Full-text

Using Inverse Reinforcement Learning with Real Trajectories to Get More Trustworthy Pedestrian Simulations

Mathematics ◽

10.3390/math8091479 ◽

2020 ◽

Vol 8 (9) ◽

pp. 1479

Author(s):

Francisco Martinez-Gil ◽

Miguel Lozano ◽

Ignacio García-Fernández ◽

Pau Romero ◽

Dolors Serra ◽

...

Keyword(s):

Reinforcement Learning ◽

Value Function ◽

Machine Learning Techniques ◽

Inverse Reinforcement Learning ◽

The Real ◽

Q Learning ◽

Learning Framework ◽

Entropy Principle ◽

Real Behavior ◽

Function Approximator

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.

Download Full-text

EFFECTS OF COMMUNICATION ON GROUP LEARNING RATES IN A MULTI-AGENT ENVIRONMENT

Advances in Complex Systems ◽

10.1142/s0219525903000979 ◽

2003 ◽

Vol 06 (03) ◽

pp. 405-426 ◽

Cited By ~ 1

Author(s):

PAUL DARBYSHIRE

Keyword(s):

Reinforcement Learning ◽

Cognitive Abilities ◽

Complex Adaptive System ◽

Machine Learning Techniques ◽

Simulation Techniques ◽

Learning Rates ◽

Learning Techniques ◽

Complex Adaptive ◽

Rate Of Learning ◽

Multi Agent

Distillations utilize multi-agent based modeling and simulation techniques to study warfare as a complex adaptive system at the conceptual level. The focus is placed on the interactions between the agents to facilitate study of cause and effect between individual interactions and overall system behavior. Current distillations do not utilize machine-learning techniques to model the cognitive abilities of individual combatants but employ agent control paradigms to represent agents as highly instinctual entities. For a team of agents implementing a reinforcement-learning paradigm, the rate of learning is not sufficient for agents to adapt to this hostile environment. However, by allowing the agents to communicate their respective rewards for actions performed as the simulation progresses, the rate of learning can be increased sufficiently to significantly increase the teams chances of survival. This paper presents the results of trials to measure the success of a team-based approach to the reinforcement-learning problem in a distillation, using reward communication to increase learning rates.

Download Full-text

The Use of Reinforcement Learning in Gaming The Breakout Game Case Study.pdf

10.36227/techrxiv.12061728.v1 ◽

2020 ◽

Author(s):

Ao Chen ◽

Taresh Dewan ◽

Manva Trivedi ◽

Danning Jiang ◽

Aloukik Aditya ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Comparative Analysis ◽

Policy Learning ◽

Q Value ◽

Complex Environment ◽

Q Learning ◽

Hit Rate ◽

Optimal Action ◽

Good For

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.

Download Full-text

The Use of Reinforcement Learning in Gaming The Breakout Game Case Study.pdf

10.36227/techrxiv.12061728 ◽

2020 ◽

Author(s):

Ao Chen ◽

Taresh Dewan ◽

Manva Trivedi ◽

Danning Jiang ◽

Aloukik Aditya ◽

...

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Comparative Analysis ◽

Policy Learning ◽

Q Value ◽

Complex Environment ◽

Q Learning ◽

Hit Rate ◽

Optimal Action ◽

Good For

Download Full-text

Reinforcement Learning-Enabled UAV Itinerary Planning for Remote Sensing Applications in Smart Farming

Telecom ◽

10.3390/telecom2030017 ◽

2021 ◽

Vol 2 (3) ◽

pp. 255-270

Author(s):

Saeid Pourroostaei Ardakani ◽

Ali Cheshmehzangi

Keyword(s):

Remote Sensing ◽

Reinforcement Learning ◽

Data Collection ◽

Cost Effective ◽

Environmental Data ◽

Machine Learning Techniques ◽

Q Learning ◽

Sensing Applications ◽

Learning Technique ◽

Target Locations

UAV path planning for remote sensing aims to find the best-fitted routes to complete a data collection mission. UAVs plan the routes and move through them to remotely collect environmental data from particular target zones by using sensory devices such as cameras. Route planning may utilize machine learning techniques to autonomously find/select cost-effective and/or best-fitted routes and achieve optimized results including: minimized data collection delay, reduced UAV power consumption, decreased flight traversed distance and maximized number of collected data samples. This paper utilizes a reinforcement learning technique (location and energy-aware Q-learning) to plan UAV routes for remote sensing in smart farms. Through this, the UAV avoids heuristically or blindly moving throughout a farm, but this takes the benefits of environment exploration–exploitation to explore the farm and find the shortest and most cost-effective paths into target locations with interesting data samples to collect. According to the simulation results, utilizing the Q-learning technique increases data collection robustness and reduces UAV resource consumption (e.g., power), traversed paths, and remote sensing latency as compared to two well-known benchmarks, IEMF and TBID, especially if the target locations are dense and crowded in a farm.

Download Full-text

ANALYSIS OF THE APPLICATION OF REINFORCEMENT LEARNING ALGORITHMS ON THE STARCRAFT II VIDEO GAME

Revista Destaques Acadêmicos ◽

10.22410/issn.2176-3070.v11i4a2019.2403 ◽

2020 ◽

Vol 11 (4) ◽

Author(s):

Leandro Vian ◽

Marcelo De Gomensoro Malheiros

Keyword(s):

Reinforcement Learning ◽

Video Game ◽

Cost Effective ◽

Machine Learning Techniques ◽

Specific Training ◽

Board Games ◽

Learning Techniques ◽

Technical Issues ◽

Strategy Game ◽

Real Time Strategy Game

In recent years Machine Learning techniques have become the driving force behind the worldwide emergence of Artificial Intelligence, producing cost-effective and precise tools for pattern recognition and data analysis. A particular approach for the training of neural networks, Reinforcement Learning (RL), achieved prominence creating almost unbeatable artificial opponents in board games like Chess or Go, and also on video games. This paper gives an overview of Reinforcement Learning and tests this approach against a very popular real-time strategy game, Starcraft II. Our goal is to examine the tools and algorithms readily available for RL, also addressing different scenarios where a neural network can be linked to Starcraft II to learn by itself. This work describes both the technical issues involved and the preliminary results obtained by the application of two specific training strategies, A2C and DQN.

Download Full-text

Developing End-to-End Control Policies for Robotic Swarms Using Deep Q-learning

Journal of Advanced Computational Intelligence and Intelligent Informatics ◽

10.20965/jaciii.2019.p0920 ◽

2019 ◽

Vol 23 (5) ◽

pp. 920-927 ◽

Cited By ~ 3

Author(s):

Yufei Wei ◽

Xiaotong Nie ◽

Motoaki Hiraga ◽

Kazuhiro Ohkura ◽

Zlatan Car ◽

...

Keyword(s):

Reinforcement Learning ◽

Learning Algorithm ◽

Evolutionary Robotics ◽

Control Policy ◽

Control Policies ◽

Q Learning ◽

Robotic Swarms ◽

Learning Techniques ◽

End To End ◽

Large Parameter Space

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.

Download Full-text