scholarly journals Q-Learning based Routing Protocol to Enhance Network Lifetime in WSNs

2021 ◽  
Vol 13 (2) ◽  
pp. 57-80
Author(s):  
Arunita Kundaliya ◽  
D.K. Lobiyal

In resource constraint Wireless Sensor Networks (WSNs), enhancement of network lifetime has been one of the significantly challenging issues for the researchers. Researchers have been exploiting machine learning techniques, in particular reinforcement learning, to achieve efficient solutions in the domain of WSN. The objective of this paper is to apply Q-learning, a reinforcement learning technique, to enhance the lifetime of the network, by developing distributed routing protocols. Q-learning is an attractive choice for routing due to its low computational requirements and additional memory demands. To facilitate an agent running at each node to take an optimal action, the approach considers node’s residual energy, hop length to sink and transmission power. The parameters, residual energy and hop length, are used to calculate the Q-value, which in turn is used to decide the optimal next-hop for routing. The proposed protocols’ performance is evaluated through NS3 simulations, and compared with AODV protocol in terms of network lifetime, throughput and end-to-end delay.

Entropy ◽  
2021 ◽  
Vol 23 (6) ◽  
pp. 737
Author(s):  
Fengjie Sun ◽  
Xianchang Wang ◽  
Rui Zhang

An Unmanned Aerial Vehicle (UAV) can greatly reduce manpower in the agricultural plant protection such as watering, sowing, and pesticide spraying. It is essential to develop a Decision-making Support System (DSS) for UAVs to help them choose the correct action in states according to the policy. In an unknown environment, the method of formulating rules for UAVs to help them choose actions is not applicable, and it is a feasible solution to obtain the optimal policy through reinforcement learning. However, experiments show that the existing reinforcement learning algorithms cannot get the optimal policy for a UAV in the agricultural plant protection environment. In this work we propose an improved Q-learning algorithm based on similar state matching, and we prove theoretically that there has a greater probability for UAV choosing the optimal action according to the policy learned by the algorithm we proposed than the classic Q-learning algorithm in the agricultural plant protection environment. This proposed algorithm is implemented and tested on datasets that are evenly distributed based on real UAV parameters and real farm information. The performance evaluation of the algorithm is discussed in detail. Experimental results show that the algorithm we proposed can efficiently learn the optimal policy for UAVs in the agricultural plant protection environment.


Author(s):  
Abdelghafour Harraz ◽  
Mostapha Zbakh

Artificial Intelligence allows to create engines that are able to explore, learn environments and therefore create policies that permit to control them in real time with no human intervention. It can be applied, through its Reinforcement Learning techniques component, using frameworks such as temporal differences, State-Action-Reward-State-Action (SARSA), Q Learning to name a few, to systems that are be perceived as a Markov Decision Process, this opens door in front of applying Reinforcement Learning to Cloud Load Balancing to be able to dispatch load dynamically to a given Cloud System. The authors will describe different techniques that can used to implement a Reinforcement Learning based engine in a cloud system.


Author(s):  
Jonathan Becker ◽  
Aveek Purohit ◽  
Zheng Sun

USARSim group at NIST developed a simulated robot that operated in the Unreal Tournament 3 (UT3) gaming environment. They used a software PID controller to control the robot in UT3 worlds. Unfortunately, the PID controller did not work well, so NIST asked us to develop a better controller using machine learning techniques. In the process, we characterized the software PID controller and the robot’s behavior in UT3 worlds. Using data collected from our simulations, we compared different machine learning techniques including linear regression and reinforcement learning (RL). Finally, we implemented a RL based controller in Matlab and ran it in the UT3 environment via a TCP/IP link between Matlab and UT3.


Mathematics ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. 1479
Author(s):  
Francisco Martinez-Gil ◽  
Miguel Lozano ◽  
Ignacio García-Fernández ◽  
Pau Romero ◽  
Dolors Serra ◽  
...  

Reinforcement learning is one of the most promising machine learning techniques to get intelligent behaviors for embodied agents in simulations. The output of the classic Temporal Difference family of Reinforcement Learning algorithms adopts the form of a value function expressed as a numeric table or a function approximator. The learned behavior is then derived using a greedy policy with respect to this value function. Nevertheless, sometimes the learned policy does not meet expectations, and the task of authoring is difficult and unsafe because the modification of one value or parameter in the learned value function has unpredictable consequences in the space of the policies it represents. This invalidates direct manipulation of the learned value function as a method to modify the derived behaviors. In this paper, we propose the use of Inverse Reinforcement Learning to incorporate real behavior traces in the learning process to shape the learned behaviors, thus increasing their trustworthiness (in terms of conformance to reality). To do so, we adapt the Inverse Reinforcement Learning framework to the navigation problem domain. Specifically, we use Soft Q-learning, an algorithm based on the maximum causal entropy principle, with MARL-Ped (a Reinforcement Learning-based pedestrian simulator) to include information from trajectories of real pedestrians in the process of learning how to navigate inside a virtual 3D space that represents the real environment. A comparison with the behaviors learned using a Reinforcement Learning classic algorithm (Sarsa(λ)) shows that the Inverse Reinforcement Learning behaviors adjust significantly better to the real trajectories.


2003 ◽  
Vol 06 (03) ◽  
pp. 405-426 ◽  
Author(s):  
PAUL DARBYSHIRE

Distillations utilize multi-agent based modeling and simulation techniques to study warfare as a complex adaptive system at the conceptual level. The focus is placed on the interactions between the agents to facilitate study of cause and effect between individual interactions and overall system behavior. Current distillations do not utilize machine-learning techniques to model the cognitive abilities of individual combatants but employ agent control paradigms to represent agents as highly instinctual entities. For a team of agents implementing a reinforcement-learning paradigm, the rate of learning is not sufficient for agents to adapt to this hostile environment. However, by allowing the agents to communicate their respective rewards for actions performed as the simulation progresses, the rate of learning can be increased sufficiently to significantly increase the teams chances of survival. This paper presents the results of trials to measure the success of a team-based approach to the reinforcement-learning problem in a distillation, using reward communication to increase learning rates.


2020 ◽  
Author(s):  
Ao Chen ◽  
Taresh Dewan ◽  
Manva Trivedi ◽  
Danning Jiang ◽  
Aloukik Aditya ◽  
...  

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.


2020 ◽  
Author(s):  
Ao Chen ◽  
Taresh Dewan ◽  
Manva Trivedi ◽  
Danning Jiang ◽  
Aloukik Aditya ◽  
...  

This paper provides a comparative analysis between Deep Q Network (DQN) and Double Deep Q Network (DDQN) algorithms based on their hit rate, out of which DDQN proved to be better for Breakout game. DQN is chosen over Basic Q learning because it understands policy learning using its neural network which is good for complex environment and DDQN is chosen as it solves overestimation problem (agent always choses non-optimal action for any state just because it has maximum Q-value) occurring in basic Q-learning.


Telecom ◽  
2021 ◽  
Vol 2 (3) ◽  
pp. 255-270
Author(s):  
Saeid Pourroostaei Ardakani ◽  
Ali Cheshmehzangi

UAV path planning for remote sensing aims to find the best-fitted routes to complete a data collection mission. UAVs plan the routes and move through them to remotely collect environmental data from particular target zones by using sensory devices such as cameras. Route planning may utilize machine learning techniques to autonomously find/select cost-effective and/or best-fitted routes and achieve optimized results including: minimized data collection delay, reduced UAV power consumption, decreased flight traversed distance and maximized number of collected data samples. This paper utilizes a reinforcement learning technique (location and energy-aware Q-learning) to plan UAV routes for remote sensing in smart farms. Through this, the UAV avoids heuristically or blindly moving throughout a farm, but this takes the benefits of environment exploration–exploitation to explore the farm and find the shortest and most cost-effective paths into target locations with interesting data samples to collect. According to the simulation results, utilizing the Q-learning technique increases data collection robustness and reduces UAV resource consumption (e.g., power), traversed paths, and remote sensing latency as compared to two well-known benchmarks, IEMF and TBID, especially if the target locations are dense and crowded in a farm.


2020 ◽  
Vol 11 (4) ◽  
Author(s):  
Leandro Vian ◽  
Marcelo De Gomensoro Malheiros

In recent years Machine Learning techniques have become the driving force behind the worldwide emergence of Artificial Intelligence, producing cost-effective and precise tools for pattern recognition and data analysis. A particular approach for the training of neural networks, Reinforcement Learning (RL), achieved prominence creating almost unbeatable artificial opponents in board games like Chess or Go, and also on video games. This paper gives an overview of Reinforcement Learning and tests this approach against a very popular real-time strategy game, Starcraft II. Our goal is to examine the tools and algorithms readily available for RL, also addressing different scenarios where a neural network can be linked to Starcraft II to learn by itself. This work describes both the technical issues involved and the preliminary results obtained by the application of two specific training strategies, A2C and DQN.


Author(s):  
Yufei Wei ◽  
Xiaotong Nie ◽  
Motoaki Hiraga ◽  
Kazuhiro Ohkura ◽  
Zlatan Car ◽  
...  

In this study, the use of a popular deep reinforcement learning algorithm – deep Q-learning – in developing end-to-end control policies for robotic swarms is explored. Robots only have limited local sensory capabilities; however, in a swarm, they can accomplish collective tasks beyond the capability of a single robot. Compared with most automatic design approaches proposed so far, which belong to the field of evolutionary robotics, deep reinforcement learning techniques provide two advantages: (i) they enable researchers to develop control policies in an end-to-end fashion; and (ii) they require fewer computation resources, especially when the control policy to be developed has a large parameter space. The proposed approach is evaluated in a round-trip task, where the robots are required to travel between two destinations as much as possible. Simulation results show that the proposed approach can learn control policies directly from high-dimensional raw camera pixel inputs for robotic swarms.


Sign in / Sign up

Export Citation Format

Share Document