Effective Routing in Vehicular Adhoc Network (VANET) using an Bio-inspired Algorithm: Enhanced Deep Reinforcement Learning (EDRL) for Secure Wireless Communication

Abstract For improving the performance of city wide-ranging lane networks through the optimized control signal, we proposed a framework in Vehicular Adhoc Network (VANET). Node which reduces the traffic efficiency drastically is identified as critical node, with the help of defined framework. Tripartite graph is used for identifying critical node through vehicle trajectory in the over-all viewpoint. Enhanced Deep Reinforcement Learning (EDRL) method is introduced to control the traffic signal and gives appropriate decision for routing the data from Road Side Unit (RSU) to intermediate or destination node. Various experiments were done with proposed model and the result shows considerable efficiency in delay and travelling time of the node in VANET.

Download Full-text

A strategy learning model for autonomous agents based on classification

International Journal of Applied Mathematics and Computer Science ◽

10.1515/amcs-2015-0035 ◽

2015 ◽

Vol 25 (3) ◽

pp. 471-482 ◽

Cited By ~ 7

Author(s):

Bartłomiej Śnieżyński

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Learning Process ◽

Autonomous Agents ◽

Good Alternative ◽

Learning Model ◽

Learning Method ◽

Complex Environments ◽

Agent Based ◽

Proposed Model

AbstractIn this paper we propose a strategy learning model for autonomous agents based on classification. In the literature, the most commonly used learning method in agent-based systems is reinforcement learning. In our opinion, classification can be considered a good alternative. This type of supervised learning can be used to generate a classifier that allows the agent to choose an appropriate action for execution. Experimental results show that this model can be successfully applied for strategy generation even if rewards are delayed. We compare the efficiency of the proposed model and reinforcement learning using the farmer-pest domain and configurations of various complexity. In complex environments, supervised learning can improve the performance of agents much faster that reinforcement learning. If an appropriate knowledge representation is used, the learned knowledge may be analyzed by humans, which allows tracking the learning process

Download Full-text

Multi-Agent Reinforcement Learning for Optimizing Traffic Signal Timing

10.5121/csit.2021.110102 ◽

2021 ◽

Author(s):

Areej Salaymeh ◽

Loren Schwiebert ◽

Stephen Remias

Keyword(s):

Reinforcement Learning ◽

Traffic Signals ◽

Transportation Systems ◽

Traffic Signal ◽

Urban Traffic ◽

Signal Timing ◽

Model Free ◽

Proposed Model ◽

Multi Agent ◽

Traffic Signal Timing

Designing efficient transportation systems is crucial to save time and money for drivers and for the economy as whole. One of the most important components of traffic systems are traffic signals. Currently, most traffic signal systems are configured using fixed timing plans, which are based on limited vehicle count data. Past research has introduced and designed intelligent traffic signals; however, machine learning and deep learning have only recently been used in systems that aim to optimize the timing of traffic signals in order to reduce travel time. A very promising field in Artificial Intelligence is Reinforcement Learning. Reinforcement learning (RL) is a data driven method that has shown promising results in optimizing traffic signal timing plans to reduce traffic congestion. However, model-based and centralized methods are impractical here due to the high dimensional state-action space in complex urban traffic network. In this paper, a model-free approach is used to optimize signal timing for complicated multiple four-phase signalized intersections. We propose a multi-agent deep reinforcement learning framework that aims to optimize traffic flow using data within traffic signal intersections and data coming from other intersections in a Multi-Agent Environment in what is called Multi-Agent Reinforcement Learning (MARL). The proposed model consists of state-of-art techniques such as Double Deep Q-Network and Hindsight Experience Replay (HER). This research uses HER to allow our framework to quickly learn on sparse reward settings. We tested and evaluated our proposed model via a Simulation of Urban MObility simulation (SUMO). Our results show that the proposed method is effective in reducing congestion in both peak and off-peak times.

Download Full-text

Robot hand-eye cooperation based on improved inverse reinforcement learning

Industrial Robot the international journal of robotics research and application ◽

10.1108/ir-09-2021-0208 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Ning Yu ◽

Lin Nan ◽

Tao Ku

Keyword(s):

Decision Making ◽

Reinforcement Learning ◽

Visual Information ◽

Industrial Robots ◽

Robot Hand ◽

Inverse Reinforcement Learning ◽

Generative Adversarial Network ◽

Content Type ◽

Coordination Model ◽

Proposed Model

Purpose How to make accurate action decisions based on visual information is one of the important research directions of industrial robots. The purpose of this paper is to design a highly optimized hand-eye coordination model of the robot to improve the robots’ on-site decision-making ability. Design/methodology/approach The combination of inverse reinforcement learning (IRL) algorithm and generative adversarial network can effectively reduce the dependence on expert samples and robots can obtain the decision-making performance that the degree of optimization is not lower than or even higher than that of expert samples. Findings The performance of the proposed model is verified in the simulation environment and real scene. By monitoring the reward distribution of the reward function and the trajectory of the robot, the proposed model is compared with other existing methods. The experimental results show that the proposed model has better decision-making performance in the case of less expert data. Originality/value A robot hand-eye cooperation model based on improved IRL is proposed and verified. Empirical investigations on real experiments reveal that overall, the proposed approach tends to improve the real efficiency by more than 10% when compared to alternative hand-eye cooperation methods.

Download Full-text

Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks

Applied Sciences ◽

10.3390/app9030502 ◽

2019 ◽

Vol 9 (3) ◽

pp. 502 ◽

Cited By ~ 8

Author(s):

Cristyan Gil ◽

Hiram Calvo ◽

Humberto Sossa

Keyword(s):

Reinforcement Learning ◽

Gait Cycle ◽

Biped Robot ◽

Q Learning ◽

Nao Robot ◽

Simulated Environment ◽

Proposed Model ◽

Normal Speed ◽

Multi Level ◽

Efficient Gait

Programming robots for performing different activities requires calculating sequences of values of their joints by taking into account many factors, such as stability and efficiency, at the same time. Particularly for walking, state of the art techniques to approximate these sequences are based on reinforcement learning (RL). In this work we propose a multi-level system, where the same RL method is used first to learn the configuration of robot joints (poses) that allow it to stand with stability, and then in the second level, we find the sequence of poses that let it reach the furthest distance in the shortest time, while avoiding falling down and keeping a straight path. In order to evaluate this, we focus on measuring the time it takes for the robot to travel a certain distance. To our knowledge, this is the first work focusing both on speed and precision of the trajectory at the same time. We implement our model in a simulated environment using q-learning. We compare with the built-in walking modes of an NAO robot by improving normal-speed and enhancing robustness in fast-speed. The proposed model can be extended to other tasks and is independent of a particular robot model.

Download Full-text

Deep Reinforcement Learning by Balancing Offline Monte Carlo and Online Temporal Difference Use Based on Environment Experiences

Symmetry ◽

10.3390/sym12101685 ◽

2020 ◽

Vol 12 (10) ◽

pp. 1685 ◽

Cited By ~ 1

Author(s):

Chayoung Kim

Keyword(s):

Monte Carlo ◽

Reinforcement Learning ◽

Real Time ◽

Temporal Difference ◽

Q Learning ◽

State Action ◽

Proposed Model ◽

Reward Functions ◽

And Performance ◽

The Internet Of Things

Owing to the complexity involved in training an agent in a real-time environment, e.g., using the Internet of Things (IoT), reinforcement learning (RL) using a deep neural network, i.e., deep reinforcement learning (DRL) has been widely adopted on an online basis without prior knowledge and complicated reward functions. DRL can handle a symmetrical balance between bias and variance—this indicates that the RL agents are competently trained in real-world applications. The approach of the proposed model considers the combinations of basic RL algorithms with online and offline use based on the empirical balances of bias–variance. Therefore, we exploited the balance between the offline Monte Carlo (MC) technique and online temporal difference (TD) with on-policy (state-action–reward-state-action, Sarsa) and an off-policy (Q-learning) in terms of a DRL. The proposed balance of MC (offline) and TD (online) use, which is simple and applicable without a well-designed reward, is suitable for real-time online learning. We demonstrated that, for a simple control task, the balance between online and offline use without an on- and off-policy shows satisfactory results. However, in complex tasks, the results clearly indicate the effectiveness of the combined method in improving the convergence speed and performance in a deep Q-network.

Download Full-text

Optimizing the Pairs-Trading Strategy Using Deep Reinforcement Learning with Trading and Stop-Loss Boundaries

Complexity ◽

10.1155/2019/3582516 ◽

2019 ◽

Vol 2019 ◽

pp. 1-20 ◽

Cited By ~ 1

Author(s):

Taewook Kim ◽

Ha Young Kim

Keyword(s):

Reinforcement Learning ◽

Trading Strategy ◽

Trading Strategies ◽

Cointegration Test ◽

Optimum Level ◽

Pairs Trading ◽

Proposed Model ◽

The Mean ◽

Stop Loss ◽

The Given

Many researchers have tried to optimize pairs trading as the numbers of opportunities for arbitrage profit have gradually decreased. Pairs trading is a market-neutral strategy; it profits if the given condition is satisfied within a given trading window, and if not, there is a risk of loss. In this study, we propose an optimized pairs-trading strategy using deep reinforcement learning—particularly with the deep Q-network—utilizing various trading and stop-loss boundaries. More specifically, if spreads hit trading thresholds and reverse to the mean, the agent receives a positive reward. However, if spreads hit stop-loss thresholds or fail to reverse to the mean after hitting the trading thresholds, the agent receives a negative reward. The agent is trained to select the optimum level of discretized trading and stop-loss boundaries given a spread to maximize the expected sum of discounted future profits. Pairs are selected from stocks on the S&P 500 Index using a cointegration test. We compared our proposed method with traditional pairs-trading strategies which use constant trading and stop-loss boundaries. We find that our proposed model is trained well and outperforms traditional pairs-trading strategies.

Download Full-text

A Reinforcement Learning Neural Network for Robotic Manipulator Control

Neural Computation ◽

10.1162/neco_a_01079 ◽

2018 ◽

Vol 30 (7) ◽

pp. 1983-2004 ◽

Cited By ~ 8

Author(s):

Yazhou Hu ◽

Bailu Si

Keyword(s):

Neural Network ◽

Reinforcement Learning ◽

Performance Index ◽

Robotic Manipulator ◽

The State ◽

Unknown Parameters ◽

Proposed Model ◽

Action Network ◽

Action Policy ◽

The Stability

We propose a neural network model for reinforcement learning to control a robotic manipulator with unknown parameters and dead zones. The model is composed of three networks. The state of the robotic manipulator is predicted by the state network of the model, the action policy is learned by the action network, and the performance index of the action policy is estimated by a critic network. The three networks work together to optimize the performance index based on the reinforcement learning control scheme. The convergence of the learning methods is analyzed. Application of the proposed model on a simulated two-link robotic manipulator demonstrates the effectiveness and the stability of the model.

Download Full-text

Reinforcement Learning for Routing in Cognitive Radio Ad Hoc Networks

The Scientific World JOURNAL ◽

10.1155/2014/960584 ◽

2014 ◽

Vol 2014 ◽

pp. 1-22 ◽

Cited By ~ 2

Author(s):

Hasan A. A. Al-Rawi ◽

Kok-Lim Alvin Yau ◽

Hafizal Mohamad ◽

Nordin Ramli ◽

Wahidah Hashim

Keyword(s):

Reinforcement Learning ◽

Cognitive Radio ◽

Network Performance ◽

Ad Hoc ◽

Research Area ◽

Learning Rate ◽

Destination Node ◽

Reward Function ◽

New Approaches ◽

Exploitation And Exploration

Cognitive radio (CR) enables unlicensed users (or secondary users, SUs) to sense for and exploit underutilized licensed spectrum owned by the licensed users (or primary users, PUs). Reinforcement learning (RL) is an artificial intelligence approach that enables a node to observe, learn, and make appropriate decisions on action selection in order to maximize network performance. Routing enables a source node to search for a least-cost route to its destination node. While there have been increasing efforts to enhance the traditional RL approach for routing in wireless networks, this research area remains largely unexplored in the domain of routing in CR networks. This paper applies RL in routing and investigates the effects of various features of RL (i.e., reward function, exploitation, and exploration, as well as learning rate) through simulation. New approaches and recommendations are proposed to enhance the features in order to improve the network performance brought about by RL to routing. Simulation results show that the RL parameters of the reward function, exploitation, and exploration, as well as learning rate, must be well regulated, and the new approaches proposed in this paper improves SUs’ network performance without significantly jeopardizing PUs’ network performance, specifically SUs’ interference to PUs.

Download Full-text

Spatial-Temporal Flows-Adaptive Street Layout Control Using Reinforcement Learning

Sustainability ◽

10.3390/su14010107 ◽

2021 ◽

Vol 14 (1) ◽

pp. 107

Author(s):

Qiming Ye ◽

Yuxiang Feng ◽

Eduardo Candela ◽

Jose Escribano Macias ◽

Marc Stettler ◽

...

Keyword(s):

Optimal Control ◽

Reinforcement Learning ◽

Real Time ◽

Control Method ◽

Traffic Light ◽

Traffic Efficiency ◽

Road Management ◽

Street Space ◽

Road Space ◽

Pedestrian Travel

Complete streets scheme makes seminal contributions to securing the basic public right-of-way (ROW), improving road safety, and maintaining high traffic efficiency for all modes of commute. However, such a popular street design paradigm also faces endogenous pressures like the appeal to a more balanced ROW for non-vehicular users. In addition, the deployment of Autonomous Vehicle (AV) mobility is likely to challenge the conventional use of the street space as well as this scheme. Previous studies have invented automated control techniques for specific road management issues, such as traffic light control and lane management. Whereas models and algorithms that dynamically calibrate the ROW of road space corresponding to travel demands and place-making requirements still represent a research gap. This study proposes a novel optimal control method that decides the ROW of road space assigned to driveways and sidewalks in real-time. To solve this optimal control task, a reinforcement learning method is introduced that employs a microscopic traffic simulator, namely SUMO, as its environment. The model was trained for 150 episodes using a four-legged intersection and joint AVs-pedestrian travel demands of a day. Results evidenced the effectiveness of the model in both symmetric and asymmetric road settings. After being trained by 150 episodes, our proposed model significantly increased its comprehensive reward of both pedestrians and vehicular traffic efficiency and sidewalk ratio by 10.39%. Decisions on the balanced ROW are optimised as 90.16% of the edges decrease the driveways supply and raise sidewalk shares by approximately 9%. Moreover, during 18.22% of the tested time slots, a lane-width equivalent space is shifted from driveways to sidewalks, minimising the travel costs for both an AV fleet and pedestrians. Our study primarily contributes to the modelling architecture and algorithms concerning centralised and real-time ROW management. Prospective applications out of this method are likely to facilitate AV mobility-oriented road management and pedestrian-friendly street space design in the near future.

Download Full-text

Unsupervised learning and clustered connectivity enhance reinforcement learning in spiking neural networks

10.1101/2020.03.17.995563 ◽

2020 ◽

Author(s):

Philipp Weidel ◽

Renato Duarte ◽

Abigail Morrison

Keyword(s):

Reinforcement Learning ◽

Unsupervised Learning ◽

Activity Patterns ◽

Receptive Fields ◽

Place Cells ◽

Spiking Neural Networks ◽

Complex Environments ◽

Proposed Model ◽

Clustered Network ◽

Better Than

ABSTRACTReinforcement learning is a learning paradigm that can account for how organisms learn to adapt their behavior in complex environments with sparse rewards. However, implementations in spiking neuronal networks typically rely on input architectures involving place cells or receptive fields. This is problematic, as such approaches either scale badly as the environment grows in size or complexity, or presuppose knowledge on how the environment should be partitioned. Here, we propose a learning architecture that combines unsupervised learning on the input projections with clustered connectivity within the representation layer. This combination allows input features to be mapped to clusters; thus the network self-organizes to produce task-relevant activity patterns that can serve as the basis for reinforcement learning on the output projections. On the basis of the MNIST and Mountain Car tasks, we show that our proposed model performs better than either a comparable unclustered network or a clustered network with static input projections. We conclude that the combination of unsupervised learning and clustered connectivity provides a generic representational substrate suitable for further computation.

Download Full-text