large state space Latest Research Papers

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).

Download Full-text

A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space

Computers & Industrial Engineering ◽

10.1016/j.cie.2021.107622 ◽

2021 ◽

Vol 161 ◽

pp. 107622

Author(s):

Ping Zhang ◽

Xiaoyan Zhu ◽

Min Xie

Keyword(s):

Reinforcement Learning ◽

State Space ◽

Maintenance Optimization ◽

Learning Approach ◽

Model Based ◽

Large State ◽

Large State Space

Download Full-text

Deep Reinforcement Learning for Task Offloading and Power Allocation in UAV-assisted MEC System

International Journal of Mobile Computing and Multimedia Communications ◽

10.4018/ijmcmc.289163 ◽

2021 ◽

Vol 12 (4) ◽

pp. 0-0

Keyword(s):

Reinforcement Learning ◽

Power Allocation ◽

Optimization Methods ◽

Gradient Algorithm ◽

Wireless Access Networks ◽

Computing Services ◽

Large State Space ◽

Markov Decision ◽

Training Cost ◽

Task Offloading

Mobile edge computing (MEC) can provide computing services for mobile users (MUs) by offloading computing tasks to edge clouds through wireless access networks. Unmanned aerial vehicles (UAVs) are deployed as supplementary edge clouds to provide effective MEC services for MUs with poor wireless communication condition. In this paper, a joint task offloading and power allocation (TOPA) optimization problem is investigated in UAV-assisted MEC system. Since the joint TOPA problem has a strong non-convex characteristic, a method based on deep reinforcement learning is proposed. Specifically, the joint TOPA problem is modeled as Markov decision process. Then, considering the large state space and continuous action space, a twin delayed deep deterministic policy gradient algorithm is proposed. Simulation results show that the proposed scheme has lower smoothing training cost than other optimization methods.

Download Full-text

Dynamic Cooperative Spectrum Sensing Based on Deep Multi-User Reinforcement Learning

Applied Sciences ◽

10.3390/app11041884 ◽

2021 ◽

Vol 11 (4) ◽

pp. 1884

Author(s):

Shuai Liu ◽

Jing He ◽

Jiayun Wu

Keyword(s):

Reinforcement Learning ◽

Spectrum Sensing ◽

Dynamic Spectrum Access ◽

Prior Information ◽

Cooperative Spectrum Sensing ◽

Spectrum Access ◽

Spectrum Utilization ◽

Channel Strategy ◽

Large State Space ◽

Network Utility

Dynamic spectrum access (DSA) has been considered as a promising technology to address spectrum scarcity and improve spectrum utilization. Normally, the channels are related to each other. Meanwhile, collisions will be inevitably caused by communicating between multiple PUs or multiple SUs in a real DSA environment. Considering these factors, the deep multi-user reinforcement learning (DMRL) is proposed by introducing the cooperative strategy into dueling deep Q network (DDQN). With no demand of prior information about the system dynamics, DDQN can efficiently learn the correlations between channels, and reduce the computational complexity in the large state space of the multi-user environment. To reduce the conflicts and further maximize the network utility, cooperative channel strategy is explored by utilizing the acknowledge (ACK) signals without exchanging spectrum information. In each time slot, each user selects a channel and transmits a packet with a certain probability. After sending, ACK signals are utilized to judge whether the transmission is successful or not. Compared with other popular models, the simulation results show that the proposed DMRL can achieve better performance on effectively enhancing spectrum utilization and reducing conflict rate in the dynamic cooperative spectrum sensing.

Download Full-text

Target Tracking Using Reinforcement Learning and Neural Networks

European Journal of Engineering Research and Science ◽

10.24018/ejers.2021.6.1.2316 ◽

2021 ◽

Vol 6 (1) ◽

pp. 48-54

Author(s):

Jezuina Koroveshi ◽

Ana Ktona

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Reinforcement Learning ◽

Target Tracking ◽

Traffic Light ◽

Large State Space ◽

Traffic Light Control ◽

Multi Agent ◽

Reward Functions ◽

The Value Function

Target tracking is a process that may find applications in different domains such as video surveillance, robot navigation and human computer interaction. In this work we have considered the problem of tracking a moving object in a multi agent environment. The environment is a rectangular space bounded by walls. The first agent is the target and it moves randomly in the space. The second agent should follow the target, keeping as close as possible without crashing with it. It uses sensors to detect the position of the target. The sensor readings give the distance and the angle from the target. We use reinforcement learning to train the tracker to detect any change in the movement of the target and stay within a certain range from it. Reinforcement learning is a form of machine learning in which the agent learns by interacting with the environment. By doing so, for each action taken, the agent receives a reward from the environment, which is used to determine positive or negative behaviour. The goal of the agent is to maximise the total reward received during the interaction. This form of machine learning has applications in different areas, such as: game solving with the most known game being AlphaGO; robotics, for design of hard-to engineer behaviours; traffic light control, personalized recommendations, etc. The sensor readings may have continuous values, making a very large state space. We approximate the value function using neural networks and use different reward functions for learning the best policy.

Download Full-text

An Approach to the Formation of Test Sequences Based on the Graph Model of the Cache Memory Hierarchy

Proceedings of Universities ELECTRONICS ◽

10.24151/1561-5405-2020-25-6-548-557 ◽

2020 ◽

Vol 25 (6) ◽

pp. 548-557

Author(s):

A.V. Garashchenko ◽

◽

L.G. Gagarina ◽

◽

Keyword(s):

State Space ◽

Memory Hierarchy ◽

Graph Model ◽

Cache Memory ◽

Functional Verification ◽

Huge Number ◽

Model Framework ◽

Hierarchy Level ◽

Large State Space ◽

Processor Cores

The verification of the cache memory hierarchy in modern SoC due to the large state space requires a huge number of complex tests. This becomes the main problem for functional verification. To cover the entire state space, a graph model of the cache memory hierarchy as well as the methods of generating the formation of the test sequences based on this model have been proposed. The graph model vertices are a set of states (tags, values, etc.) of each hierarchy level, and the edges are a set of transitions between states (instructions for reading, records). The graph model, describing all states of the cache-memory hierarchy states, has been developed. Each edge in the graph is a separate check sequence. In case of the non-deterministic situations, such as the choice of a channel (port) for multichannel cache memory, it will not be possible to resolve them at the level of the graph model, since the choice of the channel depends on many factors not considered within the model framework. It has been proposed to create a separate instance of a subgraph for each channel. The described approach has revealed, in verification of the multiport cache-memory hierarchy of the developed core with the new vector architecture VLIW DSP, a few architectural and functional errors. This approach can be used to test other processor cores and their blocks

Download Full-text

On Solving MDPs With Large State Space: Exploitation of Policy Structures and Spectral Properties

IEEE Transactions on Communications ◽

10.1109/tcomm.2019.2899620 ◽

2019 ◽

Vol 67 (6) ◽

pp. 4151-4165 ◽

Cited By ~ 2

Author(s):

Libin Liu ◽

Arpan Chattopadhyay ◽

Urbashi Mitra

Keyword(s):

State Space ◽

Spectral Properties ◽

Large State ◽

Large State Space

Download Full-text

Bayesian Estimation of Population Size Changes by Sampling Tajima’s Trees

10.1101/605352 ◽

2019 ◽

Cited By ~ 1

Author(s):

Julia A. Palacios ◽

Amandine Véber ◽

Lorenzo Cappello ◽

Zhangyuan Wang ◽

John Wakeley ◽

...

Keyword(s):

State Space ◽

Bayesian Estimation ◽

Population Size ◽

Simulated Data ◽

Effective Population ◽

Large State Space ◽

Population Sizes ◽

Efficient Alternative ◽

Inference Methods ◽

Kingman’S Coalescent

AbstractThe large state space of gene genealogies is a major hurdle for inference methods based on Kingman’s coalescent. Here, we present a new Bayesian approach for inferring past population sizes which relies on a lower resolution coalescent process we refer to as “Tajima’s coalescent”. Tajima’s coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima’s Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated data and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman’s coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.

Download Full-text

The Beacon Calculus: A formal method for the flexible and concise modelling of biological systems

10.1101/579029 ◽

2019 ◽

Author(s):

Michael A. Boemo ◽

Luca Cardelli ◽

Conrad A. Nieduszynski

Keyword(s):

Dna Replication ◽

Open Source ◽

Biological System ◽

Genome Stability ◽

Formal Method ◽

Biological Systems ◽

Origin Firing ◽

Link Type ◽

Multisite Phosphorylation ◽

Large State Space

AbstractBiological systems are made up of components that change their actions (and interactions) over time and coordinate with other components nearby. Together with a large state space, the complexity of this behaviour can make it difficult to create concise mathematical models that can be easily extended or modified. This paper introduces the Beacon Calculus, a process algebra designed to simplify the task of modelling interacting biological components. Its breadth is demonstrated by creating models of DNA replication dynamics, the gene expression dynamics in response to DNA methylation damage, and a multisite phosphorylation switch. The flexibility of these models is shown by adapting the DNA replication model to further include two topics of interest from the literature: cooperative origin firing and replication fork barriers. The Beacon Calculus is supported with the open-source simulator bcs (https://github.com/MBoemo/bcs.git) to allow users to develop and simulate their own models.Author summarySimulating a model of a biological system can suggest ideas for future experiments and help ensure that conclusions about a mechanism are consistent with data. The Beacon Calculus is a new language that makes modelling simple by allowing users to simulate a biological system in only a few lines of code. This simplicity is critical as it allows users the freedom to come up with new ideas and rapidly test them. Models written in the Beacon Calculus are also easy to modify and extend, allowing users to add new features to the model or incorporate it into a larger biological system. We demonstrate the breadth of applications in this paper by applying the Beacon Calculus to DNA replication and DNA damage repair, both of which have implications for genome stability and cancer. We also apply it to multisite phosphorylation, which is important for cellular signalling. To enable users to create their own models, we created the open-source Beacon Calculus simulator bcs (https://github.com/MBoemo/bcs.git) which is easy to install and is well-supported by documentation and examples.

Download Full-text

Continuous residual reinforcement learning for traffic signal control optimization

Canadian Journal of Civil Engineering ◽

10.1139/cjce-2017-0408 ◽

2018 ◽

Vol 45 (8) ◽

pp. 690-702 ◽

Cited By ~ 7

Author(s):

Mohammad Aslani ◽

Stefan Seipel ◽

Marco Wiering

Keyword(s):

Reinforcement Learning ◽

Fixed Time ◽

Traffic Signal ◽

Signal Control ◽

Traffic Signal Control ◽

Microscopic Traffic Simulation ◽

Control Optimization ◽

Large State Space ◽

Average Travel Time ◽

Control Traffic

Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on continuous reinforcement learning. Although they have been successful in traffic signal control, they may become unstable and fail to converge to near-optimal solutions. We develop adaptive traffic signal controllers based on continuous residual reinforcement learning (CRL-TSC) that is more stable. The effect of three feature functions is empirically investigated in a microscopic traffic simulation. Furthermore, the effects of departing streets, more actions, and the use of the spatial distribution of the vehicles on the performance of CRL-TSCs are assessed. The results show that the best setup of the CRL-TSC leads to saving average travel time by 15% in comparison to an optimized fixed-time controller.

Download Full-text

large state space
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space

Deep Reinforcement Learning for Task Offloading and Power Allocation in UAV-assisted MEC System

Dynamic Cooperative Spectrum Sensing Based on Deep Multi-User Reinforcement Learning

Target Tracking Using Reinforcement Learning and Neural Networks

An Approach to the Formation of Test Sequences Based on the Graph Model of the Cache Memory Hierarchy

On Solving MDPs With Large State Space: Exploitation of Policy Structures and Spectral Properties

Bayesian Estimation of Population Size Changes by Sampling Tajima’s Trees

The Beacon Calculus: A formal method for the flexible and concise modelling of biological systems

Continuous residual reinforcement learning for traffic signal control optimization

Export Citation Format

large state spaceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Approximating Stackelberg Equilibrium in Anti-UAV Jamming Markov Game with Hierarchical Multi-Agent Deep Reinforcement Learning Algorithm

A model-based reinforcement learning approach for maintenance optimization of degrading systems in a large state space

Deep Reinforcement Learning for Task Offloading and Power Allocation in UAV-assisted MEC System

Dynamic Cooperative Spectrum Sensing Based on Deep Multi-User Reinforcement Learning

Target Tracking Using Reinforcement Learning and Neural Networks

An Approach to the Formation of Test Sequences Based on the Graph Model of the Cache Memory Hierarchy

On Solving MDPs With Large State Space: Exploitation of Policy Structures and Spectral Properties

Bayesian Estimation of Population Size Changes by Sampling Tajima’s Trees

The Beacon Calculus: A formal method for the flexible and concise modelling of biological systems

Continuous residual reinforcement learning for traffic signal control optimization

large state space
Recently Published Documents