large state space
Recently Published Documents


TOTAL DOCUMENTS

38
(FIVE YEARS 9)

H-INDEX

7
(FIVE YEARS 1)

2021 ◽  
Author(s):  
Zikai Feng ◽  
Yuanyuan Wu ◽  
Mengxing Huang ◽  
Di Wu

Abstract In order to avoid the malicious jamming of the intelligent unmanned aerial vehicle (UAV) to ground users in the downlink communications, a new anti-UAV jamming strategy based on multi-agent deep reinforcement learning is studied in this paper. In this method, ground users aim to learn the best mobile strategies to avoid the jamming of UAV. The problem is modeled as a Stackelberg game to describe the competitive interaction between the UAV jammer (leader) and ground users (followers). To reduce the computational cost of equilibrium solution for the complex game with large state space, a hierarchical multi-agent proximal policy optimization (HMAPPO) algorithm is proposed to decouple the hybrid game into several sub-Markov games, which updates the actor and critic network of the UAV jammer and ground users at different time scales. Simulation results suggest that the hierarchical multi-agent proximal policy optimization -based anti-jamming strategy achieves comparable performance with lower time complexity than the benchmark strategies. The well-trained HMAPPO has the ability to obtain the optimal jamming strategy and the optimal anti-jamming strategies, which can approximate the Stackelberg equilibrium (SE).


Mobile edge computing (MEC) can provide computing services for mobile users (MUs) by offloading computing tasks to edge clouds through wireless access networks. Unmanned aerial vehicles (UAVs) are deployed as supplementary edge clouds to provide effective MEC services for MUs with poor wireless communication condition. In this paper, a joint task offloading and power allocation (TOPA) optimization problem is investigated in UAV-assisted MEC system. Since the joint TOPA problem has a strong non-convex characteristic, a method based on deep reinforcement learning is proposed. Specifically, the joint TOPA problem is modeled as Markov decision process. Then, considering the large state space and continuous action space, a twin delayed deep deterministic policy gradient algorithm is proposed. Simulation results show that the proposed scheme has lower smoothing training cost than other optimization methods.


2021 ◽  
Vol 11 (4) ◽  
pp. 1884
Author(s):  
Shuai Liu ◽  
Jing He ◽  
Jiayun Wu

Dynamic spectrum access (DSA) has been considered as a promising technology to address spectrum scarcity and improve spectrum utilization. Normally, the channels are related to each other. Meanwhile, collisions will be inevitably caused by communicating between multiple PUs or multiple SUs in a real DSA environment. Considering these factors, the deep multi-user reinforcement learning (DMRL) is proposed by introducing the cooperative strategy into dueling deep Q network (DDQN). With no demand of prior information about the system dynamics, DDQN can efficiently learn the correlations between channels, and reduce the computational complexity in the large state space of the multi-user environment. To reduce the conflicts and further maximize the network utility, cooperative channel strategy is explored by utilizing the acknowledge (ACK) signals without exchanging spectrum information. In each time slot, each user selects a channel and transmits a packet with a certain probability. After sending, ACK signals are utilized to judge whether the transmission is successful or not. Compared with other popular models, the simulation results show that the proposed DMRL can achieve better performance on effectively enhancing spectrum utilization and reducing conflict rate in the dynamic cooperative spectrum sensing.


2021 ◽  
Vol 6 (1) ◽  
pp. 48-54
Author(s):  
Jezuina Koroveshi ◽  
Ana Ktona

Target tracking is a process that may find applications in different domains such as video surveillance, robot navigation and human computer interaction. In this work we have considered the problem of tracking a moving object in a multi agent environment. The environment is a rectangular space bounded by walls. The first agent is the target and it moves randomly in the space. The second agent should follow the target, keeping as close as possible without crashing with it. It uses sensors to detect the position of the target. The sensor readings give the distance and the angle from the target. We use reinforcement learning to train the tracker to detect any change in the movement of the target and stay within a certain range from it. Reinforcement learning is a form of machine learning in which the agent learns by interacting with the environment. By doing so, for each action taken, the agent receives a reward from the environment, which is used to determine positive or negative behaviour. The goal of the agent is to maximise the total reward received during the interaction. This form of machine learning has applications in different areas, such as: game solving with the most known game being AlphaGO; robotics, for design of hard-to engineer behaviours; traffic light control, personalized recommendations, etc. The sensor readings may have continuous values, making a very large state space. We approximate the value function using neural networks and use different reward functions for learning the best policy.


2020 ◽  
Vol 25 (6) ◽  
pp. 548-557
Author(s):  
A.V. Garashchenko ◽  
◽  
L.G. Gagarina ◽  
◽  

The verification of the cache memory hierarchy in modern SoC due to the large state space requires a huge number of complex tests. This becomes the main problem for functional verification. To cover the entire state space, a graph model of the cache memory hierarchy as well as the methods of generating the formation of the test sequences based on this model have been proposed. The graph model vertices are a set of states (tags, values, etc.) of each hierarchy level, and the edges are a set of transitions between states (instructions for reading, records). The graph model, describing all states of the cache-memory hierarchy states, has been developed. Each edge in the graph is a separate check sequence. In case of the non-deterministic situations, such as the choice of a channel (port) for multichannel cache memory, it will not be possible to resolve them at the level of the graph model, since the choice of the channel depends on many factors not considered within the model framework. It has been proposed to create a separate instance of a subgraph for each channel. The described approach has revealed, in verification of the multiport cache-memory hierarchy of the developed core with the new vector architecture VLIW DSP, a few architectural and functional errors. This approach can be used to test other processor cores and their blocks


2019 ◽  
Author(s):  
Julia A. Palacios ◽  
Amandine Véber ◽  
Lorenzo Cappello ◽  
Zhangyuan Wang ◽  
John Wakeley ◽  
...  

AbstractThe large state space of gene genealogies is a major hurdle for inference methods based on Kingman’s coalescent. Here, we present a new Bayesian approach for inferring past population sizes which relies on a lower resolution coalescent process we refer to as “Tajima’s coalescent”. Tajima’s coalescent has a drastically smaller state space, and hence it is a computationally more efficient model, than the standard Kingman coalescent. We provide a new algorithm for efficient and exact likelihood calculations for data without recombination, which exploits a directed acyclic graph and a correspondingly tailored Markov Chain Monte Carlo method. We compare the performance of our Bayesian Estimation of population size changes by Sampling Tajima’s Trees (BESTT) with a popular implementation of coalescent-based inference in BEAST using simulated data and human data. We empirically demonstrate that BESTT can accurately infer effective population sizes, and it further provides an efficient alternative to the Kingman’s coalescent. The algorithms described here are implemented in the R package phylodyn, which is available for download at https://github.com/JuliaPalacios/phylodyn.


2019 ◽  
Author(s):  
Michael A. Boemo ◽  
Luca Cardelli ◽  
Conrad A. Nieduszynski

AbstractBiological systems are made up of components that change their actions (and interactions) over time and coordinate with other components nearby. Together with a large state space, the complexity of this behaviour can make it difficult to create concise mathematical models that can be easily extended or modified. This paper introduces the Beacon Calculus, a process algebra designed to simplify the task of modelling interacting biological components. Its breadth is demonstrated by creating models of DNA replication dynamics, the gene expression dynamics in response to DNA methylation damage, and a multisite phosphorylation switch. The flexibility of these models is shown by adapting the DNA replication model to further include two topics of interest from the literature: cooperative origin firing and replication fork barriers. The Beacon Calculus is supported with the open-source simulator bcs (https://github.com/MBoemo/bcs.git) to allow users to develop and simulate their own models.Author summarySimulating a model of a biological system can suggest ideas for future experiments and help ensure that conclusions about a mechanism are consistent with data. The Beacon Calculus is a new language that makes modelling simple by allowing users to simulate a biological system in only a few lines of code. This simplicity is critical as it allows users the freedom to come up with new ideas and rapidly test them. Models written in the Beacon Calculus are also easy to modify and extend, allowing users to add new features to the model or incorporate it into a larger biological system. We demonstrate the breadth of applications in this paper by applying the Beacon Calculus to DNA replication and DNA damage repair, both of which have implications for genome stability and cancer. We also apply it to multisite phosphorylation, which is important for cellular signalling. To enable users to create their own models, we created the open-source Beacon Calculus simulator bcs (https://github.com/MBoemo/bcs.git) which is easy to install and is well-supported by documentation and examples.


2018 ◽  
Vol 45 (8) ◽  
pp. 690-702 ◽  
Author(s):  
Mohammad Aslani ◽  
Stefan Seipel ◽  
Marco Wiering

Traffic signal control can be naturally regarded as a reinforcement learning problem. Unfortunately, it is one of the most difficult classes of reinforcement learning problems owing to its large state space. A straightforward approach to address this challenge is to control traffic signals based on continuous reinforcement learning. Although they have been successful in traffic signal control, they may become unstable and fail to converge to near-optimal solutions. We develop adaptive traffic signal controllers based on continuous residual reinforcement learning (CRL-TSC) that is more stable. The effect of three feature functions is empirically investigated in a microscopic traffic simulation. Furthermore, the effects of departing streets, more actions, and the use of the spatial distribution of the vehicles on the performance of CRL-TSCs are assessed. The results show that the best setup of the CRL-TSC leads to saving average travel time by 15% in comparison to an optimized fixed-time controller.


Sign in / Sign up

Export Citation Format

Share Document