Trajectory Optimization for Autonomous Flying Base Station via Reinforcement Learning

Author(s):  
Harald Bayerlein ◽  
Paul De Kerret ◽  
David Gesbert
2021 ◽  
Vol 10 (1) ◽  
pp. 21
Author(s):  
Omar Nassef ◽  
Toktam Mahmoodi ◽  
Foivos Michelinakis ◽  
Kashif Mahmood ◽  
Ahmed Elmokashfi

This paper presents a data driven framework for performance optimisation of Narrow-Band IoT user equipment. The proposed framework is an edge micro-service that suggests one-time configurations to user equipment communicating with a base station. Suggested configurations are delivered from a Configuration Advocate, to improve energy consumption, delay, throughput or a combination of those metrics, depending on the user-end device and the application. Reinforcement learning utilising gradient descent and genetic algorithm is adopted synchronously with machine and deep learning algorithms to predict the environmental states and suggest an optimal configuration. The results highlight the adaptability of the Deep Neural Network in the prediction of intermediary environmental states, additionally the results present superior performance of the genetic reinforcement learning algorithm regarding its performance optimisation.


Author(s):  
Akindele Segun Afolabi ◽  
Shehu Ahmed ◽  
Olubunmi Adewale Akinola

<span lang="EN-US">Due to the increased demand for scarce wireless bandwidth, it has become insufficient to serve the network user equipment using macrocell base stations only. Network densification through the addition of low power nodes (picocell) to conventional high power nodes addresses the bandwidth dearth issue, but unfortunately introduces unwanted interference into the network which causes a reduction in throughput. This paper developed a reinforcement learning model that assisted in coordinating interference in a heterogeneous network comprising macro-cell and pico-cell base stations. The learning mechanism was derived based on Q-learning, which consisted of agent, state, action, and reward. The base station was modeled as the agent, while the state represented the condition of the user equipment in terms of Signal to Interference Plus Noise Ratio. The action was represented by the transmission power level and the reward was given in terms of throughput. Simulation results showed that the proposed Q-learning scheme improved the performances of average user equipment throughput in the network. In particular, </span><span lang="EN-US">multi-agent systems with a normal learning rate increased the throughput of associated user equipment by a whooping 212.5% compared to a macrocell-only scheme.</span>


Aerospace ◽  
2021 ◽  
Vol 8 (10) ◽  
pp. 299
Author(s):  
Bin Yang ◽  
Pengxuan Liu ◽  
Jinglang Feng ◽  
Shuang Li

This paper presents a novel and robust two-stage pursuit strategy for the incomplete-information impulsive space pursuit-evasion missions considering the J2 perturbation. The strategy firstly models the impulsive pursuit-evasion game problem into a far-distance rendezvous stage and a close-distance game stage according to the perception range of the evader. For the far-distance rendezvous stage, it is transformed into a rendezvous trajectory optimization problem and a new objective function is proposed to obtain the pursuit trajectory with the optimal terminal pursuit capability. For the close-distance game stage, a closed-loop pursuit approach is proposed using one of the reinforcement learning algorithms, i.e., the deep deterministic policy gradient algorithm, to solve and update the pursuit trajectory for the incomplete-information impulsive pursuit-evasion missions. The feasibility of this novel strategy and its robustness to different initial states of the pursuer and evader and to the evasion strategies are demonstrated for the sun-synchronous orbit pursuit-evasion game scenarios. The results of the Monte Carlo tests show that the successful pursuit ratio of the proposed method is over 91% for all the given scenarios.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Fitsum Debebe Tilahun ◽  
Chung G. Kang

Enhanced licensed-assisted access (eLAA) is an operational mode that allows the use of unlicensed band to support long-term evolution (LTE) service via carrier aggregation technology. The extension of additional bandwidth is beneficial to meet the demands of the growing mobile traffic. In the uplink eLAA, which is prone to unexpected interference from WiFi access points, resource scheduling by the base station, and then performing a listen before talk (LBT) mechanism by the users can seriously affect the resource utilization. In this paper, we present a decentralized deep reinforcement learning (DRL)-based approach in which each user independently learns dynamic band selection strategy that maximizes its own rate. Through extensive simulations, we show that the proposed DRL-based band selection scheme improves resource utilization while supporting certain minimum quality of service (QoS).


IEEE Access ◽  
2020 ◽  
Vol 8 ◽  
pp. 67625-67634
Author(s):  
Yingqian Huang ◽  
Miao Cui ◽  
Guangchi Zhang ◽  
Wei Chen

2014 ◽  
Vol 10 (2) ◽  
pp. 173-196 ◽  
Author(s):  
M. Louta ◽  
P. Sarigiannidis ◽  
S. Misra ◽  
P. Nicopolitidis ◽  
G. Papadimitriou

WiMAX (Worldwide Interoperability for Microwave Access) constitutes a candidate networking technology towards the 4G vision realization. By adopting the Orthogonal Frequency Division Multiple Access (OFDMA) technique, the latest IEEE 802.16x amendments manage to provide QoS-aware access services with full mobility support. A number of interesting scheduling and mapping schemes have been proposed in research literature. However, they neglect a considerable asset of the OFDMA-based wireless systems: the dynamic adjustment of the downlink-to-uplink width ratio. In order to fully exploit the supported mobile WiMAX features, we design, develop, and evaluate a rigorous adaptive model, which inherits its main aspects from the reinforcement learning field. The model proposed endeavours to efficiently determine the downlink-to-uplinkwidth ratio, on a frame-by-frame basis, taking into account both the downlink and uplink traffic in the Base Station (BS). Extensive evaluation results indicate that the model proposed succeeds in providing quite accurate estimations, keeping the average error rate below 15% with respect to the optimal sub-frame configurations. Additionally, it presents improved performance compared to other learning methods (e.g., learning automata) and notable improvements compared to static schemes that maintain a fixed predefined ratio in terms of service ratio and resource utilization.


Sign in / Sign up

Export Citation Format

Share Document