scholarly journals Ensemble Consensus-based Representation Deep Reinforcement Learning for Hybrid FSO/RF Communication Systems

Author(s):  
Shagufta Henna

<div>Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF’s consensus learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.</div>

2021 ◽  
Author(s):  
Shagufta Henna

<div>Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF’s consensus learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.</div>


2021 ◽  
Vol 11 (1) ◽  
pp. 104-113
Author(s):  
Walead Kaled Seaman ◽  
Sırma Yavuz

Compared with traditional motion planners and deep reinforcement learning DRL has been applied more and more widely to achieving sequential behaviors control of movement robots in internal environment. There are two addressed issues of deep learning. The inability to generalize to achieve set of goals. The data inefficiency, that is, the model requires, many trial and error loops (often costly). Applied can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging. In this paper, we address these two issues and apply the proposed model to visual navigation in conformity with generalizing in conformity with obtaining new goals (target-driven). To tackle the first issue, we advise an actor-critic mannequin whose coverage is a feature of the intention as much properly namely the present day state, which approves higher generalization. To tackle the second issue, we advocate the 3D scenes in environment indoor simulation is AI2-THOR framework, who provides a surrounding including tremendous with high-quality 3D scenes and a physics engine. Our framework allows agents according to receive actions and have interaction with objects. Hence, we are able to accumulate an enormous number of training samples successfully with sequential decision making based totally on the RL framework. Particularly, Healthcare and medicine stand to benefit immensely from deep learning because of the sheer volume of data being generated we used the behavioral cloning approach, who enables the active agent to storeroom an expert (or mentor) policy except for the utilization of reward function stability or generalizes across targets.


2020 ◽  
Vol 34 (10) ◽  
pp. 13949-13950
Author(s):  
Wang Qisheng ◽  
Wang Qichao ◽  
Li Xiao

Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.


2009 ◽  
Vol 129 (4) ◽  
pp. 363-367
Author(s):  
Tomoyuki Maeda ◽  
Makishi Nakayama ◽  
Hiroshi Narazaki ◽  
Akira Kitamura

Author(s):  
Michail Yu. Maslov ◽  
Yuri M. Spodobaev

Telecommunications industry evolution shows the highest rates of transition to high-tech systems and is accompanied by a trend of deep mutual penetration of technologies - convergence. The dominant telecommunication technologies have become wireless communication systems. The widespread use of modern wireless technologies has led to the saturation of the environment with technological electromagnetic fields and the actualization of the problems of protecting the population from them. This fundamental restructuring has led to a uniform dense placement of radiating fragments of network technologies in the mudflow areas. The changed parameters of the emitted fields became the reason for the revision of the regulatory and methodological support of electromagnetic safety. A fragmented structural, functional and parametric analysis of the problem of protecting the population from the technological fields of network technologies revealed uncertainty in the interpretation of real situations, vulnerability, weakness and groundlessness of the methodological basis of sanitary-hygienic approaches. It is shown that this applies to all stages of the electromagnetic examination of the emitting fragments of network technologies. Distrust arises on the part of specialists and the population in not only the system of sanitary-hygienic control, but also the safety of modern network technologies is being called into question. Growing social tensions and radio phobia are everywhere accompanying the development of wireless communication technologies. The basis for solving almost all problems of protecting the population can be the transfer of subjective methods and means of monitoring and sanitary-hygienic control of electromagnetic fields into the field of IT.


Author(s):  
Ivan Herreros

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.


2021 ◽  
Author(s):  
Stav Belogolovsky ◽  
Philip Korsunsky ◽  
Shie Mannor ◽  
Chen Tessler ◽  
Tom Zahavy

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.


2021 ◽  
Author(s):  
Amarildo Likmeta ◽  
Alberto Maria Metelli ◽  
Giorgia Ramponi ◽  
Andrea Tirinzoni ◽  
Matteo Giuliani ◽  
...  

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.


Minerals ◽  
2021 ◽  
Vol 11 (6) ◽  
pp. 587
Author(s):  
Joao Pedro de Carvalho ◽  
Roussos Dimitrakopoulos

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.


Nanophotonics ◽  
2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Wei Shi ◽  
Ye Tian ◽  
Antoine Gervais

AbstractThe tremendous growth of data traffic has spurred a rapid evolution of optical communications for a higher data transmission capacity. Next-generation fiber-optic communication systems will require dramatically increased complexity that cannot be obtained using discrete components. In this context, silicon photonics is quickly maturing. Capable of manipulating electrons and photons on the same platform, this disruptive technology promises to cram more complexity on a single chip, leading to orders-of-magnitude reduction of integrated photonic systems in size, energy, and cost. This paper provides a system perspective and reviews recent progress in silicon photonics probing all dimensions of light to scale the capacity of fiber-optic networks toward terabits-per-second per optical interface and petabits-per-second per transmission link. Firstly, we overview fundamentals and the evolving trends of silicon photonic fabrication process. Then, we focus on recent progress in silicon coherent optical transceivers. Further scaling the system capacity requires multiplexing techniques in all the dimensions of light: wavelength, polarization, and space, for which we have seen impressive demonstrations of on-chip functionalities such as polarization diversity circuits and wavelength- and space-division multiplexers. Despite these advances, large-scale silicon photonic integrated circuits incorporating a variety of active and passive functionalities still face considerable challenges, many of which will eventually be addressed as the technology continues evolving with the entire ecosystem at a fast pace.


Sign in / Sign up

Export Citation Format

Share Document