Ensemble Consensus-based Representation Deep Reinforcement Learning for Hybrid FSO/RF Communication Systems

Mapping Intimacies ◽

10.36227/techrxiv.15109434.v1 ◽

2021 ◽

Author(s):

Shagufta Henna

Keyword(s):

Reinforcement Learning ◽

Communication Systems ◽

Representation Learning ◽

System Capacity ◽

Rf System ◽

State Action ◽

Rf Systems ◽

Reward Function ◽

And Control ◽

Learned Features

<div>Hybrid FSO/RF system requires an efficient FSO and RF link switching mechanism to improve the system capacity by realizing the complementary benefits of both the links. The dynamics of network conditions, such as fog, dust, and sand storms compound the link switching problem and control complexity. To address this problem, we initiate the study of deep reinforcement learning (DRL) for link switching of hybrid FSO/RF systems. Specifically, in this work, we focus on actor-critic called Actor/Critic-FSO/RF and Deep-Q network (DQN) called DQN-FSO/RF for FSO/RF link switching under atmospheric turbulences. To formulate the problem, we define the state, action, and reward function of a hybrid FSO/RF system. DQN-FSO/RF frequently updates the deployed policy that interacts with the environment in a hybrid FSO/RF system, resulting in high switching costs. To overcome this, we lift this problem to ensemble consensus-based representation learning for deep reinforcement called DQNEnsemble-FSO/RF. The proposed novel DQNEnsemble-FSO/RF DRL approach uses consensus learned features representations based on an ensemble of asynchronous threads to update the deployed policy. Experimental results corroborate that the proposed DQNEnsemble-FSO/RF’s consensus learned features switching achieves better performance than Actor/Critic-FSO/RF, DQN-FSO/RF, and MyOpic for FSO/RF link switching while keeping the switching cost significantly low.</div>

Download Full-text

Ensemble Consensus-based Representation Deep Reinforcement Learning for Hybrid FSO/RF Communication Systems

10.36227/techrxiv.15109434 ◽

2021 ◽

Author(s):

Shagufta Henna

Keyword(s):

Reinforcement Learning ◽

Communication Systems ◽

Representation Learning ◽

System Capacity ◽

Rf System ◽

State Action ◽

Rf Systems ◽

Reward Function ◽

And Control ◽

Learned Features

Download Full-text

Image Visual Sensor Used in Health-Care Navigation in Indoor Scenes Using Deep Reinforcement Learning (DRL) and Control Sensor Robot for Patients Data Health Information

Journal of Medical Imaging and Health Informatics ◽

10.1166/jmihi.2021.3283 ◽

2021 ◽

Vol 11 (1) ◽

pp. 104-113

Author(s):

Walead Kaled Seaman ◽

Sırma Yavuz

Keyword(s):

Deep Learning ◽

Reinforcement Learning ◽

Active Agent ◽

Visual Navigation ◽

Sequential Decision ◽

Visual Sensor ◽

Reward Function ◽

Training Samples ◽

Indoor Scenes ◽

And Control

Compared with traditional motion planners and deep reinforcement learning DRL has been applied more and more widely to achieving sequential behaviors control of movement robots in internal environment. There are two addressed issues of deep learning. The inability to generalize to achieve set of goals. The data inefficiency, that is, the model requires, many trial and error loops (often costly). Applied can impact a few key areas of medicine and explore how to build end-to-end systems. Our discussion of computer vision focuses largely on medical imaging. In this paper, we address these two issues and apply the proposed model to visual navigation in conformity with generalizing in conformity with obtaining new goals (target-driven). To tackle the first issue, we advise an actor-critic mannequin whose coverage is a feature of the intention as much properly namely the present day state, which approves higher generalization. To tackle the second issue, we advocate the 3D scenes in environment indoor simulation is AI2-THOR framework, who provides a surrounding including tremendous with high-quality 3D scenes and a physics engine. Our framework allows agents according to receive actions and have interaction with objects. Hence, we are able to accumulate an enormous number of training samples successfully with sequential decision making based totally on the RL framework. Particularly, Healthcare and medicine stand to benefit immensely from deep learning because of the sheer volume of data being generated we used the behavioral cloning approach, who enables the active agent to storeroom an expert (or mentor) policy except for the utilization of reward function stability or generalizes across targets.

Download Full-text

Optimal Exploration Algorithm of Multi-Agent Reinforcement Learning Methods (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7247 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13949-13950

Author(s):

Wang Qisheng ◽

Wang Qichao ◽

Li Xiao

Keyword(s):

Reinforcement Learning ◽

Supervised Learning ◽

Experimental Results ◽

State Action ◽

Reward Function ◽

Current State ◽

Learning Speed ◽

Communication Method ◽

Experience Replay ◽

Multi Agent

Exploration efficiency challenges for multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the interaction among agents. Less informative reward also restricts the learning speed of MARL in comparison with the informative label in supervised learning. This paper proposes a novel communication method which helps agents focus on different exploration subarea to guide MARL to accelerate exploration. We propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to help agents better take advantage of the different knowledge learned by different agents. Experimental results demonstrate that the proposed algorithm outperforms existing methods in cooperative multi-agent environments.

Download Full-text

Modeling and Control for Plant Dynamics Based on Reinforcement Learning

IEEJ Transactions on Industry Applications ◽

10.1541/ieejias.129.363 ◽

2009 ◽

Vol 129 (4) ◽

pp. 363-367

Author(s):

Tomoyuki Maeda ◽

Makishi Nakayama ◽

Hiroshi Narazaki ◽

Akira Kitamura

Keyword(s):

Reinforcement Learning ◽

Modeling And Control ◽

Plant Dynamics ◽

And Control

Download Full-text

Convergence in norming and control of modern wireless technologies electromagnetic fields

Russian Journal of Occupational Health and Industrial Ecology ◽

10.31089/1026-9428-2020-60-9-610-613 ◽

2020 ◽

Vol 60 (9) ◽

pp. 610-613

Author(s):

Michail Yu. Maslov ◽

Yuri M. Spodobaev

Keyword(s):

Wireless Communication ◽

Electromagnetic Fields ◽

Communication Systems ◽

Communication Technologies ◽

Telecommunications Industry ◽

High Tech ◽

Wireless Technologies ◽

Almost All ◽

Network Technologies ◽

And Control

Telecommunications industry evolution shows the highest rates of transition to high-tech systems and is accompanied by a trend of deep mutual penetration of technologies - convergence. The dominant telecommunication technologies have become wireless communication systems. The widespread use of modern wireless technologies has led to the saturation of the environment with technological electromagnetic fields and the actualization of the problems of protecting the population from them. This fundamental restructuring has led to a uniform dense placement of radiating fragments of network technologies in the mudflow areas. The changed parameters of the emitted fields became the reason for the revision of the regulatory and methodological support of electromagnetic safety. A fragmented structural, functional and parametric analysis of the problem of protecting the population from the technological fields of network technologies revealed uncertainty in the interpretation of real situations, vulnerability, weakness and groundlessness of the methodological basis of sanitary-hygienic approaches. It is shown that this applies to all stages of the electromagnetic examination of the emitting fragments of network technologies. Distrust arises on the part of specialists and the population in not only the system of sanitary-hygienic control, but also the safety of modern network technologies is being called into question. Growing social tensions and radio phobia are everywhere accompanying the development of wireless communication technologies. The basis for solving almost all problems of protecting the population can be the transfer of subjective methods and means of monitoring and sanitary-hygienic control of electromagnetic fields into the field of IT.

Download Full-text

Learning and control

10.1093/oso/9780199674923.003.0026 ◽

2018 ◽

Author(s):

Ivan Herreros

Keyword(s):

Machine Learning ◽

Reinforcement Learning ◽

Brain Function ◽

Control Strategies ◽

Learning Problems ◽

Animal Learning ◽

Feed Forward Control ◽

Machine Learning Applications ◽

And Control

This chapter discusses basic concepts from control theory and machine learning to facilitate a formal understanding of animal learning and motor control. It first distinguishes between feedback and feed-forward control strategies, and later introduces the classification of machine learning applications into supervised, unsupervised, and reinforcement learning problems. Next, it links these concepts with their counterparts in the domain of the psychology of animal learning, highlighting the analogies between supervised learning and classical conditioning, reinforcement learning and operant conditioning, and between unsupervised and perceptual learning. Additionally, it interprets innate and acquired actions from the standpoint of feedback vs anticipatory and adaptive control. Finally, it argues how this framework of translating knowledge between formal and biological disciplines can serve us to not only structure and advance our understanding of brain function but also enrich engineering solutions at the level of robot learning and control with insights coming from biology.

Download Full-text

Inverse reinforcement learning in contextual MDPs

Machine Learning ◽

10.1007/s10994-021-05984-x ◽

2021 ◽

Author(s):

Stav Belogolovsky ◽

Philip Korsunsky ◽

Shie Mannor ◽

Chen Tessler ◽

Tom Zahavy

Keyword(s):

Reinforcement Learning ◽

Optimization Problem ◽

Decision Processes ◽

Inverse Reinforcement Learning ◽

Convex Optimization Problem ◽

Reward Function ◽

Dynamic Treatment Regime ◽

Markov Decision ◽

Dynamic Treatment ◽

Recorded Data

AbstractWe consider the task of Inverse Reinforcement Learning in Contextual Markov Decision Processes (MDPs). In this setting, contexts, which define the reward and transition kernel, are sampled from a distribution. In addition, although the reward is a function of the context, it is not provided to the agent. Instead, the agent observes demonstrations from an optimal policy. The goal is to learn the reward mapping, such that the agent will act optimally even when encountering previously unseen contexts, also known as zero-shot transfer. We formulate this problem as a non-differential convex optimization problem and propose a novel algorithm to compute its subgradients. Based on this scheme, we analyze several methods both theoretically, where we compare the sample complexity and scalability, and empirically. Most importantly, we show both theoretically and empirically that our algorithms perform zero-shot transfer (generalize to new and unseen contexts). Specifically, we present empirical experiments in a dynamic treatment regime, where the goal is to learn a reward function which explains the behavior of expert physicians based on recorded data of them treating patients diagnosed with sepsis.

Download Full-text

Dealing with multiple experts and non-stationarity in inverse reinforcement learning: an application to real-life problems

Machine Learning ◽

10.1007/s10994-020-05939-8 ◽

2021 ◽

Author(s):

Amarildo Likmeta ◽

Alberto Maria Metelli ◽

Giorgia Ramponi ◽

Andrea Tirinzoni ◽

Matteo Giuliani ◽

...

Keyword(s):

Reinforcement Learning ◽

Real World ◽

Real Life ◽

User Preferences ◽

Inverse Reinforcement Learning ◽

Water Release ◽

Reward Function ◽

Model Free ◽

Conflicting Objectives ◽

Multiple Experts

AbstractIn real-world applications, inferring the intentions of expert agents (e.g., human operators) can be fundamental to understand how possibly conflicting objectives are managed, helping to interpret the demonstrated behavior. In this paper, we discuss how inverse reinforcement learning (IRL) can be employed to retrieve the reward function implicitly optimized by expert agents acting in real applications. Scaling IRL to real-world cases has proved challenging as typically only a fixed dataset of demonstrations is available and further interactions with the environment are not allowed. For this reason, we resort to a class of truly batch model-free IRL algorithms and we present three application scenarios: (1) the high-level decision-making problem in the highway driving scenario, and (2) inferring the user preferences in a social network (Twitter), and (3) the management of the water release in the Como Lake. For each of these scenarios, we provide formalization, experiments and a discussion to interpret the obtained results.

Download Full-text

Integrating Production Planning with Truck-Dispatching Decisions through Reinforcement Learning While Managing Uncertainty

Minerals ◽

10.3390/min11060587 ◽

2021 ◽

Vol 11 (6) ◽

pp. 587

Author(s):

Joao Pedro de Carvalho ◽

Roussos Dimitrakopoulos

Keyword(s):

Reinforcement Learning ◽

Discrete Event ◽

Mining Operations ◽

Fixed Sequence ◽

Q Learning ◽

Reward Function ◽

Copper Gold ◽

Mining Complex ◽

Learning Reinforcement ◽

Operational Plan

This paper presents a new truck dispatching policy approach that is adaptive given different mining complex configurations in order to deliver supply material extracted by the shovels to the processors. The method aims to improve adherence to the operational plan and fleet utilization in a mining complex context. Several sources of operational uncertainty arising from the loading, hauling and dumping activities can influence the dispatching strategy. Given a fixed sequence of extraction of the mining blocks provided by the short-term plan, a discrete event simulator model emulates the interaction arising from these mining operations. The continuous repetition of this simulator and a reward function, associating a score value to each dispatching decision, generate sample experiences to train a deep Q-learning reinforcement learning model. The model learns from past dispatching experience, such that when a new task is required, a well-informed decision can be quickly taken. The approach is tested at a copper–gold mining complex, characterized by uncertainties in equipment performance and geological attributes, and the results show improvements in terms of production targets, metal production, and fleet management.

Download Full-text

Scaling capacity of fiber-optic transmission systems via silicon photonics

Nanophotonics ◽

10.1515/nanoph-2020-0309 ◽

2020 ◽

Vol 0 (0) ◽

Author(s):

Wei Shi ◽

Ye Tian ◽

Antoine Gervais

Keyword(s):

Silicon Photonics ◽

Communication Systems ◽

Large Scale ◽

Recent Progress ◽

Fiber Optic ◽

Rapid Evolution ◽

System Capacity ◽

Disruptive Technology ◽

System Perspective ◽

Silicon Photonic

AbstractThe tremendous growth of data traffic has spurred a rapid evolution of optical communications for a higher data transmission capacity. Next-generation fiber-optic communication systems will require dramatically increased complexity that cannot be obtained using discrete components. In this context, silicon photonics is quickly maturing. Capable of manipulating electrons and photons on the same platform, this disruptive technology promises to cram more complexity on a single chip, leading to orders-of-magnitude reduction of integrated photonic systems in size, energy, and cost. This paper provides a system perspective and reviews recent progress in silicon photonics probing all dimensions of light to scale the capacity of fiber-optic networks toward terabits-per-second per optical interface and petabits-per-second per transmission link. Firstly, we overview fundamentals and the evolving trends of silicon photonic fabrication process. Then, we focus on recent progress in silicon coherent optical transceivers. Further scaling the system capacity requires multiplexing techniques in all the dimensions of light: wavelength, polarization, and space, for which we have seen impressive demonstrations of on-chip functionalities such as polarization diversity circuits and wavelength- and space-division multiplexers. Despite these advances, large-scale silicon photonic integrated circuits incorporating a variety of active and passive functionalities still face considerable challenges, many of which will eventually be addressed as the technology continues evolving with the entire ecosystem at a fast pace.

Download Full-text