scholarly journals Feature Reinforcement Learning: Part II. Structured MDPs

2021 ◽  
Vol 12 (1) ◽  
pp. 71-86
Author(s):  
Marcus Hutter

Abstract The Feature Markov Decision Processes ( MDPs) model developed in Part I (Hutter, 2009b) is well-suited for learning agents in general environments. Nevertheless, unstructured (Φ)MDPs are limited to relatively simple environments. Structured MDPs like Dynamic Bayesian Networks (DBNs) are used for large-scale real-world problems. In this article I extend ΦMDP to ΦDBN. The primary contribution is to derive a cost criterion that allows to automatically extract the most relevant features from the environment, leading to the “best” DBN representation. I discuss all building blocks required for a complete general learning algorithm, and compare the novel ΦDBN model to the prevalent POMDP approach.

2010 ◽  
Vol 44-47 ◽  
pp. 3611-3615 ◽  
Author(s):  
Zhi Cong Zhang ◽  
Kai Shun Hu ◽  
Hui Yu Huang ◽  
Shuai Li ◽  
Shao Yong Zhao

Reinforcement learning (RL) is a state or action value based machine learning method which approximately solves large-scale Markov Decision Process (MDP) or Semi-Markov Decision Process (SMDP). A multi-step RL algorithm called Sarsa(,k) is proposed, which is a compromised variation of Sarsa and Sarsa(). It is equivalent to Sarsa if k is 1 and is equivalent to Sarsa() if k is infinite. Sarsa(,k) adjust its performance by setting k value. Two forms of Sarsa(,k), forward view Sarsa(,k) and backward view Sarsa(,k), are constructed and proved equivalent in off-line updating.


Author(s):  
Du Zhang ◽  
Meiliu Lu

One of the long-term research goals in machine learning is how to build never-ending learners. The state-of-the-practice in the field of machine learning thus far is still dominated by the one-time learner paradigm: some learning algorithm is utilized on data sets to produce certain model or target function, and then the learner is put away and the model or function is put to work. Such a learn-once-apply-next (or LOAN) approach may not be adequate in dealing with many real world problems and is in sharp contrast with the human’s lifelong learning process. On the other hand, learning can often be brought on through overcoming some inconsistent circumstances. This paper proposes a framework for perpetual learning agents that are capable of continuously refining or augmenting their knowledge through overcoming inconsistencies encountered during their problem-solving episodes. The never-ending nature of a perpetual learning agent is embodied in the framework as the agent’s continuous inconsistency-induced belief revision process. The framework hinges on the agents recognizing inconsistency in data, information, knowledge, or meta-knowledge, identifying the cause of inconsistency, revising or augmenting beliefs to explain, resolve, or accommodate inconsistency. The authors believe that inconsistency can serve as one of the important learning stimuli toward building perpetual learning agents that incrementally improve their performance over time.


2021 ◽  
Vol 2021 ◽  
pp. 1-19
Author(s):  
Ning Wang ◽  
Jiahui Guo

The fusion of electricity, automation, and sharing is forming a new Autonomous Mobility-on-Demand (AMoD) system in current urban transportation, in which the Shared Autonomous Electric Vehicles (SAEVs) are a fleet to execute delivery, parking, recharging, and repositioning tasks automatically. To model the decision-making process of AMoD system and optimize multiaction dynamic dispatching of SAEVs over a long horizon, the dispatching problem of SAEVs is modeled according to Markov Decision Process (MDP) at first. Then two optimization models from short-sighted view and farsighted view based on combinatorial optimization theory are built, respectively. The former focuses on the instant and single-step reward, while the latter aims at the accumulative and multistep return. After that, the Kuhn–Munkres algorithm is set as the baseline method to solve the first model to achieve optimal multiaction allocation instructions for SAEVs, and the combination of deep Q-learning algorithm and Kuhn–Munkres algorithm is designed to solve the second model to realize the global optimization. Finally, a toy example, a macrosimulation of 1 month, and a microsimulation of 6 hours based on actual historical operation data are conducted. Results show that (1) the Kuhn–Munkres algorithm ensures the computational effectiveness in the large-scale real-time application of the AMoD system; (2) the second optimization model considering long-term return can decrease average user waiting time and achieve a 2.78% increase in total revenue compared with the first model; (3) and integrating combinatorial optimization theory with reinforcement learning theory is a perfect package for solving the multiaction dynamic dispatching problem of SAEVs.


2019 ◽  
Vol 34 ◽  
Author(s):  
Alper Demіr ◽  
Erkіn Çіlden ◽  
Faruk Polat

Abstract In the reinforcement learning context, a landmark is a compact information which uniquely couples a state, for problems with hidden states. Landmarks are shown to support finding good memoryless policies for Partially Observable Markov Decision Processes (POMDP) which contain at least one landmark. SarsaLandmark, as an adaptation of Sarsa(λ), is known to promise a better learning performance with the assumption that all landmarks of the problem are known in advance. In this paper, we propose a framework built upon SarsaLandmark, which is able to automatically identify landmarks within the problem during learning without sacrificing quality, and requiring no prior information about the problem structure. For this purpose, the framework fuses SarsaLandmark with a well-known multiple-instance learning algorithm, namely Diverse Density (DD). By further experimentation, we also provide a deeper insight into our concept filtering heuristic to accelerate DD, abbreviated as DDCF (Diverse Density with Concept Filtering), which proves itself to be suitable for POMDPs with landmarks. DDCF outperforms its antecedent in terms of computation speed and solution quality without loss of generality. The methods are empirically shown to be effective via extensive experimentation on a number of known and newly introduced problems with hidden state, and the results are discussed.


2019 ◽  
Author(s):  
Mingguang Chen ◽  
Wangxiang Li ◽  
Anshuman Kumar ◽  
Guanghui Li ◽  
Mikhail Itkis ◽  
...  

<p>Interconnecting the surfaces of nanomaterials without compromising their outstanding mechanical, thermal, and electronic properties is critical in the design of advanced bulk structures that still preserve the novel properties of their nanoscale constituents. As such, bridging the p-conjugated carbon surfaces of single-walled carbon nanotubes (SWNTs) has special implications in next-generation electronics. This study presents a rational path towards improvement of the electrical transport in aligned semiconducting SWNT films by deposition of metal atoms. The formation of conducting Cr-mediated pathways between the parallel SWNTs increases the transverse (intertube) conductance, while having negligible effect on the parallel (intratube) transport. In contrast, doping with Li has a predominant effect on the intratube electrical transport of aligned SWNT films. Large-scale first-principles calculations of electrical transport on aligned SWNTs show good agreement with the experimental electrical measurements and provide insight into the changes that different metal atoms exert on the density of states near the Fermi level of the SWNTs and the formation of transport channels. </p>


IoT ◽  
2021 ◽  
Vol 2 (1) ◽  
pp. 140-162
Author(s):  
Hung Nguyen-An ◽  
Thomas Silverston ◽  
Taku Yamazaki ◽  
Takumi Miyoshi

We now use the Internet of things (IoT) in our everyday lives. The novel IoT devices collect cyber–physical data and provide information on the environment. Hence, IoT traffic will count for a major part of Internet traffic; however, its impact on the network is still widely unknown. IoT devices are prone to cyberattacks because of constrained resources or misconfigurations. It is essential to characterize IoT traffic and identify each device to monitor the IoT network and discriminate among legitimate and anomalous IoT traffic. In this study, we deployed a smart-home testbed comprising several IoT devices to study IoT traffic. We performed extensive measurement experiments using a novel IoT traffic generator tool called IoTTGen. This tool can generate traffic from multiple devices, emulating large-scale scenarios with different devices under different network conditions. We analyzed the IoT traffic properties by computing the entropy value of traffic parameters and visually observing the traffic on behavior shape graphs. We propose a new method for identifying traffic entropy-based devices, computing the entropy values of traffic features. The method relies on machine learning to classify the traffic. The proposed method succeeded in identifying devices with a performance accuracy up to 94% and is robust with unpredictable network behavior with traffic anomalies spreading in the network.


Healthcare ◽  
2021 ◽  
Vol 9 (2) ◽  
pp. 126
Author(s):  
Hai-Feng Ling ◽  
Zheng-Lian Su ◽  
Xun-Lin Jiang ◽  
Yu-Jun Zheng

In a large-scale epidemic, such as the novel coronavirus pneumonia (COVID-19), there is huge demand for a variety of medical supplies, such as medical masks, ventilators, and sickbeds. Resources from civilian medical services are often not sufficient for fully satisfying all of these demands. Resources from military medical services, which are normally reserved for military use, can be an effective supplement to these demands. In this paper, we formulate a problem of integrated civilian-military scheduling of medical supplies for epidemic prevention and control, the aim of which is to simultaneously maximize the overall satisfaction rate of the medical supplies and minimize the total scheduling cost, while keeping a minimum ratio of medical supplies reservation for military use. We propose a multi-objective water wave optimization (WWO) algorithm in order to efficiently solve this problem. Computational results on a set of problem instances constructed based on real COVID-19 data demonstrate the effectiveness of the proposed method.


Author(s):  
Anna Lavecchia ◽  
Matteo Chiara ◽  
Caterina De Virgilio ◽  
Caterina Manzari ◽  
Carlo Pazzani ◽  
...  

Abstract Staphylococcus cohnii (SC), a coagulase-negative bacterium, was first isolated in 1975 from human skin. Early phenotypic analyses led to the delineation of two subspecies (subsp.), Staphylococcus cohnii subsp. cohnii (SCC) and Staphylococcus cohnii subsp. urealyticus (SCU). SCC was considered to be specific to humans whereas SCU apparently demonstrated a wider host range, from lower primates to humans. The type strains ATCC 29974 and ATCC 49330 have been designated for SCC and SCU, respectively. Comparative analysis of 66 complete genome sequences—including a novel SC isolate—revealed unexpected patterns within the SC complex, both in terms of genomic sequence identity and gene content, highlighting the presence of 3 phylogenetically distinct groups. Based on our observations, and on the current guidelines for taxonomic classification for bacterial species, we propose a revision of the SC species complex. We suggest that SCC and SCU should be regarded as two distinct species: SC and SU (Staphylococcus urealyticus), and that two distinct subspecies, SCC and SCB (SC subsp. barensis, represented by the novel strain isolated in Bari) should be recognized within SC. Furthermore, since large scale comparative genomics studies recurrently suggest inconsistencies or conflicts in taxonomic assignments of bacterial species, we believe that the approach proposed here might be considered for more general application.


2021 ◽  
Vol 40 (5) ◽  
pp. 10043-10061
Author(s):  
Xiaoping Shi ◽  
Shiqi Zou ◽  
Shenmin Song ◽  
Rui Guo

 The asset-based weapon target assignment (ABWTA) problem is one of the important branches of the weapon target assignment (WTA) problem. Due to the current large-scale battlefield environment, the ABWTA problem is a multi-objective optimization problem (MOP) with strong constraints, large-scale and sparse properties. The novel model of the ABWTA problem with the operation error parameter is established. An evolutionary algorithm for large-scale sparse problems (SparseEA) is introduced as the main framework for solving large-scale sparse ABWTA problem. The proposed framework (SparseEA-ABWTA) mainly addresses the issue that problem-specific initialization method and genetic operators with a reward strategy can generate solutions efficiently considering the sparsity of variables and an improved non-dominated solution selection method is presented to handle the constraints. Under the premise of constructing large-scale cases by the specific case generator, two numerical experiments on four outstanding multi-objective evolutionary algorithms (MOEAs) show Runtime of SparseEA-ABWTA is faster nearly 50% than others under the same convergence and the gap between MOEAs improved by the mechanism of SparseEA-ABWTA and SparseEA-ABWTA is reduced to nearly 20% in the convergence and distribution.


Sign in / Sign up

Export Citation Format

Share Document