scholarly journals Independent Skill Transfer for Deep Reinforcement Learning

Author(s):  
Qiangxing Tian ◽  
Guanchu Wang ◽  
Jinxin Liu ◽  
Donglin Wang ◽  
Yachen Kang

Recently, diverse primitive skills have been learned by adopting the entropy as intrinsic reward, which further shows that new practical skills can be produced by combining a variety of primitive skills. This is essentially skill transfer, very useful for learning high-level skills but quite challenging due to the low efficiency of transferring primitive skills. In this paper, we propose a novel efficient skill transfer method, where we learn independent skills and only independent components of skills are transferred instead of the whole set of skills. More concretely, independent components of skills are obtained through independent component analysis (ICA), which always have a smaller amount (or lower dimension) compared with their mixtures. With a lower dimension, independent skill transfer (IST) exhibits a higher efficiency on learning a given task. Extensive experiments including three robotic tasks demonstrate the effectiveness and high efficiency of our proposed IST method in comparison to direct primitive-skill transfer and conventional reinforcement learning.

Author(s):  
Nicolas Bougie ◽  
Ryutaro Ichise

Deep reinforcement learning (DRL) methods traditionally struggle with tasks where environment rewards are sparse or delayed, which entails that exploration remains one of the key challenges of DRL. Instead of solely relying on extrinsic rewards, many state-of-the-art methods use intrinsic curiosity as exploration signal. While they hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our curiosity signal is driven by a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration strategies. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. Experimental results show that this high-level exploration enables our agents to outperform prior work in several Atari games.


Author(s):  
Khoo Zhi Yion ◽  
Ab Al-Hadi Ab Rahman

<p>This paper presents the design space exploration of the hardware-based inverse fixed-point integer transform for High Efficiency Video Coding (HEVC). The designs are specified at high-level using CAL dataflow language and automatically synthesized to HDL for FPGA implementation. Several parallel design alternatives are proposed with trade-off between performance and resource. The HEVC transform consists of several independent components from 4x4 to 32x32 discrete cosine transform and 4x4 discrete sine transform. This work explores the strategies to efficiently compute the transforms by applying data parallelism on the different components. Results show that an intermediate version of parallelism, whereby the 4x4 and 8x8 are merged together, and the 16x16 and 32x32 merged together gives the best trade-off between performance and resource. The results presented in this work also give an insight on how the HEVC transform can be designed efficiently in parallel for hardware implementation.</p>


Author(s):  
Shihong Song ◽  
Jiayi Weng ◽  
Hang Su ◽  
Dong Yan ◽  
Haosheng Zou ◽  
...  

Learning rational behaviors in First-person-shooter (FPS) games is a challenging task for Reinforcement Learning (RL) with the primary difficulties of huge action space and insufficient exploration. To address this, we propose a hierarchical agent based on combined options with intrinsic rewards to drive exploration. Specifically, we present a hierarchical model that works in a manager-worker fashion over two levels of hierarchy. The high-level manager learns a policy over options, and the low-level workers, motivated by intrinsic reward, learn to execute the options. Performance is further improved with environmental signals appropriately harnessed. Extensive experiments demonstrate that our trained bot significantly outperforms the alternative RL-based models on FPS games requiring maze solving and combat skills, etc. Notably, we achieved first place in VDAIC 2018 Track(1).


2020 ◽  
Vol 10 (10) ◽  
pp. 3356 ◽  
Author(s):  
Jose J. Valero-Mas ◽  
Francisco J. Castellanos

Within the Pattern Recognition field, two representations are generally considered for encoding the data: statistical codifications, which describe elements as feature vectors, and structural representations, which encode elements as high-level symbolic data structures such as strings, trees or graphs. While the vast majority of classifiers are capable of addressing statistical spaces, only some particular methods are suitable for structural representations. The kNN classifier constitutes one of the scarce examples of algorithms capable of tackling both statistical and structural spaces. This method is based on the computation of the dissimilarity between all the samples of the set, which is the main reason for its high versatility, but in turn, for its low efficiency as well. Prototype Generation is one of the possibilities for palliating this issue. These mechanisms generate a reduced version of the initial dataset by performing data transformation and aggregation processes on the initial collection. Nevertheless, these generation processes are quite dependent on the data representation considered, being not generally well defined for structural data. In this work we present the adaptation of the generation-based reduction algorithm Reduction through Homogeneous Clusters to the case of string data. This algorithm performs the reduction by partitioning the space into class-homogeneous clusters for then generating a representative prototype as the median value of each group. Thus, the main issue to tackle is the retrieval of the median element of a set of strings. Our comprehensive experimentation comparatively assesses the performance of this algorithm in both the statistical and the string-based spaces. Results prove the relevance of our approach by showing a competitive compromise between classification rate and data reduction.


2021 ◽  
Vol 11 (3) ◽  
pp. 1291
Author(s):  
Bonwoo Gu ◽  
Yunsick Sung

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go’s algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.


Genetics ◽  
1975 ◽  
Vol 80 (4) ◽  
pp. 667-678
Author(s):  
Mary Lee S Ledbetter ◽  
Rollin D Hotchkiss

ABSTRACT A sulfonamide-resistant mutant of pneumococcus, sulr-c, displays a genetic instability, regularly segregating to wild type. DNA extracts of derivatives of the strain possess transforming activities for both the mutant and wild-type alleles, establishing that the strain is a partial diploid. The linkage of sulr-c to strr-61, a stable chromosomal marker, was established, thus defining a chromosomal locus for sulr-c. DNA isolated from sulr-c cells transforms two mutant recipient strains at the same low efficiency as it does a wild-type recipient, although the mutant property of these strains makes them capable of integrating classical "low-efficiency" donor markers equally as efficiently as "high efficiency" markers. Hence sulr-c must have a different basis for its low efficiency than do classical low efficiency point mutations. We suggest that the DNA in the region of the sulr-c mutation has a structural abnormality which leads both to its frequent segregation during growth and its difficulty in efficiently mediating genetic transformation.


Materials ◽  
2021 ◽  
Vol 14 (4) ◽  
pp. 788
Author(s):  
Jinlin Mei ◽  
Aijun Duan ◽  
Xilong Wang

The traditional hydrothermal method to prepare zeolite will inevitably use a large amount of water as a solvent, which will lead to higher autogenous pressure, low efficiency, and wastewater pollution. The solvent-free method can be used to synthesize various types of zeolites by mechanical mixing, grinding, and heating of solid raw materials, which exhibits the apparent advantages of high yield, low pollution, and high efficiency. This review mainly introduces the development process of solvent-free synthesis, preparation of hierarchical zeolite, morphology control, synthesis mechanism and applications of solvent-free methods. It can be believed that solvent-free methods will become a research focus and have enormous industrial application potential.


2002 ◽  
Vol 70 (9) ◽  
pp. 4880-4891 ◽  
Author(s):  
Julia Eitel ◽  
Petra Dersch

ABSTRACT The YadA protein is a major adhesin of Yersinia pseudotuberculosis that promotes tight adhesion to mammalian cells by binding to extracellular matrix proteins. In this study, we first addressed the possibility of competitive interference of YadA and the major invasive factor invasin and found that expression of YadA in the presence of invasin affected neither the export nor the function of invasin in the outer membrane. Furthermore, expression of YadA promoted both bacterial adhesion and high-efficiency invasion entirely independently of invasin. Antibodies against fibronectin and β1 integrins blocked invasion, indicating that invasion occurs via extracellular-matrix-dependent bridging between YadA and the host cell β1 integrin receptors. Inhibitor studies also demonstrated that tyrosine and Ser/Thr kinases, as well as phosphatidylinositol 3-kinase, are involved in the uptake process. Further expression studies revealed that yadA is regulated in response to several environmental parameters, including temperature, ion and nutrient concentrations, and the bacterial growth phase. In complex medium, YadA production was generally repressed but could be induced by addition of Mg2+. Maximal expression of yadA was obtained in exponential-phase cells grown in minimal medium at 37°C, conditions under which the invasin gene is repressed. These results suggest that YadA of Y. pseudotuberculosis constitutes another independent high-level uptake pathway that might complement other cell entry mechanisms (e.g., invasin) at certain sites or stages during the infection process.


2021 ◽  
Vol 31 (3) ◽  
pp. 1-26
Author(s):  
Aravind Balakrishnan ◽  
Jaeyoung Lee ◽  
Ashish Gaurav ◽  
Krzysztof Czarnecki ◽  
Sean Sedwards

Reinforcement learning (RL) is an attractive way to implement high-level decision-making policies for autonomous driving, but learning directly from a real vehicle or a high-fidelity simulator is variously infeasible. We therefore consider the problem of transfer reinforcement learning and study how a policy learned in a simple environment using WiseMove can be transferred to our high-fidelity simulator, W ise M ove . WiseMove is a framework to study safety and other aspects of RL for autonomous driving. W ise M ove accurately reproduces the dynamics and software stack of our real vehicle. We find that the accurately modelled perception errors in W ise M ove contribute the most to the transfer problem. These errors, when even naively modelled in WiseMove , provide an RL policy that performs better in W ise M ove than a hand-crafted rule-based policy. Applying domain randomization to the environment in WiseMove yields an even better policy. The final RL policy reduces the failures due to perception errors from 10% to 2.75%. We also observe that the RL policy has significantly less reliance on velocity compared to the rule-based policy, having learned that its measurement is unreliable.


Sign in / Sign up

Export Citation Format

Share Document