scholarly journals Behavioral Cloning from Observation

Author(s):  
Faraz Torabi ◽  
Garrett Warnell ◽  
Peter Stone

Humans often learn how to perform tasks via imitation: they observe others perform a task, and then very quickly infer the appropriate actions to take based on their observations. While extending this paradigm to autonomous agents is a well-studied problem in general, there are two particular aspects that have largely been overlooked: (1) that the learning is done from observation only (i.e., without explicit action information), and (2) that the learning is typically done very quickly. In this work, we propose a two-phase, autonomous imitation learning technique called behavioral cloning from observation (BCO), that aims to provide improved performance with respect to both of these aspects. First, we allow the agent to acquire experience in a self-supervised fashion. This experience is used to develop a model which is then utilized to learn a particular task by observing an expert perform that task without the knowledge of the specific actions taken. We experimentally compare BCO to imitation learning methods, including the state-of-the-art, generative adversarial imitation learning (GAIL) technique, and we show comparable task performance in several different simulation domains while exhibiting increased learning speed after expert trajectories become available.

1997 ◽  
Vol 31 (3-4) ◽  
pp. 157-166 ◽  
Author(s):  
Z. Wanfang ◽  
H. S. Wheater ◽  
P. M. Johnston

2018 ◽  
Author(s):  
John-William Sidhom ◽  
Drew Pardoll ◽  
Alexander Baras

AbstractMotivationThe immune system has potential to present a wide variety of peptides to itself as a means of surveillance for pathogenic invaders. This means of surveillances allows the immune system to detect peptides derives from bacterial, viral, and even oncologic sources. However, given the breadth of the epitope repertoire, in order to study immune responses to these epitopes, investigators have relied on in-silico prediction algorithms to help narrow down the list of candidate epitopes, and current methods still have much in the way of improvement.ResultsWe present Allele-Integrated MHC (AI-MHC), a deep learning architecture with improved performance over the current state-of-the-art algorithms in human Class I and Class II MHC binding prediction. Our architecture utilizes a convolutional neural network that improves prediction accuracy by 1) allowing one neural network to be trained on all peptides for all alleles of a given class of MHC molecules by making the allele an input to the net and 2) introducing a global max pooling operation with an optimized kernel size that allows the architecture to achieve translational invariance in MHC-peptide binding analysis, making it suitable for sequence analytics where a frame of interest needs to be learned in a longer, variable length sequence. We assess AI-MHC against internal independent test sets and compare against all algorithms in the IEDB automated server benchmarks, demonstrating our algorithm achieves state-of-the-art for both Class I and Class II prediction.Availability and ImplementationAI-MHC can be used via web interface at baras.pathology.jhu.edu/[email protected]


Author(s):  
Ali Fakhry

The applications of Deep Q-Networks are seen throughout the field of reinforcement learning, a large subsect of machine learning. Using a classic environment from OpenAI, CarRacing-v0, a 2D car racing environment, alongside a custom based modification of the environment, a DQN, Deep Q-Network, was created to solve both the classic and custom environments. The environments are tested using custom made CNN architectures and applying transfer learning from Resnet18. While DQNs were state of the art years ago, using it for CarRacing-v0 appears somewhat unappealing and not as effective as other reinforcement learning techniques. Overall, while the model did train and the agent learned various parts of the environment, attempting to reach the reward threshold for the environment with this reinforcement learning technique seems problematic and difficult as other techniques would be more useful.


Author(s):  
Cong Fei ◽  
Bin Wang ◽  
Yuzheng Zhuang ◽  
Zongzhang Zhang ◽  
Jianye Hao ◽  
...  

Generative adversarial imitation learning (GAIL) has shown promising results by taking advantage of generative adversarial nets, especially in the field of robot learning. However, the requirement of isolated single modal demonstrations limits the scalability of the approach to real world scenarios such as autonomous vehicles' demand for a proper understanding of human drivers' behavior. In this paper, we propose a novel multi-modal GAIL framework, named Triple-GAIL, that is able to learn skill selection and imitation jointly from both expert demonstrations and continuously generated experiences with data augmentation purpose by introducing an auxiliary selector. We provide theoretical guarantees on the convergence to optima for both of the generator and the selector respectively. Experiments on real driver trajectories and real-time strategy game datasets demonstrate that Triple-GAIL can better fit multi-modal behaviors close to the demonstrators and outperforms state-of-the-art methods.


2017 ◽  
Vol 108 (1) ◽  
pp. 307-318 ◽  
Author(s):  
Eleftherios Avramidis

AbstractA deeper analysis on Comparative Quality Estimation is presented by extending the state-of-the-art methods with adequacy and grammatical features from other Quality Estimation tasks. The previously used linear method, unable to cope with the augmented features, is replaced with a boosting classifier assisted by feature selection. The methods indicated show improved performance for 6 language pairs, when applied on the output from MT systems developed over 7 years. The improved models compete better with reference-aware metrics.Notable conclusions are reached through the examination of the contribution of the features in the models, whereas it is possible to identify common MT errors that are captured by the features. Many grammatical/fluency features have a good contribution, few adequacy features have some contribution, whereas source complexity features are of no use. The importance of many fluency and adequacy features is language-specific.


2018 ◽  
Vol 37 (13-14) ◽  
pp. 1632-1672 ◽  
Author(s):  
Sanjiban Choudhury ◽  
Mohak Bhardwaj ◽  
Sankalp Arora ◽  
Ashish Kapoor ◽  
Gireeja Ranade ◽  
...  

Robot planning is the process of selecting a sequence of actions that optimize for a task=specific objective. For instance, the objective for a navigation task would be to find collision-free paths, whereas the objective for an exploration task would be to map unknown areas. The optimal solutions to such tasks are heavily influenced by the implicit structure in the environment, i.e. the configuration of objects in the world. State-of-the-art planning approaches, however, do not exploit this structure, thereby expending valuable effort searching the action space instead of focusing on potentially good actions. In this paper, we address the problem of enabling planners to adapt their search strategies by inferring such good actions in an efficient manner using only the information uncovered by the search up until that time. We formulate this as a problem of sequential decision making under uncertainty where at a given iteration a planning policy must map the state of the search to a planning action. Unfortunately, the training process for such partial-information-based policies is slow to converge and susceptible to poor local minima. Our key insight is that if we could fully observe the underlying world map, we would easily be able to disambiguate between good and bad actions. We hence present a novel data-driven imitation learning framework to efficiently train planning policies by imitating a clairvoyant oracle: an oracle that at train time has full knowledge about the world map and can compute optimal decisions. We leverage the fact that for planning problems, such oracles can be efficiently computed and derive performance guarantees for the learnt policy. We examine two important domains that rely on partial-information-based policies: informative path planning and search-based motion planning. We validate the approach on a spectrum of environments for both problem domains, including experiments on a real UAV, and show that the learnt policy consistently outperforms state-of-the-art algorithms. Our framework is able to train policies that achieve up to [Formula: see text] more reward than state-of-the art information-gathering heuristics and a [Formula: see text] speedup as compared with A* on search-based planning problems. Our approach paves the way forward for applying data-driven techniques to other such problem domains under the umbrella of robot planning.


Author(s):  
Vaishali S. Tidake ◽  
Shirish S. Sane

Usage of feature similarity is expected when the nearest neighbors are to be explored. Examples in multi-label datasets are associated with multiple labels. Hence, the use of label dissimilarity accompanied by feature similarity may reveal better neighbors. Information extracted from such neighbors is explored by devised MLFLD and MLFLD-MAXP algorithms. Among three distance metrics used for computation of label dissimilarity, Hamming distance has shown the most improved performance and hence used for further evaluation. The performance of implemented algorithms is compared with the state-of-the-art MLkNN algorithm. They showed an improvement for some datasets only. This chapter introduces parameters MLE and skew. MLE, skew, along with outlier parameter help to analyze multi-label and imbalanced nature of datasets. Investigation of datasets for various parameters and experimentation explored the need for data preprocessing for removing outliers. It revealed an improvement in the performance of implemented algorithms for all measures, and effectiveness is empirically validated.


Water ◽  
2020 ◽  
Vol 12 (4) ◽  
pp. 1002 ◽  
Author(s):  
Xuan Khoa Bui ◽  
Malvin S. Marlim ◽  
Doosun Kang

A water distribution network (WDN) is an indispensable element of civil infrastructure that provides fresh water for domestic use, industrial development, and fire-fighting. However, in a large and complex network, operation and management (O&M) can be challenging. As a technical initiative to improve O&M efficiency, the paradigm of “divide and conquer” can divide an original WDN into multiple subnetworks. Each subnetwork is controlled by boundary pipes installed with gate valves or flow meters that control the water volume entering and leaving what are known as district metered areas (DMAs). Many approaches to creating DMAs are formulated as two-phase procedures, clustering and sectorizing, and are called water network partitioning (WNP) in general. To assess the benefits and drawbacks of DMAs in a WDN, we provide a comprehensive review of various state-of-the-art approaches, which can be broadly classified as: (1) Clustering algorithms, which focus on defining the optimal configuration of DMAs; and (2) sectorization procedures, which physically decompose the network by selecting pipes for installing flow meters or gate valves. We also provide an overview of emerging problems that need to be studied.


Author(s):  
Harald Ewolds ◽  
Laura Broeker ◽  
Rita F. de Oliveira ◽  
Markus Raab ◽  
Stefan Künzell

Abstract This study examined the effect of instructions and feedback on the integration of two tasks. Task-integration of covarying tasks are thought to help dual-task performance. With complete task integration of covarying dual tasks, a dual task becomes more like a single task and dual-task costs should be reduced as it is no longer conceptualized as a dual task. In the current study we tried to manipulate the extent to which tasks are integrated. We covaried a tracking task with an auditory go/no-go task and tried to manipulate the extent of task-integration by using two different sets of instructions and feedback. A group receiving task-integration promoting instructions and feedback (N = 18) and a group receiving task-separation instructions and feedback (N = 20) trained on a continuous tracking task. The tracking task covaried with the auditory go/no-go reaction time task because high-pitch sounds always occurred 250 ms before turns, which has been demonstrated to foster task integration. The tracking task further contained a repeating segment to investigate implicit learning. Results showed that instructions, feedback, or participants’ conceptualization of performing a single task versus a dual task did not significantly affect task integration. However, the covariation manipulation improved performance in both the tracking and the go/no-go task, exceeding performance in non-covarying and single tasks. We concluded that task integration between covarying motor tasks is a robust phenomenon that is not influenced by instructions or feedback.


Sign in / Sign up

Export Citation Format

Share Document