Concept2Robot: Learning manipulation concepts from instructions and human demonstrations

2021 ◽  
pp. 027836492110462
Author(s):  
Lin Shao ◽  
Toki Migimatsu ◽  
Qiang Zhang ◽  
Karen Yang ◽  
Jeannette Bohg

We aim to endow a robot with the ability to learn manipulation concepts that link natural language instructions to motor skills. Our goal is to learn a single multi-task policy that takes as input a natural language instruction and an image of the initial scene and outputs a robot motion trajectory to achieve the specified task. This policy has to generalize over different instructions and environments. Our insight is that we can approach this problem through learning from demonstration by leveraging large-scale video datasets of humans performing manipulation actions. Thereby, we avoid more time-consuming processes such as teleoperation or kinesthetic teaching. We also avoid having to manually design task-specific rewards. We propose a two-stage learning process where we first learn single-task policies through reinforcement learning. The reward is provided by scoring how well the robot visually appears to perform the task. This score is given by a video-based action classifier trained on a large-scale human activity dataset. In the second stage, we train a multi-task policy through imitation learning to imitate all the single-task policies. In extensive simulation experiments, we show that the multi-task policy learns to perform a large percentage of the 78 different manipulation tasks on which it was trained. The tasks are of greater variety and complexity than previously considered robot manipulation tasks. We show that the policy generalizes over variations of the environment. We also show examples of successful generalization over novel but similar instructions.

Author(s):  
Pauline Jacobson

This chapter examines the currently fashionable notion of ‘experimental semantics’, and argues that most work in natural language semantics has always been experimental. The oft-cited dichotomy between ‘theoretical’ (or ‘armchair’) and ‘experimental’ is bogus and should be dropped form the discourse. The same holds for dichotomies like ‘intuition-based’ (or ‘thought experiments’) vs. ‘empirical’ work (and ‘real experiments’). The so-called new ‘empirical’ methods are often nothing more than collecting the large-scale ‘intuitions’ or, doing multiple thought experiments. Of course the use of multiple subjects could well allow for a better experiment than the more traditional single or few subject methodologies. But whether or not this is the case depends entirely on the question at hand. In fact, the chapter considers several multiple-subject studies and shows that the particular methodology in those cases does not necessarily provide important insights, and the chapter argues that some its claimed benefits are incorrect.


2020 ◽  
Vol 60 (4) ◽  
pp. 612-622
Author(s):  
Rosina Lozano

The twenty-first century has seen a surge in scholarship on Latino educational history and a new nonbinary umbrella term, Latinx, that a younger generation prefers. Many of historian Victoria-María MacDonald's astute observations in 2001 presaged the growth of the field. Focus has increased on Spanish-surnamed teachers and discussions have grown about the Latino experience in higher education, especially around student activism on campus. Great strides are being made in studying the history of Spanish-speaking regions with long ties to the United States, either as colonies or as sites of large-scale immigration, including Puerto Rico, Cuba, and the Philippines. Historical inquiry into the place of Latinos in the US educational system has also developed in ways that MacDonald did not anticipate. The growth of the comparative race and ethnicity field in and of itself has encouraged cross-ethnic and cross-racial studies, which often also tie together larger themes of colonialism, language instruction, legal cases, and civil rights or activism.


Sensors ◽  
2021 ◽  
Vol 21 (3) ◽  
pp. 1012
Author(s):  
Jisu Hwang ◽  
Incheol Kim

Due to the development of computer vision and natural language processing technologies in recent years, there has been a growing interest in multimodal intelligent tasks that require the ability to concurrently understand various forms of input data such as images and text. Vision-and-language navigation (VLN) require the alignment and grounding of multimodal input data to enable real-time perception of the task status on panoramic images and natural language instruction. This study proposes a novel deep neural network model (JMEBS), with joint multimodal embedding and backtracking search for VLN tasks. The proposed JMEBS model uses a transformer-based joint multimodal embedding module. JMEBS uses both multimodal context and temporal context. It also employs backtracking-enabled greedy local search (BGLS), a novel algorithm with a backtracking feature designed to improve the task success rate and optimize the navigation path, based on the local and global scores related to candidate actions. A novel global scoring method is also used for performance improvement by comparing the partial trajectories searched thus far with a plurality of natural language instructions. The performance of the proposed model on various operations was then experimentally demonstrated and compared with other models using the Matterport3D Simulator and room-to-room (R2R) benchmark datasets.


Author(s):  
Lu Chen ◽  
Handing Wang ◽  
Wenping Ma

AbstractReal-world optimization applications in complex systems always contain multiple factors to be optimized, which can be formulated as multi-objective optimization problems. These problems have been solved by many evolutionary algorithms like MOEA/D, NSGA-III, and KnEA. However, when the numbers of decision variables and objectives increase, the computation costs of those mentioned algorithms will be unaffordable. To reduce such high computation cost on large-scale many-objective optimization problems, we proposed a two-stage framework. The first stage of the proposed algorithm combines with a multi-tasking optimization strategy and a bi-directional search strategy, where the original problem is reformulated as a multi-tasking optimization problem in the decision space to enhance the convergence. To improve the diversity, in the second stage, the proposed algorithm applies multi-tasking optimization to a number of sub-problems based on reference points in the objective space. In this paper, to show the effectiveness of the proposed algorithm, we test the algorithm on the DTLZ and LSMOP problems and compare it with existing algorithms, and it outperforms other compared algorithms in most cases and shows disadvantage on both convergence and diversity.


Author(s):  
Siva Reddy ◽  
Mirella Lapata ◽  
Mark Steedman

In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the Free917 and WebQuestions benchmark datasets show our semantic parser improves over the state of the art.


2001 ◽  
Vol 16 (5) ◽  
pp. 38-45 ◽  
Author(s):  
S. Lauria ◽  
G. Bugmann ◽  
T. Kyriacou ◽  
J. Bos ◽  
A. Klein

2021 ◽  
Author(s):  
Xinxu Shen ◽  
Troy Houser ◽  
David Victor Smith ◽  
Vishnu P. Murty

The use of naturalistic stimuli, such as narrative movies, is gaining popularity in many fields, characterizing memory, affect, and decision-making. Narrative recall paradigms are often used to capture the complexity and richness of memory for naturalistic events. However, scoring narrative recalls is time-consuming and prone to human biases. Here, we show the validity and reliability of using a natural language processing tool, the Universal Sentence Encoder (USE), to automatically score narrative recall. We compared the reliability in scoring made between two independent raters (i.e., hand-scored) and between our automated algorithm and individual raters (i.e., automated) on trial-unique, video clips of magic tricks. Study 1 showed that our automated segmentation approaches yielded high reliability and reflected measures yielded by hand-scoring, and further that the results using USE outperformed another popular natural language processing tool, GloVe. In study two, we tested whether our automated approach remained valid when testing individual’s varying on clinically-relevant dimensions that influence episodic memory, age and anxiety. We found that our automated approach was equally reliable across both age groups and anxiety groups, which shows the efficacy of our approach to assess narrative recall in large-scale individual difference analysis. In sum, these findings suggested that machine learning approaches implementing USE are a promising tool for scoring large-scale narrative recalls and perform individual difference analysis for research using naturalistic stimuli.


Author(s):  
Rui Qiu ◽  
Yongtu Liang

Abstract Currently, unmanned aerial vehicle (UAV) provides the possibility of comprehensive coverage and multi-dimensional visualization of pipeline monitoring. Encouraged by industry policy, research on UAV path planning in pipeline network inspection has emerged. The difficulties of this issue lie in strict operational requirements, variable flight missions, as well as unified optimization for UAV deployment and real-time path planning. Meanwhile, the intricate structure and large scale of the pipeline network further complicate this issue. At present, there is still room to improve the practicality and applicability of the mathematical model and solution strategy. Aiming at this problem, this paper proposes a novel two-stage optimization approach for UAV path planning in pipeline network inspection. The first stage is conventional pre-flight planning, where the requirement for optimality is higher than calculation time. Therefore, a mixed integer linear programming (MILP) model is established and solved by the commercial solver to obtain the optimal UAV number, take-off location and detailed flight path. The second stage is re-planning during the flight, taking into account frequent pipeline accidents (e.g. leaks and cracks). In this stage, the flight path must be timely rescheduled to identify specific hazardous locations. Thus, the requirement for calculation time is higher than optimality and the genetic algorithm is used for solution to satisfy the timeliness of decision-making. Finally, the proposed method is applied to the UAV inspection of a branched oil and gas transmission pipeline network with 36 nodes and the results are analyzed in detail in terms of computational performance. In the first stage, compared to manpower inspection, the total cost and time of UAV inspection is decreased by 54% and 56% respectively. In the second stage, it takes less than 1 minute to obtain a suboptimal solution, verifying the applicability and superiority of the method.


Sign in / Sign up

Export Citation Format

Share Document