Jointly Improving Parsing and Perception for Natural Language Commands through Human-Robot Dialog

In this work, we present methods for using human-robot dialog to improve language understanding for a mobile robot agent. The agent parses natural language to underlying semantic meanings and uses robotic sensors to create multi-modal models of perceptual concepts like red and heavy. The agent can be used for showing navigation routes, delivering objects to people, and relocating objects from one location to another. We use dialog clari_cation questions both to understand commands and to generate additional parsing training data. The agent employs opportunistic active learning to select questions about how words relate to objects, improving its understanding of perceptual concepts. We evaluated this agent on Amazon Mechanical Turk. After training on data induced from conversations, the agent reduced the number of dialog questions it asked while receiving higher usability ratings. Additionally, we demonstrated the agent on a robotic platform, where it learned new perceptual concepts on the y while completing a real-world task.

Download Full-text

Pre-Trained Transformer-Based Language Models for Sundanese

10.21203/rs.3.rs-907893/v1 ◽

2021 ◽

Author(s):

Wilson Wongso ◽

Henry Lucky ◽

Derwin Suhartono

Keyword(s):

Natural Language ◽

Text Classification ◽

Training Data ◽

Language Models ◽

Classification Task ◽

Language Understanding ◽

Training Corpus ◽

Low Resource ◽

Corpus Size ◽

Fine Tune

Abstract The Sundanese language has over 32 million speakers worldwide, but the language has reaped little to no benefits from the recent advances in natural language understanding. Like other low-resource languages, the only alternative is to fine-tune existing multilingual models. In this paper, we pre-trained three monolingual Transformer-based language models on Sundanese data. When evaluated on a downstream text classification task, we found that most of our monolingual models outperformed larger multilingual models despite the smaller overall pre-training data. In the subsequent analyses, our models benefited strongly from the Sundanese pre-training corpus size and do not exhibit socially biased behavior. We released our models for other researchers and practitioners to use.

Download Full-text

A Multitask Active Learning Framework for Natural Language Understanding

10.18653/v1/2020.coling-main.430 ◽

2020 ◽

Author(s):

Hua Zhu ◽

Wu Ye ◽

Sihan Luo ◽

Xidong Zhang

Keyword(s):

Active Learning ◽

Natural Language ◽

Natural Language Understanding ◽

Language Understanding ◽

Learning Framework

Download Full-text

Features Selection for Entity Resolution in Prostitution on Twitter

International Journal of Advances in Data and Information Systems ◽

10.25008/ijadis.v2i1.1214 ◽

2021 ◽

Vol 2 (1) ◽

pp. 53-61

Author(s):

Reisa Permatasari ◽

Nur Aini Rakhmawati

Keyword(s):

Logistic Regression ◽

Active Learning ◽

Real World ◽

Entity Resolution ◽

Training Data ◽

Features Selection ◽

Selection For ◽

Maximum Similarity

Entity resolution is the process of determining whether two references to real-world objects refer to the same or different purposes. This study applies entity resolution on Twitter prostitution dataset based on features with the Regularized Logistic Regression training and determination of Active Learning on Dedupe and based on graphs using Neo4j and Node2Vec. This study found that maximum similarity is 1 when the number of features (personal, location and bio specifications) is complete. The minimum similarity is 0.025662627 when the amount of harmful training data. The most influencing similarity feature is the cellphone number with the lowest starting range from 0.997678459 to 0.999993523. The parameter - length of walk per source has the effect of achieving the best similarity accuracy reaching 71.4% (prediction 14 and yield 10).

Download Full-text

Natural language processing for structuring clinical text data on depression using UK-CRIS

Evidence-Based Mental Health ◽

10.1136/ebmental-2019-300134 ◽

2020 ◽

Vol 23 (1) ◽

pp. 21-26 ◽

Cited By ~ 6

Author(s):

Nemanja Vaci ◽

Qiang Liu ◽

Andrey Kormilitzin ◽

Franco De Crescenzo ◽

Ayse Kurtulmus ◽

...

Keyword(s):

Natural Language Processing ◽

Active Learning ◽

Electronic Health Records ◽

Natural Language ◽

Language Processing ◽

Real World ◽

Statistical Models ◽

Health Records ◽

Clinical Text ◽

Electronic Health

BackgroundUtilisation of routinely collected electronic health records from secondary care offers unprecedented possibilities for medical science research but can also present difficulties. One key issue is that medical information is presented as free-form text and, therefore, requires time commitment from clinicians to manually extract salient information. Natural language processing (NLP) methods can be used to automatically extract clinically relevant information.ObjectiveOur aim is to use natural language processing (NLP) to capture real-world data on individuals with depression from the Clinical Record Interactive Search (CRIS) clinical text to foster the use of electronic healthcare data in mental health research.MethodsWe used a combination of methods to extract salient information from electronic health records. First, clinical experts define the information of interest and subsequently build the training and testing corpora for statistical models. Second, we built and fine-tuned the statistical models using active learning procedures.FindingsResults show a high degree of accuracy in the extraction of drug-related information. Contrastingly, a much lower degree of accuracy is demonstrated in relation to auxiliary variables. In combination with state-of-the-art active learning paradigms, the performance of the model increases considerably.ConclusionsThis study illustrates the feasibility of using the natural language processing models and proposes a research pipeline to be used for accurately extracting information from electronic health records.Clinical implicationsReal-world, individual patient data are an invaluable source of information, which can be used to better personalise treatment.

Download Full-text

Asian Candidates in America

Political Research Quarterly ◽

10.1177/1065912916674273 ◽

2016 ◽

Vol 70 (1) ◽

pp. 68-81 ◽

Cited By ~ 11

Author(s):

Neil Visalvanich

Keyword(s):

African American ◽

Asian Americans ◽

Asian American ◽

Real World ◽

Congressional Elections ◽

Racial Stereotypes ◽

Mechanical Turk ◽

Amazon Mechanical Turk ◽

Foreign Born ◽

Better Than

Racial stereotyping has been found to handicap African American and Latino candidates in negative ways. It is less clear how racial stereotypes may change the fortunes of Asian candidates. This paper explores the candidacies of Asian Americans with an experiment run through Amazon Mechanical Turk as well as real-world evaluations of Asian American candidates using the Cooperative Congressional Elections Study. In my experiments, I find that Asian candidates do significantly better than white candidates across different biographical scenarios (conservative, liberal, and foreign). I find that, contrary to expectations, Asian candidates are not significantly disadvantaged from being immigrant and foreign born. My experimental results mirror my observational results, which show that Asian Democrats are significantly advantaged even when compared with whites. These results indicate that Asian candidates in America face a set of racial-political stereotypes that are unique to their racial subgroup.

Download Full-text

Use of Crowdsourcing to Assess the Ecological Validity of Perceptual-Training Paradigms in Dysarthria

American Journal of Speech-Language Pathology ◽

10.1044/2015_ajslp-15-0059 ◽

2016 ◽

Vol 25 (2) ◽

pp. 233-239 ◽

Cited By ~ 19

Author(s):

Kaitlin L. Lansford ◽

Stephanie A. Borrie ◽

Lukas Bystricky

Keyword(s):

Treatment Approach ◽

Ecological Validity ◽

Training Data ◽

Mechanical Turk ◽

Targeted Treatment ◽

Amazon Mechanical Turk ◽

Perceptual Training ◽

Main Effect ◽

Computer Based ◽

Dysarthric Speech

Purpose It has been documented in laboratory settings that familiarizing listeners with dysarthric speech improves intelligibility of that speech. If these findings can be replicated in real-world settings, the ability to improve communicative function by focusing on communication partners has major implications for extending clinical practice in dysarthria rehabilitation. An important step toward development of a listener-targeted treatment approach requires establishment of its ecological validity. To this end, the present study leveraged the mechanism of crowdsourcing to determine whether perceptual-training benefits achieved by listeners in the laboratory could be elicited in an at-home computer-based scenario. Method Perceptual-training data (i.e., intelligibility scores from a posttraining transcription task) were collected from listeners in 2 settings—the laboratory and the crowdsourcing website Amazon Mechanical Turk. Results Consistent with previous findings, results revealed a main effect of training condition (training vs. control) on intelligibility scores. There was, however, no effect of training setting (Mechanical Turk vs. laboratory). Thus, the perceptual benefit achieved via Mechanical Turk was comparable to that achieved in the laboratory. Conclusion This study provides evidence regarding the ecological validity of perceptual-training paradigms designed to improve intelligibility of dysarthric speech, thereby supporting their continued advancement as a listener-targeted treatment option.

Download Full-text