Learning to Few-Shot Learn Across Diverse Natural Language Classification Tasks

Abstract Objective Our objective is to develop algorithms for encoding clinical text into representations that can be used for a variety of phenotyping tasks. Materials and Methods Obtaining large datasets to take advantage of highly expressive deep learning methods is difficult in clinical natural language processing (NLP). We address this difficulty by pretraining a clinical text encoder on billing code data, which is typically available in abundance. We explore several neural encoder architectures and deploy the text representations obtained from these encoders in the context of clinical text classification tasks. While our ultimate goal is learning a universal clinical text encoder, we also experiment with training a phenotype-specific encoder. A universal encoder would be more practical, but a phenotype-specific encoder could perform better for a specific task. Results We successfully train several clinical text encoders, establish a new state-of-the-art on comorbidity data, and observe good performance gains on substance misuse data. Discussion We find that pretraining using billing codes is a promising research direction. The representations generated by this type of pretraining have universal properties, as they are highly beneficial for many phenotyping tasks. Phenotype-specific pretraining is a viable route for trading the generality of the pretrained encoder for better performance on a specific phenotyping task. Conclusions We successfully applied our approach to many phenotyping tasks. We conclude by discussing potential limitations of our approach.

Download Full-text

Counterfactual Generative Smoothing for Imbalanced Natural Language Classification

10.1145/3459637.3482077 ◽

2021 ◽

Author(s):

Hojae Han ◽

Seungtaek Choi ◽

Myeongho Jeong ◽

Jin-woo Park ◽

Seung-won Hwang

Keyword(s):

Natural Language ◽

Language Classification

Download Full-text

Language Classification and Recognition From Audio Using Deep Belief Network

Advances in Data Mining and Database Management - Challenges and Applications of Data Analytics in Social Perspectives ◽

10.4018/978-1-7998-2566-1.ch011 ◽

2021 ◽

pp. 189-213

Author(s):

Santhi Selvaraj ◽

Raja Sekar J. ◽

Amutha S.

Keyword(s):

Natural Language ◽

Language Processing ◽

Markov Models ◽

Learning Algorithm ◽

Back Propagation ◽

Gaussian Mixture ◽

Deep Belief Network ◽

Mel Frequency Cepstral Coefficients ◽

Belief Network ◽

Language Classification

The main objective is to recognize the chat from social media as spoken language by using deep belief network (DBN). Currently, language classification is one of the main applications of natural language processing, artificial intelligence, and deep learning. Language classification is the process of ascertaining the information being presented in which natural language and recognizing a language from the audio sound. Presently, most language recognition systems are based on hidden Markov models and Gaussian mixture models that support both acoustic and sequential modeling. This chapter presents a DBN-based recognition system in three different languages, namely English, Hindi, and Tamil. The evaluation of languages is performed on the self built recorded database, which extracts the mel-frequency cepstral coefficients features from the speeches. These features are fed into the DBN with a back propagation learning algorithm for the recognition process. Accuracy of the recognition is efficient for the chosen languages and the system performance is assessed on three different languages.

Download Full-text

Classifier Labels as Language Grounding for Explanations

10.29007/8h3z ◽

2018 ◽

Author(s):

Sai Prabhakar Pandi Selvaraj ◽

Manuela Veloso ◽

Stephanie Rosenthal

Keyword(s):

Natural Language ◽

Autonomous Robots ◽

State Of The Art ◽

Service Robot ◽

Decision Making Process ◽

Mobile Service ◽

New Techniques ◽

Novel Approach ◽

Classification Tasks ◽

Classification Prediction

Advances in state-of-the-art techniques including convolutional neural networks (CNNs) have led to improved perception in autonomous robots. However, these new techniques make a robot’s decision-making process obscure even for the experts. Our goal is to auto- matically generate natural language explanations of a robot’s perception-based inferences in order to help people understand what features contribute to these classification predic- tions. Generating natural language explanations is particularly challenging for perception and other high-dimension classification tasks because 1) we lack a mapping from features to language and 2) there are a large number of features which could be explained. We present a novel approach to generating explanations, which first finds the important features that most affect the classification prediction and then utilizes a secondary detector which can identify and label multiple parts of the features, to label only those important features. Those labels serve as the natural language groundings that we use in our explanations. We demonstrate our explanation algorithm’s ability on the floor identification classifier of our mobile service robot.

Download Full-text

NPCEditor: Creating Virtual Human Dialogue Using Information Retrieval Techniques

AI Magazine ◽

10.1609/aimag.v32i2.2347 ◽

2011 ◽

Vol 32 (2) ◽

pp. 42 ◽

Cited By ~ 35

Author(s):

Anton Leuski ◽

David Traum

Keyword(s):

Information Retrieval ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Virtual Human ◽

Virtual Humans ◽

Text Input ◽

Processing Component ◽

User Friendly ◽

Language Classification

NPCEditor is a system for building a natural language processing component for virtual humans capable of engaging a user in spoken dialog on a limited domain. It uses statistical language classiﬁcation technology for mapping from a user’s text input to system responses. NPCEditor provides a user-friendly editor for creating effective virtual humans quickly. It has been deployed as a part of various virtual human systems in several applications.

Download Full-text

Levenshtein Distances Fail to Identify Language Relationships Accurately

Computational Linguistics ◽

10.1162/coli_a_00073 ◽

2011 ◽

Vol 37 (4) ◽

pp. 689-698 ◽

Cited By ~ 14

Author(s):

Simon J. Greenhill

Keyword(s):

Comparative Method ◽

Poor Performance ◽

Levenshtein Distance ◽

Phylogenetic Distance ◽

Distance Metric ◽

Large Database ◽

Austronesian Languages ◽

Classification Tasks ◽

Recent Attention ◽

Language Classification

The Levenshtein distance is a simple distance metric derived from the number of edit operations needed to transform one string into another. This metric has received recent attention as a means of automatically classifying languages into genealogical subgroups. In this article I test the performance of the Levenshtein distance for classifying languages by subsampling three language subsets from a large database of Austronesian languages. Comparing the classification proposed by the Levenshtein distance to that of the comparative method shows that the Levenshtein classification is correct only 40% of time. Standardizing the orthography increases the performance, but only to a maximum of 65% accuracy within language subgroups. The accuracy of the Levenshtein classification decreases rapidly with phylogenetic distance, failing to discriminate homology and chance similarity across distantly related languages.This poor performance suggests the need for more linguistically nuanced methods for automated language classification tasks.

Download Full-text

Practical Language Processing for Virtual Humans

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v24i2.18806 ◽

2021 ◽

Vol 24 (2) ◽

pp. 1740-1747

Author(s):

Anton Leuski ◽

David Traum

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Virtual Human ◽

Virtual Humans ◽

Text Input ◽

Processing Component ◽

User Friendly ◽

Language Classification ◽

Human Systems

NPCEditor is a system for building a natural language processing component for virtual humans capable of engaging a user in spoken dialog on a limited domain. It uses a statistical language classification technology for mapping from user's text input to system responses. NPCEditor provides a user-friendly editor for creating effective virtual humans quickly. It has been deployed as a part of various virtual human systems in several applications.

Download Full-text