Computational Linguistics

On Learning Interpreted Languages with Recurrent Models

Computational Linguistics ◽

10.1162/coli_a_00431 ◽

2022 ◽

pp. 1-13

Author(s):

Denis Paperno

Keyword(s):

Natural Language ◽

Data Processing ◽

Syntactic Structure ◽

Neural Nets ◽

Training Data ◽

Sequential Data ◽

Extensive Training ◽

Formal Syntax ◽

Compositional Interpretation

Abstract Can recurrent neural nets, inspired by human sequential data processing, learn to understand language? We construct simplified datasets reflecting core properties of natural language as modeled in formal syntax and semantics: recursive syntactic structure and compositionality. We find LSTM and GRU networks to generalise to compositional interpretation well, but only in the most favorable learning settings, with a well-paced curriculum, extensive training data, and left-to-right (but not right-to-left) composition.

Download Full-text

Deep Learning for Text Style Transfer: A Survey

Computational Linguistics ◽

10.1162/coli_a_00426 ◽

2021 ◽

pp. 1-51

Author(s):

Di Jin ◽

Zhijing Jin ◽

Zhiting Hu ◽

Olga Vechtomova ◽

Rada Mihalcea

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Neural Models ◽

Style Transfer ◽

Language Generation ◽

Parallel Data ◽

The Rich ◽

Significant Attention

Abstract Text style transfer is an important task in natural language generation, which aims to control certain attributes in the generated text, such as politeness, emotion, humor, and many others. It has a long history in the field of natural language processing, and recently has re-gained significant attention thanks to the promising performance brought by deep neural models. In this paper, we present a systematic survey of the research on neural text style transfer, spanning over 100 representative articles since the first neural text style transfer work in 2017. We discuss the task formulation, existing datasets and subtasks, evaluation, as well as the rich methodologies in the presence of parallel and non-parallel data. We also provide discussions on a variety of important topics regarding the future development of this task.

Download Full-text

Linguistic Parameters of Spontaneous Speech for identifying Mild Cognitive Impairment and Alzheimer’s Disease

Computational Linguistics ◽

10.1162/coli_a_00428 ◽

2021 ◽

pp. 1-34

Author(s):

Veronika Vincze ◽

Martina Katalin Szabó ◽

Ildikó Hoffmann ◽

László Tóth ◽

Magdolna Pákáski ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Cognitive Impairment ◽

Mild Cognitive Impairment ◽

Spontaneous Speech ◽

Healthy Controls ◽

Demographic Information ◽

Linguistic Features ◽

Significance Level ◽

Speech Transcripts

Abstract In this paper, we seek to automatically identify Hungarian patients suffering from mild cognitive impairment (MCI) or mild Alzheimer’s Disease (mAD) based on their speech transcripts, focusing only on linguistic features. In addition to the features examined in our earlier study, we introduce syntactic, semantic and pragmatic features of spontaneous speech that might affect the detection of dementia. In order to ascertain the most useful features for distinguishing healthy controls, MCI patients and mAD patients, we will carry out a statistical analysis of the data and investigate the significance level of the extracted features among various speaker group pairs and for various speaking tasks. In the second part of the paper, we use this rich feature set as a basis for an effective discrimination among the three speaker groups. In our machine learning experiments, we will analyze the efficacy of each feature group separately. Our model which uses all the features achieves competitive scores, either with or without demographic information (3-class accuracy values: 68–70%, 2-class accuracy values: 77.3–80%). We also analyze how different data recording scenarios affect linguistic features and how they can be productively used when distinguishing MCI patients from healthy controls.

Download Full-text

Novelty Detection: A Perspective from Natural Language Processing

Computational Linguistics ◽

10.1162/coli_a_00429 ◽

2021 ◽

pp. 1-42

Author(s):

Tirthankar Ghosal ◽

Tanik Saikh ◽

Tameesh Biswas ◽

Asif Ekbal ◽

Pushpak Bhattacharyya

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Novelty Detection ◽

Multiple Source ◽

Semantic Level ◽

New Information ◽

Textual Entailment ◽

Document Level ◽

The Web

Abstract The quest for new information is an inborn human trait and has always been quintessential for human survival and progress. Novelty drives curiosity, which in turn drives innovation. In Natural Language Processing (NLP), Novelty Detection refers to finding text that has some new information to offer with respect to whatever is earlier seen or known. With the exponential growth of information all across the web, there is an accompanying menace of redundancy. A considerable portion of the web contents are duplicates, and we need efficient mechanisms to retain new information and filter out redundant ones. However, detecting redundancy at the semantic level and identifying novel text is not straightforward because the text may have less lexical overlap yet convey the same information. On top of that, non-novel/redundant information in a document may have assimilated from multiple source documents, not just one. The problem surmounts when the subject of the discourse is documents, and numerous prior documents need to be processed to ascertain the novelty/non-novelty of the current one in concern. In this work, we build upon our earlier investigations for document-level novelty detection and present a comprehensive account of our efforts towards the problem. We explore the role of pre-trained Textual Entailment (TE) models to deal with multiple source contexts and present the outcome of our current investigations. We argue that a multi-premise entailment task is one close approximation towards identifying semantic-level non-novelty. Our recent approach either performs comparably or achieves significant improvement over the latest reported results on several datasets and across several related tasks (paraphrasing, plagiarism, rewrite). We critically analyze our performance with respect to the existing state-of-the-art and show the superiority and promise of our approach for future investigations. We also present our enhanced dataset TAP-DLND 2.0 and several baselines to the community for further researchon document-level novelty detection.

Download Full-text

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

Computational Linguistics ◽

10.1162/coli_a_00430 ◽

2021 ◽

pp. 1-12

Author(s):

Manaal Faruqui ◽

Dilek Hakkani-Tür

Keyword(s):

Speech Recognition ◽

Natural Language ◽

Automatic Speech Recognition ◽

Daily Life ◽

Natural Language Understanding ◽

Speech Understanding ◽

Language Understanding ◽

Dialog Systems ◽

Research Areas ◽

The World

Abstract As more users across the world are interacting with dialog agents in their daily life, there is a need for better speech understanding that calls for renewed attention to the dynamics between research in automatic speech recognition (ASR) and natural language understanding (NLU). We briefly review these research areas and lay out the current relationship between them. In light of the observations we make in this paper, we argue that (1) NLU should be cognizant of the presence of ASR models being used upstream in a dialog system’s pipeline, (2) ASR should be able to learn from errors found in NLU, (3) there is a need for end-to-end datasets that provide semantic annotations on spoken input, (4) there should be stronger collaboration between ASR and NLU research communities.

Download Full-text

Obituary: Martin Kay

Computational Linguistics ◽

10.1162/coli_a_00424 ◽

2021 ◽

pp. 1-4

Author(s):

Ronald M. Kaplan ◽

Hans Uszkoreit

Download Full-text

Improved N-Best Extraction with an Evaluation on Language Data

Computational Linguistics ◽

10.1162/coli_a_00427 ◽

2021 ◽

pp. 1-35

Author(s):

Johanna Björklund ◽

Frank Drewes ◽

Anna Jonsson

Keyword(s):

Language Processing ◽

State Of The Art ◽

Search Space ◽

Data Sets ◽

Weighted Tree ◽

Original Algorithm ◽

Software Toolkit ◽

Minimal Weight ◽

Language Data ◽

Memory Efficient

Abstract We show that a previously proposed algorithm for the N-best trees problem can be made more efficient by changing how it arranges and explores the search space. Given an integer N and a weighted tree automaton (wta) M over the tropical semiring, the algorithm computes N trees of minimal weight with respect to M. Compared to the original algorithm, the modifications increase the laziness of the evaluation strategy, which makes the new algorithm asymptotically more efficient than its predecessor. The algorithm is implemented in the software Betty, and compared to the state-of-the-art algorithm for extracting the N best runs, implemented in the software toolkit Tiburon. The data sets used in the experiments are wtas resulting from real-world natural language processing tasks, as well as artificially created wtas with varying degrees of nondeterminism. We find that Betty outperforms Tiburon on all tested data sets with respect to running time, while Tiburon seems to be the more memory-efficient choice.

Download Full-text

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Computational Linguistics ◽

10.1162/coli_a_00425 ◽

2021 ◽

pp. 1-38

Author(s):

Gözde Gül Şahin

Keyword(s):

Language Models ◽

Semantic Role ◽

Semantic Role Labeling ◽

Dependency Parsing ◽

Low Resource ◽

Part Of Speech Tagging ◽

High Resource ◽

Part Of Speech ◽

Augmentation Techniques ◽

Speech Tagging

Abstract Data-hungry deep neural networks have established themselves as the defacto standard for many NLP tasks including the traditional sequence tagging ones. Despite their state-of-the-art performance on high-resource languages, they still fall behind of their statistical counter-parts in low-resource scenarios. One methodology to counter attack this problem is text augmentation, i.e., generating new synthetic training data points from existing data. Although NLP has recently witnessed a load of textual augmentation techniques, the field still lacks a systematic performance analysis on a diverse set of languages and sequence tagging tasks. To fill this gap, we investigate three categories of text augmentation methodologies which perform changes on the syntax (e.g., cropping sub-sentences), token (e.g., random word insertion) and character (e.g., character swapping) levels.We systematically compare the methods on part-of-speech tagging, dependency parsing and semantic role labeling for a diverse set of language families using various models including the architectures that rely on pretrained multilingual contextualized language models such as mBERT. Augmentation most significantly improves dependency parsing, followed by part-of-speech tagging and semantic role labeling. We find the experimented techniques to be effective on morphologically rich languages in general rather than analytic languages such as Vietnamese. Our results suggest that the augmentation techniques can further improve over strong baselines based on mBERT, especially for dependency parsing. We identify the character-level methods as the most consistent performers, while synonym replacement and syntactic augmenters provide inconsistent improvements. Finally, we discuss that the results most heavily depend on the task, language pair (e.g., syntactic-level techniques mostly benefit higher-level tasks and morphologically richer languages), and the model type (e.g., token-level augmentation provide significant improvements for BPE, while character-level ones give generally higher scores for char and mBERT based models).

Download Full-text

Probing Classifiers: Promises, Shortcomings, and Advances

Computational Linguistics ◽

10.1162/coli_a_00422 ◽

2021 ◽

pp. 1-12

Author(s):

Yonatan Belinkov

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Deep Neural Network ◽

Network Models ◽

Neural Network Models ◽

Linguistic Property

Abstract Probing classifiers have emerged as one of the prominent methodologies for interpreting and analyzing deep neural network models of natural language processing. The basic idea is simple —a classifier is trained to predict some linguistic property from a model's representations—and has been used to examine a wide variety of models and properties. However, recent studies have demonstrated various methodological limitations of this approach. This article critically reviews the probing classifiers framework, highlighting their promises, shortcomings, and advances.

Download Full-text

Natural Language Processing. A Machine Learning Perspective

Computational Linguistics ◽

10.1162/coli_r_00423 ◽

2021 ◽

pp. 1-4

Author(s):

Julia Ive

Keyword(s):

Machine Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing

Download Full-text

Computational Linguistics
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mit Press

On Learning Interpreted Languages with Recurrent Models

Deep Learning for Text Style Transfer: A Survey

Linguistic Parameters of Spontaneous Speech for identifying Mild Cognitive Impairment and Alzheimer’s Disease

Novelty Detection: A Perspective from Natural Language Processing

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

Obituary: Martin Kay

Improved N-Best Extraction with an Evaluation on Language Data

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Probing Classifiers: Promises, Shortcomings, and Advances

Natural Language Processing. A Machine Learning Perspective

Export Citation Format

Computational LinguisticsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mit Press

On Learning Interpreted Languages with Recurrent Models

Deep Learning for Text Style Transfer: A Survey

Linguistic Parameters of Spontaneous Speech for identifying Mild Cognitive Impairment and Alzheimer’s Disease

Novelty Detection: A Perspective from Natural Language Processing

Revisiting the Boundary between ASR and NLU in the Age of Conversational Dialog Systems

Obituary: Martin Kay

Improved N-Best Extraction with an Evaluation on Language Data

To Augment or Not to Augment? A Comparative Study on Text Augmentation Techniques for Low-Resource NLP

Probing Classifiers: Promises, Shortcomings, and Advances

Natural Language Processing. A Machine Learning Perspective

Computational Linguistics
Latest Publications