Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Abstract A growing body of work makes use of probing in order to investigate the working of neural models, often considered black boxes. Recently, an ongoing debate emerged surrounding the limitations of the probing paradigm. In this work, we point out the inability to infer behavioral conclusions from probing results, and offer an alternative method that focuses on how the information is being used, rather than on what information is encoded. Our method, Amnesic Probing, follows the intuition that the utility of a property for a given task can be assessed by measuring the influence of a causal intervention that removes it from the representation. Equipped with this new analysis tool, we can ask questions that were not possible before, for example, is part-of-speech information important for word prediction? We perform a series of analyses on BERT to answer these types of questions. Our findings demonstrate that conventional probing performance is not correlated to task importance, and we call for increased scrutiny of claims that draw behavioral or causal conclusions from probing results.1

Download Full-text

Learning from Disagreement: A Survey

Journal of Artificial Intelligence Research ◽

10.1613/jair.1.12752 ◽

2021 ◽

Vol 72 ◽

pp. 1385-1470

Author(s):

Alexandra N. Uma ◽

Tommaso Fornaciari ◽

Dirk Hovy ◽

Silviu Paun ◽

Barbara Plank ◽

...

Keyword(s):

Language Processing ◽

Gold Standard ◽

Training Methods ◽

High Quality ◽

Training Models ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Growing Body ◽

Research Questions ◽

Speech Tagging

Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer evidence that humans disagree, from objective tasks such as part-of-speech tagging to more subjective tasks such as classifying an image or deciding whether a proposition follows from certain premises. While most learning in artificial intelligence (AI) still relies on the assumption that a single (gold) interpretation exists for each item, a growing body of research aims to develop learning methods that do not rely on this assumption. In this survey, we review the evidence for disagreements on NLP and CV tasks, focusing on tasks for which substantial datasets containing this information have been created. We discuss the most popular approaches to training models from datasets containing multiple judgments potentially in disagreement. We systematically compare these different approaches by training them with each of the available datasets, considering several ways to evaluate the resulting models. Finally, we discuss the results in depth, focusing on four key research questions, and assess how the type of evaluation and the characteristics of a dataset determine the answers to these questions. Our results suggest, first of all, that even if we abandon the assumption of a gold standard, it is still essential to reach a consensus on how to evaluate models. This is because the relative performance of the various training methods is critically affected by the chosen form of evaluation. Secondly, we observed a strong dataset effect. With substantial datasets, providing many judgments by high-quality coders for each item, training directly with soft labels achieved better results than training from aggregated or even gold labels. This result holds for both hard and soft evaluation. But when the above conditions do not hold, leveraging both gold and soft labels generally achieved the best results in the hard evaluation. All datasets and models employed in this paper are freely available as supplementary materials.

Download Full-text

Part-of-speech persistence: The influence of part-of-speech information on lexical processes☆

Journal of Memory and Language ◽

10.1016/j.jml.2006.12.001 ◽

2007 ◽

Vol 56 (4) ◽

pp. 472-489 ◽

Cited By ~ 12

Author(s):

Alissa Melinger ◽

Jean-Pierre Koenig

Keyword(s):

Part Of Speech ◽

Speech Information ◽

Lexical Processes

Download Full-text

POS-tagging a bilingual parallel corpus: methods and challenges

Research in Corpus Linguistics ◽

10.32714/ricl.05.03 ◽

2017 ◽

pp. 35-46 ◽

Cited By ~ 2

Author(s):

Irene Doval

Keyword(s):

The Other ◽

Major Error ◽

Error Patterns ◽

Parallel Corpus ◽

Pos Tagging ◽

Ongoing Process ◽

Part Of Speech ◽

Improve Accuracy ◽

The One ◽

Speech Information

This paper reviews the author’s experiences of tokenizing and POS tagging a bilingual parallel corpus, the PaGeS Corpus, consisting mostly of German and Spanish fictional texts. This is part of an ongoing process of annotating the corpus for part-of-speech information. This study discusses the specific problems encountered so far. On the one hand, tagging performance degrades significantly when applied to fictional data and, on the other, pre-existing annotation schemes are all language specific. To further improve accuracy during post-editing, the author has developed a common tagset and identified major error patterns.

Download Full-text

An Algorithm for Morphological Segmentation of Esperanto Words

Prague Bulletin of Mathematical Linguistics ◽

10.1515/pralin-2016-0003 ◽

2016 ◽

Vol 105 (1) ◽

pp. 63-76

Author(s):

Theresa Guinard

Keyword(s):

Natural Language Processing ◽

Language Processing ◽

Morphological Analysis ◽

Compound Words ◽

Part Of Speech ◽

Semantic Classes ◽

Morphological Segmentation ◽

Segmentation Accuracy ◽

N Gram ◽

Speech Information

Abstract Morphological analysis (finding the component morphemes of a word and tagging morphemes with part-of-speech information) is a useful preprocessing step in many natural language processing applications, especially for synthetic languages. Compound words from the constructed language Esperanto are formed by straightforward agglutination, but for many words, there is more than one possible sequence of component morphemes. However, one segmentation is usually more semantically probable than the others. This paper presents a modified n-gram Markov model that finds the most probable segmentation of any Esperanto word, where the model’s states represent morpheme part-of-speech and semantic classes. The overall segmentation accuracy was over 98% for a set of presegmented dictionary words.

Download Full-text

Fusing Part-of-Speech Information in Low-Resource Neural Paraphrase Generation

Computational Intelligence and Neuroscience ◽

10.1155/2021/9022193 ◽

2021 ◽

Vol 2021 ◽

pp. 1-11

Author(s):

Xiaoqiang Chi ◽

Yang Xiang

Keyword(s):

Language Processing ◽

Sequence Learning ◽

Neural Nets ◽

Linguistic Knowledge ◽

Underlying Assumption ◽

Low Resource ◽

Part Of Speech ◽

Multiple Datasets ◽

Speech Information ◽

Paraphrase Generation

Paraphrase generation is an essential yet challenging task in natural language processing. Neural-network-based approaches towards paraphrase generation have achieved remarkable success in recent years. Previous neural paraphrase generation approaches ignore linguistic knowledge, such as part-of-speech information regardless of its availability. The underlying assumption is that neural nets could learn such information implicitly when given sufficient data. However, it would be difficult for neural nets to learn such information properly when data are scarce. In this work, we endeavor to probe into the efficacy of explicit part-of-speech information for the task of paraphrase generation in low-resource scenarios. To this end, we devise three mechanisms to fuse part-of-speech information under the framework of sequence-to-sequence learning. We demonstrate the utility of part-of-speech information in low-resource paraphrase generation through extensive experiments on multiple datasets of varying sizes and genres.

Download Full-text

Semi-supervised and unsupervised categorization of posts in Web discussion forums using part-of-speech information and minimal features

10.18653/v1/w16-0417 ◽

2016 ◽

Author(s):

Krish Perumal ◽

Graeme Hirst

Keyword(s):

Discussion Forums ◽

Part Of Speech ◽

Speech Information

Download Full-text

Selecting effective index terms using a decision tree

Natural Language Engineering ◽

10.1017/s1351324902002899 ◽

2002 ◽

Vol 8 (2-3) ◽

pp. 193-207 ◽

Cited By ~ 2

Author(s):

TOKUNAGA TAKENOBU ◽

KIMURA KENJI ◽

OGIBAYASHI HIRONORI ◽

TANAKA HOZUMI

Keyword(s):

Decision Tree ◽

Effective Index ◽

Test Collection ◽

Retrieval Effectiveness ◽

Part Of Speech ◽

Information Retrieval Systems ◽

Index Terms ◽

Two Phases ◽

Syntactic Relations ◽

Speech Information

This paper explores the effectiveness of index terms more complex than the single words used in conventional information retrieval systems. Retrieval is done in two phases: in the first, a conventional retrieval method (the Okapi system) is used; in the second, complex index terms such as syntactic relations and single words with part-of-speech information are introduced to rerank the results of the first phase. We evaluated the effectiveness of the different types of index terms through experiments using the TREC-7 test collection and 50 queries. The retrieval effectiveness was improved for 32 out of 50 queries. Based on this investigation, we then introduce a method to select effective index terms by using a decision tree. Further experiments with the same test collection showed that retrieval effectiveness was improved in 25 of the 50 queries.

Download Full-text

Improving identifier informativeness using part of speech information

Proceeding of the 8th working conference on Mining software repositories - MSR '11 ◽

10.1145/1985441.1985471 ◽

2011 ◽

Cited By ~ 31

Author(s):

Dave Binkley ◽

Matthew Hearn ◽

Dawn Lawrie

Keyword(s):

Part Of Speech ◽

Speech Information

Download Full-text

Domain-Specific Chinese Transformer-XL Language Model with Part-of-Speech Information

2020 16th International Conference on Computational Intelligence and Security (CIS) ◽

10.1109/cis52066.2020.00026 ◽

2020 ◽

Author(s):

Huaichang Qu ◽

Haifeng Zhao ◽

Xin Wang

Keyword(s):

Language Model ◽

Domain Specific ◽

Part Of Speech ◽

Speech Information

Download Full-text

Fusion of part-of-speech vectors and attention mechanisms for cross-domain sentiment analysis

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201295 ◽

2021 ◽

pp. 1-9

Author(s):

Ting Lu ◽

Yan Xiang ◽

Junge Liang ◽

Li Zhang ◽

Mingfang Zhang

Keyword(s):

Sentiment Analysis ◽

Poor Performance ◽

Feature Representation ◽

Grand Challenge ◽

Feature Mapping ◽

Target Domain ◽

Specific Domain ◽

Part Of Speech ◽

Cross Domain ◽

Speech Information

The grand challenge of cross-domain sentiment analysis is that classifiers trained in a specific domain are very sensitive to the discrepancy between domains. A sentiment classifier trained in the source domain usually have a poor performance in the target domain. One of the main strategies to solve this problem is the pivot-based strategy, which regards the feature representation as an important component. However, part-of-speech information was not considered to guide the learning of feature representation and feature mapping in previous pivot-based models. Therefore, we present a fused part-of-speech vectors and attention-based model (FAM). In our model, we fuse part-of-speech vectors and feature word embeddings as the representation of features, giving deep semantics to mapping features. And we adopt Multi-Head attention mechanism to train the cross-domain sentiment classifier to obtain the connection between different features. The results of 12 groups comparative experiments on the Amazon dataset demonstrate that our model outperforms all baseline models in this paper.

Download Full-text