Transactions of the Association for Computational Linguistics

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00364 ◽

2021 ◽

Vol 9 ◽

pp. 243-260

Author(s):

Jakob Prange ◽

Nathan Schneider ◽

Vivek Srikumar

Keyword(s):

Internal Structure ◽

State Of The Art ◽

High Accuracy ◽

Structured Prediction ◽

Long Tail ◽

Constructive Models ◽

Sizeable Fraction ◽

The Many ◽

Syntactic Derivation ◽

Prior State

Abstract Although current CCG supertaggers achieve high accuracy on the standard WSJ test set, few systems make use of the categories’ internal structure that will drive the syntactic derivation during parsing. The tagset is traditionally truncated, discarding the many rare and complex category types in the long tail. However, supertags are themselves trees. Rather than give up on rare tags, we investigate constructive models that account for their internal structure, including novel methods for tree-structured prediction. Our best tagger is capable of recovering a sizeable fraction of the long-tail supertags and even generates CCG categories that have never been seen in training, while approximating the prior state of the art in overall tag accuracy with fewer parameters. We further investigate how well different approaches generalize to out-of-domain evaluation sets.

Download Full-text

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00355 ◽

2021 ◽

Vol 9 ◽

pp. 139-159

Author(s):

Adina Williams ◽

Ryan Cotterell ◽

Lawrence Wolf-Sonkin ◽

Damián Blasi ◽

Hanna Wallach

Keyword(s):

Information Theory ◽

Significant Relationship ◽

Large Scale ◽

Significant Relationships ◽

Direct Objects ◽

Future Work

Abstract We use large-scale corpora in six different gendered languages, along with tools from NLP and information theory, to test whether there is a relationship between the grammatical genders of inanimate nouns and the adjectives used to describe those nouns. For all six languages, we find that there is a statistically significant relationship. We also find that there are statistically significant relationships between the grammatical genders of inanimate nouns and the verbs that take those nouns as direct objects, as indirect objects, and as subjects. We defer deeper investigation of these relationships for future work.

Download Full-text

Morphology Matters: A Multilingual Language Modeling Analysis

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00365 ◽

2021 ◽

Vol 9 ◽

pp. 261-276

Author(s):

Hyunji Hayley Park ◽

Katherine J. Zhang ◽

Coleman Haley ◽

Kenneth Steimel ◽

Han Liu ◽

...

Keyword(s):

Language Modeling ◽

Inflectional Morphology ◽

Morphological Complexity ◽

Bible Translations ◽

Finite State Transducers ◽

Modeling Analysis ◽

Finite State ◽

The Impact

Abstract Prior studies in multilingual language modeling (e.g., Cotterell et al., 2018; Mielke et al., 2019) disagree on whether or not inflectional morphology makes languages harder to model. We attempt to resolve the disagreement and extend those studies. We compile a larger corpus of 145 Bible translations in 92 languages and a larger number of typological features.1 We fill in missing typological data for several languages and consider corpus-based measures of morphological complexity in addition to expert-produced typological features. We find that several morphological measures are significantly associated with higher surprisal when LSTM models are trained with BPE-segmented data. We also investigate linguistically motivated subword segmentation strategies like Morfessor and Finite-State Transducers (FSTs) and find that these segmentation strategies yield better performance and reduce the impact of a language’s morphology on language modeling.

Download Full-text

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00362 ◽

2021 ◽

Vol 9 ◽

pp. 211-225

Author(s):

Hiroaki Hayashi ◽

Prashant Budania ◽

Peng Wang ◽

Chris Ackerson ◽

Raj Neervannan ◽

...

Keyword(s):

Large Scale ◽

Open Domain ◽

Domain Specific ◽

Product Features ◽

Points Of Interest ◽

Large Scale Dataset

Abstract Aspect-based summarization is the task of generating focused summaries based on specific points of interest. Such summaries aid efficient analysis of text, such as quickly understanding reviews or opinions from different angles. However, due to large differences in the type of aspects for different domains (e.g., sentiment, product features), the development of previous models has tended to be domain-specific. In this paper, we propose WikiAsp,1 a large-scale dataset for multi-domain aspect- based summarization that attempts to spur research in the direction of open-domain aspect-based summarization. Specifically, we build the dataset using Wikipedia articles from 20 different domains, using the section titles and boundaries of each article as a proxy for aspect annotation. We propose several straightforward baseline models for this task and conduct experiments on the dataset. Results highlight key challenges that existing summarization models face in this setting, such as proper pronoun handling of quoted sources and consistent explanation of time-sensitive events.

Download Full-text

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00359 ◽

2021 ◽

Vol 9 ◽

pp. 160-175

Author(s):

Yanai Elazar ◽

Shauli Ravfogel ◽

Alon Jacovi ◽

Yoav Goldberg

Keyword(s):

Analysis Tool ◽

Neural Models ◽

Word Prediction ◽

Ongoing Debate ◽

Part Of Speech ◽

Alternative Method ◽

Black Boxes ◽

Growing Body ◽

Speech Information ◽

Causal Intervention

Abstract A growing body of work makes use of probing in order to investigate the working of neural models, often considered black boxes. Recently, an ongoing debate emerged surrounding the limitations of the probing paradigm. In this work, we point out the inability to infer behavioral conclusions from probing results, and offer an alternative method that focuses on how the information is being used, rather than on what information is encoded. Our method, Amnesic Probing, follows the intuition that the utility of a property for a given task can be assessed by measuring the influence of a causal intervention that removes it from the representation. Equipped with this new analysis tool, we can ask questions that were not possible before, for example, is part-of-speech information important for word prediction? We perform a series of analyses on BERT to answer these types of questions. Our findings demonstrate that conventional probing performance is not correlated to task importance, and we call for increased scrutiny of claims that draw behavioral or causal conclusions from probing results.1

Download Full-text

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00360 ◽

2021 ◽

Vol 9 ◽

pp. 176-194

Author(s):

Xiaozhi Wang ◽

Tianyu Gao ◽

Zhaocheng Zhu ◽

Zhengyan Zhang ◽

Zhiyuan Liu ◽

...

Keyword(s):

Link Prediction ◽

Large Scale ◽

State Of The Art ◽

Source Code ◽

Language Modeling ◽

Unified Model ◽

Factual Knowledge ◽

Textual Information ◽

Language Representation ◽

Knowledge Graphs

Abstract Pre-trained language representation models (PLMs) cannot well capture factual knowledge from text. In contrast, knowledge embedding (KE) methods can effectively represent the relational facts in knowledge graphs (KGs) with informative entity embeddings, but conventional KE models cannot take full advantage of the abundant textual information. In this paper, we propose a unified model for Knowledge Embedding and Pre-trained LanguagERepresentation (KEPLER), which can not only better integrate factual knowledge into PLMs but also produce effective text-enhanced KE with the strong PLMs. In KEPLER, we encode textual entity descriptions with a PLM as their embeddings, and then jointly optimize the KE and language modeling objectives. Experimental results show that KEPLER achieves state-of-the-art performances on various NLP tasks, and also works remarkably well as an inductive KE model on KG link prediction. Furthermore, for pre-training and evaluating KEPLER, we construct Wikidata5M1 , a large-scale KG dataset with aligned entity descriptions, and benchmark state-of-the-art KE methods on it. It shall serve as a new KE benchmark and facilitate the research on large KG, inductive KE, and KG with text. The source code can be obtained from https://github.com/THU-KEG/KEPLER.

Download Full-text

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00350 ◽

2021 ◽

Vol 9 ◽

pp. 1-16

Author(s):

Aditi Chaudhary ◽

Antonios Anastasopoulos ◽

Zaid Sheikh ◽

Graham Neubig

Keyword(s):

Active Learning ◽

Empirical Study ◽

Data Distribution ◽

Data Selection ◽

Surprising Result ◽

Selection Algorithm ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Training Samples ◽

Speech Tagging

Active learning (AL) uses a data selection algorithm to select useful training samples to minimize annotation cost. This is now an essential tool for building low-resource syntactic analyzers such as part-of-speech (POS) taggers. Existing AL heuristics are generally designed on the principle of selecting uncertain yet representative training instances, where annotating these instances may reduce a large number of errors. However, in an empirical study across six typologically diverse languages (German, Swedish, Galician, North Sami, Persian, and Ukrainian), we found the surprising result that even in an oracle scenario where we know the true uncertainty of predictions, these current heuristics are far from optimal. Based on this analysis, we pose the problem of AL as selecting instances that maximally reduce the confusion between particular pairs of output tags. Extensive experimentation on the aforementioned languages shows that our proposed AL strategy outperforms other AL strategies by a significant margin. We also present auxiliary results demonstrating the importance of proper calibration of models, which we ensure through cross-view training, and analysis demonstrating how our proposed strategy selects examples that more closely follow the oracle data distribution. The code is publicly released here. 1

Download Full-text

Efficient Content-Based Sparse Attention with Routing Transformers

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00353 ◽

2021 ◽

Vol 9 ◽

pp. 53-68

Author(s):

Aurko Roy ◽

Mohammad Saffar ◽

Ashish Vaswani ◽

David Grangier

Keyword(s):

State Of The Art ◽

Language Modeling ◽

Sequence Length ◽

Image Generation ◽

Data Set ◽

Sliding Windows ◽

Sequence Modeling ◽

Wide Range ◽

Small Set ◽

Transformer Model

Self-attention has recently been adopted for a wide range of sequence modeling problems. Despite its effectiveness, self-attention suffers from quadratic computation and memory requirements with respect to sequence length. Successful approaches to reduce this complexity focused on attending to local sliding windows or a small set of locations independent of content. Our work proposes to learn dynamic sparse attention patterns that avoid allocating computation and memory to attend to content unrelated to the query of interest. This work builds upon two lines of research: It combines the modeling flexibility of prior work on content-based sparse attention with the efficiency gains from approaches based on local, temporal sparse attention. Our model, the Routing Transformer, endows self-attention with a sparse routing module based on online k-means while reducing the overall complexity of attention to O( n1.5d) from O( n2d) for sequence length n and hidden dimension d. We show that our model outperforms comparable sparse attention models on language modeling on Wikitext-103 (15.8 vs 18.3 perplexity), as well as on image generation on ImageNet-64 (3.43 vs 3.44 bits/dim) while using fewer self-attention layers. Additionally, we set a new state-of-the-art on the newly released PG-19 data-set, obtaining a test perplexity of 33.2 with a 22 layer Routing Transformer model trained on sequences of length 8192. We open-source the code for Routing Transformer in Tensorflow.1

Download Full-text

Conversation Graph: Data Augmentation, Training, and Evaluation for Non-Deterministic Dialogue Management

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00352 ◽

2021 ◽

Vol 9 ◽

pp. 36-52

Author(s):

Milan Gritta ◽

Gerasimos Lampouras ◽

Ignacio Iacobacci

Keyword(s):

Data Augmentation ◽

Training Data ◽

Dialogue Systems ◽

Success Rates ◽

Agent Behavior ◽

Graph Data ◽

Dialogue Management ◽

Data Volume ◽

Task Oriented ◽

Training Signal

Task-oriented dialogue systems typically rely on large amounts of high-quality training data or require complex handcrafted rules. However, existing datasets are often limited in size con- sidering the complexity of the dialogues. Additionally, conventional training signal in- ference is not suitable for non-deterministic agent behavior, namely, considering multiple actions as valid in identical dialogue states. We propose the Conversation Graph (ConvGraph), a graph-based representation of dialogues that can be exploited for data augmentation, multi- reference training and evaluation of non- deterministic agents. ConvGraph generates novel dialogue paths to augment data volume and diversity. Intrinsic and extrinsic evaluation across three datasets shows that data augmentation and/or multi-reference training with ConvGraph can improve dialogue success rates by up to 6.4%.

Download Full-text

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00354 ◽

2021 ◽

Vol 9 ◽

pp. 69-81

Author(s):

Jiaming Luo ◽

Frederik Hartmann ◽

Enrico Santus ◽

Regina Barzilay ◽

Yuan Cao

Keyword(s):

Strong Evidence ◽

Word Segmentation ◽

Sound Change ◽

International Phonetic Alphabet ◽

Phonological Constraints ◽

Related Language ◽

Linguistic Constraints

Most undeciphered lost languages exhibit two characteristics that pose significant decipherment challenges: (1) the scripts are not fully segmented into words; (2) the closest known language is not determined. We propose a decipherment model that handles both of these challenges by building on rich linguistic constraints reflecting consistent patterns in historical sound change. We capture the natural phonological geometry by learning character embeddings based on the International Phonetic Alphabet (IPA). The resulting generative framework jointly models word segmentation and cognate alignment, informed by phonological constraints. We evaluate the model on both deciphered languages (Gothic, Ugaritic) and an undeciphered one (Iberian). The experiments show that incorporating phonetic geometry leads to clear and consistent gains. Additionally, we propose a measure for language closeness which correctly identifies related languages for Gothic and Ugaritic. For Iberian, the method does not show strong evidence supporting Basque as a related language, concurring with the favored position by the current scholarship. 1

Download Full-text

Transactions of the Association for Computational Linguistics
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mit Press

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

Morphology Matters: A Multilingual Language Modeling Analysis

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Efficient Content-Based Sparse Attention with Routing Transformers

Conversation Graph: Data Augmentation, Training, and Evaluation for Non-Deterministic Dialogue Management

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Export Citation Format

Transactions of the Association for Computational LinguisticsLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Mit Press

Supertagging the Long Tail with Tree-Structured Decoding of Complex Categories

On the Relationships Between the Grammatical Genders of Inanimate Nouns and Their Co-Occurring Adjectives and Verbs

Morphology Matters: A Multilingual Language Modeling Analysis

WikiAsp: A Dataset for Multi-domain Aspect-based Summarization

Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals

KEPLER: A Unified Model for Knowledge Embedding and Pre-trained Language Representation

Reducing Confusion in Active Learning for Part-Of-Speech Tagging

Efficient Content-Based Sparse Attention with Routing Transformers

Conversation Graph: Data Augmentation, Training, and Evaluation for Non-Deterministic Dialogue Management

Deciphering Undersegmented Ancient Scripts Using Phonetic Prior

Transactions of the Association for Computational Linguistics
Latest Publications