Erratum: Measuring and Improving Consistency in Pretrained Language Models

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_x_00455 ◽

2021 ◽

Vol 9 ◽

pp. 1407-1407

Author(s):

Yanai Elazar ◽

Nora Kassner ◽

Shauli Ravfogel ◽

Abhilasha Ravichander ◽

Eduard Hovy ◽

...

Keyword(s):

Language Models ◽

Correct Formula ◽

The Right

Abstract During production of this paper, an error was introduced to the formula on the bottom of the right column of page 1020. In the last two terms of the formula, the n and m subscripts were swapped. The correct formula is:Lc=∑n=1k∑m=n+1kDKL(Qnri∥Qmri)+DKL(Qmri∥Qnri)The paper has been updated.

Download Full-text

An Empirical Study on Robustness to Spurious Correlations using Pre-trained Language Models

Transactions of the Association for Computational Linguistics ◽

10.1162/tacl_a_00335 ◽

2020 ◽

Vol 8 ◽

pp. 621-633

Author(s):

Lifu Tu ◽

Garima Lalwani ◽

Spandana Gella ◽

He He

Keyword(s):

Empirical Study ◽

Natural Language ◽

Recent Work ◽

Language Models ◽

Task Learning ◽

The Right

Recent work has shown that pre-trained language models such as BERT improve robustness to spurious correlations in the dataset. Intrigued by these results, we find that the key to their success is generalization from a small amount of counterexamples where the spurious correlations do not hold. When such minority examples are scarce, pre-trained models perform as poorly as models trained from scratch. In the case of extreme minority, we propose to use multi-task learning (MTL) to improve generalization. Our experiments on natural language inference and paraphrase identification show that MTL with the right auxiliary tasks significantly improves performance on challenging examples without hurting the in-distribution performance. Further, we show that the gain from MTL mainly comes from improved generalization from the minority examples. Our results highlight the importance of data diversity for overcoming spurious correlations. 1

Download Full-text

Learning Representations for Weakly Supervised Natural Language Processing Tasks

Computational Linguistics ◽

10.1162/coli_a_00167 ◽

2014 ◽

Vol 40 (1) ◽

pp. 85-120 ◽

Cited By ~ 20

Author(s):

Fei Huang ◽

Arun Ahuja ◽

Doug Downey ◽

Yi Yang ◽

Yuhong Guo ◽

...

Keyword(s):

Language Processing ◽

Graphical Model ◽

Markov Models ◽

Language Models ◽

Part Of Speech ◽

Partial Lattice ◽

Statistical Language Models ◽

N Gram ◽

Weakly Supervised ◽

The Right

Finding the right representations for words is critical for building accurate NLP systems when domain-specific labeled data for the task is scarce. This article investigates novel techniques for extracting features from n-gram models, Hidden Markov Models, and other statistical language models, including a novel Partial Lattice Markov Random Field model. Experiments on part-of-speech tagging and information extraction, among other tasks, indicate that features taken from statistical language models, in combination with more traditional features, outperform traditional representations alone, and that graphical model representations outperform n-gram models, especially on sparse and polysemous words.

Download Full-text

Constructional ‘scene encoding’ and acquisition: Mothers’ use of argument structure constructions in English child-directed speech

Yearbook of the German Cognitive Linguistics Association ◽

10.1515/gcla-2013-0008 ◽

2013 ◽

Vol 1 (1) ◽

Cited By ~ 1

Author(s):

Arne Zeschel

Keyword(s):

Argument Structure ◽

Large Scale ◽

Language Models ◽

Event Type ◽

Corpus Study ◽

English Child ◽

Elementary Argument ◽

Usage Patterns ◽

The Right

AbstractConstruction-based language models assume that grammar is meaningful and learnable from experience. Focusing on five of the most elementary argument structure constructions of English, a large-scale corpus study of child-directed speech (CDS) investigates exactly which meanings/functions are associated with these patterns in CDS, and whether they are indeed specially indicated to children by their caretakers (as suggested by previous research, cf. Goldberg, Casenhiser and Sethuraman 2004). Collostructional analysis (Stefanowitsch and Gries 2003) is employed to uncover significantly attracted verb-construction combinations, and attracted pairs are classified semantically in order to systematise the attested usage patterns of the target constructions. The results indicate that the structure of the input may aid learners in making the right generalisations about constructional usage patterns, but such scaffolding is not strictly necessary for construction learning: not all argument structure constructions are coherently semanticised to the same extent (in the sense that they designate a single schematic event type of the kind envisioned in Goldberg’s [1995] ‘scene encoding hypothesis’), and they also differ in the extent to which individual semantic subtypes predominate in learners’ input

Download Full-text

You Don’t Need Labeled Data for Open-Book Question Answering

Applied Sciences ◽

10.3390/app12010111 ◽

2021 ◽

Vol 12 (1) ◽

pp. 111

Author(s):

Sia Gholami ◽

Mehdi Noori

Keyword(s):

Question Answering ◽

Common Knowledge ◽

Language Model ◽

Language Models ◽

Open Book ◽

Technical Documentation ◽

Domain Specific ◽

Information Retrieval Systems ◽

Amazon Web Services ◽

The Right

Open-book question answering is a subset of question answering (QA) tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions have a yes–no–none answer and a text answer which can be short (a few words) or long (a few sentences). We present a two-step, retriever–extractor architecture in which a retriever finds the right documents and an extractor finds the answers in the retrieved documents. To test our solution, we are introducing a new dataset for open-book QA based on real customer questions on AWS technical documentation. In this paper, we conducted experiments on several information retrieval systems and extractive language models, attempting to find the yes–no–none answers and text answers in the same pass. Our custom-built extractor model is created from a pretrained language model and fine-tuned on the the Stanford Question Answering Dataset—SQuAD and Natural Questions datasets. We were able to achieve 42% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.

Download Full-text

Do you have the right scissors? Tailoring Pre-trained Language Models via Monte-Carlo Methods

10.18653/v1/2020.acl-main.314 ◽

2020 ◽

Author(s):

Ning Miao ◽

Yuxuan Song ◽

Hao Zhou ◽

Lei Li

Keyword(s):

Monte Carlo ◽

Monte Carlo Methods ◽

Language Models ◽

The Right

Download Full-text

Investor-State Dispute Settlement under NAFTA Chapter 11: The Shape of Things to Come?

Canadian Yearbook of international Law/Annuaire canadien de droit international ◽

10.1017/s0069005800006652 ◽

1998 ◽

Vol 35 ◽

pp. 263-290

Author(s):

J. Anthony VanDuzer

Keyword(s):

Free Trade Agreement ◽

Dispute Settlement ◽

Trade Agreement ◽

Minimum Standards ◽

The North ◽

State Behaviour ◽

Nafta Chapter 11 ◽

To Come ◽

The Right

SummaryRecently, there has been a proliferation of international agreements imposing minimum standards on states in respect of their treatment of foreign investors and allowing investors to initiate dispute settlement proceedings where a state violates these standards. Of greatest significance to Canada is Chapter 11 of the North American Free Trade Agreement, which provides both standards for state behaviour and the right to initiate binding arbitration. Since 1996, four cases have been brought under Chapter 11. This note describes the Chapter 11 process and suggests some of the issues that may arise as it is increasingly resorted to by investors.

Download Full-text

What face familiarity feelings say about the lateralization of specific entities within the core system

Behavioral and Brain Sciences ◽

10.1017/s0140525x19001778 ◽

2019 ◽

Vol 42 ◽

Author(s):

Guido Gainotti

Keyword(s):

Temporal Lobe ◽

Memory System ◽

Core System ◽

The Core ◽

Memory Traces ◽

Specific Memory ◽

Target Article ◽

Temporal Structures ◽

The Right

Abstract The target article carefully describes the memory system, centered on the temporal lobe that builds specific memory traces. It does not, however, mention the laterality effects that exist within this system. This commentary briefly surveys evidence showing that clear asymmetries exist within the temporal lobe structures subserving the core system and that the right temporal structures mainly underpin face familiarity feelings.

Download Full-text

Address By The Right Honourable The Earl of Balfour, Chancellor of the University of Cambridge

Transactions of the International Astronomical Union ◽

10.1017/s0251107x00012670 ◽

1925 ◽

Vol 2 ◽

pp. 146-149

Keyword(s):

University Of Cambridge ◽

The Right ◽

The University

Download Full-text

Study of charge transfer in FeTi

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100075270 ◽

1983 ◽

Vol 41 ◽

pp. 296-297

Author(s):

J. Taft∅

Keyword(s):

Charge Transfer ◽

Charge Distribution ◽

Electron Scattering ◽

Periodic System ◽

Electron Distribution ◽

Scattering Amplitude ◽

Transition Elements ◽

Hartree Fock ◽

Cscl Structure ◽

The Right

It is well known that for reflections corresponding to large interplanar spacings (i.e., sin θ/λ small), the electron scattering amplitude, f, is sensitive to the ionicity and to the charge distribution around the atoms. We have used this in order to obtain information about the charge distribution in FeTi, which is a candidate for storage of hydrogen. Our goal is to study the changes in electron distribution in the presence of hydrogen, and also the ionicity of hydrogen in metals, but so far our study has been limited to pure FeTi. FeTi has the CsCl structure and thus Fe and Ti scatter with a phase difference of π into the 100-ref lections. Because Fe (Z = 26) is higher in the periodic system than Ti (Z = 22), an immediate “guess” would be that Fe has a larger scattering amplitude than Ti. However, relativistic Hartree-Fock calculations show that the opposite is the case for the 100-reflection. An explanation for this may be sought in the stronger localization of the d-electrons of the first row transition elements when moving to the right in the periodic table. The tabulated difference between fTi (100) and ffe (100) is small, however, and based on the values of the scattering amplitude for isolated atoms, the kinematical intensity of the 100-reflection is only 5.10-4 of the intensity of the 200-reflection.

Download Full-text

New Dimensions in Freeze-Etching

Proceedings, annual meeting, Electron Microscopy Society of America ◽

10.1017/s0424820100062920 ◽

1969 ◽

Vol 27 ◽

pp. 202-203 ◽

Cited By ~ 4

Author(s):

Russell L. Steere ◽

Michael Moseley

Keyword(s):

Fracture Surfaces ◽

Aluminum Specimen ◽

Specimen Holder ◽

Freeze Etching ◽

The Right

A redesigned specimen holder and cap have made possible the freeze-etching of both fracture surfaces of a frozen fractured specimen. In principal, the procedure involves freezing a specimen between two specimen holders (as shown in A, Fig. 1, and the left side of Fig. 2). The aluminum specimen holders and brass cap are constructed so that the upper specimen holder can be forced loose, turned over, and pressed down firmly against the specimen stage to a position represented by B, Fig. 1, and the right side of Fig. 2.

Download Full-text