scholarly journals Injecting Event Knowledge into Pre-Trained Language Models for Event Extraction

2020 ◽  
Author(s):  
Zining Yang ◽  
Siyu Zhan ◽  
Mengshu Hou ◽  
Xiaoyang Zeng ◽  
Hao Zhu

The recent pre-trained language model has made great success in many NLP tasks. In this paper, we propose an event extraction system based on the novel pre-trained language model BERT to extract both event trigger and argument. As a deep-learningbased method, the size of the training dataset has a crucial impact on performance. To address the lacking training data problem for event extraction, we further train the pretrained language model with a carefully constructed in-domain corpus to inject event knowledge to our event extraction system with minimal efforts. Empirical evaluation on the ACE2005 dataset shows that injecting event knowledge can significantly improve the performance of event extraction.

2021 ◽  
pp. 1-55
Author(s):  
Daniel Loureiro ◽  
Kiamehr Rezaee ◽  
Mohammad Taher Pilehvar ◽  
Jose Camacho-Collados

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.


2021 ◽  
Vol 15 (1) ◽  
pp. 31-45
Author(s):  
Arjit Jain ◽  
Sunita Sarawagi ◽  
Prithviraj Sen

Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on such tasks require large amounts of labeled data to yield useful models. Active Learning is a promising approach for ER in low resource settings. However, the search space, to find informative samples for the user to label, grows quadratically for instance-pair tasks making active learning hard to scale. Previous works, in this setting, rely on hand-crafted predicates, pre-trained language model embeddings, or rule learning to prune away unlikely pairs from the Cartesian product. This blocking step can miss out on important regions in the product space leading to low recall. We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize recall for blocking and accuracy for matching blocked pairs. DIAL uses an Index-By-Committee framework, where each committee member learns representations based on powerful pre-trained transformer language models. We highlight surprising differences between the matcher and the blocker in the creation of the training data and the objective used to train their parameters. Experiments on five benchmark datasets and a multilingual record matching dataset show the effectiveness of our approach in terms of precision, recall and running time.


Author(s):  
A. Evtushenko

Machine learning language models are combinations of algorithms and neural networks designed for text processing composed in natural language (Natural Language Processing, NLP).  In 2020, the largest language model from the artificial intelligence research company OpenAI, GPT-3, was released, the maximum number of parameters of which reaches 175 billion. The parameterization of the model increased by more than 100 times made it possible to improve the quality of generated texts to a level that is hard to distinguish from human-written texts. It is noteworthy that this model was trained on a training dataset mainly collected from open sources on the Internet, the volume of which is estimated at 570 GB.  This article discusses the problem of memorizing critical information, in particular, personal data of individual, at the stage of training large language models (GPT-2/3 and derivatives), and also describes an algorithmic approach to solving this problem, which consists in additional preprocessing training dataset and refinement of the model inference in the context of generating pseudo-personal data and embedding into the results of work on the tasks of summarization, text generation, formation of answers to questions and others from the field of seq2seq.


2020 ◽  
Vol 34 (05) ◽  
pp. 7634-7642
Author(s):  
David Demeter ◽  
Doug Downey

Neural network language models (NNLMs) have achieved ever-improving accuracy due to more sophisticated architectures and increasing amounts of training data. However, the inductive bias of these models (formed by the distributional hypothesis of language), while ideally suited to modeling most running text, results in key limitations for today's models. In particular, the models often struggle to learn certain spatial, temporal, or quantitative relationships, which are commonplace in text and are second-nature for human readers. Yet, in many cases, these relationships can be encoded with simple mathematical or logical expressions. How can we augment today's neural models with such encodings?In this paper, we propose a general methodology to enhance the inductive bias of NNLMs by incorporating simple functions into a neural architecture to form a hierarchical neural-symbolic language model (NSLM). These functions explicitly encode symbolic deterministic relationships to form probability distributions over words. We explore the effectiveness of this approach on numbers and geographic locations, and show that NSLMs significantly reduce perplexity in small-corpus language modeling, and that the performance improvement persists for rare tokens even on much larger corpora. The approach is simple and general, and we discuss how it can be applied to other word classes beyond numbers and geography.


2014 ◽  
Vol 102 (1) ◽  
pp. 81-92 ◽  
Author(s):  
Baltescu Paul ◽  
Blunsom Phil ◽  
Hoang Hieu

Abstract This paper presents an open source implementation1 of a neural language model for machine translation. Neural language models deal with the problem of data sparsity by learning distributed representations for words in a continuous vector space. The language modelling probabilities are estimated by projecting a word's context in the same space as the word representations and by assigning probabilities proportional to the distance between the words and the context's projection. Neural language models are notoriously slow to train and test. Our framework is designed with scalability in mind and provides two optional techniques for reducing the computational cost: the so-called class decomposition trick and a training algorithm based on noise contrastive estimation. Our models may be extended to incorporate direct n-gram features to learn weights for every n-gram in the training data. Our framework comes with wrappers for the cdec and Moses translation toolkits, allowing our language models to be incorporated as normalized features in their decoders (inside the beam search).


2021 ◽  
Vol 33 (3) ◽  
pp. 199-222
Author(s):  
Nicolay Leonidovich Rusnachenko

Large text can convey various forms of sentiment information including the author’s position, positive or negative effects of some events, attitudes of mentioned entities towards to each other. In this paper, we experiment with BERT based language models for extracting sentiment attitudes between named entities. Given a mass media article and list of mentioned named entities, the task is to ex tract positive or negative attitudes between them. Efficiency of language model methods depends on the amount of training data. To enrich training data, we adopt distant supervision method, which provide automatic annotation of unlabeled texts using an additional lexical resource. The proposed approach is subdivided into two stages FRAME-BASED: (1) sentiment pairs list completion (PAIR-BASED), (2) document annotations using PAIR-BASED and FRAME-BASED factors. Being applied towards a large news collection, the method generates RuAttitudes2017 automatically annotated collection. We evaluate the approach on RuSentRel-1.0, consisted of mass media articles written in Russian. Adopting RuAttitudes2017 in the training process results in 10-13% quality improvement by F1-measure over supervised learning and by 25% over the top neural network based model results.


Author(s):  
Linshu Ouyang ◽  
Yongzheng Zhang ◽  
Hui Liu ◽  
Yige Chen ◽  
Yipeng Wang

Authorship verification is an important problem that has many applications. The state-of-the-art deep authorship verification methods typically leverage character-level language models to encode author-specific writing styles. However, they often fail to capture syntactic level patterns, leading to sub-optimal accuracy in cross-topic scenarios. Also, due to imperfect cross-author parameter sharing, it's difficult for them to distinguish author-specific writing style from common patterns, leading to data-inefficient learning. This paper introduces a novel POS-level (Part of Speech) gated RNN based language model to effectively learn the author-specific syntactic styles. The author-agnostic syntactic information obtained from the POS tagger pre-trained on large external datasets greatly reduces the number of effective parameters of our model, enabling the model to learn accurate author-specific syntactic styles with limited training data. We also utilize a gated architecture to learn the common syntactic writing styles with a small set of shared parameters and let the author-specific parameters focus on each author's special syntactic styles. Extensive experimental results show that our method achieves significantly better accuracy than state-of-the-art competing methods, especially in cross-topic scenarios (over 5\% in terms of AUC-ROC).


Author(s):  
Hao Fei ◽  
Yafeng Ren ◽  
Yue Zhang ◽  
Donghong Ji ◽  
Xiaohui Liang

Abstract Biomedical information extraction (BioIE) is an important task. The aim is to analyze biomedical texts and extract structured information such as named entities and semantic relations between them. In recent years, pre-trained language models have largely improved the performance of BioIE. However, they neglect to incorporate external structural knowledge, which can provide rich factual information to support the underlying understanding and reasoning for biomedical information extraction. In this paper, we first evaluate current extraction methods, including vanilla neural networks, general language models and pre-trained contextualized language models on biomedical information extraction tasks, including named entity recognition, relation extraction and event extraction. We then propose to enrich a contextualized language model by integrating a large scale of biomedical knowledge graphs (namely, BioKGLM). In order to effectively encode knowledge, we explore a three-stage training procedure and introduce different fusion strategies to facilitate knowledge injection. Experimental results on multiple tasks show that BioKGLM consistently outperforms state-of-the-art extraction models. A further analysis proves that BioKGLM can capture the underlying relations between biomedical knowledge concepts, which are crucial for BioIE.


2021 ◽  
Vol 9 ◽  
pp. 1408-1424
Author(s):  
Timo Schick ◽  
Sahana Udupa ◽  
Hinrich Schütze

Abstract ⚠ This paper contains prompts and model outputs that are offensive in nature. When trained on large, unfiltered crawls from the Internet, language models pick up and reproduce all kinds of undesirable biases that can be found in the data: They often generate racist, sexist, violent, or otherwise toxic language. As large models require millions of training examples to achieve good performance, it is difficult to completely prevent them from being exposed to such content. In this paper, we first demonstrate a surprising finding: Pretrained language models recognize, to a considerable degree, their undesirable biases and the toxicity of the content they produce. We refer to this capability as self-diagnosis. Based on this finding, we then propose a decoding algorithm that, given only a textual description of the undesired behavior, reduces the probability of a language model producing problematic text. We refer to this approach as self-debiasing. Self-debiasing does not rely on manually curated word lists, nor does it require any training data or changes to the model’s parameters. While we by no means eliminate the issue of language models generating biased text, we believe our approach to be an important step in this direction.1


Sign in / Sign up

Export Citation Format

Share Document