Just Add Functions: A Neural-Symbolic Language Model

David Demeter; Doug Downey

doi:10.1609/aaai.v34i05.6264

Just Add Functions: A Neural-Symbolic Language Model

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6264 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7634-7642

Author(s):

David Demeter ◽

Doug Downey

Keyword(s):

Probability Distributions ◽

Language Model ◽

Training Data ◽

Language Models ◽

Symbolic Language ◽

Inductive Bias ◽

Second Nature ◽

Word Classes ◽

Network Language ◽

Improving Accuracy

Neural network language models (NNLMs) have achieved ever-improving accuracy due to more sophisticated architectures and increasing amounts of training data. However, the inductive bias of these models (formed by the distributional hypothesis of language), while ideally suited to modeling most running text, results in key limitations for today's models. In particular, the models often struggle to learn certain spatial, temporal, or quantitative relationships, which are commonplace in text and are second-nature for human readers. Yet, in many cases, these relationships can be encoded with simple mathematical or logical expressions. How can we augment today's neural models with such encodings?In this paper, we propose a general methodology to enhance the inductive bias of NNLMs by incorporating simple functions into a neural architecture to form a hierarchical neural-symbolic language model (NSLM). These functions explicitly encode symbolic deterministic relationships to form probability distributions over words. We explore the effectiveness of this approach on numbers and geographic locations, and show that NSLMs significantly reduce perplexity in small-corpus language modeling, and that the performance improvement persists for rare tokens even on much larger corpora. The approach is simple and general, and we discuss how it can be applied to other word classes beyond numbers and geography.

Download Full-text

HATIMU AISYAH KARYA ZURINAH HASSAN MENERUSI PERSPEKTIF ELAINE SHOWALTER MODEL BAHASA

International Journal of Creative Future and Heritage (TENIAT) ◽

10.47252/teniat.v8i2.296 ◽

2020 ◽

Vol 8 (2) ◽

pp. 54-62

Author(s):

NUR ZALIKHA MAT RADZI ◽

NASIRIN ABDILLAH ◽

DAENG HALIZA DAENG JAMAL

Keyword(s):

Language Model ◽

Southeast Asian ◽

Language Models ◽

Symbolic Language ◽

Female Characters ◽

Literary Works ◽

The Past ◽

Historical Practices

Hatimu Aisyah karya Sasterawan Negara ke-13 iaitu - Zurinah Hassan, yang juga penerima Anugerah Hadiah Penulis Asia Tenggara (SEA Write Award) pada tahun 2004. Rentetan kejayaan beliau, telah menjadi tumpuan para pengkaji untuk meneliti aspek mengenai pengarangan wanita. Hatimu Aisyah merupakan novel pertama dihasilkan oleh Zurinah Hassan yang menekankan mengenai amalan adat resam zaman terdahulu sehingga ditelan arus pemodenan zaman. Novel Hatimu Aisyah mengetengahkan gambaran wanita yang mengutamakan adat dalam konteks perjalanan hidup bermasyarakat. Kajian terhadap karya Zurinah Hassan ini, bersandarkan kepada Model Bahasa Gagasan Elaine Showalter dari perspektif ginokritik untuk melihat watak-watak wanita. Antara Perbincangan dalam kajian ini adalah berfokuskan kepada simbolik bahasa dan bahasa sebagai ekspresi kesedaran wanita. Hasil dapatan keseluruhan kajian menunjukkan bahawa Zurinah Hassan menggunakan bahasa yang bersesuaian dengan gagasan bahasa daripada Elaine Showalter tetapi agak kurang menyerlah. Hal ini disebabkan keterbatasan penggunaan bahasa selaras dengan sosiobudaya masyarakat Melayu. Penemuan kajian ini dalam model bahasa wanita dapat dilihat menerusi simbolik bahasa dan bahasa sebagai ekspresi kesedaran wanita. Hasil manfaat dan kepentingan diperolehi masa hadapan dapat dilihat bahawa golongan wanita menzahirkan protes dan kritikan menerusi corak penulisan karya mereka meskipun masih dalam keadaan terkawal. Hatimu Aisyah the 13th National literary works, namely-Zurinah Hassan, who is also the recipient of the Southeast Asian Writer award (SEA Write Award) in 2004. His success string has been the focus of researchers to examine the aspects of women's writings. Hatimu Aisyah is the first novel to be produced by Zurinah Hassan that emphasizes on the historical practices of the past, having swallowed the current modernization of the day. The Hatimu Aisyah Novel highlights the portrayal of women who are customcentric in the context of the communities life. Studies on Zurinah Hassan's work are based on the language Model of Elaine Showalter from the perspective of Ginokritik to see the female characters. Among the discussions in this study are focused on symbolic language and language as a expression of women's awareness. The overall findings of the study showed that Zurinah Hassan used a language that fits the language idea of Elaine Showalter but was somewhat less striking. This is due to the limitations of usage in line with the Malay social. The findings of this study in female language models can be seen through the symbolic language and language in the expression of women's awareness. The results of the benefits and interests gained future can be seen that women are in their protest and criticism through their work writing patterns despite being controlled.

Download Full-text

On efficient training of word classes and their application to recurrent neural network language models

10.21437/interspeech.2015-345 ◽

2015 ◽

Author(s):

Rami Botros ◽

Kazuki Irie ◽

Martin Sundermeyer ◽

Hermann Ney

Keyword(s):

Neural Network ◽

Recurrent Neural Network ◽

Language Models ◽

Word Classes ◽

Network Language

Download Full-text

Injecting Event Knowledge into Pre-Trained Language Models for Event Extraction

10.5121/csit.2020.101404 ◽

2020 ◽

Author(s):

Zining Yang ◽

Siyu Zhan ◽

Mengshu Hou ◽

Xiaoyang Zeng ◽

Hao Zhu

Keyword(s):

Language Model ◽

Empirical Evaluation ◽

Event Extraction ◽

Training Data ◽

Language Models ◽

Extraction System ◽

Training Dataset ◽

Great Success ◽

Event Knowledge ◽

Event Trigger

The recent pre-trained language model has made great success in many NLP tasks. In this paper, we propose an event extraction system based on the novel pre-trained language model BERT to extract both event trigger and argument. As a deep-learningbased method, the size of the training dataset has a crucial impact on performance. To address the lacking training data problem for event extraction, we further train the pretrained language model with a carefully constructed in-domain corpus to inject event knowledge to our event extraction system with minimal efforts. Empirical evaluation on the ACE2005 dataset shows that injecting event knowledge can significantly improve the performance of event extraction.

Download Full-text

Analysis and Evaluation of Language Models for Word Sense Disambiguation

Computational Linguistics ◽

10.1162/coli_a_00405 ◽

2021 ◽

pp. 1-55

Author(s):

Daniel Loureiro ◽

Kiamehr Rezaee ◽

Mohammad Taher Pilehvar ◽

Jose Camacho-Collados

Keyword(s):

Feature Extraction ◽

Word Sense Disambiguation ◽

Language Model ◽

Training Data ◽

Fine Tuning ◽

Language Models ◽

Coarse Grained ◽

Word Sense ◽

Sense Disambiguation ◽

High Level

Abstract Transformer-based language models have taken many fields in NLP by storm. BERT and its derivatives dominate most of the existing evaluation benchmarks, including those for Word Sense Disambiguation (WSD), thanks to their ability in capturing context-sensitive semantic nuances. However, there is still little knowledge about their capabilities and potential limitations in encoding and recovering word senses. In this article, we provide an in-depth quantitative and qualitative analysis of the celebrated BERT model with respect to lexical ambiguity. One of the main conclusions of our analysis is that BERT can accurately capture high-level sense distinctions, even when a limited number of examples is available for each word sense. Our analysis also reveals that in some cases language models come close to solving coarse-grained noun disambiguation under ideal conditions in terms of availability of training data and computing resources. However, this scenario rarely occurs in real-world settings and, hence, many practical challenges remain even in the coarse-grained setting. We also perform an in-depth comparison of the two main language model based WSD strategies, i.e., fine-tuning and feature extraction, finding that the latter approach is more robust with respect to sense bias and it can better exploit limited available training data. In fact, the simple feature extraction strategy of averaging contextualized embeddings proves robust even using only three training sentences per word sense, with minimal improvements obtained by increasing the size of this training data.

Download Full-text

Deep indexed active learning for matching heterogeneous entity representations

Proceedings of the VLDB Endowment ◽

10.14778/3485450.3485455 ◽

2021 ◽

Vol 15 (1) ◽

pp. 31-45

Author(s):

Arjit Jain ◽

Sunita Sarawagi ◽

Prithviraj Sen

Keyword(s):

Active Learning ◽

Committee Member ◽

Language Model ◽

Cartesian Product ◽

Rule Learning ◽

Search Space ◽

Training Data ◽

Language Models ◽

Passive Learning ◽

Benchmark Datasets

Given two large lists of records, the task in entity resolution (ER) is to find the pairs from the Cartesian product of the lists that correspond to the same real world entity. Typically, passive learning methods on such tasks require large amounts of labeled data to yield useful models. Active Learning is a promising approach for ER in low resource settings. However, the search space, to find informative samples for the user to label, grows quadratically for instance-pair tasks making active learning hard to scale. Previous works, in this setting, rely on hand-crafted predicates, pre-trained language model embeddings, or rule learning to prune away unlikely pairs from the Cartesian product. This blocking step can miss out on important regions in the product space leading to low recall. We propose DIAL, a scalable active learning approach that jointly learns embeddings to maximize recall for blocking and accuracy for matching blocked pairs. DIAL uses an Index-By-Committee framework, where each committee member learns representations based on powerful pre-trained transformer language models. We highlight surprising differences between the matcher and the blocker in the creation of the training data and the objective used to train their parameters. Experiments on five benchmark datasets and a multilingual record matching dataset show the effectiveness of our approach in terms of precision, recall and running time.

Download Full-text

Assessment of Word-Level Neural Language Models for Sentence Completion

Applied Sciences ◽

10.3390/app10041340 ◽

2020 ◽

Vol 10 (4) ◽

pp. 1340

Author(s):

Heewoong Park ◽

Jonghun Park

Keyword(s):

Language Model ◽

Fine Tuning ◽

Language Models ◽

Sentence Completion ◽

Korean Language ◽

Learning Framework ◽

Scholastic Aptitude ◽

Word Level ◽

Network Language ◽

Comprehensive Study

The task of sentence completion, which aims to infer the missing text of a given sentence, was carried out to assess the reading comprehension level of machines as well as humans. In this work, we conducted a comprehensive study of various approaches for the sentence completion based on neural language models, which have been advanced in recent years. First, we revisited the recurrent neural network language model (RNN LM), achieving highly competitive results with an appropriate network structure and hyper-parameters. This paper presents a bidirectional version of RNN LM, which surpassed the previous best results on Microsoft Research (MSR) Sentence Completion Challenge and the Scholastic Aptitude Test (SAT) sentence completion questions. In parallel with directly applying RNN LM to sentence completion, we also employed a supervised learning framework that fine-tunes a large pre-trained transformer-based LM with a few sentence-completion examples. By fine-tuning a pre-trained BERT model, this work established state-of-the-art results on the MSR and SAT sets. Furthermore, we performed similar experimentation on newly collected cloze-style questions in the Korean language. The experimental results reveal that simply applying the multilingual BERT models for the Korean dataset was not satisfactory, which leaves room for further research.

Download Full-text

Part-of-Speech Tagging

10.1093/oxfordhb/9780199276349.013.0011 ◽

2012 ◽

Author(s):

Atro Voutilainen

Keyword(s):

Markov Models ◽

Language Model ◽

Language Models ◽

Symbolic Language ◽

Part Of Speech Tagging ◽

Part Of Speech ◽

Text Corpora ◽

History Of ◽

General Architecture ◽

Speech Tagging

This article outlines the recently used methods for designing part-of-speech taggers; computer programs for assigning contextually appropriate grammatical descriptors to words in texts. It begins with the description of general architecture and task setting. It gives an overview of the history of tagging and describes the central approaches to tagging. These approaches are: taggers based on handwritten local rules, taggers based on n-grams automatically derived from text corpora, taggers based on hidden Markov models, taggers using automatically generated symbolic language models derived using methods from machine tagging, taggers based on handwritten global rules, and hybrid taggers, which combine the advantages of handwritten and automatically generated taggers. This article focuses on handwritten tagging rules. Well-tagged training corpora are a valuable resource for testing and improving language model. The text corpus reminds the grammarian about any oversight while designing a rule.

Download Full-text

OxLM: A Neural Language Modelling Framework for Machine Translation

Prague Bulletin of Mathematical Linguistics ◽

10.2478/pralin-2014-0016 ◽

2014 ◽

Vol 102 (1) ◽

pp. 81-92 ◽

Cited By ~ 1

Author(s):

Baltescu Paul ◽

Blunsom Phil ◽

Hoang Hieu

Keyword(s):

Machine Translation ◽

Language Model ◽

Computational Cost ◽

Training Data ◽

Language Models ◽

Training Algorithm ◽

Beam Search ◽

Modelling Framework ◽

Language Modelling ◽

N Gram

Abstract This paper presents an open source implementation1 of a neural language model for machine translation. Neural language models deal with the problem of data sparsity by learning distributed representations for words in a continuous vector space. The language modelling probabilities are estimated by projecting a word's context in the same space as the word representations and by assigning probabilities proportional to the distance between the words and the context's projection. Neural language models are notoriously slow to train and test. Our framework is designed with scalability in mind and provides two optional techniques for reducing the computational cost: the so-called class decomposition trick and a training algorithm based on noise contrastive estimation. Our models may be extended to incorporate direct n-gram features to learn weights for every n-gram in the training data. Our framework comes with wrappers for the cdec and Moses translation toolkits, allowing our language models to be incorporated as normalized features in their decoders (inside the beam search).

Download Full-text

Transformer-based deep neural network language models for Alzheimer’s disease detection from targeted speech

10.21203/rs.3.rs-49267/v2 ◽

2020 ◽

Author(s):

Alireza Roshanzamir ◽

Hamid Aghajan ◽

Mahdieh Soleymani Baghshah

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Large Datasets ◽

Language Models ◽

Classification Model ◽

Picture Description ◽

Network Language

Abstract Background: We developed transformer-based deep learning models based on natural language processing for early diagnosis of Alzheimer’s disease from the picture description test.Methods: The lack of large datasets poses the most important limitation for using complex models that do not require feature engineering. Transformer-based pre-trained deep language models have recently made a large leap in NLP research and application. These models are pre-trained on available large datasets to understand natural language texts appropriately, and are shown to subsequently perform well on classification tasks with small training sets. The overall classification model is a simple classifier on top of the pre-trained deep language model.Results: The models are evaluated on picture description test transcripts of the Pitt corpus, which contains data of 170 AD patients with 257 interviews and 99 healthy controls with 243 interviews. The large bidirectional encoder representations from transformers (BERTLarge) embedding with logistic regression classifier achieves classification accuracy of 88.08%, which improves thestate-of-the-art by 2.48%.Conclusions: Using pre-trained language models can improve AD prediction. This not only solves the problem of lack of sufficiently large datasets, but also reduces the need for expert-defined features.

Download Full-text

Language Models Application in Sentiment Attitude Extraction Task

Proceedings of the Institute for System Programming of RAS ◽

10.15514/ispras-2021-33(3)-14 ◽

2021 ◽

Vol 33 (3) ◽

pp. 199-222

Author(s):

Nicolay Leonidovich Rusnachenko

Keyword(s):

Mass Media ◽

Language Model ◽

Training Data ◽

Language Models ◽

Negative Effects ◽

Named Entities ◽

Distant Supervision ◽

Lexical Resource ◽

Attitude Extraction ◽

Over The Top

Large text can convey various forms of sentiment information including the author’s position, positive or negative effects of some events, attitudes of mentioned entities towards to each other. In this paper, we experiment with BERT based language models for extracting sentiment attitudes between named entities. Given a mass media article and list of mentioned named entities, the task is to ex tract positive or negative attitudes between them. Efficiency of language model methods depends on the amount of training data. To enrich training data, we adopt distant supervision method, which provide automatic annotation of unlabeled texts using an additional lexical resource. The proposed approach is subdivided into two stages FRAME-BASED: (1) sentiment pairs list completion (PAIR-BASED), (2) document annotations using PAIR-BASED and FRAME-BASED factors. Being applied towards a large news collection, the method generates RuAttitudes2017 automatically annotated collection. We evaluate the approach on RuSentRel-1.0, consisted of mass media articles written in Russian. Adopting RuAttitudes2017 in the training process results in 10-13% quality improvement by F1-measure over supervised learning and by 25% over the top neural network based model results.

Download Full-text