Comparing gated and simple recurrent neural network architectures as models of human sentence processing

Mapping Intimacies ◽

10.31234/osf.io/wec74 ◽

2018 ◽

Author(s):

Christoph Aurnhammer ◽

Stefan L. Frank

Keyword(s):

Language Processing ◽

Sentence Processing ◽

Language Model ◽

Cell Types ◽

Recurrent Network ◽

Cognitive Models ◽

Language Models ◽

Model Quality ◽

Sentence Reading ◽

Human Sentence Processing

The Simple Recurrent Network (SRN) has a long tradition in cognitive models of language processing. More recently, gated recurrent networks have been proposed that often outperform the SRN on natural language processing tasks. Here, we investigate whether two types of gated networks perform better as cognitive models of sentence reading than SRNs, beyond their advantage as language models.This will reveal whether the filtering mechanism implemented in gated networks corresponds to an aspect of human sentence processing.We train a series of language models differing only in the cell types of their recurrent layers. We then compute word surprisal values for stimuli used in self-paced reading, eye-tracking, and electroencephalography experiments, and quantify the surprisal values' fit to experimental measures that indicate human sentence reading effort.While the gated networks provide better language models, they do not outperform their SRN counterpart as cognitive models when language model quality is equal across network types. Our results suggest that the different architectures are equally valid as models of human sentence processing.

Download Full-text

Automated Source Code Generation and Auto-Completion Using Deep Learning: Comparing and Discussing Current Language Model-Related Approaches

AI ◽

10.3390/ai2010001 ◽

2021 ◽

Vol 2 (1) ◽

pp. 1-16

Author(s):

Juan Cruz-Benito ◽

Sanjay Vishwakarma ◽

Francisco Martin-Fernandez ◽

Ismael Faro

Keyword(s):

Deep Learning ◽

Learning Community ◽

Programming Languages ◽

Language Processing ◽

Code Generation ◽

Language Model ◽

Language Models ◽

Stochastic Gradient Descent ◽

Network Architectures ◽

Learning Architectures

In recent years, the use of deep learning in language models has gained much attention. Some research projects claim that they can generate text that can be interpreted as human writing, enabling new possibilities in many application areas. Among the different areas related to language processing, one of the most notable in applying this type of modeling is programming languages. For years, the machine learning community has been researching this software engineering area, pursuing goals like applying different approaches to auto-complete, generate, fix, or evaluate code programmed by humans. Considering the increasing popularity of the deep learning-enabled language models approach, we found a lack of empirical papers that compare different deep learning architectures to create and use language models based on programming code. This paper compares different neural network architectures like Average Stochastic Gradient Descent (ASGD) Weight-Dropped LSTMs (AWD-LSTMs), AWD-Quasi-Recurrent Neural Networks (QRNNs), and Transformer while using transfer learning and different forms of tokenization to see how they behave in building language models using a Python dataset for code generation and filling mask tasks. Considering the results, we discuss each approach’s different strengths and weaknesses and what gaps we found to evaluate the language models or to apply them in a real programming context.

Download Full-text

Astrid

Proceedings of the VLDB Endowment ◽

10.14778/3436905.3436907 ◽

2020 ◽

Vol 14 (4) ◽

pp. 471-484

Author(s):

Suraj Shetiya ◽

Saravanan Thirumuruganathan ◽

Nick Koudas ◽

Gautam Das

Keyword(s):

Deep Learning ◽

Objective Function ◽

Pattern Matching ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Selectivity Estimation ◽

Statistical Correlations ◽

Benchmark Datasets ◽

Traditional Approaches

Accurate selectivity estimation for string predicates is a long-standing research challenge in databases. Supporting pattern matching on strings (such as prefix, substring, and suffix) makes this problem much more challenging, thereby necessitating a dedicated study. Traditional approaches often build pruned summary data structures such as tries followed by selectivity estimation using statistical correlations. However, this produces insufficiently accurate cardinality estimates resulting in the selection of sub-optimal plans by the query optimizer. Recently proposed deep learning based approaches leverage techniques from natural language processing such as embeddings to encode the strings and use it to train a model. While this is an improvement over traditional approaches, there is a large scope for improvement. We propose Astrid, a framework for string selectivity estimation that synthesizes ideas from traditional and deep learning based approaches. We make two complementary contributions. First, we propose an embedding algorithm that is query-type (prefix, substring, and suffix) and selectivity aware. Consider three strings 'ab', 'abc' and 'abd' whose prefix frequencies are 1000, 800 and 100 respectively. Our approach would ensure that the embedding for 'ab' is closer to 'abc' than 'abd'. Second, we describe how neural language models could be used for selectivity estimation. While they work well for prefix queries, their performance for substring queries is sub-optimal. We modify the objective function of the neural language model so that it could be used for estimating selectivities of pattern matching queries. We also propose a novel and efficient algorithm for optimizing the new objective function. We conduct extensive experiments over benchmark datasets and show that our proposed approaches achieve state-of-the-art results.

Download Full-text

Better Word Representation Vectors Using Syllabic Alphabet: A Case Study of Swahili

Applied Sciences ◽

10.3390/app9183648 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3648

Author(s):

Casper S. Shikali ◽

Zhou Sijie ◽

Liu Qihe ◽

Refuoe Mokhosi

Keyword(s):

Language Processing ◽

Critical Role ◽

Language Model ◽

Central Africa ◽

Spoken Language ◽

Language Models ◽

Word Embeddings ◽

Word Representation

Deep learning has extensively been used in natural language processing with sub-word representation vectors playing a critical role. However, this cannot be said of Swahili, which is a low resource and widely spoken language in East and Central Africa. This study proposed novel word embeddings from syllable embeddings (WEFSE) for Swahili to address the concern of word representation for agglutinative and syllabic-based languages. Inspired by the learning methodology of Swahili in beginner classes, we encoded respective syllables instead of characters, character n-grams or morphemes of words and generated quality word embeddings using a convolutional neural network. The quality of WEFSE was demonstrated by the state-of-art results in the syllable-aware language model on both the small dataset (31.229 perplexity value) and the medium dataset (45.859 perplexity value), outperforming character-aware language models. We further evaluated the word embeddings using word analogy task. To the best of our knowledge, syllabic alphabets have not been used to compose the word representation vectors. Therefore, the main contributions of the study are a syllabic alphabet, WEFSE, a syllabic-aware language model and a word analogy dataset for Swahili.

Download Full-text

Towards computational models of multilingual sentence processing

10.31234/osf.io/kefmz ◽

2019 ◽

Author(s):

Stefan L. Frank

Keyword(s):

Computational Linguistics ◽

Sentence Processing ◽

Computational Models ◽

State Of The Art ◽

Cognitive Models ◽

The State ◽

Cognitive Modelling ◽

Current Review ◽

Multiple Languages ◽

Human Sentence Processing

Although computational models can simulate aspects of human sentence processing, research on this topic has remained almost exclusively limited to the single language case. The current review presents an overview of the state of the art in computational cognitive models of sentence processing, and discusses how recent sentence-processing models can be used to study bi- and multilingualism. Recent results from cognitive modelling and computational linguistics suggest that phenomena specific to bilingualism can emerge from systems that have no dedicated components for handling multiple languages. Hence, accounting for human bi-/multilingualism may not require models that are much more sophisticated than those for the monolingual case.

Download Full-text

HIDING CRITICAL INFORMATION WHEN TRAINING LANGUAGE MODELS

EurasianUnionScientists ◽

10.31618/esu.2413-9335.2021.1.86.1349 ◽

2021 ◽

pp. 15-18

Author(s):

A. Evtushenko

Keyword(s):

Natural Language ◽

Language Processing ◽

Text Processing ◽

Language Model ◽

Personal Data ◽

Language Models ◽

Training Dataset ◽

Critical Information ◽

Research Company ◽

Learning Language

Machine learning language models are combinations of algorithms and neural networks designed for text processing composed in natural language (Natural Language Processing, NLP). In 2020, the largest language model from the artificial intelligence research company OpenAI, GPT-3, was released, the maximum number of parameters of which reaches 175 billion. The parameterization of the model increased by more than 100 times made it possible to improve the quality of generated texts to a level that is hard to distinguish from human-written texts. It is noteworthy that this model was trained on a training dataset mainly collected from open sources on the Internet, the volume of which is estimated at 570 GB. This article discusses the problem of memorizing critical information, in particular, personal data of individual, at the stage of training large language models (GPT-2/3 and derivatives), and also describes an algorithmic approach to solving this problem, which consists in additional preprocessing training dataset and refinement of the model inference in the context of generating pseudo-personal data and embedding into the results of work on the tasks of summarization, text generation, formation of answers to questions and others from the field of seq2seq.

Download Full-text

Enhancing argumentation component classification using contextual language model

Journal Of Big Data ◽

10.1186/s40537-021-00490-2 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Hidayaturrahman ◽

Emmanuel Dave ◽

Derwin Suhartono ◽

Aniati Murni Arymurthy

Keyword(s):

Language Processing ◽

Language Model ◽

Language Models ◽

Context Sensitive ◽

Argumentation Mining ◽

Prediction Time ◽

Semantic Label ◽

Core Idea ◽

Increase In Accuracy ◽

Context Free

AbstractArguments facilitate humans to deliver their ideas. The outcome of the discussion heavily relies on the validity of the argument. If an argument is well-composed, it is more effective to grasp the core idea behind the argument. To grade the argument, machines can be utilized by decomposing into semantic label components. In natural language processing, multiple language models are available to perform this task. It is divided into context-free and contextual models. The majority of previous studies used hand-crafted features to perform argument component classification, while state of the art language models utilize machine learning. The majority of these language models ignore the context in an argument. This research paper aims to analyze whether by including the context in the classification process may improve the accuracy of the language model which will enhance the argumentation mining process as well. The same document corpus is fed into several language models. Word2Vec and GLoVe represent the context free models, while BERT and ELMo as context sensitive language models. Accuracy and time from each model are then compared to determine the importance of context. The result shows that contextual language models are proven to be able to boost classification accuracy by approximately 20%. However, time comes as a cost where contextual models require longer training and prediction time. The benefit from the increase in accuracy outweighs the burden of time. Thus, as a contextual task, argumentation mining is suggested to use contextual model where context must be included to achieve promising results.

Download Full-text

Neural Language Models Capture Some, But Not All, Agreement Attraction Effects

10.31234/osf.io/97qcg ◽

2020 ◽

Author(s):

Suhas Arehalli ◽

Tal Linzen

Keyword(s):

Real Time ◽

Language Processing ◽

Language Production ◽

Prediction Models ◽

Language Model ◽

Cognitive Model ◽

Language Models ◽

Word Prediction ◽

Wide Range ◽

Attraction Effects

The number of the subject in English must match the number of the corresponding verb (dog runs but dogs run). Yet in real-time language production and comprehension, speakers often mistakenly compute agreement between the verb and a grammatically irrelevant non-subject noun phrase instead. This phenomenon, referred to as agreement attraction, is modulated by a wide range of factors; any complete computational model of grammatical planning and comprehension would be expected to derive this rich empirical picture. Recent developments in Natural Language Processing have shown that neural networks trained only on word-prediction over large corpora are capable of capturing subject-verb agreement dependencies to a significant extent, but with occasional errors. The goal of this paper is to evaluate the potential of such neural word prediction models as a foundation for a cognitive model of real-time grammatical processing. We simulate six experiments taken from the agreement attraction literature with LSTMs, one common type of neural language model. The LSTMs captured the critical human behavior in three of them, indicating that (1) some agreement attraction phenomena can be captured by a generic sequence processing model, but (2) capturing the other phenomena may require models with more language-specific mechanisms

Download Full-text

Lexical predictability during natural reading: Effects of surprisal and entropy reduction

10.31234/osf.io/6f4wq ◽

2017 ◽

Author(s):

Matthew Lowder ◽

Wonil Choi ◽

Fernanda Ferreira ◽

John Henderson

Keyword(s):

Language Processing ◽

Sentence Processing ◽

Large Scale ◽

Language Models ◽

Information Complexity ◽

Complexity Metrics ◽

Processing Times ◽

Entropy Reduction ◽

Theory Of Language ◽

Word Predictability

What are the effects of word-by-word predictability on sentence processing times during the natural reading of a text? Although information-complexity metrics such as surprisal and entropy reduction have been useful in addressing this question, these metrics tend to be estimated using computational language models, which require some degree of commitment to a particular theory of language processing. Taking a different approach, the current study implemented a large-scale cumulative cloze task to collect word-by-word predictability data for 40 passages and compute surprisal and entropy reduction values in a theory-neutral manner. A separate group of participants read the same texts while their eye movements were recorded. Results showed that increases in surprisal and entropy reduction were both associated with increases in reading times. Further, these effects did not depend on the global difficulty of the text. The findings suggest that surprisal and entropy reduction independently contribute to variation in reading times, as these metrics seem to capture different aspects of lexical predictability.

Download Full-text

Language processing in brains and deep neural networks: computational convergence and its limits

10.1101/2020.07.03.186288 ◽

2020 ◽

Author(s):

Charlotte Caucheteux ◽

Jean-Rémi King

Keyword(s):

Neural Networks ◽

Language Processing ◽

Sentence Processing ◽

Brain Activity ◽

Language Models ◽

Modern Language ◽

Lexical Representations ◽

Brain Responses ◽

Substantial Progress ◽

Learning Principles

AbstractDeep learning has recently allowed substantial progress in language tasks such as translation and completion. Do such models process language similarly to humans, and is this similarity driven by systematic structural, functional and learning principles? To address these issues, we tested whether the activations of 7,400 artificial neural networks trained on image, word and sentence processing linearly map onto the hierarchy of human brain responses elicited during a reading task, using source-localized magneto-encephalography (MEG) recordings of one hundred and four subjects. Our results confirm that visual, word and language models sequentially correlate with distinct areas of the left-lateralized cortical hierarchy of reading. However, only specific subsets of these models converge towards brain-like representations during their training. Specifically, when the algorithms are trained on language modeling, their middle layers become increasingly similar to the late responses of the language network in the brain. By contrast, input and output word embedding layers often diverge away from brain activity during training. These differences are primarily rooted in the sustained and bilateral responses of the temporal and frontal cortices. Together, these results suggest that the compositional - but not the lexical - representations of modern language models converge to a brain-like solution.

Download Full-text

Inducing Relational Knowledge from BERT

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6242 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7456-7463 ◽

Cited By ~ 3

Author(s):

Zied Bouraoui ◽

Jose Camacho-Collados ◽

Steven Schockaert

Keyword(s):

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Language Model ◽

Language Models ◽

Word Embeddings ◽

Relational Knowledge ◽

Wide Range ◽

Fine Tune ◽

Standard Word

One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured by standard word embeddings. To explore this question, we propose a methodology for distilling relational knowledge from a pre-trained language model. Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation. We then use a subset of these extracted sentences as templates. Finally, we fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.

Download Full-text