Lexical predictability during natural reading: Effects of surprisal and entropy reduction

Mapping Intimacies ◽

10.31234/osf.io/6f4wq ◽

2017 ◽

Author(s):

Matthew Lowder ◽

Wonil Choi ◽

Fernanda Ferreira ◽

John Henderson

Keyword(s):

Language Processing ◽

Sentence Processing ◽

Large Scale ◽

Language Models ◽

Information Complexity ◽

Complexity Metrics ◽

Processing Times ◽

Entropy Reduction ◽

Theory Of Language ◽

Word Predictability

What are the effects of word-by-word predictability on sentence processing times during the natural reading of a text? Although information-complexity metrics such as surprisal and entropy reduction have been useful in addressing this question, these metrics tend to be estimated using computational language models, which require some degree of commitment to a particular theory of language processing. Taking a different approach, the current study implemented a large-scale cumulative cloze task to collect word-by-word predictability data for 40 passages and compute surprisal and entropy reduction values in a theory-neutral manner. A separate group of participants read the same texts while their eye movements were recorded. Results showed that increases in surprisal and entropy reduction were both associated with increases in reading times. Further, these effects did not depend on the global difficulty of the text. The findings suggest that surprisal and entropy reduction independently contribute to variation in reading times, as these metrics seem to capture different aspects of lexical predictability.

Download Full-text

Comparing gated and simple recurrent neural network architectures as models of human sentence processing

10.31234/osf.io/wec74 ◽

2018 ◽

Author(s):

Christoph Aurnhammer ◽

Stefan L. Frank

Keyword(s):

Language Processing ◽

Sentence Processing ◽

Language Model ◽

Cell Types ◽

Recurrent Network ◽

Cognitive Models ◽

Language Models ◽

Model Quality ◽

Sentence Reading ◽

Human Sentence Processing

The Simple Recurrent Network (SRN) has a long tradition in cognitive models of language processing. More recently, gated recurrent networks have been proposed that often outperform the SRN on natural language processing tasks. Here, we investigate whether two types of gated networks perform better as cognitive models of sentence reading than SRNs, beyond their advantage as language models.This will reveal whether the filtering mechanism implemented in gated networks corresponds to an aspect of human sentence processing.We train a series of language models differing only in the cell types of their recurrent layers. We then compute word surprisal values for stimuli used in self-paced reading, eye-tracking, and electroencephalography experiments, and quantify the surprisal values' fit to experimental measures that indicate human sentence reading effort.While the gated networks provide better language models, they do not outperform their SRN counterpart as cognitive models when language model quality is equal across network types. Our results suggest that the different architectures are equally valid as models of human sentence processing.

Download Full-text

Language Comprehension, Emotion, and Sociality

The Oxford Handbook of Psycholinguistics ◽

10.1093/oxfordhb/9780198786825.013.28 ◽

2018 ◽

pp. 643-670

Author(s):

Jos J. A. van Berkum

Keyword(s):

Language Processing ◽

Sentence Processing ◽

Language Comprehension ◽

Wide Scope ◽

What Is Said ◽

Pragmatic Analysis ◽

Theory Of Language ◽

And Function ◽

Scope Processing

This chapter on language comprehension, emotion, and sociality presents a theory of language processing that goes beyond the usual focus on constructing representations of what is said and meant, and that explicitly models how such construction processes mesh with emotion. It starts by asking why research on the interface between language and emotion is relatively marginal in psycholinguistics, and subsequently reviews current ideas on the nature and function of emotion (covering short-lived emotions, evaluations, and mood). Next, it presents the Affective Language Comprehension or ALC model, a wide-scope processing model that combines insights from the psycholinguistics of word and sentence processing, the pragmatic analysis of communication, and emotion science. The model accommodates verbal and non-verbal (e.g. emoji) signing, and provides a principled take on word valence. By examining how linguistic and other signs actually move people, it also adds to our understanding of the relation between language and human sociality.

Download Full-text

Language processing in brains and deep neural networks: computational convergence and its limits

10.1101/2020.07.03.186288 ◽

2020 ◽

Author(s):

Charlotte Caucheteux ◽

Jean-Rémi King

Keyword(s):

Neural Networks ◽

Language Processing ◽

Sentence Processing ◽

Brain Activity ◽

Language Models ◽

Modern Language ◽

Lexical Representations ◽

Brain Responses ◽

Substantial Progress ◽

Learning Principles

AbstractDeep learning has recently allowed substantial progress in language tasks such as translation and completion. Do such models process language similarly to humans, and is this similarity driven by systematic structural, functional and learning principles? To address these issues, we tested whether the activations of 7,400 artificial neural networks trained on image, word and sentence processing linearly map onto the hierarchy of human brain responses elicited during a reading task, using source-localized magneto-encephalography (MEG) recordings of one hundred and four subjects. Our results confirm that visual, word and language models sequentially correlate with distinct areas of the left-lateralized cortical hierarchy of reading. However, only specific subsets of these models converge towards brain-like representations during their training. Specifically, when the algorithms are trained on language modeling, their middle layers become increasingly similar to the late responses of the language network in the brain. By contrast, input and output word embedding layers often diverge away from brain activity during training. These differences are primarily rooted in the sustained and bilateral responses of the temporal and frontal cortices. Together, these results suggest that the compositional - but not the lexical - representations of modern language models converge to a brain-like solution.

Download Full-text

A Pre-Training Technique to Localize Medical BERT and to Enhance Biomedical BERT

10.21203/rs.3.rs-103477/v1 ◽

2020 ◽

Author(s):

Shoya Wada ◽

Toshihiro Takeda ◽

Shiro Manabe ◽

Shozo Konishi ◽

Jun Kamohara ◽

...

Keyword(s):

Language Processing ◽

High Performance ◽

Large Scale ◽

Language Models ◽

Free Text ◽

Medical Databases ◽

Large Size ◽

Training Technique ◽

Medical Domain ◽

Medical Document

Abstract Background: Pre-training large-scale neural language models on raw texts has been shown to make a significant contribution to a strategy for transfer learning in natural language processing (NLP). With the introduction of transformer-based language models, such as Bidirectional Encoder Representations from Transformers (BERT), the performance of information extraction from free text by NLP has significantly improved for both the general domain and the medical domain; however, it is difficult for languages in which there are few publicly available medical databases with a high quality and a large size to train medical BERT models that perform well.Method: We introduce a method to train a BERT model using a small medical corpus both in English and in Japanese. Our proposed method consists of two interventions: simultaneous pre-training, which is intended to encourage masked language modeling and next-sentence prediction on the small medical corpus, and amplified vocabulary, which helps with suiting the small corpus when building the customized corpus by byte-pair encoding. Moreover, we used whole PubMed abstracts and developed a high-performance BERT model, Bidirectional Encoder Representations from Transformers for Biomedical Text Mining by Osaka University (ouBioBERT), in English via our method. We then evaluated the performance of our BERT models and publicly available baselines and compared them.Results: We confirmed that our Japanese medical BERT outperforms conventional baselines and the other BERT models in terms of the medical-document classification task and that our English BERT pre-trained using both the general and medical domain corpora performs sufficiently for practical use in terms of the biomedical language understanding evaluation (BLUE) benchmark. Moreover, ouBioBERT shows that the total score of the BLUE benchmark is 1.1 points above that of BioBERT and 0.3 points above that of the ablation model trained without our proposed method.Conclusions: Our proposed method makes it feasible to construct a practical medical BERT model in both Japanese and English, and it has a potential to produce higher performing models for biomedical shared tasks.

Download Full-text

What Does a Language-And-Vision Transformer See: The Impact of Semantic Information on Visual Representations

Frontiers in Artificial Intelligence ◽

10.3389/frai.2021.767971 ◽

2021 ◽

Vol 4 ◽

Author(s):

Nikolai Ilinykh ◽

Simon Dobnik

Keyword(s):

Neural Networks ◽

Language Processing ◽

Large Scale ◽

Visual Representations ◽

Language Models ◽

Visual Stream ◽

Visual Knowledge ◽

Language And Vision ◽

The Impact ◽

Image Descriptions

Neural networks have proven to be very successful in automatically capturing the composition of language and different structures across a range of multi-modal tasks. Thus, an important question to investigate is how neural networks learn and organise such structures. Numerous studies have examined the knowledge captured by language models (LSTMs, transformers) and vision architectures (CNNs, vision transformers) for respective uni-modal tasks. However, very few have explored what structures are acquired by multi-modal transformers where linguistic and visual features are combined. It is critical to understand the representations learned by each modality, their respective interplay, and the task’s effect on these representations in large-scale architectures. In this paper, we take a multi-modal transformer trained for image captioning and examine the structure of the self-attention patterns extracted from the visual stream. Our results indicate that the information about different relations between objects in the visual stream is hierarchical and varies from local to a global object-level understanding of the image. In particular, while visual representations in the first layers encode the knowledge of relations between semantically similar object detections, often constituting neighbouring objects, deeper layers expand their attention across more distant objects and learn global relations between them. We also show that globally attended objects in deeper layers can be linked with entities described in image descriptions, indicating a critical finding - the indirect effect of language on visual representations. In addition, we highlight how object-based input representations affect the structure of learned visual knowledge and guide the model towards more accurate image descriptions. A parallel question that we investigate is whether the insights from cognitive science echo the structure of representations that the current neural architecture learns. The proposed analysis of the inner workings of multi-modal transformers can be used to better understand and improve on such problems as pre-training of large-scale multi-modal architectures, multi-modal information fusion and probing of attention weights. In general, we contribute to the explainable multi-modal natural language processing and currently shallow understanding of how the input representations and the structure of the multi-modal transformer affect visual representations.

Download Full-text

The great transformer: Examining the role of large language models in the political economy of AI

Big Data & Society ◽

10.1177/20539517211047734 ◽

2021 ◽

Vol 8 (2) ◽

pp. 205395172110477

Author(s):

Dieuwertje Luitse ◽

Wiebke Denkena

Keyword(s):

Political Economy ◽

Language Processing ◽

Large Scale ◽

Language Models ◽

The Political ◽

Public Controversy ◽

Corporate Influence ◽

Technical Developments ◽

Environmental Footprints ◽

Ai Ethics

In recent years, AI research has become more and more computationally demanding. In natural language processing (NLP), this tendency is reflected in the emergence of large language models (LLMs) like GPT-3. These powerful neural network-based models can be used for a range of NLP tasks and their language generation capacities have become so sophisticated that it can be very difficult to distinguish their outputs from human language. LLMs have raised concerns over their demonstrable biases, heavy environmental footprints, and future social ramifications. In December 2020, critical research on LLMs led Google to fire Timnit Gebru, co-lead of the company’s AI Ethics team, which sparked a major public controversy around LLMs and the growing corporate influence over AI research. This article explores the role LLMs play in the political economy of AI as infrastructural components for AI research and development. Retracing the technical developments that have led to the emergence of LLMs, we point out how they are intertwined with the business model of big tech companies and further shift power relations in their favour. This becomes visible through the Transformer, which is the underlying architecture of most LLMs today and started the race for ever bigger models when it was introduced by Google in 2017. Using the example of GPT-3, we shed light on recent corporate efforts to commodify LLMs through paid API access and exclusive licensing, raising questions around monopolization and dependency in a field that is increasingly divided by access to large-scale computing power.

Download Full-text

Unsupervised Domain Adaptation of a Pretrained Cross-Lingual Language Model

Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2020/508 ◽

2020 ◽

Author(s):

Juntao Li ◽

Ruidan He ◽

Hai Ye ◽

Hwee Tou Ng ◽

Lidong Bing ◽

...

Keyword(s):

Language Processing ◽

Large Scale ◽

Language Model ◽

Language Models ◽

Low Resource ◽

Performance Improvements ◽

Domain Specific ◽

High Resource ◽

Significant Performance ◽

Cross Lingual

Recent research indicates that pretraining cross-lingual language models on large-scale unlabeled texts yields significant performance improvements over various cross-lingual and low-resource tasks. Through training on one hundred languages and terabytes of texts, cross-lingual language models have proven to be effective in leveraging high-resource languages to enhance low-resource language processing and outperform monolingual models. In this paper, we further investigate the cross-lingual and cross-domain (CLCD) setting when a pretrained cross-lingual language model needs to adapt to new domains. Specifically, we propose a novel unsupervised feature decomposition method that can automatically extract domain-specific features and domain-invariant features from the entangled pretrained cross-lingual representations, given unlabeled raw texts in the source language. Our proposed model leverages mutual information estimation to decompose the representations computed by a cross-lingual model into domain-invariant and domain-specific parts. Experimental results show that our proposed method achieves significant performance improvements over the state-of-the-art pretrained cross-lingual language model in the CLCD setting.

Download Full-text

FPM: A Collection of Large-scale Foundation Pre-trained Language Models

10.21203/rs.3.rs-1061146/v1 ◽

2021 ◽

Author(s):

Dezhou Shen

Keyword(s):

Language Processing ◽

Large Scale ◽

Language Model ◽

Language Models ◽

Classification Task ◽

Accuracy Rate ◽

Basic Model ◽

Transformer Model ◽

Effective Models ◽

Param Eters

Abstract Recent work in language modeling has shown that train- ing large-scale Transformer models has promoted the lat- est developments in natural language processing applica- tions. However, there is very little work to unify the cur- rent effective models. In this work, we use the current ef- fective model structure to launch a model set through the current most mainstream technology. We think this will become the basic model in the future. For Chinese, us- ing the GPT-2[9] model, a 10.3 billion parameter language model was trained on the Chinese dataset, and, in particu- lar, a 2.9 billion parameter language model based on dia- logue data was trained; the BERT model was trained on the Chinese dataset with 495 million parameters; the Trans- former model has trained a language model with 5.6 bil- lion parameters on the Chinese dataset. In English, cor- responding training work has also been done. Using the GPT-2 model, a language model with 6.4 billion param- eters was trained on the English dataset; the BERT[3] model trained a language model with 1.24 billion param- eters on the English dataset, and in particular, it trained a 688 million parameter based on single card training tech- nology Language model; Transformer model trained a lan- guage model with 5.6 billion parameters on the English dataset. In the TNEWS classification task evaluated by CLUE[13], the BERT-C model exceeded the 59.46% accu- racy of ALBERT-xxlarge with an accuracy rate of 59.99%, an increase of 0.53%. In the QQP classification task evalu- ated by GLUE[11], the accuracy rate of 78.95% surpassed the accuracy rate of BERT-Large of 72.1%, an increase of 6.85%. Compared with the current accuracy rate of ERNIE, the first place in the GLUE evaluation of 75.2%, an increase of 3.75%.

Download Full-text

Unsupervised cross-lingual model transfer for named entity recognition with contextualized word representations

PLoS ONE ◽

10.1371/journal.pone.0257230 ◽

2021 ◽

Vol 16 (9) ◽

pp. e0257230

Author(s):

Huijiong Yan ◽

Tao Qian ◽

Liang Xie ◽

Shanguang Chen

Keyword(s):

Language Processing ◽

Large Scale ◽

Named Entity Recognition ◽

Entity Recognition ◽

Language Models ◽

Competitive Performance ◽

Named Entity ◽

Model Transfer ◽

The Cross ◽

Cross Lingual

Named entity recognition (NER) is one fundamental task in the natural language processing (NLP) community. Supervised neural network models based on contextualized word representations can achieve highly-competitive performance, which requires a large-scale manually-annotated corpus for training. While for the resource-scarce languages, the construction of such as corpus is always expensive and time-consuming. Thus, unsupervised cross-lingual transfer is one good solution to address the problem. In this work, we investigate the unsupervised cross-lingual NER with model transfer based on contextualized word representations, which greatly advances the cross-lingual NER performance. We study several model transfer settings of the unsupervised cross-lingual NER, including (1) different types of the pretrained transformer-based language models as input, (2) the exploration strategies of the multilingual contextualized word representations, and (3) multi-source adaption. In particular, we propose an adapter-based word representation method combining with parameter generation network (PGN) better to capture the relationship between the source and target languages. We conduct experiments on a benchmark ConLL dataset involving four languages to simulate the cross-lingual setting. Results show that we can obtain highly-competitive performance by cross-lingual model transfer. In particular, our proposed adapter-based PGN model can lead to significant improvements for cross-lingual NER.

Download Full-text

Anonymization of German financial documents using neural network-based language models with contextual word representations

International Journal of Data Science and Analytics ◽

10.1007/s41060-021-00285-x ◽

2021 ◽

Author(s):

David Biesner ◽

Rajkumar Ramamurthy ◽

Robin Stenzel ◽

Max Lübbering ◽

Lars Hillebrand ◽

...

Keyword(s):

Language Processing ◽

Large Scale ◽

Business Processes ◽

Text Processing ◽

Neural Nets ◽

Language Models ◽

Sensitive Information ◽

Web Based ◽

Legal Documents ◽

Efficient Information

AbstractThe automatization and digitalization of business processes have led to an increase in the need for efficient information extraction from business documents. However, financial and legal documents are often not utilized effectively by text processing or machine learning systems, partly due to the presence of sensitive information in these documents, which restrict their usage beyond authorized parties and purposes. To overcome this limitation, we develop an anonymization method for German financial and legal documents using state-of-the-art natural language processing methods based on recurrent neural nets and transformer architectures. We present a web-based application to anonymize financial documents and a large-scale evaluation of different deep learning techniques.

Download Full-text