Fine-tuning of deep language models as a computational framework of modeling listeners' perspective during language comprehension

Computational Deep Language Models (DLMs) have been shown to be effective in predicting neural responses during natural language processing. This study introduces a novel computational framework, based on the concept of fine-tuning (Hinton, 2007), for modeling differences in interpretation of narratives based on the listeners' perspective (i.e. their prior knowledge, thoughts, and beliefs). We draw on an fMRI experiment conducted by Yeshurun et al. (2017), in which two groups of listeners were listening to the same narrative but with two different perspectives (cheating versus paranoia). We collected a dedicated dataset of ~3000 stories, and used it to create two modified (fine-tuned) versions of a pre-trained DLM, each representing the perspective of a different group of listeners. Information extracted from each of the two fine-tuned models was better fitted with neural responses of the corresponding group of listeners. Furthermore, we show that the degree of difference between the listeners' interpretation of the story - as measured both neurally and behaviorally - can be approximated using the distances between the representations of the story extracted from these two fine-tuned models. These models-brain associations were expressed in many language-related brain areas, as well as in several higher-order areas related to the default-mode and the mentalizing networks, therefore implying that computational fine-tuning reliably captures relevant aspects of human language comprehension across different levels of cognitive processing.

Download Full-text

The Opposition of Surprisal and Semantic Similarity in the Prediction of Language Processing: Evidence from Eye-tracking Data

10.31234/osf.io/zypk9 ◽

2020 ◽

Author(s):

Kun Sun

Keyword(s):

Eye Tracking ◽

Semantic Similarity ◽

Cognitive Processing ◽

Language Processing ◽

Language Comprehension ◽

Word Processing ◽

Reading Time ◽

Computational Models ◽

Tracking Data ◽

Dynamic Approach

Expectations or predictions about upcoming content play an important role during language comprehension and processing. One important aspect of recent studies of language comprehension and processing concerns the estimation of the upcoming words in a sentence or discourse. Many studies have used eye-tracking data to explore computational and cognitive models for contextual word predictions and word processing. Eye-tracking data has previously been widely explored with a view to investigating the factors that influence word prediction. However, these studies are problematic on several levels, including the stimuli, corpora, statistical tools they applied. Although various computational models have been proposed for simulating contextual word predictions, past studies usually preferred to use a single computational model. The disadvantage of this is that it often cannot give an adequate account of cognitive processing in language comprehension. To avoid these problems, this study draws upon a massive natural and coherent discourse as stimuli in collecting the data on reading time. This study trains two state-of-art computational models (surprisal and semantic (dis)similarity from word vectors by linear discriminative learning (LDL)), measuring knowledge of both the syntagmatic and paradigmatic structure of language. We develop a `dynamic approach' to compute semantic (dis)similarity. It is the first time that these two computational models have been merged. Models are evaluated using advanced statistical methods. Meanwhile, in order to test the efficiency of our approach, one recently developed cosine method of computing semantic (dis)similarity based on word vectors data adopted is used to compare with our `dynamic' approach. The two computational and fixed-effect statistical models can be used to cross-verify the findings, thus ensuring that the result is reliable. All results support that surprisal and semantic similarity are opposed in the prediction of the reading time of words although both can make good predictions. Additionally, our `dynamic' approach performs better than the popular cosine method. The findings of this study are therefore of significance with regard to acquiring a better understanding how humans process words in a real-world context and how they make predictions in language cognition and processing.

Download Full-text

Text: An R-package for Analyzing and Visualizing Human Language Using Natural Language Processing and Deep Learning

10.31234/osf.io/293kt ◽

2021 ◽

Author(s):

Oscar Nils Erik Kjell ◽

H. Andrew Schwartz ◽

Salvatore Giorgi

Keyword(s):

Deep Learning ◽

Natural Language Processing ◽

Natural Language ◽

Language Processing ◽

Rating Scale ◽

State Of The Art ◽

R Package ◽

Language Models ◽

Categorical Variables ◽

Human Language

The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language such as machine translation. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (www.r-text.org), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. Text is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large datasets. This tutorial describes useful methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel techniques and analysis pipelines. The reader learns about six methods: 1) textEmbed: to transform text to traditional or modern transformer-based word embeddings (i.e., numeric representations of words); 2) textTrain: to examine the relationships between text and numeric/categorical variables; 3) textSimilarity and 4) textSimilarityTest: to computing semantic similarity scores between texts and significance test the difference in meaning between two sets of texts; and 5) textProjection and 6) textProjectionPlot: to examine and visualize text within the embedding space according to latent or specified construct dimensions (e.g., low to high rating scale scores).

Download Full-text

Beyond Recursion: Critique of Hauser, Chomsky, and Fitch

East European Journal of Psycholinguistics ◽

10.29038/eejpl.2017.4.2.tar ◽

2017 ◽

Vol 4 (2) ◽

pp. 58-66

Author(s):

Роман Тарабань ◽

Бандара Ахінта

Keyword(s):

New York ◽

Cognitive Processing ◽

Language Processing ◽

Sentence Processing ◽

Minimalist Program ◽

University Of Pennsylvania ◽

Human Language ◽

Human Communication ◽

Formal Properties ◽

And Control

In 2002, Hauser, Chomsky, and Fitch published an article in which they introduced a distinction between properties of language that are exclusively part of human communication (i.e., the FLN) and those properties that might be shared with other species (i.e., the FLB). The sole property proposed for the FLN was recursion. Hauser et al. provided evidence for their position based on issues of evolution. The question of the required properties of human language is central to developing theories of language processing and acquisition. In the present critique of Hauser et al. we consider two examples from non-English languages that argue against the suggestion that recursion is the sole property within the human language faculty. These are i) agreement of inflectional morphemes across sentence constructions, and ii) synthetic one-word constructions. References Adger, D. (2003). Core Syntax: A Minimalist Approach. Oxford: Oxford University Press. Bates, E., & MacWhinney, B. (1989). Functionalism and the Competition Model. In: The Crosslinguistic Study of Sentence Processing, (pp 3-76). B. MacWhinney and E. Bates (Eds.). New York: Cambridge University Press. Bickerton, D (2009). Recursion: core of complexity or artifact of analysis? In: Syntactic Complexity: Diachrony, Acquisition, Neuro-Cognition, Evolution, (pp. 531–543). T. Givón and M. Shibatani (Eds.). Amsterdam: John Benjamins. Chomsky, N. (1957). Syntactic Structures (2nd edition published in 2002). Berlin: Mouton Chomsky, N. (1959). On certain formal properties of grammars. Information and Control, 2, 137–167. Chomsky, N. (1995). The Minimalist Program for Linguistic Theory. Cambridge, MA: MIT Press. Hauser, M. D., Chomsky, N., Fitch, W. T. (2002). The faculty of language: What it is, who has it, and how did it evolve? Science, 298, 1569-1579. Luuk, E., & Luuk, H. (2011). The redundancy of recursion and infinity for natural language. Cognitive Processing 12, 1–11. Marantz, A. (1997). No escape from syntax: Don't try morphological analysis in the privacy of your own lexicon. University of Pennsylvania Working Papers in Linguistics, 4(2), A. Dimitriadis, L. Siegel, et. al. (eds.), 201- 225. MacWhinney, B. & O’Grady, W. (Eds.) (2015). Handbook of Language Emergence. New York: Wiley. Nevins, A., Pesetsky, D., & Rodrigues, C. (2009). Pirahã exceptionality: A reassessment. Language, 85(2), 355–404. Ott, D. (2009). The evolution of I-language: Lexicalization as the key evolutionary novelty. Biolinguistics, 3, 255–269. Sauerland, U., & Trotzke, A. (2011). Biolinguistic perspectives on recursion: Introduction to the special issue. Biolinguistics, 5, 1–9. Trotzke, A., Bader, M. & Frazier, L. (2013). Third factors and the performance interface in language design. Biolinguistics, 7, 1–34.

Download Full-text

Pragmatic Language Processing in the Adolescent Brain

10.1101/871343 ◽

2019 ◽

Author(s):

Salomi S. Asaridou ◽

Ö. Ece Demir-Lira ◽

Julia Uddén ◽

Susan Goldin-Meadow ◽

Steven L. Small

Keyword(s):

Social Interactions ◽

Cognitive Processing ◽

Language Processing ◽

Language Comprehension ◽

Social Contexts ◽

Pragmatic Competence ◽

Pragmatic Language ◽

Typically Developing ◽

The Brain ◽

Preceding Question

Adolescence is a developmental period in which social interactions become increasingly important. Successful social interactions rely heavily on pragmatic competence, the appropriate use of language in different social contexts, a skill that is still developing in adolescence. In the present study, we used fMRI to characterize the brain networks underlying pragmatic language processing in typically developing adolescents. We used an indirect speech paradigm whereby participants were presented with question/answer dialogues in which the meaning of the answer had to be inferred from the context, in this case the preceding question. Participants were presented with three types of answers: (1) direct replies, i.e., simple answers to open-ended questions, (2) indirect informative replies, i.e., answers in which the speaker’s intention was to add more information to a yes/no question, and (3) indirect affective replies, i.e., answers in which the speaker’s intention was to express polite refusals, negative opinions or to save face in response to an emotionally charged question. We found that indirect affective replies elicited the strongest response in brain areas associated with language comprehension (superior temporal gyri), theory of mind (medial prefrontal cortex, temporo-parietal junction, and precuneus), and attention/working memory (inferior frontal gyri). The increased activation to indirect affective as opposed to indirect informative and direct replies potentially reflects the high salience of opinions and perspectives of others in adolescence. Our results add to previous findings on socio-cognitive processing in adolescents and extend them to pragmatic language comprehension.

Download Full-text

PatentNet: multi-label classification of patent documents using deep learning based language understanding

Scientometrics ◽

10.1007/s11192-021-04179-4 ◽

2021 ◽

Author(s):

Arousha Haghighian Roudsari ◽

Jafar Afshar ◽

Wookey Lee ◽

Suan Lee

Keyword(s):

Deep Learning ◽

Language Processing ◽

State Of The Art ◽

Classification Performance ◽

Fine Tuning ◽

Language Models ◽

Classification Task ◽

Domain Experts ◽

Patent Classification ◽

Patent Documents

AbstractPatent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.

Download Full-text

On the Hierarchical Information in a Single Contextualised Word Representation (Student Abstract)

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i10.7231 ◽

2020 ◽

Vol 34 (10) ◽

pp. 13917-13918

Author(s):

Dean L. Slack ◽

Mariann Hardey ◽

Noura Al Moubayed

Keyword(s):

Language Processing ◽

Fine Tuning ◽

Language Models ◽

Linguistic Features ◽

Widespread Application ◽

Linear Classifiers ◽

Sentence Level ◽

Level Information ◽

Word Representation ◽

And Performance

Contextual word embeddings produced by neural language models, such as BERT or ELMo, have seen widespread application and performance gains across many Natural Language Processing tasks, suggesting rich linguistic features encoded in their representations. This work aims to investigate to what extent any linguistic hierarchical information is encoded into a single contextual embedding. Using labelled constituency trees, we train simple linear classifiers on top of single contextualised word representations for ancestor sentiment analysis tasks at multiple constituency levels of a sentence. To assess the presence of hierarchical information throughout the networks, the linear classifiers are trained using representations produced by each intermediate layer of BERT and ELMo variants. We show that with no fine-tuning, a single contextualised representation encodes enough syntactic and semantic sentence-level information to significantly outperform a non-contextual baseline for classifying 5-class sentiment of its ancestor constituents at multiple levels of the constituency tree. Additionally, we show that both LSTM and transformer architectures trained on similarly sized datasets achieve similar levels of performance on these tasks. Future work looks to expand the analysis to a wider range of NLP tasks and contextualisers.

Download Full-text

Rare Words: A Major Problem for Contextualized Embeddings and How to Fix it by Attentive Mimicking

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6403 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8766-8774 ◽

Cited By ~ 1

Author(s):

Timo Schick ◽

Hinrich Schütze

Keyword(s):

Neural Network ◽

Natural Language Processing ◽

Language Processing ◽

Deep Neural Network ◽

Language Model ◽

Language Modeling ◽

Fine Tuning ◽

Language Models ◽

Network Architectures ◽

Semantic Properties

Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does substantially improve its understanding of rare words.

Download Full-text

Less is More/More Diverse: On The Communicative Utility of Linguistic Conventionalization

Frontiers in Communication ◽

10.3389/fcomm.2020.620275 ◽

2021 ◽

Vol 5 ◽

Author(s):

Elke Teich ◽

Peter Fankhauser ◽

Stefania Degaetano-Ortlieb ◽

Yuri Bizzoni

Keyword(s):

Cognitive Processing ◽

Language Processing ◽

Language Change ◽

Language Models ◽

Less Is More ◽

Linguistic Conventions ◽

Diachronic Development ◽

Linguistic Units ◽

Linguistic Usage ◽

Dynamic Domain

We present empirical evidence of the communicative utility of conventionalization, i.e., convergence in linguistic usage over time, and diversification, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexical-semantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.

Download Full-text

Pretrained Language Model for Text Generation: A Survey

Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence ◽

10.24963/ijcai.2021/612 ◽

2021 ◽

Author(s):

Junyi Li ◽

Tianyi Tang ◽

Wayne Xin Zhao ◽

Ji-Rong Wen

Keyword(s):

Language Processing ◽

Language Model ◽

Fine Tuning ◽

Language Models ◽

Text Generation ◽

Future Directions ◽

The Core ◽

Task Definition ◽

Core Content ◽

Challenging Tasks

Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.

Download Full-text

Non-movement-based approaches

Unbounded Dependency Constructions ◽

10.1093/oso/9780198784999.003.0005 ◽

2020 ◽

pp. 156-202

Author(s):

Rui P. Chaves ◽

Michael T. Putnam

Keyword(s):

Language Processing ◽

Point Of View ◽

Language Models ◽

Sentence Structure ◽

Human Language ◽

Unbounded Dependency

This chapter compares movement-based conceptions of grammar and of unbounded dependency constructions with their construction- and non-movement-based antithesis. In particular, the focus of this chapter is on how unification and construction-based grammar provides not only a better handle on the phenomena than the MP from a linguistic perspective, but also from a psycholinguistic point of view. The flexibility of non-movement-based accounts allows a much wider and much more complex array of unbounded dependency patterns because it rejects the basic idea that extracted phrases start out as being embedded in sentence structure, and instead views the propagation of all information in sentence structure as a local and distributed (featural) process. The grammatical theory discussed in this chapter is also more consistent with extant models of human language processing than the MP, and demonstrably allows for efficient incremental and probabilistic language models of both comprehension and production.

Download Full-text