scholarly journals Fine-tuning of deep language models as a computational framework of modeling listeners' perspective during language comprehension

2021 ◽  
Author(s):  
Refael Tikochinski ◽  
Ariel Goldstein ◽  
Yaara Yeshurun ◽  
Uri Hasson ◽  
Roi Reichart

Computational Deep Language Models (DLMs) have been shown to be effective in predicting neural responses during natural language processing. This study introduces a novel computational framework, based on the concept of fine-tuning (Hinton, 2007), for modeling differences in interpretation of narratives based on the listeners' perspective (i.e. their prior knowledge, thoughts, and beliefs). We draw on an fMRI experiment conducted by Yeshurun et al. (2017), in which two groups of listeners were listening to the same narrative but with two different perspectives (cheating versus paranoia). We collected a dedicated dataset of ~3000 stories, and used it to create two modified (fine-tuned) versions of a pre-trained DLM, each representing the perspective of a different group of listeners. Information extracted from each of the two fine-tuned models was better fitted with neural responses of the corresponding group of listeners. Furthermore, we show that the degree of difference between the listeners' interpretation of the story - as measured both neurally and behaviorally - can be approximated using the distances between the representations of the story extracted from these two fine-tuned models. These models-brain associations were expressed in many language-related brain areas, as well as in several higher-order areas related to the default-mode and the mentalizing networks, therefore implying that computational fine-tuning reliably captures relevant aspects of human language comprehension across different levels of cognitive processing.

2020 ◽  
Author(s):  
Kun Sun

Expectations or predictions about upcoming content play an important role during language comprehension and processing. One important aspect of recent studies of language comprehension and processing concerns the estimation of the upcoming words in a sentence or discourse. Many studies have used eye-tracking data to explore computational and cognitive models for contextual word predictions and word processing. Eye-tracking data has previously been widely explored with a view to investigating the factors that influence word prediction. However, these studies are problematic on several levels, including the stimuli, corpora, statistical tools they applied. Although various computational models have been proposed for simulating contextual word predictions, past studies usually preferred to use a single computational model. The disadvantage of this is that it often cannot give an adequate account of cognitive processing in language comprehension. To avoid these problems, this study draws upon a massive natural and coherent discourse as stimuli in collecting the data on reading time. This study trains two state-of-art computational models (surprisal and semantic (dis)similarity from word vectors by linear discriminative learning (LDL)), measuring knowledge of both the syntagmatic and paradigmatic structure of language. We develop a `dynamic approach' to compute semantic (dis)similarity. It is the first time that these two computational models have been merged. Models are evaluated using advanced statistical methods. Meanwhile, in order to test the efficiency of our approach, one recently developed cosine method of computing semantic (dis)similarity based on word vectors data adopted is used to compare with our `dynamic' approach. The two computational and fixed-effect statistical models can be used to cross-verify the findings, thus ensuring that the result is reliable. All results support that surprisal and semantic similarity are opposed in the prediction of the reading time of words although both can make good predictions. Additionally, our `dynamic' approach performs better than the popular cosine method. The findings of this study are therefore of significance with regard to acquiring a better understanding how humans process words in a real-world context and how they make predictions in language cognition and processing.


2021 ◽  
Author(s):  
Oscar Nils Erik Kjell ◽  
H. Andrew Schwartz ◽  
Salvatore Giorgi

The language that individuals use for expressing themselves contains rich psychological information. Recent significant advances in Natural Language Processing (NLP) and Deep Learning (DL), namely transformers, have resulted in large performance gains in tasks related to understanding natural language such as machine translation. However, these state-of-the-art methods have not yet been made easily accessible for psychology researchers, nor designed to be optimal for human-level analyses. This tutorial introduces text (www.r-text.org), a new R-package for analyzing and visualizing human language using transformers, the latest techniques from NLP and DL. Text is both a modular solution for accessing state-of-the-art language models and an end-to-end solution catered for human-level analyses. Hence, text provides user-friendly functions tailored to test hypotheses in social sciences for both relatively small and large datasets. This tutorial describes useful methods for analyzing text, providing functions with reliable defaults that can be used off-the-shelf as well as providing a framework for the advanced users to build on for novel techniques and analysis pipelines. The reader learns about six methods: 1) textEmbed: to transform text to traditional or modern transformer-based word embeddings (i.e., numeric representations of words); 2) textTrain: to examine the relationships between text and numeric/categorical variables; 3) textSimilarity and 4) textSimilarityTest: to computing semantic similarity scores between texts and significance test the difference in meaning between two sets of texts; and 5) textProjection and 6) textProjectionPlot: to examine and visualize text within the embedding space according to latent or specified construct dimensions (e.g., low to high rating scale scores).


2017 ◽  
Vol 4 (2) ◽  
pp. 58-66
Author(s):  
Роман Тарабань ◽  
Бандара Ахінта

In 2002, Hauser, Chomsky, and Fitch published an article in which they introduced a distinction between properties of language that are exclusively part of human communication (i.e., the FLN) and those properties that might be shared with other species (i.e., the FLB). The sole property proposed for the FLN was recursion. Hauser et al. provided evidence for their position based on issues of evolution. The question of the required properties of human language is central to developing theories of language processing and acquisition. In the present critique of Hauser et al. we consider two examples from non-English languages that argue against the suggestion that recursion is the sole property within the human language faculty. These are i) agreement of inflectional morphemes across sentence constructions, and ii) synthetic one-word constructions. References Adger, D. (2003). Core Syntax: A Minimalist Approach. Oxford: Oxford University Press. Bates, E., & MacWhinney, B. (1989). Functionalism and the Competition Model. In: The Crosslinguistic Study of Sentence Processing, (pp 3-76). B. MacWhinney and E. Bates (Eds.). New York: Cambridge University Press. Bickerton, D (2009). Recursion: core of complexity or artifact of analysis? In: Syntactic Complexity: Diachrony, Acquisition, Neuro-Cognition, Evolution, (pp. 531–543). T. Givón and M. Shibatani (Eds.). Amsterdam: John Benjamins. Chomsky, N. (1957). Syntactic Structures (2nd edition published in 2002). Berlin: Mouton Chomsky, N. (1959). On certain formal properties of grammars. Information and Control, 2, 137–167. Chomsky, N. (1995). The Minimalist Program for Linguistic Theory. Cambridge, MA: MIT Press. Hauser, M. D., Chomsky, N., Fitch, W. T. (2002). The faculty of language: What it is, who has it, and how did it evolve? Science, 298, 1569-1579. Luuk, E., & Luuk, H. (2011). The redundancy of recursion and infinity for natural language. Cognitive Processing 12, 1–11. Marantz, A. (1997). No escape from syntax: Don't try morphological analysis in the privacy of your own lexicon. University of Pennsylvania Working Papers in Linguistics, 4(2), A. Dimitriadis, L. Siegel, et. al. (eds.), 201- 225. MacWhinney, B. & O’Grady, W. (Eds.) (2015). Handbook of Language Emergence. New York: Wiley. Nevins, A., Pesetsky, D., & Rodrigues, C. (2009). Pirahã exceptionality: A reassessment. Language, 85(2), 355–404. Ott, D. (2009). The evolution of I-language: Lexicalization as the key evolutionary novelty. Biolinguistics, 3, 255–269. Sauerland, U., & Trotzke, A. (2011). Biolinguistic perspectives on recursion: Introduction to the special issue. Biolinguistics, 5, 1–9. Trotzke, A., Bader, M. & Frazier, L. (2013). Third factors and the performance interface in language design. Biolinguistics, 7, 1–34.  


2019 ◽  
Author(s):  
Salomi S. Asaridou ◽  
Ö. Ece Demir-Lira ◽  
Julia Uddén ◽  
Susan Goldin-Meadow ◽  
Steven L. Small

Adolescence is a developmental period in which social interactions become increasingly important. Successful social interactions rely heavily on pragmatic competence, the appropriate use of language in different social contexts, a skill that is still developing in adolescence. In the present study, we used fMRI to characterize the brain networks underlying pragmatic language processing in typically developing adolescents. We used an indirect speech paradigm whereby participants were presented with question/answer dialogues in which the meaning of the answer had to be inferred from the context, in this case the preceding question. Participants were presented with three types of answers: (1) direct replies, i.e., simple answers to open-ended questions, (2) indirect informative replies, i.e., answers in which the speaker’s intention was to add more information to a yes/no question, and (3) indirect affective replies, i.e., answers in which the speaker’s intention was to express polite refusals, negative opinions or to save face in response to an emotionally charged question. We found that indirect affective replies elicited the strongest response in brain areas associated with language comprehension (superior temporal gyri), theory of mind (medial prefrontal cortex, temporo-parietal junction, and precuneus), and attention/working memory (inferior frontal gyri). The increased activation to indirect affective as opposed to indirect informative and direct replies potentially reflects the high salience of opinions and perspectives of others in adolescence. Our results add to previous findings on socio-cognitive processing in adolescents and extend them to pragmatic language comprehension.


2021 ◽  
Author(s):  
Arousha Haghighian Roudsari ◽  
Jafar Afshar ◽  
Wookey Lee ◽  
Suan Lee

AbstractPatent classification is an expensive and time-consuming task that has conventionally been performed by domain experts. However, the increase in the number of filed patents and the complexity of the documents make the classification task challenging. The text used in patent documents is not always written in a way to efficiently convey knowledge. Moreover, patent classification is a multi-label classification task with a large number of labels, which makes the problem even more complicated. Hence, automating this expensive and laborious task is essential for assisting domain experts in managing patent documents, facilitating reliable search, retrieval, and further patent analysis tasks. Transfer learning and pre-trained language models have recently achieved state-of-the-art results in many Natural Language Processing tasks. In this work, we focus on investigating the effect of fine-tuning the pre-trained language models, namely, BERT, XLNet, RoBERTa, and ELECTRA, for the essential task of multi-label patent classification. We compare these models with the baseline deep-learning approaches used for patent classification. We use various word embeddings to enhance the performance of the baseline models. The publicly available USPTO-2M patent classification benchmark and M-patent datasets are used for conducting experiments. We conclude that fine-tuning the pre-trained language models on the patent text improves the multi-label patent classification performance. Our findings indicate that XLNet performs the best and achieves a new state-of-the-art classification performance with respect to precision, recall, F1 measure, as well as coverage error, and LRAP.


2020 ◽  
Vol 34 (10) ◽  
pp. 13917-13918
Author(s):  
Dean L. Slack ◽  
Mariann Hardey ◽  
Noura Al Moubayed

Contextual word embeddings produced by neural language models, such as BERT or ELMo, have seen widespread application and performance gains across many Natural Language Processing tasks, suggesting rich linguistic features encoded in their representations. This work aims to investigate to what extent any linguistic hierarchical information is encoded into a single contextual embedding. Using labelled constituency trees, we train simple linear classifiers on top of single contextualised word representations for ancestor sentiment analysis tasks at multiple constituency levels of a sentence. To assess the presence of hierarchical information throughout the networks, the linear classifiers are trained using representations produced by each intermediate layer of BERT and ELMo variants. We show that with no fine-tuning, a single contextualised representation encodes enough syntactic and semantic sentence-level information to significantly outperform a non-contextual baseline for classifying 5-class sentiment of its ancestor constituents at multiple levels of the constituency tree. Additionally, we show that both LSTM and transformer architectures trained on similarly sized datasets achieve similar levels of performance on these tasks. Future work looks to expand the analysis to a wider range of NLP tasks and contextualisers.


2020 ◽  
Vol 34 (05) ◽  
pp. 8766-8774 ◽  
Author(s):  
Timo Schick ◽  
Hinrich Schütze

Pretraining deep neural network architectures with a language modeling objective has brought large improvements for many natural language processing tasks. Exemplified by BERT, a recently proposed such architecture, we demonstrate that despite being trained on huge amounts of data, deep language models still struggle to understand rare words. To fix this problem, we adapt Attentive Mimicking, a method that was designed to explicitly learn embeddings for rare words, to deep language models. In order to make this possible, we introduce one-token approximation, a procedure that enables us to use Attentive Mimicking even when the underlying language model uses subword-based tokenization, i.e., it does not assign embeddings to all words. To evaluate our method, we create a novel dataset that tests the ability of language models to capture semantic properties of words without any task-specific fine-tuning. Using this dataset, we show that adding our adapted version of Attentive Mimicking to BERT does substantially improve its understanding of rare words.


2021 ◽  
Vol 5 ◽  
Author(s):  
Elke Teich ◽  
Peter Fankhauser ◽  
Stefania Degaetano-Ortlieb ◽  
Yuri Bizzoni

We present empirical evidence of the communicative utility of conventionalization, i.e., convergence in linguistic usage over time, and diversification, i.e., linguistic items acquiring different, more specific usages/meanings. From a diachronic perspective, conventionalization plays a crucial role in language change as a condition for innovation and grammaticalization (Bybee, 2010; Schmid, 2015) and diversification is a cornerstone in the formation of sublanguages/registers, i.e., functional linguistic varieties (Halliday, 1988; Harris, 1991). While it is widely acknowledged that change in language use is primarily socio-culturally determined pushing towards greater linguistic expressivity, we here highlight the limiting function of communicative factors on diachronic linguistic variation showing that conventionalization and diversification are associated with a reduction of linguistic variability. To be able to observe effects of linguistic variability reduction, we first need a well-defined notion of choice in context. Linguistically, this implies the paradigmatic axis of linguistic organization, i.e., the sets of linguistic options available in a given or similar syntagmatic contexts. Here, we draw on word embeddings, weakly neural distributional language models that have recently been employed to model lexical-semantic change and allow us to approximate the notion of paradigm by neighbourhood in vector space. Second, we need to capture changes in paradigmatic variability, i.e. reduction/expansion of linguistic options in a given context. As a formal index of paradigmatic variability we use entropy, which measures the contribution of linguistic units (e.g., words) in predicting linguistic choice in bits of information. Using entropy provides us with a link to a communicative interpretation, as it is a well-established measure of communicative efficiency with implications for cognitive processing (Linzen and Jaeger, 2016; Venhuizen et al., 2019); also, entropy is negatively correlated with distance in (word embedding) spaces which in turn shows cognitive reflexes in certain language processing tasks (Mitchel et al., 2008; Auguste et al., 2017). In terms of domain we focus on science, looking at the diachronic development of scientific English from the 17th century to modern time. This provides us with a fairly constrained yet dynamic domain of discourse that has witnessed a powerful systematization throughout the centuries and developed specific linguistic conventions geared towards efficient communication. Overall, our study confirms the assumed trends of conventionalization and diversification shown by diachronically decreasing entropy, interspersed with local, temporary entropy highs pointing to phases of linguistic expansion pertaining primarily to introduction of new technical terminology.


Author(s):  
Junyi Li ◽  
Tianyi Tang ◽  
Wayne Xin Zhao ◽  
Ji-Rong Wen

Text generation has become one of the most important yet challenging tasks in natural language processing (NLP). The resurgence of deep learning has greatly advanced this field by neural generation models, especially the paradigm of pretrained language models (PLMs). In this paper, we present an overview of the major advances achieved in the topic of PLMs for text generation. As the preliminaries, we present the general task definition and briefly describe the mainstream architectures of PLMs for text generation. As the core content, we discuss how to adapt existing PLMs to model different input data and satisfy special properties in the generated text. We further summarize several important fine-tuning strategies for text generation. Finally, we present several future directions and conclude this paper. Our survey aims to provide text generation researchers a synthesis and pointer to related research.


Author(s):  
Rui P. Chaves ◽  
Michael T. Putnam

This chapter compares movement-based conceptions of grammar and of unbounded dependency constructions with their construction- and non-movement-based antithesis. In particular, the focus of this chapter is on how unification and construction-based grammar provides not only a better handle on the phenomena than the MP from a linguistic perspective, but also from a psycholinguistic point of view. The flexibility of non-movement-based accounts allows a much wider and much more complex array of unbounded dependency patterns because it rejects the basic idea that extracted phrases start out as being embedded in sentence structure, and instead views the propagation of all information in sentence structure as a local and distributed (featural) process. The grammatical theory discussed in this chapter is also more consistent with extant models of human language processing than the MP, and demonstrably allows for efficient incremental and probabilistic language models of both comprehension and production.


Sign in / Sign up

Export Citation Format

Share Document