scholarly journals Modelling General Properties of Nouns by Selectively Averaging Contextualised Embeddings

Author(s):  
Na Li ◽  
Zied Bouraoui ◽  
Jose Camacho-Collados ◽  
Luis Espinosa-Anke ◽  
Qing Gu ◽  
...  

While the success of pre-trained language models has largely eliminated the need for high-quality static word vectors in many NLP applications, static word vectors continue to play an important role in tasks where word meaning needs to be modelled in the absence of linguistic context. In this paper, we explore how the contextualised embeddings predicted by BERT can be used to produce high-quality word vectors for such domains, in particular related to knowledge base completion, where our focus is on capturing the semantic properties of nouns. We find that a simple strategy of averaging the contextualised embeddings of masked word mentions leads to vectors that outperform the static word vectors learned by BERT, as well as those from standard word embedding models, in property induction tasks. We notice in particular that masking target words is critical to achieve this strong performance, as the resulting vectors focus less on idiosyncratic properties and more on general semantic properties. Inspired by this view, we propose a filtering strategy which is aimed at removing the most idiosyncratic mention vectors, allowing us to obtain further performance gains in property induction.

1982 ◽  
Vol 9 (1) ◽  
pp. 139-150 ◽  
Author(s):  
Stephen Wilcox ◽  
David S. Palermo

ABSTRACTTwo experiments are reported in each of which eighty children between the ages of two and six years of age were given a series of commands containing relational terms and similar commands in which the relational terms were replaced by nonsense. The results indicated that children are able to use information from a number of sources which help them to interpret such commands. Younger children, particularly, seemed to rely relatively little upon word meaning, per se. Evidence is offered that the children's responses were constrained by the non-linguistic context, by prior repetition of commands, and by information available from the linguistic context.


2014 ◽  
Vol 2 ◽  
pp. 181-192 ◽  
Author(s):  
Dani Yogatama ◽  
Chong Wang ◽  
Bryan R. Routledge ◽  
Noah A. Smith ◽  
Eric P. Xing

We present a probabilistic language model that captures temporal dynamics and conditions on arbitrary non-linguistic context features. These context features serve as important indicators of language changes that are otherwise difficult to capture using text data by itself. We learn our model in an efficient online fashion that is scalable for large, streaming data. With five streaming datasets from two different genres—economics news articles and social media—we evaluate our model on the task of sequential language modeling. Our model consistently outperforms competing models.


Entropy ◽  
2021 ◽  
Vol 23 (9) ◽  
pp. 1148
Author(s):  
Łukasz Dębowski

We present a hypothetical argument against finite-state processes in statistical language modeling that is based on semantics rather than syntax. In this theoretical model, we suppose that the semantic properties of texts in a natural language could be approximately captured by a recently introduced concept of a perigraphic process. Perigraphic processes are a class of stochastic processes that satisfy a Zipf-law accumulation of a subset of factual knowledge, which is time-independent, compressed, and effectively inferrable from the process. We show that the classes of finite-state processes and of perigraphic processes are disjoint, and we present a new simple example of perigraphic processes over a finite alphabet called Oracle processes. The disjointness result makes use of the Hilberg condition, i.e., the almost sure power-law growth of algorithmic mutual information. Using a strongly consistent estimator of the number of hidden states, we show that finite-state processes do not satisfy the Hilberg condition whereas Oracle processes satisfy the Hilberg condition via the data-processing inequality. We discuss the relevance of these mathematical results for theoretical and computational linguistics.


Author(s):  
Bianca Wühr ◽  
Peter Wühr

AbstractThe FAIR-2 (‘Frankfurter Aufmerksamkeitsinventar’) is a pen-and-paper test of visual attention in which participants have to search for targets among distractors. For similar pen-and-paper tests of attention (e.g., d2), the repetition of the test causes large improvements in performance that threaten both its (retest) reliability and validity. We investigated the size and possible sources of practice effects in the FAIR-2 in three experiments. In Experiments 1 and 2, participants were tested twice using the original FAIR-2. We compared how performance changed after 2 weeks (Experiment 1) or 3 months (Experiment 2), when the test was repeated (complete repetition), or when targets and distractors changed their roles (test reversal). For Experiment 3, we used self-constructed versions of the FAIR that allowed for a third neutral condition (complete alternation) without any stimulus overlap between the two tests. The complete repetition condition produced strong performance gains (25–35%) that persisted for 3 months. For the complete-alternation condition, we observed small to moderate improvements, suggesting that stimulus-independent learning had occurred in session 1. Finally, performance did not differ between test reversal and complete alternation, therefore, suggesting that improvements in target processing had caused the large improvements in the complete-repetition condition.


CrystEngComm ◽  
2019 ◽  
Vol 21 (27) ◽  
pp. 4072-4078 ◽  
Author(s):  
Yi Zhang ◽  
Hanling Long ◽  
Jun Zhang ◽  
Bo Tan ◽  
Qian Chen ◽  
...  

A simple strategy for the mass production of high-quality AlN epilayers on flat sapphire by utilizing a dislocation filtering layer.


2018 ◽  
Author(s):  
Simon De Deyne ◽  
Danielle Navarro ◽  
Guillem Collell ◽  
Amy Perfors

One of the main limitations in natural language-based approaches to meaning is that they are not grounded. In this study, we evaluate how well different kinds of models account for people’s representations of both concrete and abstract concepts. The models are both unimodal (language-based only) models and multimodal distributional semantic models (which additionallyincorporate perceptual and/or affective information). The language-based models include both external (based on text corpora) and internal (derived from word associations) language. We present two new studies and a re-analysis of a series of previous studies demonstrating that the unimodal performance is substantially higher for internal models, especially when comparisons at the basiclevel are considered. For multimodal models, our findings suggest that additional visual and affective features lead to only slightly more accurate mental representations of word meaning than what is already encoded in internal language models; however, for abstract concepts, visual andaffective features improve the predictions of external text-based models. Our work presents new evidence that the grounding problem includes abstract words as well and is therefore more widespread than previously suggested. Implications for both embodied and distributional views arediscussed.


2020 ◽  
Vol 34 (05) ◽  
pp. 7456-7463 ◽  
Author(s):  
Zied Bouraoui ◽  
Jose Camacho-Collados ◽  
Steven Schockaert

One of the most remarkable properties of word embeddings is the fact that they capture certain types of semantic and syntactic relationships. Recently, pre-trained language models such as BERT have achieved groundbreaking results across a wide range of Natural Language Processing tasks. However, it is unclear to what extent such models capture relational knowledge beyond what is already captured by standard word embeddings. To explore this question, we propose a methodology for distilling relational knowledge from a pre-trained language model. Starting from a few seed instances of a given relation, we first use a large text corpus to find sentences that are likely to express this relation. We then use a subset of these extracted sentences as templates. Finally, we fine-tune a language model to predict whether a given word pair is likely to be an instance of some relation, when given an instantiated template for that relation as input.


2019 ◽  
Vol 34 (1) ◽  
pp. 794-800 ◽  
Author(s):  
Margarita Norambuena ◽  
Jose Rodriguez ◽  
Zhenbin Zhang ◽  
Fengxiang Wang ◽  
Cristian Garcia ◽  
...  

2021 ◽  
Vol 9 ◽  
pp. 1012-1031
Author(s):  
Yanai Elazar ◽  
Nora Kassner ◽  
Shauli Ravfogel ◽  
Abhilasha Ravichander ◽  
Eduard Hovy ◽  
...  

Abstract Consistency of a model—that is, the invariance of its behavior under meaning-preserving alternations in its input—is a highly desirable property in natural language processing. In this paper we study the question: Are Pretrained Language Models (PLMs) consistent with respect to factual knowledge? To this end, we create ParaRel🤘, a high-quality resource of cloze-style query English paraphrases. It contains a total of 328 paraphrases for 38 relations. Using ParaRel🤘, we show that the consistency of all PLMs we experiment with is poor— though with high variance between relations. Our analysis of the representational spaces of PLMs suggests that they have a poor structure and are currently not suitable for representing knowledge robustly. Finally, we propose a method for improving model consistency and experimentally demonstrate its effectiveness.1


Sign in / Sign up

Export Citation Format

Share Document