BUILDING SEMANTIC NETWORKS FROM PLAIN TEXT AND WIKIPEDIA WITH APPLICATION TO SEMANTIC RELATEDNESS AND NOUN COMPOUND PARAPHRASING

2012 ◽  
Vol 06 (01) ◽  
pp. 67-91 ◽  
Author(s):  
PIA-RAMONA WOJTINNEK ◽  
STEPHEN PULMAN ◽  
JOHANNA VÖLKER

The construction of suitable and scalable representations of semantic knowledge is a core challenge in Semantic Computing. Manually created resources such as WordNet have been shown to be useful for many AI and NLP tasks, but they are inherently restricted in their coverage and scalability. In addition, they have been challenged by simple distributional models on very large corpora, questioning the advantage of structured knowledge representations. We present a framework for building large-scale semantic networks automatically from plain text and Wikipedia articles using only linguistic analysis tools. Our constructed resources cover up to 2 million concepts and were built in less than 6 days. Using the task of measuring semantic relatedness, we show that we achieve results comparable to the best WordNet based methods as well as the best distributional methods while using a corpus of a size several magnitudes smaller. In addition, we show that we can outperform both types of methods by combining the results of our two network variants. Initial experiments on noun compound paraphrasing show similar results, underlining the quality as well as the flexibility of our constructed resources.

2021 ◽  
Author(s):  
Dirk U. Wulff ◽  
Simon De Deyne ◽  
Samuel Aeschbach ◽  
Rui Mata

People undergo many idiosyncratic experiences throughout their lives that may contribute to individual differences in the size and structure of their knowledge representations. Ultimately, these can have important implications for individuals' cognitive performance. We review evidence that suggests a relationship between individual experiences, the size and structure of semantic representations, as well as individual and age differences in cognitive performance. We conclude that the extent to which experience-dependent changes in semantic representations contribute to individual differences in cognitive aging remains unclear. To help fill this gap, we outline an empirical agenda involving the concurrent assessment of large-scale semantic networks and cognitive performance in younger and older adults, and present preliminary data to establish the feasibility and limitations of such empirical approaches.


2021 ◽  
Author(s):  
Yue Feng

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.


2021 ◽  
Author(s):  
Yue Feng

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.


2002 ◽  
Vol 8 (2-3) ◽  
pp. 209-233 ◽  
Author(s):  
OLIVIER FERRET ◽  
BRIGITTE GRAU

Topic analysis is important for many applications dealing with texts, such as text summarization or information extraction. However, it can be done with great precision only if it relies on structured knowledge, which is difficult to produce on a large scale. In this paper, we propose using bootstrapping to solve this problem: a first topic analysis based on a weakly structured source of knowledge, a collocation network, is used for learning explicit topic representations that then support a more precise and reliable topic analysis.


2013 ◽  
Vol 1 (2) ◽  
pp. 96-114 ◽  
Author(s):  
Isabelle Buchstaller ◽  
Seraphim Alvanides

The aims of this paper are twofold. First, we locate the most effective human geographical methods for sampling across space in large-scale dialectological projects. We propose two geographical concepts as a basis for sampling decisions: Geo-demographic classification, which is a multidimensional method used for the socio-economic grouping of areas; we also develop an updated version of functional regions that can be used in sociolinguistic research. We then report on the results of a pilot project that applies these models to collect data regarding the acceptability of vernacular morphosyntactic forms in the North East of England. Following the method of natural breaks advocated for dialectology by Horvath & Horvath (2002), we interpret breaks in the probabilistic patterns as areas of dialect transitions. This study contributes to the debate about the role and limitations of spatiality in linguistic analysis. It intends to broaden our knowledge about the interfaces between human geography and dialectology.


2020 ◽  
Vol 34 (05) ◽  
pp. 7554-7561
Author(s):  
Pengxiang Cheng ◽  
Katrin Erk

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.


2020 ◽  
pp. 1085-1114
Author(s):  
Youngseok Choi ◽  
Jungsuk Oh ◽  
Jinsoo Park

This research proposes a novel method of measuring the dynamics of semantic relatedness. Research on semantic relatedness has a long history in the fields of computational linguistics, psychology, computer science, as well as information systems. Computing semantic relatedness has played a critical role in various situations, such as data integration and keyword recommendation. Many researchers have tried to propose more sophisticated techniques to measure semantic relatedness. However, little research has considered the change of semantic relatedness with the flow of time and occurrence of events. The authors' proposed method is validated by actual corpus data collected from a particular context over a specific period of time. They test the feasibility of our proposed method by constructing semantic networks by using the corpus collected during a different period of time. The experiment results show that our method can detect and manage the changes in semantic relatedness between concepts. Based on the results, the authors discuss the need for a dynamic semantic relatedness paradigm.


2010 ◽  
Vol 2 (4) ◽  
pp. 12-30 ◽  
Author(s):  
Athena Eftychiou ◽  
Bogdan Vrusias ◽  
Nick Antonopoulos

The increasing amount of online information demands effective, scalable, and accurate mechanisms to manage and search this information. Distributed semantic-enabled architectures, which enforce semantic web technologies for resource discovery, could satisfy these requirements. In this paper, a semantic-driven adaptive architecture is presented, which improves existing resource discovery processes. The P2P network is organised in a two-layered super-peer architecture. The network formation of super-peers is a conceptual representation of the network’s knowledge, shaped from the information provided by the nodes using collective intelligence methods. The authors focus on the creation of a dynamic hierarchical semantic-driven P2P topology using the network’s collective intelligence. The unmanageable amounts of data are transformed into a repository of semantic knowledge, transforming the network into an ontology of conceptually related entities of information collected from the resources located by peers. Appropriate experiments have been undertaken through a case study by simulating the proposed architecture and evaluating results.


Sign in / Sign up

Export Citation Format

Share Document