BUILDING SEMANTIC NETWORKS FROM PLAIN TEXT AND WIKIPEDIA WITH APPLICATION TO SEMANTIC RELATEDNESS AND NOUN COMPOUND PARAPHRASING

People undergo many idiosyncratic experiences throughout their lives that may contribute to individual differences in the size and structure of their knowledge representations. Ultimately, these can have important implications for individuals' cognitive performance. We review evidence that suggests a relationship between individual experiences, the size and structure of semantic representations, as well as individual and age differences in cognitive performance. We conclude that the extent to which experience-dependent changes in semantic representations contribute to individual differences in cognitive aging remains unclear. To help fill this gap, we outline an empirical agenda involving the concurrent assessment of large-scale semantic networks and cognitive performance in younger and older adults, and present preliminary data to establish the feasibility and limitations of such empirical approaches.

Download Full-text

Semantic analysis of Twitter content

10.32920/ryerson.14656917 ◽

2021 ◽

Author(s):

Yue Feng

Keyword(s):

Social Networks ◽

Semantic Analysis ◽

State Of The Art ◽

Semantic Annotation ◽

Semantic Relatedness ◽

Entity Linking ◽

Knowledge Resources ◽

Plain Text ◽

Analysis Techniques ◽

Structured Knowledge

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.

Download Full-text

Semantic analysis of Twitter content

10.32920/ryerson.14656917.v1 ◽

2021 ◽

Author(s):

Yue Feng

Keyword(s):

Social Networks ◽

Semantic Analysis ◽

State Of The Art ◽

Semantic Annotation ◽

Semantic Relatedness ◽

Entity Linking ◽

Knowledge Resources ◽

Plain Text ◽

Analysis Techniques ◽

Structured Knowledge

Semantic analysis is the process of shifting the understanding of text from the levels of phrases, clauses, sentences to the level of semantic meanings. Two of the most important semantic analysis tasks include 1) semantic relatedness measurement and 2) entity linking. The semantic relatedness measurement task aims to quantitatively identify the relationships between two words or concepts based on the similarity or closeness of their semantic meaning whereas the entity linking task focuses on linking plain text to structured knowledge resources, e.g. Wikipedia to provide semantic annotation of texts. A limitation of current semantic analysis approaches is that they are built upon traditional documents which are well structured in formal English, e.g. news; however, with the emergence of social networks, enormous volumes of information can be extracted from the posts on social networks, which are short, grammatically incorrect and can contain special characters or newly invented words, e.g. LOL, BRB. Therefore, traditional semantic analysis approaches may not perform well for analysing social network posts. In this thesis, we build semantic analysis techniques particularly for Twitter content. We build a semantic relatedness model to calculate semantic relatedness between any two words obtained from tweets and by using the proposed semantic relatedness model, we semantically annotate tweets by linking them to Wikipedia entries. We compare our work with state-of-the-art semantic relatedness and entity linking methods that show promising results.

Download Full-text

A bootstrapping approach for robust topic analysis

Natural Language Engineering ◽

10.1017/s1351324902002929 ◽

2002 ◽

Vol 8 (2-3) ◽

pp. 209-233 ◽

Cited By ~ 1

Author(s):

OLIVIER FERRET ◽

BRIGITTE GRAU

Keyword(s):

Information Extraction ◽

Large Scale ◽

Text Summarization ◽

Great Precision ◽

Topic Analysis ◽

Structured Knowledge

Topic analysis is important for many applications dealing with texts, such as text summarization or information extraction. However, it can be done with great precision only if it relies on structured knowledge, which is difficult to produce on a large scale. In this paper, we propose using bootstrapping to solve this problem: a first topic analysis based on a weakly structured source of knowledge, a collocation network, is used for learning explicit topic representations that then support a more precise and reliable topic analysis.

Download Full-text

Employing Geographical Principles for Sampling in State of the Art Dialectological Projects

Journal of Linguistic Geography ◽

10.1017/jlg.2013.9 ◽

2013 ◽

Vol 1 (2) ◽

pp. 96-114 ◽

Cited By ~ 4

Author(s):

Isabelle Buchstaller ◽

Seraphim Alvanides

Keyword(s):

Large Scale ◽

State Of The Art ◽

Pilot Project ◽

Linguistic Analysis ◽

Human Geography ◽

North East ◽

The North ◽

Functional Regions ◽

Demographic Classification

The aims of this paper are twofold. First, we locate the most effective human geographical methods for sampling across space in large-scale dialectological projects. We propose two geographical concepts as a basis for sampling decisions: Geo-demographic classification, which is a multidimensional method used for the socio-economic grouping of areas; we also develop an updated version of functional regions that can be used in sociolinguistic research. We then report on the results of a pilot project that applies these models to collect data regarding the acceptability of vernacular morphosyntactic forms in the North East of England. Following the method of natural breaks advocated for dialectology by Horvath & Horvath (2002), we interpret breaks in the probabilistic patterns as areas of dialect transitions. This study contributes to the debate about the role and limitations of spatiality in linguistic analysis. It intends to broaden our knowledge about the interfaces between human geography and dialectology.

Download Full-text

Large-scale semantic networks

Proceedings of the Workshop on Semantic Evaluations: Recent Achievements and Future Directions - DEW '09 ◽

10.3115/1621969.1621976 ◽

2009 ◽

Author(s):

Václav Novák ◽

Sven Hartrumpf ◽

Keith Hall

Keyword(s):

Large Scale ◽

Semantic Networks

Download Full-text

Attending to Entities for Better Text Understanding

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6254 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7554-7561

Author(s):

Pengxiang Cheng ◽

Katrin Erk

Keyword(s):

Large Scale ◽

Human Performance ◽

State Of The Art ◽

Syntactic Structure ◽

Semantic Knowledge ◽

Training Data ◽

Language Models ◽

Long Distance ◽

Future Directions ◽

Text Understanding

Recent progress in NLP witnessed the development of large-scale pre-trained language models (GPT, BERT, XLNet, etc.) based on Transformer (Vaswani et al. 2017), and in a range of end tasks, such models have achieved state-of-the-art results, approaching human performance. This clearly demonstrates the power of the stacked self-attention architecture when paired with a sufficient number of layers and a large amount of pre-training data. However, on tasks that require complex and long-distance reasoning where surface-level cues are not enough, there is still a large gap between the pre-trained models and human performance. Strubell et al. (2018) recently showed that it is possible to inject knowledge of syntactic structure into a model through supervised self-attention. We conjecture that a similar injection of semantic knowledge, in particular, coreference information, into an existing model would improve performance on such complex problems. On the LAMBADA (Paperno et al. 2016) task, we show that a model trained from scratch with coreference as auxiliary supervision for self-attention outperforms the largest GPT-2 model, setting the new state-of-the-art, while only containing a tiny fraction of parameters compared to GPT-2. We also conduct a thorough analysis of different variants of model architectures and supervision configurations, suggesting future directions on applying similar techniques to other problems.

Download Full-text

Using Linguistic Analysis Tools to Characterize Engineering Design Project Documentation

10.18260/1-2--22710 ◽

2020 ◽

Author(s):

Micah Lande ◽

James Nelson

Keyword(s):

Engineering Design ◽

Linguistic Analysis ◽

Design Project ◽

Analysis Tools ◽

Project Documentation

Download Full-text

A Novel Approach to Managing the Dynamic Nature of Semantic Relatedness

Natural Language Processing ◽

10.4018/978-1-7998-0951-7.ch052 ◽

2020 ◽

pp. 1085-1114

Author(s):

Youngseok Choi ◽

Jungsuk Oh ◽

Jinsoo Park

Keyword(s):

Computer Science ◽

Computational Linguistics ◽

Critical Role ◽

Semantic Relatedness ◽

Semantic Networks ◽

Dynamic Semantic ◽

Novel Approach ◽

Flow Of Time ◽

Novel Method ◽

Corpus Data

This research proposes a novel method of measuring the dynamics of semantic relatedness. Research on semantic relatedness has a long history in the fields of computational linguistics, psychology, computer science, as well as information systems. Computing semantic relatedness has played a critical role in various situations, such as data integration and keyword recommendation. Many researchers have tried to propose more sophisticated techniques to measure semantic relatedness. However, little research has considered the change of semantic relatedness with the flow of time and occurrence of events. The authors' proposed method is validated by actual corpus data collected from a particular context over a specific period of time. They test the feasibility of our proposed method by constructing semantic networks by using the corpus collected during a different period of time. The experiment results show that our method can detect and manage the changes in semantic relatedness between concepts. Based on the results, the authors discuss the need for a dynamic semantic relatedness paradigm.

Download Full-text

A Semantic-Driven Adaptive Architecture for Large Scale P2P Networks

International Journal of Grid and High Performance Computing ◽

10.4018/jghpc.2010100102 ◽

2010 ◽

Vol 2 (4) ◽

pp. 12-30 ◽

Cited By ~ 1

Author(s):

Athena Eftychiou ◽

Bogdan Vrusias ◽

Nick Antonopoulos

Keyword(s):

Large Scale ◽

Network Formation ◽

Collective Intelligence ◽

Semantic Knowledge ◽

Resource Discovery ◽

Online Information ◽

Semantic Web Technologies ◽

Web Technologies ◽

Adaptive Architecture

The increasing amount of online information demands effective, scalable, and accurate mechanisms to manage and search this information. Distributed semantic-enabled architectures, which enforce semantic web technologies for resource discovery, could satisfy these requirements. In this paper, a semantic-driven adaptive architecture is presented, which improves existing resource discovery processes. The P2P network is organised in a two-layered super-peer architecture. The network formation of super-peers is a conceptual representation of the network’s knowledge, shaped from the information provided by the nodes using collective intelligence methods. The authors focus on the creation of a dynamic hierarchical semantic-driven P2P topology using the network’s collective intelligence. The unmanageable amounts of data are transformed into a repository of semantic knowledge, transforming the network into an ontology of conceptually related entities of information collected from the resources located by peers. Appropriate experiments have been undertaken through a case study by simulating the proposed architecture and evaluating results.

Download Full-text