semantic lexicons Latest Research Papers

In every moment, there is a huge capacity of data and information communicated through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document dispensation phase is still not accomplished of extracting data as a human reader. Furthermore the significance of content in the text may also differ from one reader to another. The proposed Multiple Spider Hunting Algorithm has been used to diminish the time complexity in compare with single spider move with multiple spiders. The construction of spider is dynamic depends on the volume of a corpus. In some case tokens may related to more than one topic and there is a need to detect Topic on semantic way. Multiple Semantic Spider Hunting Algorithm is proposed based on the semantics among terms and association can be drawn between words using semantic lexicons. Topic or lists of opinions are generated from the knowledge graph. News articles are gathered from five dissimilar topics such as sports, business, education, tourism and media. Usefulness of the proposed algorithms have been calculated based on the factors precision, recall, f-measure, accuracy, true positive, false positive and topic detection percentage. Multiple Semantic Spider Hunting Algorithm produced good result. Topic detection percentage of Spider Hunting Algorithm has been compared to other algorithms Naïve bayes, Neural Network, Decision tree and Particle Swarm Optimization. Spider Hunting Algorithm produced more than 90% precise detection of topic and subtopic.

Download Full-text

A constrained optimization algorithm for learning GloVe embeddings with semantic lexicons

Knowledge-Based Systems ◽

10.1016/j.knosys.2020.105628 ◽

2020 ◽

Vol 195 ◽

pp. 105628

Author(s):

Flora Sakketou ◽

Nicholas Ampazis

Keyword(s):

Constrained Optimization ◽

Optimization Algorithm ◽

Semantic Lexicons

Download Full-text

Efficient Computational linguistics Framework for Concept Drift Detection

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.k1457.0981119 ◽

2019 ◽

Vol 8 (11) ◽

pp. 4305-4310

Keyword(s):

Information Extraction ◽

Computational Linguistics ◽

Concept Drift ◽

High Recall ◽

Substantial Loss ◽

Cleaning Method ◽

Distributional Similarity ◽

Semantic Lexicons ◽

Concept Drift Detection

Semantic drift is a common problem in iterative information extraction. Unsupervised bagging and incorporated distributional similarity is used to reduce the difficulty of semantic drift in iterative bootstrapping algorithms, particularly when extracting large semantic lexicons. Compared to previous approaches which usually incur substantial loss in recall, DP-based cleaning method can effectively clean a large proportion of semantic drift errors while keeping a high recall.

Download Full-text

Leveraging Web Semantic Knowledge in Word Representation Learning

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v33i01.33016746 ◽

2019 ◽

Vol 33 ◽

pp. 6746-6753

Author(s):

Haoyan Liu ◽

Lei Fang ◽

Jian-Guang Lou ◽

Zhoujun Li

Keyword(s):

Recent Work ◽

Word Sense Disambiguation ◽

Representation Learning ◽

Semantic Knowledge ◽

Word Sense ◽

Word Similarity ◽

Text Corpora ◽

Word Representation ◽

Semantic Lexicons ◽

Semantic Resources

Much recent work focuses on leveraging semantic lexicons like WordNet to enhance word representation learning (WRL) and achieves promising performance on many NLP tasks. However, most existing methods might have limitations because they require high-quality, manually created, semantic lexicons or linguistic structures. In this paper, we propose to leverage semantic knowledge automatically mined from web structured data to enhance WRL. We first construct a semantic similarity graph, which is referred as semantic knowledge, based on a large collection of semantic lists extracted from the web using several pre-defined HTML tag patterns. Then we introduce an efficient joint word representation learning model to capture semantics from both semantic knowledge and text corpora. Compared with recent work on improving WRL with semantic resources, our approach is more general, and can be easily scaled with no additional effort. Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks.

Download Full-text

Patterns in language: Text analysis of government reports on the Irish industrial school system with word embedding

Digital Scholarship in the Humanities ◽

10.1093/llc/fqz012 ◽

2019 ◽

Vol 34 (Supplement_1) ◽

pp. i110-i122

Author(s):

Susan Leavy ◽

Mark T Keane ◽

Emilie Pine

Keyword(s):

Machine Learning ◽

Catholic Church ◽

Cultural Context ◽

Digital Humanities ◽

Word Embedding ◽

Narrative Form ◽

Text Analytics ◽

Distant Reading ◽

Domain Specific ◽

Semantic Lexicons

AbstractIndustrial Memories is a digital humanities initiative to supplement close readings of a government report with new distant readings, using text analytics techniques. The Ryan Report (2009), the official report of the Commission to Inquire into Child Abuse (CICA), details the systematic abuse of thousands of children from 1936 to 1999 in residential institutions run by religious orders and funded and overseen by the Irish State. Arguably, the sheer size of the Ryan Report—over 1 million words—warrants a new approach that blends close readings to witness its findings, with distant readings that help surface system-wide findings embedded in the Report. Although CICA has been lauded internationally for its work, many have critiqued the narrative form of the Ryan Report, for obfuscating key findings and providing poor systemic, statistical summaries that are crucial to evaluating the political and cultural context in which the abuse took place (Keenan, 2013, Child Sexual Abuse and the Catholic Church: Gender, Power, and Organizational Culture. Oxford University Press). In this article, we concentrate on describing the distant reading methodology we adopted, using machine learning and text-analytic methods and report on what they surfaced from the Report. The contribution of this work is threefold: (i) it shows how text analytics can be used to surface new patterns, summaries and results that were not apparent via close reading, (ii) it demonstrates how machine learning can be used to annotate text by using word embedding to compile domain-specific semantic lexicons for feature extraction and (iii) it demonstrates how digital humanities methods can be applied to an official state inquiry with social justice impact.

Download Full-text

Comprehensive Review on Effectual Information Retrieval of Semantic Drift using Deep Neural Network

Asian Journal of Computer Science and Technology ◽

10.51983/ajcst-2019.8.1.2122 ◽

2019 ◽

Vol 8 (1) ◽

pp. 32-35

Author(s):

A. Uma Maheswari ◽

N. Revathy

Keyword(s):

Neural Network ◽

Information Retrieval ◽

Information Extraction ◽

Deep Neural Network ◽

Research Work ◽

Single Class ◽

High Recall ◽

Comprehensive Review ◽

Substantial Loss ◽

Semantic Lexicons

Semantic drift is a common problem in iterative information extraction. Unsupervised bagging and incorporated distributional similarity is used to reduce the difficulty of semantic drift in iterative bootstrapping algorithms, particularly when extracting large semantic lexicons. In this research work, a method to minimize semantic drift by identifying the (Drifting Points) DPs and removing the effect introduced by the DPs is proposed. Previous methods for identifying drifting errors can be roughly divided into two categories: (1) multi-class based, and (2) single-class based, according to the settings of Information Extraction systems that adopt them. Compared to previous approaches which usually incur substantial loss in recall, DP-based cleaning method can effectively clean a large proportion of semantic drift errors while keeping a high recall.

Download Full-text