semantic lexicons
Recently Published Documents


TOTAL DOCUMENTS

32
(FIVE YEARS 6)

H-INDEX

7
(FIVE YEARS 0)

2021 ◽  
Author(s):  
E. Elakiya ◽  
R. Kanagaraj ◽  
N. Rajkumar

In every moment, there is a huge capacity of data and information communicated through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document dispensation phase is still not accomplished of extracting data as a human reader. Furthermore the significance of content in the text may also differ from one reader to another. The proposed Multiple Spider Hunting Algorithm has been used to diminish the time complexity in compare with single spider move with multiple spiders. The construction of spider is dynamic depends on the volume of a corpus. In some case tokens may related to more than one topic and there is a need to detect Topic on semantic way. Multiple Semantic Spider Hunting Algorithm is proposed based on the semantics among terms and association can be drawn between words using semantic lexicons. Topic or lists of opinions are generated from the knowledge graph. News articles are gathered from five dissimilar topics such as sports, business, education, tourism and media. Usefulness of the proposed algorithms have been calculated based on the factors precision, recall, f-measure, accuracy, true positive, false positive and topic detection percentage. Multiple Semantic Spider Hunting Algorithm produced good result. Topic detection percentage of Spider Hunting Algorithm has been compared to other algorithms Naïve bayes, Neural Network, Decision tree and Particle Swarm Optimization. Spider Hunting Algorithm produced more than 90% precise detection of topic and subtopic.


Semantic drift is a common problem in iterative information extraction. Unsupervised bagging and incorporated distributional similarity is used to reduce the difficulty of semantic drift in iterative bootstrapping algorithms, particularly when extracting large semantic lexicons. Compared to previous approaches which usually incur substantial loss in recall, DP-based cleaning method can effectively clean a large proportion of semantic drift errors while keeping a high recall.


Author(s):  
Haoyan Liu ◽  
Lei Fang ◽  
Jian-Guang Lou ◽  
Zhoujun Li

Much recent work focuses on leveraging semantic lexicons like WordNet to enhance word representation learning (WRL) and achieves promising performance on many NLP tasks. However, most existing methods might have limitations because they require high-quality, manually created, semantic lexicons or linguistic structures. In this paper, we propose to leverage semantic knowledge automatically mined from web structured data to enhance WRL. We first construct a semantic similarity graph, which is referred as semantic knowledge, based on a large collection of semantic lists extracted from the web using several pre-defined HTML tag patterns. Then we introduce an efficient joint word representation learning model to capture semantics from both semantic knowledge and text corpora. Compared with recent work on improving WRL with semantic resources, our approach is more general, and can be easily scaled with no additional effort. Extensive experimental results show that our approach outperforms the state-of-the-art methods on word similarity, word sense disambiguation, text classification and textual similarity tasks.


2019 ◽  
Vol 34 (Supplement_1) ◽  
pp. i110-i122
Author(s):  
Susan Leavy ◽  
Mark T Keane ◽  
Emilie Pine

AbstractIndustrial Memories is a digital humanities initiative to supplement close readings of a government report with new distant readings, using text analytics techniques. The Ryan Report (2009), the official report of the Commission to Inquire into Child Abuse (CICA), details the systematic abuse of thousands of children from 1936 to 1999 in residential institutions run by religious orders and funded and overseen by the Irish State. Arguably, the sheer size of the Ryan Report—over 1 million words—warrants a new approach that blends close readings to witness its findings, with distant readings that help surface system-wide findings embedded in the Report. Although CICA has been lauded internationally for its work, many have critiqued the narrative form of the Ryan Report, for obfuscating key findings and providing poor systemic, statistical summaries that are crucial to evaluating the political and cultural context in which the abuse took place (Keenan, 2013, Child Sexual Abuse and the Catholic Church: Gender, Power, and Organizational Culture. Oxford University Press). In this article, we concentrate on describing the distant reading methodology we adopted, using machine learning and text-analytic methods and report on what they surfaced from the Report. The contribution of this work is threefold: (i) it shows how text analytics can be used to surface new patterns, summaries and results that were not apparent via close reading, (ii) it demonstrates how machine learning can be used to annotate text by using word embedding to compile domain-specific semantic lexicons for feature extraction and (iii) it demonstrates how digital humanities methods can be applied to an official state inquiry with social justice impact.


2019 ◽  
Vol 8 (1) ◽  
pp. 32-35
Author(s):  
A. Uma Maheswari ◽  
N. Revathy

Semantic drift is a common problem in iterative information extraction. Unsupervised bagging and incorporated distributional similarity is used to reduce the difficulty of semantic drift in iterative bootstrapping algorithms, particularly when extracting large semantic lexicons. In this research work, a method to minimize semantic drift by identifying the (Drifting Points) DPs and removing the effect introduced by the DPs is proposed. Previous methods for identifying drifting errors can be roughly divided into two categories: (1) multi-class based, and (2) single-class based, according to the settings of Information Extraction systems that adopt them. Compared to previous approaches which usually incur substantial loss in recall, DP-based cleaning method can effectively clean a large proportion of semantic drift errors while keeping a high recall.


2017 ◽  
Vol 10 (3) ◽  
pp. 501-565 ◽  
Author(s):  
Vo Ngoc Phu ◽  
Vo Thi Ngoc Tran ◽  
Vo Thi Ngoc Chau ◽  
Dat Nguyen Duy ◽  
Khanh Ly Doan Duy
Keyword(s):  

2016 ◽  
Author(s):  
Manpreet Kaur ◽  
Nishu Kumari ◽  
Anil Kumar Singh ◽  
Rajeev Sangal

Sign in / Sign up

Export Citation Format

Share Document