Web Document Categorization Using Knowledge Graph and Semantic Textual Topic Detection

2021 ◽  
pp. 40-51
Author(s):  
Antonio M. Rinaldi ◽  
Cristiano Russo ◽  
Cristian Tommasino
2020 ◽  
Vol 17 (6) ◽  
pp. 2730-2736
Author(s):  
Bhavana ◽  
Neeraj Raheja

Textual data on the web is growing at an exponential rate and millions of categories of data are available on web. In this big data environment, finding accurate information from these millions of categories is a challenging task. Web document categorization helps in classifying the web pages in to one or more predefined categories. This paper proposed hybrid Text document categorization approach. In this paper document categorization approach is improved by feature extraction and selection approach since filtering for relevant features is done twice. Here Deep Belief Network (DBN) is used for feature extraction i.e., DBN-FE, Binary Genetic Approach is used for Feature Selection i.e., BGA-FS and three different classifiers are used for evaluation of result of proposed categorization scheme. BGA Model gives the promising result for proposed approach in comparison to other approaches.


2021 ◽  
Author(s):  
E. Elakiya ◽  
R. Kanagaraj ◽  
N. Rajkumar

In every moment, there is a huge capacity of data and information communicated through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document dispensation phase is still not accomplished of extracting data as a human reader. Furthermore the significance of content in the text may also differ from one reader to another. The proposed Multiple Spider Hunting Algorithm has been used to diminish the time complexity in compare with single spider move with multiple spiders. The construction of spider is dynamic depends on the volume of a corpus. In some case tokens may related to more than one topic and there is a need to detect Topic on semantic way. Multiple Semantic Spider Hunting Algorithm is proposed based on the semantics among terms and association can be drawn between words using semantic lexicons. Topic or lists of opinions are generated from the knowledge graph. News articles are gathered from five dissimilar topics such as sports, business, education, tourism and media. Usefulness of the proposed algorithms have been calculated based on the factors precision, recall, f-measure, accuracy, true positive, false positive and topic detection percentage. Multiple Semantic Spider Hunting Algorithm produced good result. Topic detection percentage of Spider Hunting Algorithm has been compared to other algorithms Naïve bayes, Neural Network, Decision tree and Particle Swarm Optimization. Spider Hunting Algorithm produced more than 90% precise detection of topic and subtopic.


1999 ◽  
Vol 27 (3) ◽  
pp. 329-341 ◽  
Author(s):  
Daniel Boley ◽  
Maria Gini ◽  
Robert Gross ◽  
Eui-Hong (Sam) Han ◽  
Kyle Hastings ◽  
...  

Author(s):  
Antonio M. Rinaldi ◽  
Cristiano Russo

Abstract The synthesis process of document content and its visualization play a basic role in the context of knowledge representation and retrieval. Existing methods for tag-clouds generations are mostly based on text content of documents, others also consider statistical or semantic information to enrich the document summary, while precious information deriving from multimedia content is often neglected. In this paper we present a document summarization and visualization technique based on both statistical and semantic analysis of textual and visual contents. The result of our framework is a Visual Semantic Tag Cloud based on the highlighting of relevant terms in a document using some features (font size, color, etc.) showing the importance of a term compared to other ones. The semantic information is derived from a knowledge base where concepts are represented through several multimedia items. The Visual Semantic Tag Cloud can be used not only to synthesize a document but also to represent a set of documents grouped by categories using a topic detection technique based on textual and visual analysis of multimedia features. Our work aims at demonstrating that with the help of semantic analysis and the combination of textual and visual features it is possible to improve the user knowledge acquisition by means of a synthesized visualization. The whole strategy has been evaluated by means of a ground truth and compared with similar approaches. Experimental results show the effectiveness of our approach, which outperforms state-of-art algorithms in topic detection combining both visual and semantic information.


Sign in / Sign up

Export Citation Format

Share Document