Web Document Categorization Using Knowledge Graph and Semantic Textual Topic Detection

2020 ◽

Vol 17 (6) ◽

pp. 2730-2736

Author(s):

Bhavana ◽

Neeraj Raheja

Keyword(s):

Feature Extraction ◽

Promising Result ◽

Hybrid Approach ◽

Accurate Information ◽

Text Document ◽

Web Document ◽

Feature Extraction And Selection ◽

Binary Genetic Algorithm ◽

Document Categorization ◽

The Web

Textual data on the web is growing at an exponential rate and millions of categories of data are available on web. In this big data environment, finding accurate information from these millions of categories is a challenging task. Web document categorization helps in classifying the web pages in to one or more predefined categories. This paper proposed hybrid Text document categorization approach. In this paper document categorization approach is improved by feature extraction and selection approach since filtering for relevant features is done twice. Here Deep Belief Network (DBN) is used for feature extraction i.e., DBN-FE, Binary Genetic Approach is used for Feature Selection i.e., BGA-FS and three different classifiers are used for evaluation of result of proposed categorization scheme. BGA Model gives the promising result for proposed approach in comparison to other approaches.

Download Full-text

Web document categorization by Support Vector Clustering

2008 IEEE International Conference on Systems, Man and Cybernetics ◽

10.1109/icsmc.2008.4811495 ◽

2008 ◽

Author(s):

Daming Shi ◽

Ming Hei Tsui ◽

Jigang Liu

Keyword(s):

Support Vector ◽

Support Vector Clustering ◽

Web Document ◽

Document Categorization ◽

Vector Clustering

Download Full-text

Topic Detection Using Multiple Semantic Spider Hunting Algorithm

10.3233/apc210072 ◽

2021 ◽

Author(s):

E. Elakiya ◽

R. Kanagaraj ◽

N. Rajkumar

Keyword(s):

Business Education ◽

Time Complexity ◽

Knowledge Graph ◽

Topic Detection ◽

True Positive ◽

Text Data ◽

Swarm Optimization ◽

Manual Sorting ◽

Semantic Lexicons ◽

F Measure

In every moment, there is a huge capacity of data and information communicated through social network. Analyzing huge amounts of text data is very tedious, time consuming, expensive and manual sorting leads to mistakes and inconsistency. Document dispensation phase is still not accomplished of extracting data as a human reader. Furthermore the significance of content in the text may also differ from one reader to another. The proposed Multiple Spider Hunting Algorithm has been used to diminish the time complexity in compare with single spider move with multiple spiders. The construction of spider is dynamic depends on the volume of a corpus. In some case tokens may related to more than one topic and there is a need to detect Topic on semantic way. Multiple Semantic Spider Hunting Algorithm is proposed based on the semantics among terms and association can be drawn between words using semantic lexicons. Topic or lists of opinions are generated from the knowledge graph. News articles are gathered from five dissimilar topics such as sports, business, education, tourism and media. Usefulness of the proposed algorithms have been calculated based on the factors precision, recall, f-measure, accuracy, true positive, false positive and topic detection percentage. Multiple Semantic Spider Hunting Algorithm produced good result. Topic detection percentage of Spider Hunting Algorithm has been compared to other algorithms Naïve bayes, Neural Network, Decision tree and Particle Swarm Optimization. Spider Hunting Algorithm produced more than 90% precise detection of topic and subtopic.

Download Full-text

Genetic-based fuzzy clustering for automatic Web document categorization

Proceedings of the 2001 ACM symposium on Applied computing - SAC '01 ◽

10.1145/372202.372421 ◽

2001 ◽

Cited By ~ 1

Author(s):

Vincenzo Loia ◽

Paola Luongo

Keyword(s):

Fuzzy Clustering ◽

Web Document ◽

Document Categorization

Download Full-text

Partitioning-based clustering for Web document categorization

Decision Support Systems ◽

10.1016/s0167-9236(99)00055-x ◽

1999 ◽

Vol 27 (3) ◽

pp. 329-341 ◽

Cited By ~ 138

Author(s):

Daniel Boley ◽

Maria Gini ◽

Robert Gross ◽

Eui-Hong (Sam) Han ◽

Kyle Hastings ◽

...

Keyword(s):

Web Document ◽

Document Categorization

Download Full-text

Using a multimedia semantic graph for web document visualization and summarization

Multimedia Tools and Applications ◽

10.1007/s11042-020-09761-1 ◽

2020 ◽

Author(s):

Antonio M. Rinaldi ◽

Cristiano Russo

Keyword(s):

Semantic Information ◽

Semantic Analysis ◽

Visual Analysis ◽

Ground Truth ◽

Synthesis Process ◽

Topic Detection ◽

Tag Cloud ◽

Web Document ◽

Text Content ◽

User Knowledge

Abstract The synthesis process of document content and its visualization play a basic role in the context of knowledge representation and retrieval. Existing methods for tag-clouds generations are mostly based on text content of documents, others also consider statistical or semantic information to enrich the document summary, while precious information deriving from multimedia content is often neglected. In this paper we present a document summarization and visualization technique based on both statistical and semantic analysis of textual and visual contents. The result of our framework is a Visual Semantic Tag Cloud based on the highlighting of relevant terms in a document using some features (font size, color, etc.) showing the importance of a term compared to other ones. The semantic information is derived from a knowledge base where concepts are represented through several multimedia items. The Visual Semantic Tag Cloud can be used not only to synthesize a document but also to represent a set of documents grouped by categories using a topic detection technique based on textual and visual analysis of multimedia features. Our work aims at demonstrating that with the help of semantic analysis and the combination of textual and visual features it is possible to improve the user knowledge acquisition by means of a synthesized visualization. The whole strategy has been evaluated by means of a ground truth and compared with similar approaches. Experimental results show the effectiveness of our approach, which outperforms state-of-art algorithms in topic detection combining both visual and semantic information.

Download Full-text