Optimization of Document Clustering Using UNL Document Vector Generation and Swarm Intelligence

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, informationcontent- based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector re-weighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.

Download Full-text

Medical Document Clustering Using Ontology-Based Term Similarity Measures

Medical Informatics ◽

10.4018/978-1-60566-050-9.ch169 ◽

2011 ◽

pp. 2232-2243

Author(s):

Xiaodan Zhang ◽

Liping Jing ◽

Xiaohua Hu ◽

Michael Ng ◽

Jiali Xia ◽

...

Keyword(s):

Semantic Similarity ◽

Domain Knowledge ◽

Document Clustering ◽

Similarity Measures ◽

Concept Hierarchy ◽

Term Similarity ◽

Feature Based ◽

Document Vector ◽

Real World Datasets ◽

Medical Document

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, information-content- based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector reweighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.

Download Full-text

Swarm Intelligence in Text Document Clustering

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch010 ◽

2010 ◽

pp. 165-180 ◽

Cited By ~ 2

Author(s):

Xiaohui Cui

Keyword(s):

Swarm Intelligence ◽

Clustering Analysis ◽

Information Society ◽

Clustering Algorithms ◽

Document Clustering ◽

High Quality ◽

Fish Schools ◽

Text Document ◽

Self Organized ◽

Swarm Algorithms

In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. The major challenge of today’s information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. The swarm intelligence clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools, and ant food forage. Compared to the traditional clustering algorithms, the swarm algorithms are usually flexible, robust, decentralized, and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document clustering.

Download Full-text

Text Augmentation Techniques for Document Vector Generation from Russian News Articles

Communications in Computer and Information Science - Information and Software Technologies ◽

10.1007/978-3-319-99972-2_47 ◽

2018 ◽

pp. 571-586

Author(s):

Christoffer Aminoff ◽

Aleksei Romanenko ◽

Onni Kosomaa ◽

Jouko Vankka

Keyword(s):

Vector Generation ◽

Augmentation Techniques ◽

Document Vector

Download Full-text

CSIM: a document clustering algorithm based on swarm intelligence

Proceedings of the 2002 Congress on Evolutionary Computation. CEC'02 (Cat. No.02TH8600) ◽

10.1109/cec.2002.1006281 ◽

2003 ◽

Cited By ~ 2

Author(s):

Wu Bin ◽

Zheng Yi ◽

Liu Shaohui ◽

Shi Zhongzhi

Keyword(s):

Swarm Intelligence ◽

Clustering Algorithm ◽

Document Clustering

Download Full-text

Document vector compression and its application in document clustering

Canadian Conference on Electrical and Computer Engineering, 2005. ◽

10.1109/ccece.2005.1557384 ◽

2006 ◽

Cited By ~ 1

Author(s):

T.W. Fox

Keyword(s):

Document Clustering ◽

Document Vector

Download Full-text

Swarm Intelligence - Volume 1: Principles, current algorithms and methods

10.1049/pbce119f ◽

2018 ◽

Keyword(s):

Swarm Intelligence

Download Full-text

A Recursive Method for Vector Generation in Non-increasing Order of Its Likelihood for All Binary Vectors and Its Application for Linear Block Code Decodings

IEICE Transactions on Fundamentals of Electronics Communications and Computer Sciences ◽

10.1587/transfun.e95.a.801 ◽

2012 ◽

Vol E95-A (4) ◽

pp. 801-810 ◽

Cited By ~ 1

Author(s):

Takuya KUSAKA ◽

Ryuhei YOKOYAMA ◽

Toru FUJIWARA

Keyword(s):

Block Code ◽

Recursive Method ◽

Binary Vectors ◽

Vector Generation ◽

Linear Block Code ◽

Linear Block

Download Full-text

Performance Analysis of TBC-ACO Routing Protocol with Existing Routing Protocols of Wireless Sensor Networks

International Journal of Advanced Research in Computer Science and Software Engineering ◽

10.23956/ijarcsse.v7i8.46 ◽

2017 ◽

Vol 7 (8) ◽

pp. 169

Author(s):

A. Radhika ◽

D. Haritha

Keyword(s):

Energy Efficiency ◽

Wireless Sensor Networks ◽

Sensor Networks ◽

Swarm Intelligence ◽

Routing Protocol ◽

Routing Protocols ◽

Energy Efficient ◽

Sensor Nodes ◽

Wireless Sensor ◽

Congestion Aware

Wireless Sensor Networks, have witnessed significant amount of improvement in research across various areas like Routing, Security, Localization, Deployment and above all Energy Efficiency. Congestion is a problem of importance in resource constrained Wireless Sensor Networks, especially for large networks, where the traffic loads exceed the available capacity of the resources . Sensor nodes are prone to failure and the misbehaviour of these faulty nodes creates further congestion. The resulting effect is a degradation in network performance, additional computation and increased energy consumption, which in turn decreases network lifetime. Hence, the data packet routing algorithm should consider congestion as one of the parameters, in addition to the role of the faulty nodes and not merely energy efficient protocols .Nowadays, the main central point of attraction is the concept of Swarm Intelligence based techniques integration in WSN. Swarm Intelligence based Computational Swarm Intelligence Techniques have improvised WSN in terms of efficiency, Performance, robustness and scalability. The main objective of this research paper is to propose congestion aware , energy efficient, routing approach that utilizes Ant Colony Optimization, in which faulty nodes are isolated by means of the concept of trust further we compare the performance of various existing routing protocols like AODV, DSDV and DSR routing protocols, ACO Based Routing Protocol with Trust Based Congestion aware ACO Based Routing in terms of End to End Delay, Packet Delivery Rate, Routing Overhead, Throughput and Energy Efficiency. Simulation based results and data analysis shows that overall TBC-ACO is 150% more efficient in terms of overall performance as compared to other existing routing protocols for Wireless Sensor Networks.

Download Full-text

Optimization of Document Clustering Using UNL Document Vector Generation and Swarm Intelligence

An Evaluation of the Formal Concept Analysis-Based Document Vector on Document Clustering

Medical Document Clustering Using Ontology-Based Term Similarity Measures

Medical Document Clustering Using Ontology-Based Term Similarity Measures

Swarm Intelligence in Text Document Clustering

Text Augmentation Techniques for Document Vector Generation from Russian News Articles

CSIM: a document clustering algorithm based on swarm intelligence

Document vector compression and its application in document clustering

Swarm Intelligence - Volume 1: Principles, current algorithms and methods

A Recursive Method for Vector Generation in Non-increasing Order of Its Likelihood for All Binary Vectors and Its Application for Linear Block Code Decodings

Performance Analysis of TBC-ACO Routing Protocol with Existing Routing Protocols of Wireless Sensor Networks

Export Citation Format