Optimization of Document Clustering Using UNL Document Vector Generation and Swarm Intelligence

Author(s):  
Vishakha A. Metre ◽  
Shraddha K. Popat ◽  
Pramod B. Deshmukh
Author(s):  
Zhang Xiaodan ◽  
Jing Liping ◽  
Hu Xiaohua ◽  
Ng Michael ◽  
Xia Jiali ◽  
...  

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, informationcontent- based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector re-weighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.


2011 ◽  
pp. 2232-2243
Author(s):  
Xiaodan Zhang ◽  
Liping Jing ◽  
Xiaohua Hu ◽  
Michael Ng ◽  
Jiali Xia ◽  
...  

Recent research shows that ontology as background knowledge can improve document clustering quality with its concept hierarchy knowledge. Previous studies take term semantic similarity as an important measure to incorporate domain knowledge into clustering process such as clustering initialization and term re-weighting. However, not many studies have been focused on how different types of term similarity measures affect the clustering performance for a certain domain. In this article, we conduct a comparative study on how different term semantic similarity measures including path-based, information-content- based and feature-based similarity measure affect document clustering. Term re-weighting of document vector is an important method to integrate domain ontology to clustering process. In detail, the weight of a term is augmented by the weights of its co-occurred concepts. Spherical k-means are used for evaluate document vector reweighting on two real-world datasets: Disease10 and OHSUMED23. Experimental results on nine different semantic measures have shown that: (1) there is no certain type of similarity measures that significantly outperforms the others; (2) Several similarity measures have rather more stable performance than the others; (3) term re-weighting has positive effects on medical document clustering, but might not be significant when documents are short of terms.


Author(s):  
Xiaohui Cui

In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. The major challenge of today’s information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. The swarm intelligence clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools, and ant food forage. Compared to the traditional clustering algorithms, the swarm algorithms are usually flexible, robust, decentralized, and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document clustering.


Author(s):  
A. Radhika ◽  
D. Haritha

Wireless Sensor Networks, have witnessed significant amount of improvement in research across various areas like Routing, Security, Localization, Deployment and above all Energy Efficiency. Congestion is a problem of  importance in resource constrained Wireless Sensor Networks, especially for large networks, where the traffic loads exceed the available capacity of the resources . Sensor nodes are prone to failure and the misbehaviour of these faulty nodes creates further congestion. The resulting effect is a degradation in network performance, additional computation and increased energy consumption, which in turn decreases network lifetime. Hence, the data packet routing algorithm should consider congestion as one of the parameters, in addition to the role of the faulty nodes and not merely energy efficient protocols .Nowadays, the main central point of attraction is the concept of Swarm Intelligence based techniques integration in WSN.  Swarm Intelligence based Computational Swarm Intelligence Techniques have improvised WSN in terms of efficiency, Performance, robustness and scalability. The main objective of this research paper is to propose congestion aware , energy efficient, routing approach that utilizes Ant Colony Optimization, in which faulty nodes are isolated by means of the concept of trust further we compare the performance of various existing routing protocols like AODV, DSDV and DSR routing protocols, ACO Based Routing Protocol  with Trust Based Congestion aware ACO Based Routing in terms of End to End Delay, Packet Delivery Rate, Routing Overhead, Throughput and Energy Efficiency. Simulation based results and data analysis shows that overall TBC-ACO is 150% more efficient in terms of overall performance as compared to other existing routing protocols for Wireless Sensor Networks.


Sign in / Sign up

Export Citation Format

Share Document