Comparision of Different Distance Measure Methods in Text Document Clustering

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.

Download Full-text

Similarity Measure Approaches Applied in Text Document Clustering for Information Retrieval

2020 Sixth International Conference on Parallel, Distributed and Grid Computing (PDGC) ◽

10.1109/pdgc50313.2020.9315851 ◽

2020 ◽

Author(s):

Naveen Kumar ◽

Sanjay Kumar Yadav ◽

Divakar Singh Yadav

Keyword(s):

Information Retrieval ◽

Similarity Measure ◽

Document Clustering ◽

Text Document

Download Full-text

Text document clustering using statistical integrated graph based sentence sensitivity ranking algorithm

IOP Conference Series Materials Science and Engineering ◽

10.1088/1757-899x/1070/1/012069 ◽

2021 ◽

Vol 1070 (1) ◽

pp. 012069

Author(s):

G Kannan ◽

R Nagarajan

Keyword(s):

Document Clustering ◽

Ranking Algorithm ◽

Text Document

Download Full-text

Text Document Clustering Using Community Discovery Approach

Distributed Computing and Internet Technology - Lecture Notes in Computer Science ◽

10.1007/978-3-030-36987-3_22 ◽

2019 ◽

pp. 336-346

Author(s):

Anu Beniwal ◽

Gourav Roy ◽

S. Durga Bhavani

Keyword(s):

Document Clustering ◽

Community Discovery ◽

Text Document

Download Full-text

A semantic approach for text document clustering using frequent itemsets and WordNet

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9.10220 ◽

2018 ◽

Vol 7 (2.18) ◽

pp. 102

Author(s):

Harsha Patil ◽

Ramjeevan Singh Thakur

Keyword(s):

Clustering Algorithms ◽

Document Clustering ◽

Knowledge Bases ◽

Experimental Result ◽

Semantic Approach ◽

Text Document ◽

Clustering Quality ◽

Ship Function ◽

Membership Score ◽

Specific Cluster

Document Clustering is an unsupervised method for classified documents in clusters on the basis of their similarity. Any document get it place in any specific cluster, on the basis of membership score, which calculated through membership function. But many of the traditional clustering algorithms are generally based on only BOW (Bag of Words), which ignores the semantic similarity between document and Cluster. In this research we consider the semantic association between cluster and text document during the calculation of membership score of any document for any specific cluster. Several researchers are working on semantic aspects of document clustering to develop clustering performance. Many external knowledge bases like WordNet, Wikipedia, Lucene etc. are utilized for this purpose. The proposed approach exploits WordNet to improve cluster member ship function. The experimental result shows that clustering quality improved significantly by using proposed framework of semantic approach.

Download Full-text