document clustering Latest Research Papers

A Hybrid Algorithm for Document Clustering Using Optimized Kernel Matrix and Unsupervised Constraints

EAI/Springer Innovations in Communication and Computing - 3rd EAI International Conference on Big Data Innovation for Sustainable Cognitive Computing ◽

10.1007/978-3-030-78750-9_1 ◽

2022 ◽

pp. 1-20

Author(s):

S. Siamala Devi ◽

M. Deva Priya ◽

P. Anitha Rajakumari ◽

R. Kanmani ◽

G. Poorani ◽

...

Keyword(s):

Hybrid Algorithm ◽

Document Clustering ◽

Kernel Matrix

Max stable set problem to found the initial centroids in clustering problem

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v25.i1.pp569-579 ◽

2022 ◽

Vol 25 (1) ◽

pp. 569

Author(s):

Awatif Karim ◽

Chakir Loqman ◽

Youssef Hami ◽

Jaouad Boumhidi

Keyword(s):

Document Clustering ◽

Large Data ◽

Hopfield Network ◽

Large Data Sets ◽

Stable Set ◽

Data Sets ◽

Clustering Problem ◽

Text Document ◽

Stable Set Problem

In this paper, we propose a new approach to solve the document-clustering using the K-Means algorithm. The latter is sensitive to the random selection of the k cluster centroids in the initialization phase. To evaluate the quality of K-Means clustering we propose to model the text document clustering problem as the max stable set problem (MSSP) and use continuous Hopfield network to solve the MSSP problem to have initial centroids. The idea is inspired by the fact that MSSP and clustering share the same principle, MSSP consists to find the largest set of nodes completely disconnected in a graph, and in clustering, all objects are divided into disjoint clusters. Simulation results demonstrate that the proposed K-Means improved by MSSP (KM_MSSP) is efficient of large data sets, is much optimized in terms of time, and provides better quality of clustering than other methods.

Grey wolf optimization algorithm for hierarchical document clustering

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i3.pp1744-1758 ◽

2021 ◽

Vol 24 (3) ◽

pp. 1744

Author(s):

Ayad Mohammed Jabbar ◽

Ku Ruhana Ku-Mahamud

Keyword(s):

Document Clustering ◽

Text Clustering ◽

Search Space ◽

Learning Approaches ◽

Grey Wolf ◽

External Evaluation ◽

Grey Wolf Optimization ◽

Noise Data ◽

F Measure ◽

Better Than

In data mining, the application of grey wolf optimization (GWO) algorithm has been used in several learning approaches because of its simplicity in adapting to different application domains. Most recent works that concern unsupervised learning have focused on text clustering, where the GWO algorithm shows promising results. Although GWO has great potential in performing text clustering, it has limitations in dealing with outlier documents and noise data. This research introduces medoid GWO (M-GWO) algorithm, which incorporates a medoid recalculation process to share the information of medoids among the three best wolves and the rest of the population. This improvement aims to find the best set of medoids during the algorithm run and increases the exploitation search to find more local regions in the search space. Experimental results obtained from using well-known algorithms, such as genetic, firefly, GWO, and k-means algorithms, in four benchmarks. The results of external evaluation metrics, such as rand, purity, F-measure, and entropy, indicates that the proposed M-GWO algorithm achieves better document clustering than all other algorithms (i.e., 75% better when using Rand metric, 50% better than all algorithm based on purity metric, 75% better than all algorithms using F-measure metric, and 100% based on entropy metric).

Document Clustering Analysis Based on Hybrid Cuckoo Search and K-means Algorithm

10.1109/iemcon53756.2021.9623204 ◽

2021 ◽

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Noureddine Brakta

Keyword(s):

Clustering Analysis ◽

Document Clustering ◽

Cuckoo Search

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

International Journal of Swarm Intelligence Research ◽

10.4018/ijsir.2021100109 ◽

2021 ◽

Vol 12 (4) ◽

pp. 169-185

Author(s):

Saida Ishak Boushaki ◽

Omar Bendjeghaba ◽

Nadjet Kamel

Keyword(s):

Clustering Algorithm ◽

Search Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Latent Semantic Indexing ◽

Research Area ◽

Semantic Indexing ◽

Local Optima ◽

Symbiotic Organisms Search ◽

Symbiotic Organisms

Clustering is an important unsupervised analysis technique for big data mining. It finds its application in several domains including biomedical documents of the MEDLINE database. Document clustering algorithms based on metaheuristics is an active research area. However, these algorithms suffer from the problems of getting trapped in local optima, need many parameters to adjust, and the documents should be indexed by a high dimensionality matrix using the traditional vector space model. In order to overcome these limitations, in this paper a new documents clustering algorithm (ASOS-LSI) with no parameters is proposed. It is based on the recent symbiotic organisms search metaheuristic (SOS) and enhanced by an acceleration technique. Furthermore, the documents are represented by semantic indexing based on the famous latent semantic indexing (LSI). Conducted experiments on well-known biomedical documents datasets show the significant superiority of ASOS-LSI over five famous algorithms in terms of compactness, f-measure, purity, misclassified documents, entropy, and runtime.

An Efficient Document Clustering Method using Space Transformation based on LDA and WMD

Journal of KIISE ◽

10.5626/jok.2021.48.9.1052 ◽

2021 ◽

Vol 48 (9) ◽

pp. 1052-1060

Author(s):

Yongdam Kim ◽

Sungwon Jung

Keyword(s):

Document Clustering ◽

Clustering Method ◽

Space Transformation

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

KSII Transactions on Internet and Information Systems ◽

10.3837/tiis.2021.09.001 ◽

2021 ◽

Vol 15 (9) ◽

Keyword(s):

Information Retrieval ◽

Document Clustering ◽

Web Document ◽

Relational Similarity ◽

Efficient Information ◽

Web Document Clustering ◽

Semantic Ontology

Text Document Clustering with Negative Noun Attributes

Bioscience Biotechnology Research Communications ◽

10.21786/bbrc/14.9.52 ◽

2021 ◽

Vol 14 (9) ◽

pp. 277-284

Author(s):

S. Vijayalakshmi

Keyword(s):

Document Clustering ◽

Text Document

Document Clustering and Topic Classification Using Latent Dirichlet Allocation

10.1109/icses52305.2021.9633830 ◽

2021 ◽

Author(s):

Meenu Gupta ◽

Abdul Wasi ◽

Ankit Verma ◽

Somesh Awasthi

Keyword(s):

Latent Dirichlet Allocation ◽

Document Clustering ◽

Dirichlet Allocation

Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

Current Journal of Applied Science and Technology ◽

10.9734/cjast/2021/v40i2231475 ◽

2021 ◽

pp. 8-25

Author(s):

Valentina Adu ◽

Michael Donkor Adane ◽

Kwadwo Asante

Keyword(s):

Data Mining ◽

Singular Value Decomposition ◽

Similarity Measure ◽

Document Clustering ◽

Singular Value ◽

Data Mining Algorithm ◽

Original Matrix ◽

Text Documents ◽

Text Document ◽

Value Decomposition

We examined a similarity measure between text documents clustering. Data mining is a challenging field with more research and application areas. Text document clustering, which is a subset of data mining helps groups and organizes a large quantity of unstructured text documents into a small number of meaningful clusters. An algorithm which works better by calculating the degree of closeness of documents using their document matrix was used to query the terms/words in each document. We also determined whether a given set of text documents are similar/different to the other when these terms are queried. We found that, the ability to rank and approximate documents using matrix allows the use of Singular Value Decomposition (SVD) as an enhanced text data mining algorithm. Also, applying SVD to a matrix of a high dimension results in matrix of a lower dimension, to expose the relationships in the original matrix by ordering it from the most variant to the lowest.

document clustering
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Hybrid Algorithm for Document Clustering Using Optimized Kernel Matrix and Unsupervised Constraints

Max stable set problem to found the initial centroids in clustering problem

Grey wolf optimization algorithm for hierarchical document clustering

Document Clustering Analysis Based on Hybrid Cuckoo Search and K-means Algorithm

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

An Efficient Document Clustering Method using Space Transformation based on LDA and WMD

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

Text Document Clustering with Negative Noun Attributes

Document Clustering and Topic Classification Using Latent Dirichlet Allocation

Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

Export Citation Format

document clusteringRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

A Hybrid Algorithm for Document Clustering Using Optimized Kernel Matrix and Unsupervised Constraints

Max stable set problem to found the initial centroids in clustering problem

Grey wolf optimization algorithm for hierarchical document clustering

Document Clustering Analysis Based on Hybrid Cuckoo Search and K-means Algorithm

Biomedical Document Clustering Based on Accelerated Symbiotic Organisms Search Algorithm

An Efficient Document Clustering Method using Space Transformation based on LDA and WMD

Semantic Conceptual Relational Similarity Based Web Document Clustering for Efficient Information Retrieval Using Semantic Ontology

Text Document Clustering with Negative Noun Attributes

Document Clustering and Topic Classification Using Latent Dirichlet Allocation

Similarity Measure Algorithm for Text Document Clustering, Using Singular Value Decomposition

document clustering
Recently Published Documents