Swarm Intelligence in Text Document Clustering

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch010 ◽

2010 ◽

pp. 165-180 ◽

Cited By ~ 2

Author(s):

Xiaohui Cui

Keyword(s):

Swarm Intelligence ◽

Clustering Analysis ◽

Information Society ◽

Clustering Algorithms ◽

Document Clustering ◽

High Quality ◽

Fish Schools ◽

Text Document ◽

Self Organized ◽

Swarm Algorithms

In this chapter, we introduce three nature inspired swarm intelligence clustering approaches for document clustering analysis. The major challenge of today’s information society is being overwhelmed with information on any topic they are searching for. Fast and high-quality document clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the overwhelmed information. The swarm intelligence clustering algorithms use stochastic and heuristic principles discovered from observing bird flocks, fish schools, and ant food forage. Compared to the traditional clustering algorithms, the swarm algorithms are usually flexible, robust, decentralized, and self-organized. These characters make the swarm algorithms suitable for solving complex problems, such as document clustering.

Download Full-text

A semantic approach for text document clustering using frequent itemsets and WordNet

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i2.9.10220 ◽

2018 ◽

Vol 7 (2.18) ◽

pp. 102

Author(s):

Harsha Patil ◽

Ramjeevan Singh Thakur

Keyword(s):

Clustering Algorithms ◽

Document Clustering ◽

Knowledge Bases ◽

Experimental Result ◽

Semantic Approach ◽

Text Document ◽

Clustering Quality ◽

Ship Function ◽

Membership Score ◽

Specific Cluster

Document Clustering is an unsupervised method for classified documents in clusters on the basis of their similarity. Any document get it place in any specific cluster, on the basis of membership score, which calculated through membership function. But many of the traditional clustering algorithms are generally based on only BOW (Bag of Words), which ignores the semantic similarity between document and Cluster. In this research we consider the semantic association between cluster and text document during the calculation of membership score of any document for any specific cluster. Several researchers are working on semantic aspects of document clustering to develop clustering performance. Many external knowledge bases like WordNet, Wikipedia, Lucene etc. are utilized for this purpose. The proposed approach exploits WordNet to improve cluster member ship function. The experimental result shows that clustering quality improved significantly by using proposed framework of semantic approach.

Download Full-text

Generalized swarm intelligence algorithms with domain-specific heuristics

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v10.i1.pp157-165 ◽

2021 ◽

Vol 10 (1) ◽

pp. 157

Author(s):

P. Matrenin ◽

V. Myasnichenko ◽

N. Sdobnyakov ◽

D. Sokolov ◽

S. Fidanova ◽

...

Keyword(s):

Swarm Intelligence ◽

Minimum Distance ◽

Heuristic Algorithms ◽

Population Based ◽

Scheduling Problem ◽

Domain Specific ◽

Hybrid Approaches ◽

Self Organized ◽

Swarm Algorithms ◽

Tasks Scheduling

<span lang="EN-US">In recent years, hybrid approaches on population-based algorithms are more often applied in industrial settings. In this paper, we present the approach of a combination of universal, problem-free Swarm Intelligence (SI) algorithms with simple deterministic domain-specific heuristic algorithms. The approach focuses on improving efficiency by sharing the advantages of domain-specific heuristic and swarm algorithms. A heuristic algorithm helps take into account the specifics of the problem and effectively translate the positions of agents (particle, ant, bee) into the problem's solution. And a Swarm algorithm provides an increase in the adaptability and efficiency of the approach due to stochastic and self-organized properties. We demonstrate this approach on two non-trivial optimization tasks: scheduling problem and finding the minimum distance between 3D isomers.</span>

Download Full-text

Comparative Study of Document Clustering Algorithms

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.11.20816 ◽

2018 ◽

Vol 7 (4.11) ◽

pp. 246

Author(s):

N. M. Ariff ◽

M. A. A. Bakar ◽

M. I. Rahmad

Keyword(s):

Data Mining ◽

Hierarchical Clustering ◽

Clustering Analysis ◽

Clustering Algorithms ◽

Document Clustering ◽

Text Clustering ◽

Data Mining Technique ◽

Mining Technique ◽

Meaningful Result ◽

Different Types

Text clustering is a data mining technique that is becoming more important in present studies. Document clustering makes use of text clustering to divide documents according to the various topics. The choice of words in document clustering is important to ensure that the document can be classified correctly. Three different methods of clustering which are hierarchical clustering, k-means and k-medoids are used and compared in this study in order to identify the best method which produce the best result in document clustering. The three methods are applied on 60 sports articles involving four different types of sports. The k-medoids clustering produced the worst result while k-means clustering is found to be more sensitive towards general words. Therefore, the method of hierarchical clustering is deemed more stable to produce a meaningful result in document clustering analysis.

Download Full-text

An improved ant algorithm with LDA-based representation for text document clustering

Journal of Information Science ◽

10.1177/0165551516638784 ◽

2016 ◽

Vol 43 (2) ◽

pp. 275-292 ◽

Cited By ~ 24

Author(s):

Aytug Onan ◽

Hasan Bulut ◽

Serdar Korukoglu

Keyword(s):

Clustering Algorithm ◽

Latent Dirichlet Allocation ◽

Clustering Algorithms ◽

Document Clustering ◽

Clustering Methods ◽

Initial Value ◽

Text Document ◽

Clustering Quality ◽

Text Features

Document clustering can be applied in document organisation and browsing, document summarisation and classification. The identification of an appropriate representation for textual documents is extremely important for the performance of clustering or classification algorithms. Textual documents suffer from the high dimensionality and irrelevancy of text features. Besides, conventional clustering algorithms suffer from several shortcomings, such as slow convergence and sensitivity to the initial value. To tackle the problems of conventional clustering algorithms, metaheuristic algorithms are frequently applied to clustering. In this paper, an improved ant clustering algorithm is presented, where two novel heuristic methods are proposed to enhance the clustering quality of ant-based clustering. In addition, the latent Dirichlet allocation (LDA) is used to represent textual documents in a compact and efficient way. The clustering quality of the proposed ant clustering algorithm is compared to the conventional clustering algorithms using 25 text benchmarks in terms of F-measure values. The experimental results indicate that the proposed clustering scheme outperforms the compared conventional and metaheuristic clustering methods for textual documents.

Download Full-text

A Brief Review of Metaheuristics for Document or Text Clustering

Intelligent Techniques for Data Analysis in Diverse Settings - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0075-9.ch012 ◽

2016 ◽

pp. 252-264 ◽

Cited By ~ 1

Author(s):

Sinem Büyüksaatçı ◽

Alp Baray

Keyword(s):

Language Processing ◽

Clustering Algorithms ◽

Document Clustering ◽

Text Clustering ◽

Metaheuristic Algorithms ◽

Research Papers ◽

High Quality ◽

Topic Extraction ◽

Clustering Problem ◽

Research Areas

Document clustering, which involves concepts from the fields of information retrieval, automatic topic extraction, natural language processing, and machine learning, is one of the most popular research areas in data mining. Due to the large amount of information in electronic form, fast and high-quality cluster analysis plays an important role in helping users to effectively navigate, summarize and organise this information for useful data. There are a number of techniques in the literature, which efficiently provide solutions for document clustering. However, during the last decade, researchers started to use metaheuristic algorithms for the document clustering problem because of the limitations of the existing traditional clustering algorithms. In this chapter, the authors will give a brief review of various research papers that present the area of document or text clustering approaches with different metaheuristic algorithms.

Download Full-text

Text Document Clustering using K-Means and Dbscan by using Machine Learning

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2040.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6327-6330

Keyword(s):

Machine Learning ◽

Social Networking ◽

Social Networking Sites ◽

Document Clustering ◽

Similar Type ◽

Data Sets ◽

Text Data ◽

Text Document ◽

Self Organized ◽

Density Based Clustering

With the growth of today’s world, text data is also increasing which are created by different media like social networking sites, web, and other informatics and sources e.t.c . Clustering is an important part of the data mining. Clustering is the procedure of cleave the large &similar type of text into the same group. Clustering is generally used in many applications like medical, biology, signal processing, etc. Algorithm contains traditional clustering like hierarchal clustering, density based clustering and self-organized map clustering. By using kmeans features and dbscan we can able to cluster the document. dbscan a part of clustering shows to a number of standard. The data sets will automatically evaluate the formulation of each and every part data through by the use of dbscan and k-means that will shows the clustering power of the data. document consists of multiple topic. Document clustering demands the context of signifier and form ancestry. Descriptors are the expression used to describe the satisfied inside the cluster.

Download Full-text

Metaheuristics Based Clustering Algorithms on Document Clustering

Journal of Intelligent Systems with Applications ◽

10.54856/jiswa.201905059 ◽

2019 ◽

pp. 39-45

Author(s):

Aytug Onan

Keyword(s):

Cluster Analysis ◽

Optimization Problems ◽

Clustering Algorithms ◽

Document Clustering ◽

Cuckoo Search ◽

Text Documents ◽

Text Document ◽

Analysis Technique ◽

Clustering Quality ◽

Exploratory Data

Cluster analysis is an important exploratory data analysis technique which divides data into groups based on their similarity. Document clustering is the process of employing clustering algorithms on textual data so that text documents can be retrieved, organized, navigated and summarized in an efficient way. Document clustering can be utilized in the organization, summarization and classification of text documents. Metaheuristic algorithms have been successfully utilized to deal with complex optimization problems, including cluster analysis. In this paper, we analyze the clustering quality of five metaheuristic clustering algorithms (namely, particle swarm optimization, genetic algorithm, cuckoo search, firefly algorithm and yarasa algorithm) on fifteen text collections in term of F-measure. In the empirical analysis, two conventional clustering algorithms (K-means and bi-secting k-means) are also considered. The experimental analysis indicates that swarm-based clustering algorithms outperform conventional clustering algorithms on text document clustering.

Download Full-text

DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering

Journal of Information Science ◽

10.1177/0165551518816302 ◽

2018 ◽

Vol 45 (6) ◽

pp. 818-832 ◽

Cited By ~ 4

Author(s):

R Lakshmi ◽

S Baskar

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Weighted Average ◽

Minimum Standard ◽

Data Set ◽

Text Document ◽

Selection For ◽

Number Of Classes ◽

Better Than

In this article, a new initial centroid selection for a K-means document clustering algorithm, namely, Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means (DIC-DOC- K-means), to improve the performance of text document clustering is proposed. The first centroid is the document having the minimum standard deviation of its term frequency. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. For comparing the performance of the proposed DIC-DOC- K-means algorithm, the results of the K-means, K-means++ and weighted average of terms-based initial centroid selection + K-means (Weight_Avg_Initials + K-means) clustering algorithms are considered. The results show that the proposed DIC-DOC- K-means algorithm performs significantly better than the K-means, K-means++ and Weight_Avg_Initials+ K-means clustering algorithms for Reuters-21578 and WebKB with respect to purity, entropy and F-measure for most of the cluster sizes. The cluster sizes used for Reuters-8 are 8, 16, 24 and 32 and those for WebKB are 4, 8, 12 and 16. The results of the proposed DIC-DOC- K-means give a better performance for the number of clusters that are equal to the number of classes in the data set.

Download Full-text

A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis

Engineering Applications of Artificial Intelligence ◽

10.1016/j.engappai.2018.05.003 ◽

2018 ◽

Vol 73 ◽

pp. 111-125 ◽

Cited By ~ 151

Author(s):

Laith Mohammad Abualigah ◽

Ahamad Tajudin Khader ◽

Essam Said Hanandeh

Keyword(s):

Clustering Analysis ◽

Document Clustering ◽

Objective Functions ◽

Krill Herd Algorithm ◽

Text Document ◽

Krill Herd

Download Full-text

Evaluation of text document clustering approach based on particle swarm optimization

Open Computer Science ◽

10.2478/s13537-013-0104-2 ◽

2013 ◽

Vol 3 (2) ◽

Cited By ~ 18

Author(s):

Stuti Karol ◽

Veenu Mangat

Keyword(s):

Data Mining ◽

Particle Swarm Optimization ◽

Clustering Algorithms ◽

Hybrid Approach ◽

Document Clustering ◽

Particle Swarm ◽

Pso Algorithm ◽

Swarm Optimization ◽

Fuzzy C Means ◽

Text Document

AbstractClustering, an extremely important technique in Data Mining is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. The goal is to create clusters that are coherent internally, but substantially different from each other. Text Document Clustering refers to the clustering of related text documents into groups based upon their content. It is a fundamental operation used in unsupervised document organization, text data mining, automatic topic extraction, and information retrieval. Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. The documents to be clustered can be web news articles, abstracts of research papers etc. This paper proposes two techniques for efficient document clustering involving the application of soft computing approach as an intelligent hybrid approach PSO algorithm. The proposed approach involves partitioning Fuzzy C-Means algorithm and K-Means algorithm each hybridized with Particle Swarm Optimization (PSO). The performance of these hybrid algorithms has been evaluated against traditional partitioning techniques (K-Means and Fuzzy C Means).

Download Full-text