A hybrid approach for text document clustering using Jaya optimization algorithm

2021 ◽  
Vol 178 ◽  
pp. 115040
Author(s):  
Karpagalingam Thirumoorthy ◽  
Karuppaiah Muneeswaran
Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1929
Author(s):  
Timea Bezdan ◽  
Catalin Stoean ◽  
Ahmed Al Naamany ◽  
Nebojsa Bacanin ◽  
Tarik A. Rashid ◽  
...  

The fast-growing Internet results in massive amounts of text data. Due to the large volume of the unstructured format of text data, extracting relevant information and its analysis becomes very challenging. Text document clustering is a text-mining process that partitions the set of text-based documents into mutually exclusive clusters in such a way that documents within the same group are similar to each other, while documents from different clusters differ based on the content. One of the biggest challenges in text clustering is partitioning the collection of text data by measuring the relevance of the content in the documents. Addressing this issue, in this work a hybrid swarm intelligence algorithm with a K-means algorithm is proposed for text clustering. First, the hybrid fruit-fly optimization algorithm is tested on ten unconstrained CEC2019 benchmark functions. Next, the proposed method is evaluated on six standard benchmark text datasets. The experimental evaluation on the unconstrained functions, as well as on text-based documents, indicated that the proposed approach is robust and superior to other state-of-the-art methods.


2021 ◽  
Author(s):  
Gugulothu Venkanna ◽  
Dr K F Bharati

Abstract Owing to scientific development, a variety of challenges present in the field of information retrieval. These challenges are because of the increased usage of large volumes of data. These huge amounts of data are presented from large-scale distributed networks. Centralization of these data to carry out analysis is tricky. There exists a requirement for novel text document clustering algorithms, which overcomes challenges in clustering. The two most important challenges in clustering are clustering accuracy and quality. For this reason, this paper intends to present an ideal clustering model for text document using term frequency–inverse document frequency, which is considered as feature sets. Here, the initial centroid selection is much concentrated which can automatically cluster the text using weighted similarity measure in the proposed clustering process. In fact, the weighted similarity function involves the inter-cluster, and intra-cluster similarity of both ordered and unordered documents, which is used to minimize weighted similarity among the documents. An advanced model for clustering is proposed by the hybrid optimization algorithm, which is the combination of the Jaya Algorithm (JA) and Grey Wolf Algorithm (GWO), and so the proposed algorithm is termed as JA-based GWO. Finally, the performance of the proposed model is verified through a comparative analysis with the state-of-the-art models. The performance analysis exhibits that the proposed model is 96.56% better than genetic algorithm, 99.46% better than particle swarm optimization, 97.09% superior to Dragonfly algorithm, and 96.21% better than JA for the similarity index. Therefore, the proposed model has confirmed its efficiency through valuable analysis.


2013 ◽  
Vol 3 (2) ◽  
Author(s):  
Stuti Karol ◽  
Veenu Mangat

AbstractClustering, an extremely important technique in Data Mining is an automatic learning technique aimed at grouping a set of objects into subsets or clusters. The goal is to create clusters that are coherent internally, but substantially different from each other. Text Document Clustering refers to the clustering of related text documents into groups based upon their content. It is a fundamental operation used in unsupervised document organization, text data mining, automatic topic extraction, and information retrieval. Fast and high-quality document clustering algorithms play an important role in effectively navigating, summarizing, and organizing information. The documents to be clustered can be web news articles, abstracts of research papers etc. This paper proposes two techniques for efficient document clustering involving the application of soft computing approach as an intelligent hybrid approach PSO algorithm. The proposed approach involves partitioning Fuzzy C-Means algorithm and K-Means algorithm each hybridized with Particle Swarm Optimization (PSO). The performance of these hybrid algorithms has been evaluated against traditional partitioning techniques (K-Means and Fuzzy C Means).


Author(s):  
Laith Mohammad Abualigah ◽  
Essam Said Hanandeh ◽  
Ahamad Tajudin Khader ◽  
Mohammed Abdallh Otair ◽  
Shishir Kumar Shandilya

Background: Considering the increasing volume of text document information on Internet pages, dealing with such a tremendous amount of knowledge becomes totally complex due to its large size. Text clustering is a common optimization problem used to manage a large amount of text information into a subset of comparable and coherent clusters. Aims: This paper presents a novel local clustering technique, namely, β-hill climbing, to solve the problem of the text document clustering through modeling the β-hill climbing technique for partitioning the similar documents into the same cluster. Methods: The β parameter is the primary innovation in β-hill climbing technique. It has been introduced in order to perform a balance between local and global search. Local search methods are successfully applied to solve the problem of the text document clustering such as; k-medoid and kmean techniques. Results: Experiments were conducted on eight benchmark standard text datasets with different characteristics taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed β-hill climbing achieved better results in comparison with the original hill climbing technique in solving the text clustering problem. Conclusion: The performance of the text clustering is useful by adding the β operator to the hill climbing.


Atmosphere ◽  
2021 ◽  
Vol 12 (1) ◽  
pp. 64
Author(s):  
Feng Jiang ◽  
Yaqian Qiao ◽  
Xuchu Jiang ◽  
Tianhai Tian

The randomness, nonstationarity and irregularity of air pollutant data bring difficulties to forecasting. To improve the forecast accuracy, we propose a novel hybrid approach based on two-stage decomposition embedded sample entropy, group teaching optimization algorithm (GTOA), and extreme learning machine (ELM) to forecast the concentration of particulate matter (PM10 and PM2.5). First, the improvement complementary ensemble empirical mode decomposition with adaptive noise (ICEEMDAN) is employed to decompose the concentration data of PM10 and PM2.5 into a set of intrinsic mode functions (IMFs) with different frequencies. In addition, wavelet transform (WT) is utilized to decompose the IMFs with high frequency based on sample entropy values. Then the GTOA algorithm is used to optimize ELM. Furthermore, the GTOA-ELM is utilized to predict all the subseries. The final forecast result is obtained by ensemble of the forecast results of all subseries. To further prove the predictable performance of the hybrid approach on air pollutants, the hourly concentration data of PM2.5 and PM10 are used to make one-step-, two-step- and three-step-ahead predictions. The empirical results demonstrate that the hybrid ICEEMDAN-WT-GTOA-ELM approach has superior forecasting performance and stability over other methods. This novel method also provides an effective and efficient approach to make predictions for nonlinear, nonstationary and irregular data.


Sign in / Sign up

Export Citation Format

Share Document