Optimal Text Document Clustering Enabled by Weighed Similarity Oriented Jaya With Grey Wolf Optimization Algorithm

The Computer Journal ◽

10.1093/comjnl/bxab013 ◽

2021 ◽

Author(s):

Gugulothu Venkanna ◽

Dr K F Bharati

Keyword(s):

Optimization Algorithm ◽

Document Clustering ◽

Similarity Index ◽

Scientific Development ◽

Similarity Function ◽

Grey Wolf ◽

Text Document ◽

Clustering Model ◽

Proposed Model ◽

Better Than

Abstract Owing to scientific development, a variety of challenges present in the field of information retrieval. These challenges are because of the increased usage of large volumes of data. These huge amounts of data are presented from large-scale distributed networks. Centralization of these data to carry out analysis is tricky. There exists a requirement for novel text document clustering algorithms, which overcomes challenges in clustering. The two most important challenges in clustering are clustering accuracy and quality. For this reason, this paper intends to present an ideal clustering model for text document using term frequency–inverse document frequency, which is considered as feature sets. Here, the initial centroid selection is much concentrated which can automatically cluster the text using weighted similarity measure in the proposed clustering process. In fact, the weighted similarity function involves the inter-cluster, and intra-cluster similarity of both ordered and unordered documents, which is used to minimize weighted similarity among the documents. An advanced model for clustering is proposed by the hybrid optimization algorithm, which is the combination of the Jaya Algorithm (JA) and Grey Wolf Algorithm (GWO), and so the proposed algorithm is termed as JA-based GWO. Finally, the performance of the proposed model is verified through a comparative analysis with the state-of-the-art models. The performance analysis exhibits that the proposed model is 96.56% better than genetic algorithm, 99.46% better than particle swarm optimization, 97.09% superior to Dragonfly algorithm, and 96.21% better than JA for the similarity index. Therefore, the proposed model has confirmed its efficiency through valuable analysis.

Download Full-text

Grey wolf optimization algorithm for hierarchical document clustering

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v24.i3.pp1744-1758 ◽

2021 ◽

Vol 24 (3) ◽

pp. 1744

Author(s):

Ayad Mohammed Jabbar ◽

Ku Ruhana Ku-Mahamud

Keyword(s):

Document Clustering ◽

Text Clustering ◽

Search Space ◽

Learning Approaches ◽

Grey Wolf ◽

External Evaluation ◽

Grey Wolf Optimization ◽

Noise Data ◽

F Measure ◽

Better Than

In data mining, the application of grey wolf optimization (GWO) algorithm has been used in several learning approaches because of its simplicity in adapting to different application domains. Most recent works that concern unsupervised learning have focused on text clustering, where the GWO algorithm shows promising results. Although GWO has great potential in performing text clustering, it has limitations in dealing with outlier documents and noise data. This research introduces medoid GWO (M-GWO) algorithm, which incorporates a medoid recalculation process to share the information of medoids among the three best wolves and the rest of the population. This improvement aims to find the best set of medoids during the algorithm run and increases the exploitation search to find more local regions in the search space. Experimental results obtained from using well-known algorithms, such as genetic, firefly, GWO, and k-means algorithms, in four benchmarks. The results of external evaluation metrics, such as rand, purity, F-measure, and entropy, indicates that the proposed M-GWO algorithm achieves better document clustering than all other algorithms (i.e., 75% better when using Rand metric, 50% better than all algorithm based on purity metric, 75% better than all algorithms using F-measure metric, and 100% based on entropy metric).

Download Full-text

Hybridization of a Social Spider Optimization Algorithm with Differential Evolution for Text Document Clustering Using Single Cluster Approach

Journal of Advanced Research in Dynamical and Control Systems ◽

10.5373/jardcs/v11sp10/20192853 ◽

2019 ◽

Vol 11 (10-SPECIAL ISSUE) ◽

pp. 642-646

Author(s):

Aasheesh Shukla ◽

Vishal Goyal

Keyword(s):

Differential Evolution ◽

Optimization Algorithm ◽

Document Clustering ◽

Cluster Approach ◽

Social Spider ◽

Text Document ◽

Social Spider Optimization ◽

Single Cluster

Download Full-text

Data Mining K-Means Document Clustering using TFIDF and Word Frequency Count

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.b1718.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 2542-2549

Keyword(s):

Data Mining ◽

Feature Vector ◽

Rapid Development ◽

Document Clustering ◽

Similarity Measures ◽

Second Step ◽

Text Data ◽

Text Document ◽

Proposed Model ◽

Clustering Approach

In the rapid development of www the amount of documents used increases in a rapid speed. This produces huge gigabyte of text document processing. For indexing as well as retrieving the required text document an efficient algorithms produce better performance by achieving good accuracy. The algorithms available in the field of data mining also provide a variety of new innovations regarding data mining. This increases the interest of the researchers to develop many essential models in the field of text data mining. In the proposed model is a two step text document clustering approach by K-Means algorithm. The first step includes Pre_Processing and second step includes clustering process. For Pre-Processing the method performs the tokenization approach. The distinct words are identified and the distinct words frequency of occurrence, TFIDF weights of the occurrences are calculated to form a document feature vector separately. In the clustering phase the feature vector is clustered by performing K-means algorithm by implementing various similarity measures.

Download Full-text

Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering

Mathematics ◽

10.3390/math9161929 ◽

2021 ◽

Vol 9 (16) ◽

pp. 1929

Author(s):

Timea Bezdan ◽

Catalin Stoean ◽

Ahmed Al Naamany ◽

Nebojsa Bacanin ◽

Tarik A. Rashid ◽

...

Keyword(s):

Optimization Algorithm ◽

Document Clustering ◽

Fruit Fly ◽

Text Clustering ◽

Relevant Information ◽

Fruit Fly Optimization Algorithm ◽

Hybrid Swarm ◽

Text Data ◽

Fruit Fly Optimization ◽

Text Document

The fast-growing Internet results in massive amounts of text data. Due to the large volume of the unstructured format of text data, extracting relevant information and its analysis becomes very challenging. Text document clustering is a text-mining process that partitions the set of text-based documents into mutually exclusive clusters in such a way that documents within the same group are similar to each other, while documents from different clusters differ based on the content. One of the biggest challenges in text clustering is partitioning the collection of text data by measuring the relevance of the content in the documents. Addressing this issue, in this work a hybrid swarm intelligence algorithm with a K-means algorithm is proposed for text clustering. First, the hybrid fruit-fly optimization algorithm is tested on ten unconstrained CEC2019 benchmark functions. Next, the proposed method is evaluated on six standard benchmark text datasets. The experimental evaluation on the unconstrained functions, as well as on text-based documents, indicated that the proposed approach is robust and superior to other state-of-the-art methods.

Download Full-text

Biogeography optimization algorithm based next web page prediction using weblog and web content features

IAES International Journal of Artificial Intelligence (IJ-AI) ◽

10.11591/ijai.v9.i2.pp327-335 ◽

2020 ◽

Vol 9 (2) ◽

pp. 327

Author(s):

Roshan Anant Gangurde ◽

Binod Kumar

Keyword(s):

Optimization Algorithm ◽

User Behavior ◽

Web Content ◽

Web Page ◽

Search Queries ◽

Proposed Model ◽

Fitness Value ◽

Immigration And Emigration ◽

User Actions ◽

Better Than

<span lang="EN-US">Recommendation of web page as per users’ interest is a broad and important area of research. Researcher adopts user behavior from actions present in cookies, logs and search queries. This paper has utilized a prior webpage fetching model using web page prediction. For this purpose, web content in form of text and weblog features are analyzed. As per dynamic user behavior, proposed model LWPP-BOA (Logistic Web Page Prediction By Biogeography Optimization Algorithm) predict page by using genetic algorithm. Based on user actions, weblog feature are developed in form of association rules, while web content gives a set of relevant text patterns. Page prediction as per random user behavior is enhanced by means of Biogeography Optimization Algorithm where crossover operation is performed as per immigration and emigration values. Here population updation depends on other parameters of chromosome except fitness value. Experiments are conducted on real dataset having web content and weblogs. Results are compared using precision, coverage, M-Metric, MAE and RMSE parameters and it indicates that the proposed work is better than other approaches already in use.</span>

Download Full-text

DIC-DOC-K-means: Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering

Journal of Information Science ◽

10.1177/0165551518816302 ◽

2018 ◽

Vol 45 (6) ◽

pp. 818-832 ◽

Cited By ~ 4

Author(s):

R Lakshmi ◽

S Baskar

Keyword(s):

Clustering Algorithm ◽

Clustering Algorithms ◽

Document Clustering ◽

Weighted Average ◽

Minimum Standard ◽

Data Set ◽

Text Document ◽

Selection For ◽

Number Of Classes ◽

Better Than

In this article, a new initial centroid selection for a K-means document clustering algorithm, namely, Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means (DIC-DOC- K-means), to improve the performance of text document clustering is proposed. The first centroid is the document having the minimum standard deviation of its term frequency. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. For comparing the performance of the proposed DIC-DOC- K-means algorithm, the results of the K-means, K-means++ and weighted average of terms-based initial centroid selection + K-means (Weight_Avg_Initials + K-means) clustering algorithms are considered. The results show that the proposed DIC-DOC- K-means algorithm performs significantly better than the K-means, K-means++ and Weight_Avg_Initials+ K-means clustering algorithms for Reuters-21578 and WebKB with respect to purity, entropy and F-measure for most of the cluster sizes. The cluster sizes used for Reuters-8 are 8, 16, 24 and 32 and those for WebKB are 4, 8, 12 and 16. The results of the proposed DIC-DOC- K-means give a better performance for the number of clusters that are equal to the number of classes in the data set.

Download Full-text

Patch antenna design optimization using opposition based grey wolf optimizer and map-reduce framework

Data Technologies and Applications ◽

10.1108/dta-06-2019-0084 ◽

2020 ◽

Vol 54 (1) ◽

pp. 103-120

Author(s):

Ramakrishna Guttula ◽

Venkateswara Rao Nandanavanam

Keyword(s):

Patch Antenna ◽

Solution Space ◽

Microstrip Patch Antenna ◽

Superior Performance ◽

Grey Wolf Optimizer ◽

Grey Wolf ◽

Content Type ◽

Microstrip Patch ◽

Proposed Model ◽

Better Than

Purpose Microstrip patch antenna is generally used for several communication purposes particularly in the military and civilian applications. Even though several techniques have been made numerous achievements in several fields, some systems require additional improvements to meet few challenges. Yet, they require application-specific improvement for optimally designing microstrip patch antenna. The paper aims to discuss these issues. Design/methodology/approach This paper intends to adopt an advanced meta-heuristic search algorithm called as grey wolf optimization (GWO), which is said to be inspired by the hunting behaviour of grey wolves, for the design of patch antenna parameters. The searching for the optimal design of the antenna is paced up using the opposition-based solution search. Moreover, the proposed model derives a nonlinear objective model to aid the design of the solution space of antenna parameters. After executing the simulation model, this paper compares the performance of the proposed GWO-based microstrip patch antenna with several conventional models. Findings The gain of the proposed model is 27.05 per cent better than WOAD, 2.07 per cent better than AAD, 15.80 per cent better than GAD, 17.49 per cent better than PSAD and 3.77 per cent better than GWAD model. Thus, it has proved that the proposed antenna model has attained high gain, leads to cause superior performance. Originality/value This paper presents a technique for designing the microstrip patch antenna, using the proposed GWO algorithm. This is the first work utilizes GWO-based optimization for microstrip patch antenna.

Download Full-text

A hybrid approach for text document clustering using Jaya optimization algorithm

Expert Systems with Applications ◽

10.1016/j.eswa.2021.115040 ◽

2021 ◽

Vol 178 ◽

pp. 115040

Author(s):

Karpagalingam Thirumoorthy ◽

Karuppaiah Muneeswaran

Keyword(s):

Optimization Algorithm ◽

Hybrid Approach ◽

Document Clustering ◽

Text Document

Download Full-text

Energy Consumption Estimation of the Electric Bus Based on Grey Wolf Optimization Algorithm and Support Vector Machine Regression

Sustainability ◽

10.3390/su13094689 ◽

2021 ◽

Vol 13 (9) ◽

pp. 4689

Author(s):

Wei Qin ◽

Linhong Wang ◽

Yuhan Liu ◽

Cheng Xu

Keyword(s):

Support Vector Machine ◽

Energy Consumption ◽

Optimization Algorithm ◽

Support Vector ◽

Grey Wolf ◽

Grey Wolf Optimization ◽

Support Vector Machine Regression ◽

Electric Bus ◽

Proposed Model ◽

The Mean

Electric buses have many significant advantages, such as zero emissions and low noise and energy consumption, making them play an important role in saving the operation cost of bus companies and reducing urban traffic pollution emissions. Therefore, in recent years, many cities in the world dedicate to promoting the electrification of public transport vehicles. Whereas due to the limitation of on-board battery capacity, the driving range of electric buses is relatively short. The accurate estimation of energy consumption on the electric bus routes is the premise of conducting bus scheduling and optimizing the layout of charging facilities. This study collected the actual operation data of three electric bus routes in Meihekou City, China, and established the support vector machine regression (SVR) model by taking the state of charge (SOC), trip travel time, mean environment temperature and air-conditioning operation time as the independent variables; while the energy consumptions of the route operations served as the dependent variables. Furthermore, the grey wolf optimization (GWO) algorithm was adopted to select the optimal parameters of the proposed model. Finally, a support vector machine regression model based on the grey wolf optimization algorithm (GWO-SVR) is proposed. Three real bus lines were taken as examples to validate the model. The results show that the mean average percentage error is 14.47% and the mean average error is 0.7776. In addition, the estimation accuracy and training time of the proposed model are superior to the genetic algorithm-back propagation neural network model and grid-search support vector machine regression model.

Download Full-text

Comparision of Different Distance Measure Methods in Text Document Clustering

INTERNATIONAL JOURNAL OF RESEARCH AND ENGINEERING ◽

10.21276/ijre.2018.5.7.2 ◽

2018 ◽

Vol 5 (7) ◽

Author(s):

Yin Min Tun ◽

Keyword(s):

Distance Measure ◽

Document Clustering ◽

Text Document ◽

Measure Methods

Download Full-text