Optimal Text Document Clustering Enabled by Weighed Similarity Oriented Jaya With Grey Wolf Optimization Algorithm

2021 ◽  
Author(s):  
Gugulothu Venkanna ◽  
Dr K F Bharati

Abstract Owing to scientific development, a variety of challenges present in the field of information retrieval. These challenges are because of the increased usage of large volumes of data. These huge amounts of data are presented from large-scale distributed networks. Centralization of these data to carry out analysis is tricky. There exists a requirement for novel text document clustering algorithms, which overcomes challenges in clustering. The two most important challenges in clustering are clustering accuracy and quality. For this reason, this paper intends to present an ideal clustering model for text document using term frequency–inverse document frequency, which is considered as feature sets. Here, the initial centroid selection is much concentrated which can automatically cluster the text using weighted similarity measure in the proposed clustering process. In fact, the weighted similarity function involves the inter-cluster, and intra-cluster similarity of both ordered and unordered documents, which is used to minimize weighted similarity among the documents. An advanced model for clustering is proposed by the hybrid optimization algorithm, which is the combination of the Jaya Algorithm (JA) and Grey Wolf Algorithm (GWO), and so the proposed algorithm is termed as JA-based GWO. Finally, the performance of the proposed model is verified through a comparative analysis with the state-of-the-art models. The performance analysis exhibits that the proposed model is 96.56% better than genetic algorithm, 99.46% better than particle swarm optimization, 97.09% superior to Dragonfly algorithm, and 96.21% better than JA for the similarity index. Therefore, the proposed model has confirmed its efficiency through valuable analysis.

Author(s):  
Ayad Mohammed Jabbar ◽  
Ku Ruhana Ku-Mahamud

In data mining, the application of grey wolf optimization (GWO) algorithm has been used in several learning approaches because of its simplicity in adapting to different application domains. Most recent works that concern unsupervised learning have focused on text clustering, where the GWO algorithm shows promising results. Although GWO has great potential in performing text clustering, it has limitations in dealing with outlier documents and noise data. This research introduces medoid GWO (M-GWO) algorithm, which incorporates a medoid recalculation process to share the information of medoids among the three best wolves and the rest of the population. This improvement aims to find the best set of medoids during the algorithm run and increases the exploitation search to find more local regions in the search space. Experimental results obtained from using well-known algorithms, such as genetic, firefly, GWO, and k-means algorithms, in four benchmarks. The results of external evaluation metrics, such as rand, purity, F-measure, and entropy, indicates that the proposed M-GWO algorithm achieves better document clustering than all other algorithms (i.e., 75% better when using Rand metric, 50% better than all algorithm based on purity metric, 75% better than all algorithms using F-measure metric, and 100% based on entropy metric).


2019 ◽  
Vol 8 (2) ◽  
pp. 2542-2549

In the rapid development of www the amount of documents used increases in a rapid speed. This produces huge gigabyte of text document processing. For indexing as well as retrieving the required text document an efficient algorithms produce better performance by achieving good accuracy. The algorithms available in the field of data mining also provide a variety of new innovations regarding data mining. This increases the interest of the researchers to develop many essential models in the field of text data mining. In the proposed model is a two step text document clustering approach by K-Means algorithm. The first step includes Pre_Processing and second step includes clustering process. For Pre-Processing the method performs the tokenization approach. The distinct words are identified and the distinct words frequency of occurrence, TFIDF weights of the occurrences are calculated to form a document feature vector separately. In the clustering phase the feature vector is clustered by performing K-means algorithm by implementing various similarity measures.


Mathematics ◽  
2021 ◽  
Vol 9 (16) ◽  
pp. 1929
Author(s):  
Timea Bezdan ◽  
Catalin Stoean ◽  
Ahmed Al Naamany ◽  
Nebojsa Bacanin ◽  
Tarik A. Rashid ◽  
...  

The fast-growing Internet results in massive amounts of text data. Due to the large volume of the unstructured format of text data, extracting relevant information and its analysis becomes very challenging. Text document clustering is a text-mining process that partitions the set of text-based documents into mutually exclusive clusters in such a way that documents within the same group are similar to each other, while documents from different clusters differ based on the content. One of the biggest challenges in text clustering is partitioning the collection of text data by measuring the relevance of the content in the documents. Addressing this issue, in this work a hybrid swarm intelligence algorithm with a K-means algorithm is proposed for text clustering. First, the hybrid fruit-fly optimization algorithm is tested on ten unconstrained CEC2019 benchmark functions. Next, the proposed method is evaluated on six standard benchmark text datasets. The experimental evaluation on the unconstrained functions, as well as on text-based documents, indicated that the proposed approach is robust and superior to other state-of-the-art methods.


Author(s):  
Roshan Anant Gangurde ◽  
Binod Kumar

<span lang="EN-US">Recommendation of web page as per users’ interest is a broad and important area of research. Researcher adopts user behavior from actions present in cookies, logs and search queries. This paper has utilized a prior webpage fetching model using web page prediction. For this purpose, web content in form of text and weblog features are analyzed. As per dynamic user behavior, proposed model LWPP-BOA (Logistic Web Page Prediction By Biogeography Optimization Algorithm) predict page by using genetic algorithm. Based on user actions, weblog feature are developed in form of association rules, while web content gives a set of relevant text patterns. Page prediction as per random user behavior is enhanced by means of Biogeography Optimization Algorithm where crossover operation is performed as per immigration and emigration values. Here population updation depends on other parameters of chromosome except fitness value. Experiments are conducted on real dataset having web content and weblogs. Results are compared using precision, coverage, M-Metric, MAE and RMSE parameters and it indicates that the proposed work is better than other approaches already in use.</span>


2018 ◽  
Vol 45 (6) ◽  
pp. 818-832 ◽  
Author(s):  
R Lakshmi ◽  
S Baskar

In this article, a new initial centroid selection for a K-means document clustering algorithm, namely, Dissimilarity-based Initial Centroid selection for DOCument clustering using K-means (DIC-DOC- K-means), to improve the performance of text document clustering is proposed. The first centroid is the document having the minimum standard deviation of its term frequency. Each of the other subsequent centroids is selected based on the dissimilarities of the previously selected centroids. For comparing the performance of the proposed DIC-DOC- K-means algorithm, the results of the K-means, K-means++ and weighted average of terms-based initial centroid selection +  K-means (Weight_Avg_Initials +  K-means) clustering algorithms are considered. The results show that the proposed DIC-DOC- K-means algorithm performs significantly better than the K-means, K-means++ and Weight_Avg_Initials+  K-means clustering algorithms for Reuters-21578 and WebKB with respect to purity, entropy and F-measure for most of the cluster sizes. The cluster sizes used for Reuters-8 are 8, 16, 24 and 32 and those for WebKB are 4, 8, 12 and 16. The results of the proposed DIC-DOC- K-means give a better performance for the number of clusters that are equal to the number of classes in the data set.


2020 ◽  
Vol 54 (1) ◽  
pp. 103-120
Author(s):  
Ramakrishna Guttula ◽  
Venkateswara Rao Nandanavanam

Purpose Microstrip patch antenna is generally used for several communication purposes particularly in the military and civilian applications. Even though several techniques have been made numerous achievements in several fields, some systems require additional improvements to meet few challenges. Yet, they require application-specific improvement for optimally designing microstrip patch antenna. The paper aims to discuss these issues. Design/methodology/approach This paper intends to adopt an advanced meta-heuristic search algorithm called as grey wolf optimization (GWO), which is said to be inspired by the hunting behaviour of grey wolves, for the design of patch antenna parameters. The searching for the optimal design of the antenna is paced up using the opposition-based solution search. Moreover, the proposed model derives a nonlinear objective model to aid the design of the solution space of antenna parameters. After executing the simulation model, this paper compares the performance of the proposed GWO-based microstrip patch antenna with several conventional models. Findings The gain of the proposed model is 27.05 per cent better than WOAD, 2.07 per cent better than AAD, 15.80 per cent better than GAD, 17.49 per cent better than PSAD and 3.77 per cent better than GWAD model. Thus, it has proved that the proposed antenna model has attained high gain, leads to cause superior performance. Originality/value This paper presents a technique for designing the microstrip patch antenna, using the proposed GWO algorithm. This is the first work utilizes GWO-based optimization for microstrip patch antenna.


2021 ◽  
Vol 13 (9) ◽  
pp. 4689
Author(s):  
Wei Qin ◽  
Linhong Wang ◽  
Yuhan Liu ◽  
Cheng Xu

Electric buses have many significant advantages, such as zero emissions and low noise and energy consumption, making them play an important role in saving the operation cost of bus companies and reducing urban traffic pollution emissions. Therefore, in recent years, many cities in the world dedicate to promoting the electrification of public transport vehicles. Whereas due to the limitation of on-board battery capacity, the driving range of electric buses is relatively short. The accurate estimation of energy consumption on the electric bus routes is the premise of conducting bus scheduling and optimizing the layout of charging facilities. This study collected the actual operation data of three electric bus routes in Meihekou City, China, and established the support vector machine regression (SVR) model by taking the state of charge (SOC), trip travel time, mean environment temperature and air-conditioning operation time as the independent variables; while the energy consumptions of the route operations served as the dependent variables. Furthermore, the grey wolf optimization (GWO) algorithm was adopted to select the optimal parameters of the proposed model. Finally, a support vector machine regression model based on the grey wolf optimization algorithm (GWO-SVR) is proposed. Three real bus lines were taken as examples to validate the model. The results show that the mean average percentage error is 14.47% and the mean average error is 0.7776. In addition, the estimation accuracy and training time of the proposed model are superior to the genetic algorithm-back propagation neural network model and grid-search support vector machine regression model.


Sign in / Sign up

Export Citation Format

Share Document