An Improved Genetic Algorithm for Document Clustering on the Cloud

Author(s):  
Ruksana Akter ◽  
Yoojin Chung

This article presents a modified genetic algorithm for text document clustering on the cloud. Traditional approaches of genetic algorithms in document clustering represents chromosomes based on cluster centroids, and does not divide cluster centroids during crossover operations. This limits the possibility of the algorithm to introduce different variations to the population, leading it to be trapped in local minima. In this approach, a crossover point may be selected even at a position inside a cluster centroid, which allows modifying some cluster centroids. This also guides the algorithm to get rid of the local minima, and find better solutions than the traditional approaches. Moreover, instead of running only one genetic algorithm as done in the traditional approaches, this article partitions the population and runs a genetic algorithm on each of them. This gives an opportunity to simultaneously run different parts of the algorithm on different virtual machines in cloud environments. Experimental results also demonstrate that the accuracy of the proposed approach is at least 4% higher than the other approaches.

2018 ◽  
Vol 8 (4) ◽  
pp. 20-28
Author(s):  
Ruksana Akter ◽  
Yoojin Chung

This article presents a modified genetic algorithm for text document clustering on the cloud. Traditional approaches of genetic algorithms in document clustering represents chromosomes based on cluster centroids, and does not divide cluster centroids during crossover operations. This limits the possibility of the algorithm to introduce different variations to the population, leading it to be trapped in local minima. In this approach, a crossover point may be selected even at a position inside a cluster centroid, which allows modifying some cluster centroids. This also guides the algorithm to get rid of the local minima, and find better solutions than the traditional approaches. Moreover, instead of running only one genetic algorithm as done in the traditional approaches, this article partitions the population and runs a genetic algorithm on each of them. This gives an opportunity to simultaneously run different parts of the algorithm on different virtual machines in cloud environments. Experimental results also demonstrate that the accuracy of the proposed approach is at least 4% higher than the other approaches.


2014 ◽  
Vol 2014 ◽  
pp. 1-11 ◽  
Author(s):  
Hedong Xu ◽  
Jing Zheng ◽  
Ziwei Zhuang ◽  
Suohai Fan

The reconstruction of destroyed paper documents is of more interest during the last years. This topic is relevant to the fields of forensics, investigative sciences, and archeology. Previous research and analysis on the reconstruction of cross-cut shredded text document (RCCSTD) are mainly based on the likelihood and the traditional heuristic algorithm. In this paper, a feature-matching algorithm based on the character recognition via establishing the database of the letters is presented, reconstructing the shredded document by row clustering, intrarow splicing, and interrow splicing. Row clustering is executed through the clustering algorithm according to the clustering vectors of the fragments. Intrarow splicing regarded as the travelling salesman problem is solved by the improved genetic algorithm. Finally, the document is reconstructed by the interrow splicing according to the line spacing and the proximity of the fragments. Computational experiments suggest that the presented algorithm is of high precision and efficiency, and that the algorithm may be useful for the different size of cross-cut shredded text document.


2019 ◽  
Vol 32 (6) ◽  
pp. 1531-1541 ◽  
Author(s):  
Zhou Zhou ◽  
Fangmin Li ◽  
Huaxi Zhu ◽  
Houliang Xie ◽  
Jemal H. Abawajy ◽  
...  

Author(s):  
Sandeep Kumar Bothra ◽  
Sunita Singhal ◽  
Hemlata Goyal

Resource scheduling in a cloud computing environment is noteworthy for scientific workflow execution under a cost-effective deadline constraint. Although various researchers have proposed to resolve this critical issue by applying various meta-heuristic and heuristic approaches, no one is able to meet the strict deadline conditions with load-balanced among machines. This article has proposed an improved genetic algorithm that initializes the population with a greedy strategy. Greedy strategy assigns the task to a virtual machine that is under loaded instead of assigning the tasks randomly to a machine. In general workflow scheduling, task dependency is tested after each crossover and mutation operators of genetic algorithm, but here the authors perform after the mutation operation only which yield better results. The proposed model also considered booting time and performance variation of virtual machines. The authors compared the algorithm with previously developed heuristics and metaheuristics both and found it increases hit rate and load balance. It also reduces execution time and cost.


Sign in / Sign up

Export Citation Format

Share Document