MGKA: A genetic algorithm-based clustering technique for genomic data

AbstractThe fuzzy clustering technique is one of the ways of organizing data that presents special patterns using algorithms and based on the similarity level of data. In this study, in order to cluster the resulting data from the Babakoohi Anticline joints, located north of Shiraz, K-means and genetic algorithms are applied. The K-means algorithm is one of the clustering algorithms easily implemented and of fast performance; however, sometimes this algorithm is located in the local optimal trap and cannot respond with an optimal answer, due to the sensitivity of this algorithm to the centers of the primary cluster. In addition, it has some basic disadvantages, such as its inappropriateness for complicated forms and also the dependency of the final result upon the primary cluster. Therefore, in order to perform the study more accurately and to obtain more reliable results, the genetic algorithm is used for categorizing the data of joints of the studied area. Applying this algorithm for leaving the local optimal points is an effective way. The results of clustering of the aforementioned data using the two above techniques represent two clusters in the Babakoohi Anticline. Furthermore, for validity and surveying of the results of the suggested techniques, various mathematical and statistical techniques, including ICC, Vw, VMPC, and VPMBF, are applied, which supports the similarity of the obtained results and the data clustering process in two algorithms.

Download Full-text

A Clustering Genetic Algorithm for Genomic Data Mining

Studies in Computational Intelligence - Foundations of Computational Intelligence Volume 4 ◽

10.1007/978-3-642-01088-0_11 ◽

2009 ◽

pp. 249-275 ◽

Cited By ~ 6

Author(s):

José Juan Tapia ◽

Enrique Morett ◽

Edgar E. Vallejo

Keyword(s):

Data Mining ◽

Genetic Algorithm ◽

Genomic Data ◽

Genomic Data Mining

Download Full-text

A Review on Optimizing Clustering Technique for Data Stream using Genetic Algorithm

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6i9.635637 ◽

2018 ◽

Vol 6 (9) ◽

pp. 635-637 ◽

Cited By ~ 1

Author(s):

Neha Sharma ◽

Pawan Makhija

Keyword(s):

Genetic Algorithm ◽

Data Stream ◽

Clustering Technique

Download Full-text

Effective Fuzzy Ontology Based Distributed Document Using Non-Dominated Ranked Genetic Algorithm

Organizational Efficiency through Intelligent Information Technologies ◽

10.4018/978-1-4666-2047-6.ch015 ◽

2012 ◽

pp. 243-264

Author(s):

M. Thangamani ◽

P. Thangaraj

Keyword(s):

Artificial Intelligence ◽

Data Mining ◽

Genetic Algorithm ◽

Document Clustering ◽

Distributed Environment ◽

Data Set ◽

Clustering Technique ◽

Machine Readable ◽

Readable Format ◽

Machine Readable Format

The increase in the number of documents has aggravated the difficulty of classifying those documents according to specific needs. Clustering analysis in a distributed environment is a thrust area in artificial intelligence and data mining. Its fundamental task is to utilize characters to compute the degree of related corresponding relationship between objects and to accomplish automatic classification without earlier knowledge. Document clustering utilizes clustering technique to gather the documents of high resemblance collectively by computing the documents resemblance. Recent studies have shown that ontologies are useful in improving the performance of document clustering. Ontology is concerned with the conceptualization of a domain into an individual identifiable format and machine-readable format containing entities, attributes, relationships, and axioms. By analyzing types of techniques for document clustering, a better clustering technique depending on Genetic Algorithm (GA) is determined. Non-Dominated Ranked Genetic Algorithm (NRGA) is used in this paper for clustering, which has the capability of providing a better classification result. The experiment is conducted in 20 newsgroups data set for evaluating the proposed technique. The result shows that the proposed approach is very effective in clustering the documents in the distributed environment.

Download Full-text