Clustering Mixed Datasets Using K-Prototype Algorithm Based on Crow-Search Optimization

Author(s):  
Lakshmi K. ◽  
Karthikeyani Visalakshi N. ◽  
Shanthi S. ◽  
Parvathavarthini S.

Data mining techniques are useful to discover the interesting knowledge from the large amount of data objects. Clustering is one of the data mining techniques for knowledge discovery and it is the unsupervised learning method and it analyses the data objects without knowing class labels. The k-prototype is the most widely-used partitional clustering algorithm for clustering the data objects with mixed numeric and categorical type of data. This algorithm provides the local optimum solution due to its selection of initial prototypes randomly. Recently, there are number of optimization algorithms are introduced to obtain the global optimum solution. The Crow Search algorithm is one the recently developed population based meta-heuristic optimization algorithm. This algorithm is based on the intelligent behavior of the crows. In this paper, k-prototype clustering algorithm is integrated with the Crow Search optimization algorithm to produce the global optimum solution.

Author(s):  
Mamta Mittal ◽  
R. K. Sharma ◽  
V.P. Singh ◽  
Lalit Mohan Goyal

Clustering is one of the data mining techniques that investigates these data resources for hidden patterns. Many clustering algorithms are available in literature. This chapter emphasizes on partitioning based methods and is an attempt towards developing clustering algorithms that can efficiently detect clusters. In partitioning based methods, k-means and single pass clustering are popular clustering algorithms but they have several limitations. To overcome the limitations of these algorithms, a Modified Single Pass Clustering (MSPC) algorithm has been proposed in this work. It revolves around the proposition of a threshold similarity value. This is not a user defined parameter; instead, it is a function of data objects left to be clustered. In our experiments, this threshold similarity value is taken as median of the paired distance of all data objects left to be clustered. To assess the performance of MSPC algorithm, five experiments for k-means, SPC and MSPC algorithms have been carried out on artificial and real datasets.


2017 ◽  
Vol 7 (1.3) ◽  
pp. 37
Author(s):  
Joy Christy A.

Data mining refers to the extraction of meaningful knowledge from large data sources as it may contain hidden potential facts. In general the analysis of data mining can either be predictive or descriptive. Predictive analysis of data mining interprets the inference of the existing results so as to identify the future outputs and the descriptive analysis of data mining interprets the intrinsic characteristics or nature of the data. Clustering is one of the descriptive analysis techniques of data mining which groups the objects of similar types in such a way that objects in a cluster are closer to each other than the objects of other clusters.  K-means is the most popular and widely used clustering algorithm that starts by selecting the k-random initial centroids as equal to number of clusters given by the user. It then computes the distance between initial centroids with the remaining data objects and groups the data objects into the cluster centroids with minimum distance. This process is repeated until there is no change in the cluster centroids or cluster members. But, still k-means has been suffered from several issues such as optimum number of k, random initial centroids, unknown number of iterations, global optimum solutions of clusters and more importantly the creation of meaningful clusters when dealing with the analysis of datasets from various domains. The accuracy involved with clustering should never be compromised. Thus, in this paper, a novel classification via clustering algorithm called Iterative Linear Regression Clustering with Percentage Split Distribution (ILRCPSD) is introduced as an alternate solution to the problems encountered in traditional clustering algorithms. The proposed algorithm is examined over an educational dataset to identify the hidden group of students having similar cognitive and competency skills.  The performance of the proposed algorithm is well-compared with the accuracy of the traditional k-means clustering in terms of building meaningful clusters and to prove its real time usefulness.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


2016 ◽  
pp. 450-475
Author(s):  
Dipti Singh ◽  
Kusum Deep

Due to their wide applicability and easy implementation, Genetic algorithms (GAs) are preferred to solve many optimization problems over other techniques. When a local search (LS) has been included in Genetic algorithms, it is known as Memetic algorithms. In this chapter, a new variant of single-meme Memetic Algorithm is proposed to improve the efficiency of GA. Though GAs are efficient at finding the global optimum solution of nonlinear optimization problems but usually converge slow and sometimes arrive at premature convergence. On the other hand, LS algorithms are fast but are poor global searchers. To exploit the good qualities of both techniques, they are combined in a way that maximum benefits of both the approaches are reaped. It lets the population of individuals evolve using GA and then applies LS to get the optimal solution. To validate our claims, it is tested on five benchmark problems of dimension 10, 30 and 50 and a comparison between GA and MA has been made.


Author(s):  
Bo-Suk Yang

This chapter describes a hybrid artificial life optimization algorithm (ALRT) based on emergent colonization to compute the solutions of global function optimization problem. In the ALRT, the emergent colony is a fundamental mechanism to search the optimum solution and can be accomplished through the metabolism, movement and reproduction among artificial organisms which appear at the optimum locations in the artificial world. In this case, the optimum locations mean the optimum solutions in the optimization problem. Hence, the ALRT focuses on the searching for the optimum solution in the location of emergent colonies and can achieve more accurate global optimum. The optimization results using different types of test functions are presented to demonstrate the described approach successfully achieves optimum performance. The algorithm is also applied to the test function optimization and optimum design of short journal bearing as a practical application. The optimized results are compared with those of genetic algorithm and successive quadratic programming to identify the optimizing ability.


2015 ◽  
Vol 2015 ◽  
pp. 1-16 ◽  
Author(s):  
Lijin Wang ◽  
Yiwen Zhong ◽  
Yilong Yin ◽  
Wenting Zhao ◽  
Binqing Wang ◽  
...  

The backtracking search optimization algorithm (BSA) is a new nature-inspired method which possesses a memory to take advantage of experiences gained from previous generation to guide the population to the global optimum. BSA is capable of solving multimodal problems, but it slowly converges and poorly exploits solution. The differential evolution (DE) algorithm is a robust evolutionary algorithm and has a fast convergence speed in the case of exploitive mutation strategies that utilize the information of the best solution found so far. In this paper, we propose a hybrid backtracking search optimization algorithm with differential evolution, called HBD. In HBD, DE with exploitive strategy is used to accelerate the convergence by optimizing one worse individual according to its probability at each iteration process. A suit of 28 benchmark functions are employed to verify the performance of HBD, and the results show the improvement in effectiveness and efficiency of hybridization of BSA and DE.


2017 ◽  
Vol 18 (4) ◽  
pp. 1484-1496 ◽  
Author(s):  
Afshin Mansouri ◽  
Babak Aminnejad ◽  
Hassan Ahmadi

Abstract In the current study, modified version of the penguins search optimization algorithm (PeSOA) was introduced, and its usage was assessed in the water resources field. In the modified version (MPeSOA), the Gaussian exploration was added to the algorithm. The MPeSOA performance was evaluated in optimal operation of a hypothetical four-reservoir system and Karun-4 reservoir as a real world problem. Also, genetic algorithm (GA) was used as a criterion for evaluating the performance of PeSOA and MPeSOA. The results revealed that in a four-reservoir system problem, the PeSOA performance was much weaker than the GA; but on the other hand, the MPeSOA had better performance than the GA. In the mentioned problem, PeSOA, GA, and MPeSOA reached 78.43, 97.46, and 98.30% of the global optimum, respectively. In the operation of Karun-4 reservoir, although PeSOA performance had less difference with the two other algorithms than four-reservoir problem, its performance was not acceptable. The average values of objective function in this case were equal to 26.49, 23.84, and 21.48 for PeSOA, GA, and MPeSOA, respectively. According to the results obtained in the operation of Karun-4 reservoir, the algorithms including MPeSOA, GA, and PeSOA were situated in ranks one to three in terms of efficiency, respectively.


Author(s):  
Masao Arakawa ◽  
Tomoyuki Miyashita ◽  
Hiroshi Ishikawa

In some cases of developing a new product, response surface of an objective function is not always single peaked function, and it is often multi-peaked function. In that case, designers would like to have not oniy global optimum solution but also as many local optimum solutions and/or quasi-optimum solutions as possible, so that he or she can select one out of them considering the other conditions that are not taken into account priori to optimization. Although this information is quite useful, it is not that easy to obtain with a single trial of optimization. In this study, we will propose a screening of fitness function in genetic algorithms (GA). Which change fitness function during searching. Therefore, GA needs to have higher flexibility in searching. Genetic Range Genetic Algorithms include a number of searching range in a single generation. Just like there are a number of species in wild life. Therefore, it can arrange to have both global searching range and also local searching range with different fitness function. In this paper, we demonstrate the effectiveness of the proposed method through a simple benchmark test problems.


2015 ◽  
Vol 713-715 ◽  
pp. 1491-1494 ◽  
Author(s):  
Zhi Qiang Gao ◽  
Li Xia Liu ◽  
Wei Wei Kong ◽  
Xiao Hong Wang

A novel composite framework of Cuckoo Search (CS) and Particle Swarm Optimization (PSO) algorithm called CS-PSO is proposed in this paper. In CS-PSO, initialization is substituted by chaotic system, and then Cuckoo shares optimums in the global best solutions pool with particles in PSO to improve parallel cooperation and social interaction. Furthermore, Cloud Model, famous for its outstanding characteristics of the process of transforming qualitative concepts to a set of quantitative numerical values, is adopted to exploit the surrounding of the local solutions obtained from the global best solution pool. Benchmark test results show that, CS-PSO can converge to the global optimum solution rapidly and accurately, compared with other algorithms, especially in high dimensional problems.


2021 ◽  
Vol 36 (1) ◽  
pp. 35-40
Author(s):  
Shanshan Tu ◽  
Obaid Rehman ◽  
Sadaqat Rehman ◽  
Shafi Khan ◽  
Muhammad Waqas ◽  
...  

Particle swarm optimizer is one of the searched based stochastic technique that has a weakness of being trapped into local optima. Thus, to tradeoff between the local and global searches and to avoid premature convergence in PSO, a new dynamic quantum-based particle swarm optimization (DQPSO) method is proposed in this work. In the proposed method a beta probability distribution technique is used to mutate the particle with the global best position of the swarm. The proposed method can ensure the particles to escape from local optima and will achieve the global optimum solution more easily. Also, to enhance the global searching capability of the proposed method, a dynamic updated formula is proposed that will keep a good balance between the local and global searches. To evaluate the merit and efficiency of the proposed DQPSO method, it has been tested on some well-known mathematical test functions and a standard benchmark problem known as Loney’s solenoid design.


Sign in / Sign up

Export Citation Format

Share Document