A Novel Chaotic Northern Bald Ibis Optimization Algorithm for Solving Different Cluster Problems [ICCICC18 #155]

Author(s):  
Ravi Kumar Saidala ◽  
Nagaraju Devarakonda

This article proposes a new optimal data clustering method for finding optimal clusters of data by incorporating chaotic maps into the standard NOA. NOA, a newly developed optimization technique, has been shown to be efficient in generating optimal results with lowest solution cost. The incorporation of chaotic maps into metaheuristics enables algorithms to diversify the solution space into two phases: explore and exploit more. To make the NOA more efficient and avoid premature convergence, chaotic maps are incorporated in this work, termed as CNOAs. Ten different chaotic maps are incorporated individually into standard NOA for testing the optimization performance. The CNOA is first benchmarked on 23 standard functions. Secondly, testing was done on the numerical complexity of the new clustering method which utilizes CNOA, by solving 10 UCI data cluster problems and 4 web document cluster problems. The comparisons have been made with the help of obtaining statistical and graphical results. The superiority of the proposed optimal clustering algorithm is evident from the simulations and comparisons.

Author(s):  
Ravi Kumar Saidala

Clustering, one of the most attractive data analysis concepts in data mining, are frequently used by many researchers for analysing data of variety of real-world applications. It is stated in the literature that traditional clustering methods are trapped in local optima and fail to obtain optimal clusters. This research work gives the design and development of an advanced optimum clustering method for unmasking abnormal entries in the clinical dataset. The basis is the NOA, a recently proposed algorithm, driven by mimicking the migration pattern of Northern Bald Ibis (Threskiornithidae) birds. First, we developed the variant of the standard NOA by replacing C1 and C2 parameters of NOA with chaotic maps turning it into the VNOA. Later, we utilized the VNOA in the design of a new and advanced clustering method. VNOA is first benchmarked on a 7 unimodal (F1–F7) and 6 multimodal (F8–F13) mathematical functions. We tested the numerical complexity of proposed VNOA-based clustering methods on a clinical dataset. We then compared the obtained graphical and statistical results with well-known algorithms. The superiority of the presented clustering method is evidenced from the simulations and comparisons.


Author(s):  
Zhang Xiaodan ◽  
Hu Xiaohua ◽  
Xia Jiali ◽  
Zhou Xiaohua ◽  
Achananuparp Palakorn

In this article, we present a graph-based knowledge representation for biomedical digital library literature clustering. An efficient clustering method is developed to identify the ontology-enriched k-highest density term subgraphs that capture the core semantic relationship information about each document cluster. The distance between each document and the k term graph clusters is calculated. A document is then assigned to the closest term cluster. The extensive experimental results on two PubMed document sets (Disease10 and OHSUMED23) show that our approach is comparable to spherical k-means. The contributions of our approach are the following: (1) we provide two corpus-level graph representations to improve document clustering, a term co-occurrence graph and an abstract-title graph; (2) we develop an efficient and effective document clustering algorithm by identifying k distinguishable class-specific core term subgraphs using terms’ global and local importance information; and (3) the identified term clusters give a meaningful explanation for the document clustering results.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Author(s):  
Ana Belén Ramos-Guajardo

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.


2020 ◽  
Vol 16 (4) ◽  
pp. 15-29
Author(s):  
Jayalakshmi D. ◽  
Dheeba J.

The incidence of skin cancer has been increasing in recent years and it can become dangerous if not detected early. Computer-aided diagnosis systems can help the dermatologists in assisting with skin cancer detection by examining the features more critically. In this article, a detailed review of pre-processing and segmentation methods is done on skin lesion images by investigating existing and prevalent segmentation methods for the diagnosis of skin cancer. The pre-processing stage is divided into two phases, in the first phase, a median filter is used to remove the artifact; and in the second phase, an improved K-means clustering with outlier removal (KMOR) algorithm is suggested. The proposed method was tested in a publicly available Danderm database. The improved cluster-based algorithm gives an accuracy of 92.8% with a sensitivity of 93% and specificity of 90% with an AUC value of 0.90435. From the experimental results, it is evident that the clustering algorithm has performed well in detecting the border of the lesion and is suitable for pre-processing dermoscopic images.


2013 ◽  
Vol 321-324 ◽  
pp. 1939-1942
Author(s):  
Lei Gu

The locality sensitive k-means clustering method has been presented recently. Although this approach can improve the clustering accuracies, it often gains the unstable clustering results because some random samples are employed for the initial centers. In this paper, an initialization method based on the core clusters is used for the locality sensitive k-means clustering. The core clusters can be formed by constructing the σ-neighborhood graph and their centers are regarded as the initial centers of the locality sensitive k-means clustering. To investigate the effectiveness of our approach, several experiments are done on three datasets. Experimental results show that our proposed method can improve the clustering performance compared to the previous locality sensitive k-means clustering.


Complexity ◽  
2018 ◽  
Vol 2018 ◽  
pp. 1-16 ◽  
Author(s):  
Yiwen Zhang ◽  
Yuanyuan Zhou ◽  
Xing Guo ◽  
Jintao Wu ◽  
Qiang He ◽  
...  

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.


2018 ◽  
Vol 35 (9) ◽  
pp. 2052-2079 ◽  
Author(s):  
Umamaheswari E. ◽  
Ganesan S. ◽  
Abirami M. ◽  
Subramanian S.

Purpose Finding the optimal maintenance schedules is the primitive aim of preventive maintenance scheduling (PMS) problem dealing with the objectives of reliability, risk and cost. Most of the earlier works in the literature have focused on PMS with the objectives of leveling reserves/risk/cost independently. Nevertheless, very few publications in the current literature tackle the multi-objective PMS model with simultaneous optimization of reliability, and economic perspectives. Since, the PMS problem is highly nonlinear and complex in nature, an appropriate optimization technique is necessary to solve the problem in hand. The paper aims to discuss these issues. Design/methodology/approach The complexity of the PMS problem in power systems necessitates a simple and robust optimization tool. This paper employs the modern meta-heuristic algorithm, namely, Ant Lion Optimizer (ALO) to obtain the optimal maintenance schedules for the PMS problem. In order to extract best compromise solution in the multi-objective solution space (reliability, risk and cost), a fuzzy decision-making mechanism is incorporated with ALO (FDMALO) for solving PMS. Findings As a first attempt, the best feasible maintenance schedules are obtained for PMS problem using FDMALO in the multi-objective solution space. The statistical measures are computed for the test systems which are compared with various meta-heuristic algorithms. The applicability of the algorithm for PMS problem is validated through statistical t-test. The statistical comparison and the t-test results reveal the superiority of ALO in achieving improved solution quality. The numerical and statistical results are encouraging and indicate the viability of the proposed ALO technique. Originality/value As a maiden attempt, FDMALO is used to solve the multi-objective PMS problem. This paper fills the gap in the literature by solving the PMS problem in the multi-objective framework, with the improved quality of the statistical indices.


Author(s):  
J. W. Li ◽  
X. Q. Han ◽  
J. W. Jiang ◽  
Y. Hu ◽  
L. Liu

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.


Author(s):  
Muhamad Alias Md. Jedi ◽  
Robiah Adnan

TCLUST is a method in statistical clustering technique which is based on modification of trimmed k-means clustering algorithm. It is called “crisp” clustering approach because the observation is can be eliminated or assigned to a group. TCLUST strengthen the group assignment by putting constraint to the cluster scatter matrix. The emphasis in this paper is to restrict on the eigenvalues, λ of the scatter matrix. The idea of imposing constraints is to maximize the log-likelihood function of spurious-outlier model. A review of different robust clustering approach is presented as a comparison to TCLUST methods. This paper will discuss the nature of TCLUST algorithm and how to determine the number of cluster or group properly and measure the strength of group assignment. At the end of this paper, R-package on TCLUST implement the types of scatter restriction, making the algorithm to be more flexible for choosing the number of clusters and the trimming proportion.


Sign in / Sign up

Export Citation Format

Share Document