A Novel Chaotic Northern Bald Ibis Optimization Algorithm for Solving Different Cluster Problems [ICCICC18 #155]

This article proposes a new optimal data clustering method for finding optimal clusters of data by incorporating chaotic maps into the standard NOA. NOA, a newly developed optimization technique, has been shown to be efficient in generating optimal results with lowest solution cost. The incorporation of chaotic maps into metaheuristics enables algorithms to diversify the solution space into two phases: explore and exploit more. To make the NOA more efficient and avoid premature convergence, chaotic maps are incorporated in this work, termed as CNOAs. Ten different chaotic maps are incorporated individually into standard NOA for testing the optimization performance. The CNOA is first benchmarked on 23 standard functions. Secondly, testing was done on the numerical complexity of the new clustering method which utilizes CNOA, by solving 10 UCI data cluster problems and 4 web document cluster problems. The comparisons have been made with the help of obtaining statistical and graphical results. The superiority of the proposed optimal clustering algorithm is evident from the simulations and comparisons.

Download Full-text

Variant of Northern Bald Ibis Algorithm for Unmasking Outliers

International Journal of Software Science and Computational Intelligence ◽

10.4018/ijssci.2020010102 ◽

2020 ◽

Vol 12 (1) ◽

pp. 15-29

Author(s):

Ravi Kumar Saidala

Keyword(s):

Chaotic Maps ◽

Research Work ◽

Migration Pattern ◽

Clustering Methods ◽

Clustering Method ◽

Mathematical Functions ◽

Local Optima ◽

Clinical Dataset ◽

Real World Applications ◽

Numerical Complexity

Clustering, one of the most attractive data analysis concepts in data mining, are frequently used by many researchers for analysing data of variety of real-world applications. It is stated in the literature that traditional clustering methods are trapped in local optima and fail to obtain optimal clusters. This research work gives the design and development of an advanced optimum clustering method for unmasking abnormal entries in the clinical dataset. The basis is the NOA, a recently proposed algorithm, driven by mimicking the migration pattern of Northern Bald Ibis (Threskiornithidae) birds. First, we developed the variant of the standard NOA by replacing C1 and C2 parameters of NOA with chaotic maps turning it into the VNOA. Later, we utilized the VNOA in the design of a new and advanced clustering method. VNOA is first benchmarked on a 7 unimodal (F1–F7) and 6 multimodal (F8–F13) mathematical functions. We tested the numerical complexity of proposed VNOA-based clustering methods on a clinical dataset. We then compared the obtained graphical and statistical results with well-known algorithms. The superiority of the presented clustering method is evidenced from the simulations and comparisons.

Download Full-text

A Graph-Based Biomedical Literature Clustering Approach Utilizing Term's Global and Local Importance Information

Strategic Advancements in Utilizing Data Mining and Warehousing Technologies ◽

10.4018/978-1-60566-717-1.ch008 ◽

2011 ◽

pp. 133-150

Author(s):

Zhang Xiaodan ◽

Hu Xiaohua ◽

Xia Jiali ◽

Zhou Xiaohua ◽

Achananuparp Palakorn

Keyword(s):

Clustering Algorithm ◽

Document Clustering ◽

Biomedical Literature ◽

Semantic Relationship ◽

Clustering Method ◽

Graph Representations ◽

The Core ◽

Document Cluster ◽

Clustering Approach ◽

Global And Local

In this article, we present a graph-based knowledge representation for biomedical digital library literature clustering. An efficient clustering method is developed to identify the ontology-enriched k-highest density term subgraphs that capture the core semantic relationship information about each document cluster. The distance between each document and the k term graph clusters is calculated. A document is then assigned to the closest term cluster. The extensive experimental results on two PubMed document sets (Disease10 and OHSUMED23) show that our approach is comparable to spherical k-means. The contributions of our approach are the following: (1) we provide two corpus-level graph representations to improve document clustering, a term co-occurrence graph and an abstract-title graph; (2) we develop an efficient and effective document clustering algorithm by identifying k distinguishable class-specific core term subgraphs using terms’ global and local importance information; and (3) the identified term clusters give a meaningful explanation for the document clustering results.

Download Full-text

Teknik Data Mining Dalam Clustering Produksi Susu Segar Di Indonesia Dengan Algoritma K-Means

BRAHMANA: Jurnal Penerapan Kecerdasan Buatan ◽

10.30645/brahmana.v1i1.5 ◽

2019 ◽

Vol 1 (1) ◽

pp. 31-39

Author(s):

Ilham Safitra Damanik ◽

Sundari Retno Andani ◽

Dedi Sehendro

Keyword(s):

Data Mining ◽

Milk Production ◽

Clustering Algorithm ◽

Clustering Method ◽

Data Mining Techniques ◽

Low Level ◽

Fresh Milk ◽

Nutritional Needs ◽

High Level ◽

Level Cluster

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.

Download Full-text

A hierarchical clustering method for random intervals based on a similarity measure

Computational Statistics ◽

10.1007/s00180-021-01121-3 ◽

2021 ◽

Author(s):

Ana Belén Ramos-Guajardo

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Real Life ◽

Stopping Criterion ◽

Clustering Method ◽

Bootstrap Test ◽

Empirical Performance ◽

Random Intervals ◽

Expected Values

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.

Download Full-text

Border Detection in Skin Lesion Images Using an Improved Clustering Algorithm

International Journal of e-Collaboration ◽

10.4018/ijec.2020100102 ◽

2020 ◽

Vol 16 (4) ◽

pp. 15-29

Author(s):

Jayalakshmi D. ◽

Dheeba J.

Keyword(s):

Skin Cancer ◽

Skin Lesion ◽

Clustering Algorithm ◽

Median Filter ◽

Second Phase ◽

Border Detection ◽

Early Computer ◽

Segmentation Methods ◽

Two Phases ◽

Skin Cancer Detection

The incidence of skin cancer has been increasing in recent years and it can become dangerous if not detected early. Computer-aided diagnosis systems can help the dermatologists in assisting with skin cancer detection by examining the features more critically. In this article, a detailed review of pre-processing and segmentation methods is done on skin lesion images by investigating existing and prevalent segmentation methods for the diagnosis of skin cancer. The pre-processing stage is divided into two phases, in the first phase, a median filter is used to remove the artifact; and in the second phase, an improved K-means clustering with outlier removal (KMOR) algorithm is suggested. The proposed method was tested in a publicly available Danderm database. The improved cluster-based algorithm gives an accuracy of 92.8% with a sensitivity of 93% and specificity of 90% with an AUC value of 0.90435. From the experimental results, it is evident that the clustering algorithm has performed well in detecting the border of the lesion and is suitable for pre-processing dermoscopic images.

Download Full-text

A Novel Locality Sensitive K-Means Clustering Algorithm based on Core Clusters

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.1939 ◽

2013 ◽

Vol 321-324 ◽

pp. 1939-1942

Author(s):

Lei Gu

Keyword(s):

Clustering Algorithm ◽

Experimental Results ◽

Clustering Method ◽

Neighborhood Graph ◽

The Core ◽

Random Samples

The locality sensitive k-means clustering method has been presented recently. Although this approach can improve the clustering accuracies, it often gains the unstable clustering results because some random samples are employed for the initial centers. In this paper, an initialization method based on the core clusters is used for the locality sensitive k-means clustering. The core clusters can be formed by constructing the σ-neighborhood graph and their centers are regarded as the initial centers of the locality sensitive k-means clustering. To investigate the effectiveness of our approach, several experiments are done on three datasets. Experimental results show that our proposed method can improve the clustering performance compared to the previous locality sensitive k-means clustering.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Reliability/risk centered cost effective preventive maintenance planning of generating units

International Journal of Quality & Reliability Management ◽

10.1108/ijqrm-03-2017-0039 ◽

2018 ◽

Vol 35 (9) ◽

pp. 2052-2079 ◽

Cited By ~ 2

Author(s):

Umamaheswari E. ◽

Ganesan S. ◽

Abirami M. ◽

Subramanian S.

Keyword(s):

Preventive Maintenance ◽

Optimization Technique ◽

Solution Space ◽

Fuzzy Decision Making ◽

Content Type ◽

Multi Objective ◽

Statistical Measures ◽

Highly Nonlinear ◽

Optimal Maintenance ◽

Maintenance Schedules

Purpose Finding the optimal maintenance schedules is the primitive aim of preventive maintenance scheduling (PMS) problem dealing with the objectives of reliability, risk and cost. Most of the earlier works in the literature have focused on PMS with the objectives of leveling reserves/risk/cost independently. Nevertheless, very few publications in the current literature tackle the multi-objective PMS model with simultaneous optimization of reliability, and economic perspectives. Since, the PMS problem is highly nonlinear and complex in nature, an appropriate optimization technique is necessary to solve the problem in hand. The paper aims to discuss these issues. Design/methodology/approach The complexity of the PMS problem in power systems necessitates a simple and robust optimization tool. This paper employs the modern meta-heuristic algorithm, namely, Ant Lion Optimizer (ALO) to obtain the optimal maintenance schedules for the PMS problem. In order to extract best compromise solution in the multi-objective solution space (reliability, risk and cost), a fuzzy decision-making mechanism is incorporated with ALO (FDMALO) for solving PMS. Findings As a first attempt, the best feasible maintenance schedules are obtained for PMS problem using FDMALO in the multi-objective solution space. The statistical measures are computed for the test systems which are compared with various meta-heuristic algorithms. The applicability of the algorithm for PMS problem is validated through statistical t-test. The statistical comparison and the t-test results reveal the superiority of ALO in achieving improved solution quality. The numerical and statistical results are encouraging and indicate the viability of the proposed ALO technique. Originality/value As a maiden attempt, FDMALO is used to solve the multi-objective PMS problem. This paper fills the gap in the literature by solving the PMS problem in the multi-objective framework, with the improved quality of the statistical indices.

Download Full-text

AN EFFICIENT CLUSTERING METHOD FOR DBSCAN GEOGRAPHIC SPATIO-TEMPORAL LARGE DATA WITH IMPROVED PARAMETER OPTIMIZATION

ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences ◽

10.5194/isprs-archives-xlii-3-w10-581-2020 ◽

2020 ◽

Vol XLII-3/W10 ◽

pp. 581-584

Author(s):

J. W. Li ◽

X. Q. Han ◽

J. W. Jiang ◽

Y. Hu ◽

L. Liu

Keyword(s):

Parameter Optimization ◽

Clustering Algorithm ◽

Optimal Solution ◽

Large Data ◽

Parameter Selection ◽

Physical Analysis ◽

Clustering Method ◽

K Value ◽

Dbscan Clustering ◽

Spatio Temporal

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.

Download Full-text

TCLUST: Trimming Approach of Robust Clustering Method

Malaysian Journal of Fundamental and Applied Sciences ◽

10.11113/mjfas.v8n4.154 ◽

2014 ◽

Vol 8 (4) ◽

Author(s):

Muhamad Alias Md. Jedi ◽

Robiah Adnan

Keyword(s):

Clustering Algorithm ◽

Likelihood Function ◽

R Package ◽

Clustering Method ◽

Number Of Clusters ◽

Robust Clustering ◽

Scatter Matrix ◽

Group Assignment ◽

Log Likelihood ◽

Clustering Approach

TCLUST is a method in statistical clustering technique which is based on modification of trimmed k-means clustering algorithm. It is called “crisp” clustering approach because the observation is can be eliminated or assigned to a group. TCLUST strengthen the group assignment by putting constraint to the cluster scatter matrix. The emphasis in this paper is to restrict on the eigenvalues, λ of the scatter matrix. The idea of imposing constraints is to maximize the log-likelihood function of spurious-outlier model. A review of different robust clustering approach is presented as a comparison to TCLUST methods. This paper will discuss the nature of TCLUST algorithm and how to determine the number of cluster or group properly and measure the strength of group assignment. At the end of this paper, R-package on TCLUST implement the types of scatter restriction, making the algorithm to be more flexible for choosing the number of clusters and the trimming proportion.

Download Full-text