Variant of Northern Bald Ibis Algorithm for Unmasking Outliers

Author(s):  
Ravi Kumar Saidala

Clustering, one of the most attractive data analysis concepts in data mining, are frequently used by many researchers for analysing data of variety of real-world applications. It is stated in the literature that traditional clustering methods are trapped in local optima and fail to obtain optimal clusters. This research work gives the design and development of an advanced optimum clustering method for unmasking abnormal entries in the clinical dataset. The basis is the NOA, a recently proposed algorithm, driven by mimicking the migration pattern of Northern Bald Ibis (Threskiornithidae) birds. First, we developed the variant of the standard NOA by replacing C1 and C2 parameters of NOA with chaotic maps turning it into the VNOA. Later, we utilized the VNOA in the design of a new and advanced clustering method. VNOA is first benchmarked on a 7 unimodal (F1–F7) and 6 multimodal (F8–F13) mathematical functions. We tested the numerical complexity of proposed VNOA-based clustering methods on a clinical dataset. We then compared the obtained graphical and statistical results with well-known algorithms. The superiority of the presented clustering method is evidenced from the simulations and comparisons.

Author(s):  
Ravi Kumar Saidala ◽  
Nagaraju Devarakonda

This article proposes a new optimal data clustering method for finding optimal clusters of data by incorporating chaotic maps into the standard NOA. NOA, a newly developed optimization technique, has been shown to be efficient in generating optimal results with lowest solution cost. The incorporation of chaotic maps into metaheuristics enables algorithms to diversify the solution space into two phases: explore and exploit more. To make the NOA more efficient and avoid premature convergence, chaotic maps are incorporated in this work, termed as CNOAs. Ten different chaotic maps are incorporated individually into standard NOA for testing the optimization performance. The CNOA is first benchmarked on 23 standard functions. Secondly, testing was done on the numerical complexity of the new clustering method which utilizes CNOA, by solving 10 UCI data cluster problems and 4 web document cluster problems. The comparisons have been made with the help of obtaining statistical and graphical results. The superiority of the proposed optimal clustering algorithm is evident from the simulations and comparisons.


2021 ◽  
Vol 10 (3) ◽  
pp. 161
Author(s):  
Hao-xuan Chen ◽  
Fei Tao ◽  
Pei-long Ma ◽  
Li-na Gao ◽  
Tong Zhou

Spatial analysis is an important means of mining floating car trajectory information, and clustering method and density analysis are common methods among them. The choice of the clustering method affects the accuracy and time efficiency of the analysis results. Therefore, clarifying the principles and characteristics of each method is the primary prerequisite for problem solving. Taking four representative spatial analysis methods—KMeans, Density-Based Spatial Clustering of Applications with Noise (DBSCAN), Clustering by Fast Search and Find of Density Peaks (CFSFDP), and Kernel Density Estimation (KDE)—as examples, combined with the hotspot spatiotemporal mining problem of taxi trajectory, through quantitative analysis and experimental verification, it is found that DBSCAN and KDE algorithms have strong hotspot discovery capabilities, but the heat regions’ shape of DBSCAN is found to be relatively more robust. DBSCAN and CFSFDP can achieve high spatial accuracy in calculating the entrance and exit position of a Point of Interest (POI). KDE and DBSCAN are more suitable for the classification of heat index. When the dataset scale is similar, KMeans has the highest operating efficiency, while CFSFDP and KDE are inferior. This paper resolves to a certain extent the lack of scientific basis for selecting spatial analysis methods in current research. The conclusions drawn in this paper can provide technical support and act as a reference for the selection of methods to solve the taxi trajectory mining problem.


2015 ◽  
Vol 17 (5) ◽  
pp. 719-732
Author(s):  
Dulakshi Santhusitha Kumari Karunasingha ◽  
Shie-Yui Liong

A simple clustering method is proposed for extracting representative subsets from lengthy data sets. The main purpose of the extracted subset of data is to use it to build prediction models (of the form of approximating functional relationships) instead of using the entire large data set. Such smaller subsets of data are often required in exploratory analysis stages of studies that involve resource consuming investigations. A few recent studies have used a subtractive clustering method (SCM) for such data extraction, in the absence of clustering methods for function approximation. SCM, however, requires several parameters to be specified. This study proposes a clustering method, which requires only a single parameter to be specified, yet it is shown to be as effective as the SCM. A method to find suitable values for the parameter is also proposed. Due to having only a single parameter, using the proposed clustering method is shown to be orders of magnitudes more efficient than using SCM. The effectiveness of the proposed method is demonstrated on phase space prediction of three univariate time series and prediction of two multivariate data sets. Some drawbacks of SCM when applied for data extraction are identified, and the proposed method is shown to be a solution for them.


2018 ◽  
Vol 6 (2) ◽  
Author(s):  
Elly Muningsih - AMIK BSI Yogyakarta

Abstract ~ The K-Means method is one of the clustering methods that is widely used in data clustering research. While the K-Medoids method is an efficient method used for processing small data. This study aims to compare two clustering methods by grouping customers into 3 clusters according to their characteristics, namely very potential (loyal) customers, potential customers and non potential customers. The method used in this study is the K-Means clustering method and the K-Medoids method. The data used is online sales transaction. The clustering method testing is done by using a Fuzzy RFM (Recency, Frequenty and Monetary) model where the average (mean) of the third value is taken. From the data testing is known that the K-Means method is better than the K-Medoids method with an accuracy value of 90.47%. Whereas from the data processing carried out is known that cluster 1 has 16 members (customers), cluster 2 has 11 members and cluster 3 has 15 members. Keywords : clustering, K-Means method, K-Medoids method, customer, Fuzzy RFM model. Abstrak ~ Metode K-Means merupakan salah satu metode clustering yang banyak digunakan dalam penelitian pengelompokan data. Sedangkan metode K-Medoids merupakan metode yang efisien digunakan untuk pengolahan data yang kecil. Penelitian ini bertujuan untuk membandingkan atau mengkomparasi dua metode clustering dengan cara mengelompokkan pelanggan menjadi 3 cluster sesuai dengan karakteristiknya, yaitu pelanggan sangat potensial (loyal), pelanggan potensial dan pelanggan kurang (tidak) potensial. Metode yang digunakan dalam penelitian ini adalah metode clustering K-Means dan metode K-Medoids. Data yang digunakan adalah data transaksi penjualan online. Pengujian metode clustering yang dilakukan adalah dengan menggunakan model Fuzzy RFM (Recency, Frequenty dan Monetary) dimana diambil rata-rata (mean) dari nilai ketiga tersebut. Dari pengujian data diketahui bahwa metode K-Means lebih baik dari metode K-Medoids dengan nilai akurasi 90,47%. Sedangkan dari pengolahan data yang dilakukan diketahui bahwa cluster 1 memiliki 16 anggota (pelanggan), cluster 2 memiliki 11 anggota dan cluster 3 memiliki 15 anggota. Kata kunci : clustering, metode K-Means, metode K-Medoids, pelanggan, model Fuzzy RFM.


2021 ◽  
Author(s):  
◽  
Daniel Wayne Crabtree

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>


2021 ◽  
Author(s):  
◽  
Daniel Wayne Crabtree

<p>This thesis investigates the refinement of web search results with a special focus on the use of clustering and the role of queries. It presents a collection of new methods for evaluating clustering methods, performing clustering effectively, and for performing query refinement. The thesis identifies different types of query, the situations where refinement is necessary, and the factors affecting search difficulty. It then analyses hard searches and argues that many of them fail because users and search engines have different query models. The thesis identifies best practice for evaluating web search results and search refinement methods. It finds that none of the commonly used evaluation measures for clustering meet all of the properties of good evaluation measures. It then presents new quality and coverage measures that satisfy all the desired properties and that rank clusterings correctly in all web page clustering situations. The thesis argues that current web page clustering methods work well when different interpretations of the query have distinct vocabulary, but still have several limitations and often produce incomprehensible clusters. It then presents a new clustering method that uses the query to guide the construction of semantically meaningful clusters. The new clustering method significantly improves performance. Finally, the thesis explores how searches and queries are composed of different aspects and shows how to use aspects to reduce the distance between the query models of search engines and users. It then presents fully automatic methods that identify query aspects, identify underrepresented aspects, and predict query difficulty. Used in combination, these methods have many applications — the thesis describes methods for two of them. The first method improves the search results for hard queries with underrepresented aspects by automatically expanding the query using semantically orthogonal keywords related to the underrepresented aspects. The second method helps users refine hard ambiguous queries by identifying the different query interpretations using a clustering of a diverse set of refinements. Both methods significantly outperform existing methods.</p>


Author(s):  
Nurshazwani Muhamad Mahfuz ◽  
Marina Yusoff ◽  
Zakiah Ahmad

<div style="’text-align: justify;">Clustering provides a prime important role as an unsupervised learning method in data analytics to assist many real-world problems such as image segmentation, object recognition or information retrieval. It is often an issue of difficulty for traditional clustering technique due to non-optimal result exist because of the presence of outliers and noise data.  This review paper provides a review of single clustering methods that were applied in various domains.  The aim is to see the potential suitable applications and aspect of improvement of the methods. Three categories of single clustering methods were suggested, and it would be beneficial to the researcher to see the clustering aspects as well as to determine the requirement for clustering method for an employment based on the state of the art of the previous research findings.</div>


Author(s):  
ZD Zhou ◽  
YQ Xie ◽  
DT Pham ◽  
S Kamsani ◽  
M Castellani

The aim of multimodal optimisation is to find significant optima of a multimodal objective function including its global optimum. Many real-world applications are multimodal optimisation problems requiring multiple optimal solutions. The Bees Algorithm is a global optimisation procedure inspired by the foraging behaviour of honeybees. In this paper, several procedures are introduced to enhance the algorithm’s capability to find multiple optima in multimodal optimisation problems. In the proposed Bees Algorithm for multimodal optimisation, dynamic colony size is permitted to automatically adapt the search effort to different objective functions. A local search approach called balanced search technique is also proposed to speed up the algorithm. In addition, two procedures of radius estimation and optima elitism are added, to respectively enhance the Bees Algorithm’s ability to locate unevenly distributed optima, and eliminate insignificant local optima. The performance of the modified Bees Algorithm is evaluated on well-known benchmark problems, and the results are compared with those obtained by several other state-of-the-art algorithms. The results indicate that the proposed algorithm inherits excellent properties from the standard Bees Algorithm, obtaining notable efficiency for solving multimodal optimisation problems due to the introduced modifications.


Author(s):  
Ming Cao ◽  
Qinke Peng ◽  
Ze-Gang Wei ◽  
Fei Liu ◽  
Yi-Fan Hou

The development of high-throughput technologies has produced increasing amounts of sequence data and an increasing need for efficient clustering algorithms that can process massive volumes of sequencing data for downstream analysis. Heuristic clustering methods are widely applied for sequence clustering because of their low computational complexity. Although numerous heuristic clustering methods have been developed, they suffer from two limitations: overestimation of inferred clusters and low clustering sensitivity. To address these issues, we present a new sequence clustering method (edClust) based on Edlib, a C/C[Formula: see text] library for fast, exact semi-global sequence alignment to group similar sequences. The new method edClust was tested on three large-scale sequence databases, and we compared edClust to several classic heuristic clustering methods, such as UCLUST, CD-HIT, and VSEARCH. Evaluations based on the metrics of cluster number and seed sensitivity (SS) demonstrate that edClust can produce fewer clusters than other methods and that its SS is higher than that of other methods. The source codes of edClust are available from https://github.com/zhang134/EdClust.git under the GNU GPL license.


Author(s):  
Akbar Esmaeilzadeh ◽  
Kourosh Shahriar

Detecting of joint sets (clusters) is one of the most important processes in determining properties of fractures. Joints clustering and consequently, determination of the mean value representing each cluster is applicable to most rock mass studies. It is clear that the accuracy of the clustering process plays a key role in analyzing stability of infrastructures such as dams and tunnels and so on. Hence, in this paper, by reviewing several methods proposed for clustering fractures and considering their advantages and disadvantages, a three-stage hybrid method is developed which contains Fuzzy c-means, Fuzzy covariance and Fuzzy maximum likelihood estimation that by utilizing the modified orientation matrix had been optimized. This method is optimized by the Differential Evolutionary algorithm using a new and strong cost function which is defined as the computation core. In addition, using three clustering quality comparing criteria, the new developed method of differential evolutionary optimized of fuzzy cmeans - fuzzy covariance - fuzzy maximum likelihood estimation clustering method (DEF3) is compared with other base and common methods using field data. After doing the calculations, the developed method by giving the best values for all the criteria provided the best results and good stability in meeting different criteria. The DEF3 method was validated using actual field data which mapped in Rudbar Lorestan dam site. The results revealed that DEF3 acquired the best rank among the other method by getting the value of 0.5721 of Davis-Bouldin criterion, 1403.1 of Calinski-Harabasz criterion, and 0.83482 of Silihotte as comparing criteria of clustering methods.


Sign in / Sign up

Export Citation Format

Share Document