PxEBCA: Proximity Expansion Based Clustering Algorithm

Cluster analysis is one of the main techniques for analysing data. It is a technique for detecting groups of objects which are similar without specifying any criteria for the grouping. The matter of detecting clusters is challenging when the clusters are of varied size, density and shape. DBSCAN can find arbitrary shaped clusters along with outliers but it cannot handle different density. This paper presents a new method for detecting density based clusters which works on datasets having varied density. In this paper we propose PxEBCA that discovers clusters with arbitrary shape and also with varying density.Experimental evaluation of the effectiveness and efficiency of PEBCA was done using synthetic data. The results of experiments demonstrated that PxEBCA is significantly more effective in discovering clusters of arbitrary shapes with varying densities.

Download Full-text

Evidence accumulation clustering using combinations of features

10.31219/osf.io/epb6t ◽

2019 ◽

Cited By ~ 1

Author(s):

William Wong ◽

Naotsugu Tsuchiya

Keyword(s):

Clustering Algorithm ◽

Synthetic Data ◽

New Method ◽

Ensemble Clustering ◽

Clustering Ensemble ◽

Evidence Accumulation ◽

Original Dataset ◽

Cluster Data ◽

Arbitrary Shapes

Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data.

Download Full-text

An arbitrary shape clustering algorithm over variable density data streams

Journal of Algorithms & Computational Technology ◽

10.1177/1748301816670163 ◽

2016 ◽

Vol 11 (1) ◽

pp. 93-99 ◽

Cited By ~ 2

Author(s):

Na Su ◽

Jimin Liu ◽

Changqing Yan ◽

Taian Liu ◽

Xinjun An

Keyword(s):

Data Streams ◽

Arbitrary Shape ◽

Clustering Algorithm ◽

Variable Density ◽

Density Data ◽

History Data ◽

Noise Data ◽

Effectiveness And Efficiency ◽

Shape Clustering

This paper proposes VDStream, a new effective method, to discover arbitrary shape clusters over variable density data streams. The algorithm can reduce the influence of history data and effectively eliminate the interference of noise data. When the density of data streams changes, VDStream can dynamically adjust the parameters of density to find precise clusters. Experiments demonstrate the effectiveness and efficiency of VDStream.

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

Improved well logs clustering algorithm for shale gas identification and formation evaluation

Acta Geodaetica et Geophysica ◽

10.1007/s40328-021-00358-0 ◽

2021 ◽

Author(s):

N. P. Szabó ◽

B. A. Braun ◽

M. M. G. Abdelrahman ◽

M. Dobróka

Keyword(s):

Cluster Analysis ◽

Shale Gas ◽

Clustering Algorithm ◽

Traditional Approach ◽

Harmful Effect ◽

Weighted Average ◽

Well Logs ◽

Organic Carbon Content ◽

Wireline Logs ◽

Most Frequent Value

AbstractThe identification of lithology, fluid types, and total organic carbon content are of great priority in the exploration of unconventional hydrocarbons. As a new alternative, a further developed K-means type clustering method is suggested for the evaluation of shale gas formations. The traditional approach of cluster analysis is mainly based on the use of the Euclidean distance for grouping the objects of multivariate observations into different clusters. The high sensitivity of the L2 norm applied to non-Gaussian distributed measurement noises is well-known, which can be reduced by selecting a more suitable norm as distance metrics. To suppress the harmful effect of non-systematic errors and outlying data, the Most Frequent Value method as a robust statistical estimator is combined with the K-means clustering algorithm. The Cauchy-Steiner weights calculated by the Most Frequent Value procedure is applied to measure the weighted distance between the objects, which improves the performance of cluster analysis compared to the Euclidean norm. At the same time, the centroids are also calculated as a weighted average (using the Most Frequent Value method), instead of applying arithmetic mean. The suggested statistical method is tested using synthetic datasets as well as observed wireline logs, mud-logging data and core samples collected from the Barnett Shale Formation, USA. The synthetic experiment using extremely noisy well logs demonstrates that the newly developed robust clustering procedure is able to separate the geological-lithological units in hydrocarbon formations and provide additional information to standard well log analysis. It is also shown that the Cauchy-Steiner weighted cluster analysis is affected less by outliers, which allows a more efficient processing of poor-quality wireline logs and an improved evaluation of shale gas reservoirs.

Download Full-text

Clustering-Based Component Fraction Estimation in Solid–Liquid Two-Phase Flow in Dredging Engineering

Sensors ◽

10.3390/s20195697 ◽

2020 ◽

Vol 20 (19) ◽

pp. 5697

Author(s):

Chang Sun ◽

Shihong Yue ◽

Qi Li ◽

Huaxiang Wang

Keyword(s):

Clustering Algorithm ◽

Two Phase Flow ◽

Low Cost ◽

Fast Response ◽

New Method ◽

Phase Flow ◽

Reference Distribution ◽

Two Phase ◽

Long Time ◽

Solid Liquid

Component fraction (CF) is one of the most important parameters in multiple-phase flow. Due to the complexity of the solid–liquid two-phase flow, the CF estimation remains unsolved both in scientific research and industrial application for a long time. Electrical resistance tomography (ERT) is an advanced type of conductivity detection technique due to its low-cost, fast-response, non-invasive, and non-radiation characteristics. However, when the existing ERT method is used to measure the CF value in solid–liquid two-phase flow in dredging engineering, there are at least three problems: (1) the dependence of reference distribution whose CF value is zero; (2) the size of the detected objects may be too small to be found by ERT; and (3) there is no efficient way to estimate the effect of artifacts in ERT. In this paper, we proposed a method based on the clustering technique, where a fast-fuzzy clustering algorithm is used to partition the ERT image to three clusters that respond to liquid, solid phases, and their mixtures and artifacts, respectively. The clustering algorithm does not need any reference distribution in the CF estimation. In the case of small solid objects or artifacts, the CF value remains effectively computed by prior information. To validate the new method, a group of typical CF estimations in dredging engineering were implemented. Results show that the new method can effectively overcome the limitations of the existing method, and can provide a practical and more accurate way for CF estimation.

Download Full-text

A New Method for Initialising the K-Means Clustering Algorithm

2009 Second International Symposium on Knowledge Acquisition and Modeling ◽

10.1109/kam.2009.20 ◽

2009 ◽

Cited By ~ 3

Author(s):

Xiaoping Qin ◽

Shijue Zheng

Keyword(s):

Clustering Algorithm ◽

New Method

Download Full-text

Empirical Evaluation of Genetic Clustering Methods Using Multilocus Genotypes From 20 Chicken Breeds

Genetics ◽

10.1093/genetics/159.2.699 ◽

2001 ◽

Vol 159 (2) ◽

pp. 699-713

Author(s):

Noah A Rosenberg ◽

Terry Burke ◽

Kari Elo ◽

Marcus W Feldman ◽

Paul J Freidlin ◽

...

Keyword(s):

Cluster Analysis ◽

Population Structure ◽

Clustering Algorithm ◽

Empirical Evaluation ◽

Unknown Origin ◽

Clustering Methods ◽

Genetic Cluster ◽

Data Set ◽

Multilocus Genotypes ◽

Chicken Breeds

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.

Download Full-text

An Integrated Approach to Product Delivery Planning and Scheduling

Scientific Journal of Riga Technical University Computer Sciences ◽

10.2478/v10143-011-0049-7 ◽

2011 ◽

Vol 45 (1) ◽

pp. 97-103 ◽

Cited By ~ 1

Author(s):

Galina Merkuryeva ◽

Vitaly Bolshakov ◽

Maksims Kornevs

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Expert Knowledge ◽

Real Life ◽

Integrated Approach ◽

Classification Model ◽

Performance Criteria ◽

Planning And Scheduling ◽

Delivery Planning ◽

Product Delivery

An Integrated Approach to Product Delivery Planning and SchedulingProduct delivery planning and scheduling is a task of high priority in transport logistics. In distribution centres this task is related to deliveries of various types of goods in predefined time windows. In real-life applications the problem has different stochastic performance criteria and conditions. Optimisation of schedules itself is time consuming and requires an expert knowledge. In this paper an integrated approach to product delivery planning and scheduling is proposed. It is based on a cluster analysis of demand data of stores to identify typical dynamic demand patterns and product delivery tactical plans, and simulation optimisation to find optimal parameters of transportation or vehicle schedules. Here, a cluster analysis of the demand data by using the K-means clustering algorithm and silhouette plots mean values is performed, and an NBTree-based classification model is built. In order to find an optimal grouping of stores into regions based on their geographical locations and the total demand uniformly distributed over regions, a multiobjective optimisation problem is formulated and solved with the NSGA II algorithm.

Download Full-text

Hybrid K Mean Clustering Algorithm for Crop Production Analysis in Agriculture

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b1002.1292s19 ◽

2019 ◽

Vol 9 (2S) ◽

pp. 9-13

Keyword(s):

Cluster Analysis ◽

Precision Agriculture ◽

Crop Production ◽

Clustering Algorithm ◽

Research Work ◽

K Value ◽

Clustering Technique ◽

Production Analysis ◽

Initial Cluster ◽

The Stability

The proposed research work aims to perform the cluster analysis in the field of Precision Agriculture. The k-means technique is implemented to cluster the agriculture data. Selecting K value plays a major role in k-mean algorithm. Different techniques are used to identify the number of cluster value (k-value). Identification of suitable initial centroid has an important role in k-means algorithm. In general it will be selected randomly. In the proposed work to get the stability in the result Hybrid K-Mean clustering is used to identify the initial centroids. Since initial cluster centers are well defined Hybrid K-Means acts as a stable clustering technique.

Download Full-text

Experimental force reconstruction on plates of arbitrary shape using neural networks

INTER-NOISE and NOISE-CON Congress and Conference Proceedings ◽

10.3397/in-2021-2397 ◽

2021 ◽

Vol 263 (3) ◽

pp. 3407-3416

Author(s):

Tyler Dare

Keyword(s):

Neural Network ◽

Neural Networks ◽

Inverse Problem ◽

Artificial Neural Network ◽

Rectangular Plate ◽

Arbitrary Shape ◽

Impulsive Force ◽

Force Reconstruction ◽

Output Space ◽

Arbitrary Shapes

Measuring the forces that excite a structure into vibration is an important tool in modeling the system and investigating ways to reduce the vibration. However, determining the forces that have been applied to a vibrating structure can be a challenging inverse problem, even when the structure is instrumented with a large number of sensors. Previously, an artificial neural network was developed to identify the location of an impulsive force on a rectangular plate. In this research, the techniques were extended to plates of arbitrary shape. The principal challenge of arbitrary shapes is that some combinations of network outputs (x- and y-coordinates) are invalid. For example, for a plate with a hole in the middle, the network should not output that the force was applied in the center of the hole. Different methods of accommodating arbitrary shapes were investigated, including output space quantization and selecting the closest valid region.

Download Full-text