scholarly journals PxEBCA: Proximity Expansion Based Clustering Algorithm

Author(s):  
Bhumika S. Arora, Dr.Vijay Chavda, Dr Bhadresh R. Pandya

Cluster analysis is one of the main techniques for analysing data. It is a technique for detecting groups of objects which are similar without specifying any criteria for the grouping. The matter of detecting clusters is challenging when the clusters are of varied size, density and shape. DBSCAN can find arbitrary shaped clusters along with outliers but it cannot handle different density. This paper presents a new method for detecting density based clusters which works on datasets having varied density. In this paper we propose PxEBCA that discovers clusters with arbitrary shape and also with varying density.Experimental evaluation of the effectiveness and efficiency of PEBCA was done using synthetic data. The results of experiments demonstrated that PxEBCA is significantly more effective in discovering clusters of arbitrary shapes with varying densities.

2019 ◽  
Author(s):  
William Wong ◽  
Naotsugu Tsuchiya

Evidence accumulation clustering (EAC) is an ensemble clustering algorithm that can cluster data for arbitrary shapes and numbers of clusters. Here, we present a variant of EAC in which we aimed to better cluster data with a large number of features, many of which may be uninformative. Our new method builds on the existing EAC algorithm by populating the clustering ensemble with clusterings based on combinations of fewer features than the original dataset at a time. Our method also calls for prewhitening the recombined data and weighting the influence of each individual clustering by an estimate of its informativeness. We provide code of an example implementation of the algorithm in Matlab and demonstrate its effectiveness compared to ordinary evidence accumulation clustering with synthetic data.


2016 ◽  
Vol 11 (1) ◽  
pp. 93-99 ◽  
Author(s):  
Na Su ◽  
Jimin Liu ◽  
Changqing Yan ◽  
Taian Liu ◽  
Xinjun An

This paper proposes VDStream, a new effective method, to discover arbitrary shape clusters over variable density data streams. The algorithm can reduce the influence of history data and effectively eliminate the interference of noise data. When the density of data streams changes, VDStream can dynamically adjust the parameters of density to find precise clusters. Experiments demonstrate the effectiveness and efficiency of VDStream.


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Author(s):  
N. P. Szabó ◽  
B. A. Braun ◽  
M. M. G. Abdelrahman ◽  
M. Dobróka

AbstractThe identification of lithology, fluid types, and total organic carbon content are of great priority in the exploration of unconventional hydrocarbons. As a new alternative, a further developed K-means type clustering method is suggested for the evaluation of shale gas formations. The traditional approach of cluster analysis is mainly based on the use of the Euclidean distance for grouping the objects of multivariate observations into different clusters. The high sensitivity of the L2 norm applied to non-Gaussian distributed measurement noises is well-known, which can be reduced by selecting a more suitable norm as distance metrics. To suppress the harmful effect of non-systematic errors and outlying data, the Most Frequent Value method as a robust statistical estimator is combined with the K-means clustering algorithm. The Cauchy-Steiner weights calculated by the Most Frequent Value procedure is applied to measure the weighted distance between the objects, which improves the performance of cluster analysis compared to the Euclidean norm. At the same time, the centroids are also calculated as a weighted average (using the Most Frequent Value method), instead of applying arithmetic mean. The suggested statistical method is tested using synthetic datasets as well as observed wireline logs, mud-logging data and core samples collected from the Barnett Shale Formation, USA. The synthetic experiment using extremely noisy well logs demonstrates that the newly developed robust clustering procedure is able to separate the geological-lithological units in hydrocarbon formations and provide additional information to standard well log analysis. It is also shown that the Cauchy-Steiner weighted cluster analysis is affected less by outliers, which allows a more efficient processing of poor-quality wireline logs and an improved evaluation of shale gas reservoirs.


Sensors ◽  
2020 ◽  
Vol 20 (19) ◽  
pp. 5697
Author(s):  
Chang Sun ◽  
Shihong Yue ◽  
Qi Li ◽  
Huaxiang Wang

Component fraction (CF) is one of the most important parameters in multiple-phase flow. Due to the complexity of the solid–liquid two-phase flow, the CF estimation remains unsolved both in scientific research and industrial application for a long time. Electrical resistance tomography (ERT) is an advanced type of conductivity detection technique due to its low-cost, fast-response, non-invasive, and non-radiation characteristics. However, when the existing ERT method is used to measure the CF value in solid–liquid two-phase flow in dredging engineering, there are at least three problems: (1) the dependence of reference distribution whose CF value is zero; (2) the size of the detected objects may be too small to be found by ERT; and (3) there is no efficient way to estimate the effect of artifacts in ERT. In this paper, we proposed a method based on the clustering technique, where a fast-fuzzy clustering algorithm is used to partition the ERT image to three clusters that respond to liquid, solid phases, and their mixtures and artifacts, respectively. The clustering algorithm does not need any reference distribution in the CF estimation. In the case of small solid objects or artifacts, the CF value remains effectively computed by prior information. To validate the new method, a group of typical CF estimations in dredging engineering were implemented. Results show that the new method can effectively overcome the limitations of the existing method, and can provide a practical and more accurate way for CF estimation.


Genetics ◽  
2001 ◽  
Vol 159 (2) ◽  
pp. 699-713
Author(s):  
Noah A Rosenberg ◽  
Terry Burke ◽  
Kari Elo ◽  
Marcus W Feldman ◽  
Paul J Freidlin ◽  
...  

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.


Author(s):  
Galina Merkuryeva ◽  
Vitaly Bolshakov ◽  
Maksims Kornevs

An Integrated Approach to Product Delivery Planning and SchedulingProduct delivery planning and scheduling is a task of high priority in transport logistics. In distribution centres this task is related to deliveries of various types of goods in predefined time windows. In real-life applications the problem has different stochastic performance criteria and conditions. Optimisation of schedules itself is time consuming and requires an expert knowledge. In this paper an integrated approach to product delivery planning and scheduling is proposed. It is based on a cluster analysis of demand data of stores to identify typical dynamic demand patterns and product delivery tactical plans, and simulation optimisation to find optimal parameters of transportation or vehicle schedules. Here, a cluster analysis of the demand data by using the K-means clustering algorithm and silhouette plots mean values is performed, and an NBTree-based classification model is built. In order to find an optimal grouping of stores into regions based on their geographical locations and the total demand uniformly distributed over regions, a multiobjective optimisation problem is formulated and solved with the NSGA II algorithm.


The proposed research work aims to perform the cluster analysis in the field of Precision Agriculture. The k-means technique is implemented to cluster the agriculture data. Selecting K value plays a major role in k-mean algorithm. Different techniques are used to identify the number of cluster value (k-value). Identification of suitable initial centroid has an important role in k-means algorithm. In general it will be selected randomly. In the proposed work to get the stability in the result Hybrid K-Mean clustering is used to identify the initial centroids. Since initial cluster centers are well defined Hybrid K-Means acts as a stable clustering technique.


2021 ◽  
Vol 263 (3) ◽  
pp. 3407-3416
Author(s):  
Tyler Dare

Measuring the forces that excite a structure into vibration is an important tool in modeling the system and investigating ways to reduce the vibration. However, determining the forces that have been applied to a vibrating structure can be a challenging inverse problem, even when the structure is instrumented with a large number of sensors. Previously, an artificial neural network was developed to identify the location of an impulsive force on a rectangular plate. In this research, the techniques were extended to plates of arbitrary shape. The principal challenge of arbitrary shapes is that some combinations of network outputs (x- and y-coordinates) are invalid. For example, for a plate with a hole in the middle, the network should not output that the force was applied in the center of the hole. Different methods of accommodating arbitrary shapes were investigated, including output space quantization and selecting the closest valid region.


Sign in / Sign up

Export Citation Format

Share Document