AN EFFICIENT CLUSTERING METHOD FOR DBSCAN GEOGRAPHIC SPATIO-TEMPORAL LARGE DATA WITH IMPROVED PARAMETER OPTIMIZATION

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.

Download Full-text

Center-based l1–clustering method

International Journal of Applied Mathematics and Computer Science ◽

10.2478/amcs-2014-0012 ◽

2014 ◽

Vol 24 (1) ◽

pp. 151-163 ◽

Cited By ~ 7

Author(s):

Kristian Sabo

Keyword(s):

Incomplete Data ◽

Optimization Problem ◽

Clustering Algorithm ◽

Optimal Solution ◽

Clustering Methods ◽

Global Minima ◽

Clustering Method ◽

Clustering Problem ◽

Point Set ◽

Locally Optimal Solution

Abstract In this paper, we consider the l1-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l2-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l1-clustering algorithm is faster and gives significantly better results than the l2-clustering algorithm.

Download Full-text

Computational analysis of incremental clustering approaches for Large Data

International Journal of Computers and Communications ◽

10.46300/91013.2021.15.3 ◽

2021 ◽

Vol 15 ◽

pp. 14-18

Author(s):

Arun Pratap Singh Kushwah ◽

Shailesh Jaloree ◽

Ramjeevan Singh Thakur

Keyword(s):

Data Mining ◽

Clustering Algorithm ◽

Computational Analysis ◽

Large Data ◽

Distance Functions ◽

Spatial Density ◽

Incremental Clustering ◽

Clustering Method ◽

Density Method ◽

Incremental Approach

Clustering is an approach of data mining, which helps us to find the underlying hidden structure in the dataset. K-means is a clustering method which usages distance functions to find the similarities or dissimilarities between the instances. DBSCAN is a clustering algorithm, which discovers the arbitrary shapes & sizes of clusters from huge volume of using spatial density method. These two approaches of clustering are the classical methods for efficient clustering but underperform when the data is updated frequently in the databases so, the incremental or gradual clustering approaches are always preferred in this environment. In this paper, an incremental approach for clustering is introduced using K-means and DBSCAN to handle the new datasets dynamically updated in the database in an interval.

Download Full-text

A Stochastic Disturbance of Particle Swarm Optimization for K-Means Clustering Method

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.268-270.10 ◽

2011 ◽

Vol 268-270 ◽

pp. 10-15

Author(s):

Jun Yan Chen

Keyword(s):

Particle Swarm Optimization ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Computational Cost ◽

Particle Swarm ◽

Optimal Solution ◽

Clustering Method ◽

Swarm Optimization ◽

Stochastic Disturbance ◽

Searching Ability

This paper presents a hybrid-clustering algorithm that is a stochastic disturbance of particle swarm optimization (PSO) for K-means clustering method (SDPSO-K). The proposed algorithm can improve the particle global searching ability in PSO to avoid the K-means disadvantage of being easily trapped in a local optimal solution and to save the expensive computational cost of PSO clustering. The performance of the SDPSO-K, compared with three recently developed modified PSO techniques and related clustering algorithms for six datasets, indicates that the SDPSO-K algorithm is clearly and consistently superior in terms of precision and robustness.

Download Full-text

A generalized Spatio-Temporal Threshold Clustering method for identification of extreme event patterns

Advances in Statistical Climatology Meteorology and Oceanography ◽

10.5194/ascmo-7-35-2021 ◽

2021 ◽

Vol 7 (1) ◽

pp. 35-52

Author(s):

Vitaly Kholodovsky ◽

Xin-Zhong Liang

Keyword(s):

Spatial Patterns ◽

Extreme Events ◽

Clustering Algorithm ◽

Climate Models ◽

Extreme Event ◽

Heat Waves ◽

Threshold Selection ◽

Forecast Verification ◽

Clustering Method ◽

Spatio Temporal

Abstract. Extreme weather and climate events such as floods, droughts, and heat waves can cause extensive societal damages. While various statistical and climate models have been developed for the purpose of simulating extremes, a consistent definition of extreme events is still lacking. Furthermore, to better assess the performance of the climate models, a variety of spatial forecast verification measures have been developed. However, in most cases, the spatial verification measures that are widely used to compare mean states do not have sufficient theoretical justification to benchmark extreme events. In order to alleviate inconsistencies when defining extreme events within different scientific communities, we propose a new generalized Spatio-Temporal Threshold Clustering method for the identification of extreme event episodes, which uses machine learning techniques to couple existing pattern recognition indices with high or low threshold choices. The method consists of five main steps: (1) construction of essential field quantities; (2) dimension reduction; (3) spatial domain mapping; (4) time series clustering; and (5) threshold selection. We develop and apply this method using a gridded daily precipitation dataset derived from rain gauge stations over the contiguous United States. We observe changes in the distribution of conditional frequency of extreme precipitation from large-scale well-connected spatial patterns to smaller-scale more isolated rainfall clusters, possibly leading to more localized droughts and heat waves, especially during the summer months. The proposed method automates the threshold selection process through a clustering algorithm and can be directly applicable in conjunction with modeling and spatial forecast verification of extremes. Additionally, it allows for the identification of synoptic-scale spatial patterns that can be directly traced to the individual extreme episodes, and it offers users the flexibility to select an extreme threshold that is linked to the desired geometrical properties. The approach can be applied to broad scientific disciplines.

Download Full-text

Teknik Data Mining Dalam Clustering Produksi Susu Segar Di Indonesia Dengan Algoritma K-Means

BRAHMANA: Jurnal Penerapan Kecerdasan Buatan ◽

10.30645/brahmana.v1i1.5 ◽

2019 ◽

Vol 1 (1) ◽

pp. 31-39

Author(s):

Ilham Safitra Damanik ◽

Sundari Retno Andani ◽

Dedi Sehendro

Keyword(s):

Data Mining ◽

Milk Production ◽

Clustering Algorithm ◽

Clustering Method ◽

Data Mining Techniques ◽

Low Level ◽

Fresh Milk ◽

Nutritional Needs ◽

High Level ◽

Level Cluster

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.

Download Full-text

A hierarchical clustering method for random intervals based on a similarity measure

Computational Statistics ◽

10.1007/s00180-021-01121-3 ◽

2021 ◽

Author(s):

Ana Belén Ramos-Guajardo

Keyword(s):

Hierarchical Clustering ◽

Similarity Measure ◽

Clustering Algorithm ◽

Real Life ◽

Stopping Criterion ◽

Clustering Method ◽

Bootstrap Test ◽

Empirical Performance ◽

Random Intervals ◽

Expected Values

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.

Download Full-text

Research on Anomaly Detection Method Based on DBSCAN Clustering Algorithm

2020 5th International Conference on Information Science, Computer Technology and Transportation (ISCTT) ◽

10.1109/isctt51595.2020.00083 ◽

2020 ◽

Author(s):

Dingsheng Deng

Keyword(s):

Anomaly Detection ◽

Clustering Algorithm ◽

Detection Method ◽

Dbscan Clustering

Download Full-text

Resolvable Cluster Target Tracking Based on the DBSCAN Clustering Algorithm and Labeled RFS

IEEE Access ◽

10.1109/access.2021.3066629 ◽

2021 ◽

Vol 9 ◽

pp. 43364-43377

Author(s):

Xirui Xue ◽

Shucai Huang ◽

Jiahao Xie ◽

Jiashun Ma ◽

Ning Li

Keyword(s):

Target Tracking ◽

Clustering Algorithm ◽

Dbscan Clustering

Download Full-text

A Novel Locality Sensitive K-Means Clustering Algorithm based on Core Clusters

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.321-324.1939 ◽

2013 ◽

Vol 321-324 ◽

pp. 1939-1942

Author(s):

Lei Gu

Keyword(s):

Clustering Algorithm ◽

Experimental Results ◽

Clustering Method ◽

Neighborhood Graph ◽

The Core ◽

Random Samples

The locality sensitive k-means clustering method has been presented recently. Although this approach can improve the clustering accuracies, it often gains the unstable clustering results because some random samples are employed for the initial centers. In this paper, an initialization method based on the core clusters is used for the locality sensitive k-means clustering. The core clusters can be formed by constructing the σ-neighborhood graph and their centers are regarded as the initial centers of the locality sensitive k-means clustering. To investigate the effectiveness of our approach, several experiments are done on three datasets. Experimental results show that our proposed method can improve the clustering performance compared to the previous locality sensitive k-means clustering.

Download Full-text

Simplification algorithm of denture point cloud based on feature preserving

Journal of Computational Methods in Sciences and Engineering ◽

10.3233/jcm-215541 ◽

2021 ◽

pp. 1-14

Author(s):

Shigang Wang ◽

Shuai Peng ◽

Jiawen He

Keyword(s):

Point Cloud ◽

Clustering Algorithm ◽

Characteristic Point ◽

Cloud Model ◽

Average Distance ◽

Normal Vector ◽

Sampling Point ◽

Characteristic Region ◽

K Value ◽

Feature Preserving

Due to the point cloud of oral scan denture has a large amount of data and redundant points. A point cloud simplification algorithm based on feature preserving is proposed to solve the problem that the feature preserving is incomplete when processing point cloud data and cavities occur in relatively flat regions. Firstly, the algorithm uses kd-tree to construct the point cloud spatial topological to search the k-Neighborhood of the sampling point. On the basis of that to calculate the curvature of each point, the angle between the normal vector, the distance from the point to the neighborhood centroid, as well as the standard deviation and the average distance from the point to the neighborhood on this basis, therefore, the detailed features of point cloud can be extracted by multi-feature extraction and threshold determination. For the non-characteristic region, the non-characteristic point cloud is spatially divided through Octree to obtain the K-value of K-means clustering algorithm and the initial clustering center point. The simplified results of non-characteristic regions are obtained after further subdivision. Finally, the extracted detail features and the reduced result of non-featured region will be merged to obtain the final simplification result. The experimental results show that the algorithm can retain the characteristic information of point cloud model better, and effectively avoid the phenomenon of holes in the simplification process. The simplified results have better smoothness, simplicity and precision, and are of high practical value.

Download Full-text