scholarly journals AN EFFICIENT CLUSTERING METHOD FOR DBSCAN GEOGRAPHIC SPATIO-TEMPORAL LARGE DATA WITH IMPROVED PARAMETER OPTIMIZATION

Author(s):  
J. W. Li ◽  
X. Q. Han ◽  
J. W. Jiang ◽  
Y. Hu ◽  
L. Liu

Abstract. How to establish an effective method of large data analysis of geographic space-time and quickly and accurately find the hidden value behind geographic information has become a current research focus. Researchers have found that clustering analysis methods in data mining field can well mine knowledge and information hidden in complex and massive spatio-temporal data, and density-based clustering is one of the most important clustering methods.However, the traditional DBSCAN clustering algorithm has some drawbacks which are difficult to overcome in parameter selection. For example, the two important parameters of Eps neighborhood and MinPts density need to be set artificially. If the clustering results are reasonable, the more suitable parameters can not be selected according to the guiding principles of parameter setting of traditional DBSCAN clustering algorithm. It can not produce accurate clustering results.To solve the problem of misclassification and density sparsity caused by unreasonable parameter selection in DBSCAN clustering algorithm. In this paper, a DBSCAN-based data efficient density clustering method with improved parameter optimization is proposed. Its evaluation index function (Optimal Distance) is obtained by cycling k-clustering in turn, and the optimal solution is selected. The optimal k-value in k-clustering is used to cluster samples. Through mathematical and physical analysis, we can determine the appropriate parameters of Eps and MinPts. Finally, we can get clustering results by DBSCAN clustering. Experiments show that this method can select parameters reasonably for DBSCAN clustering, which proves the superiority of the method described in this paper.

2014 ◽  
Vol 24 (1) ◽  
pp. 151-163 ◽  
Author(s):  
Kristian Sabo

Abstract In this paper, we consider the l1-clustering problem for a finite data-point set which should be partitioned into k disjoint nonempty subsets. In that case, the objective function does not have to be either convex or differentiable, and generally it may have many local or global minima. Therefore, it becomes a complex global optimization problem. A method of searching for a locally optimal solution is proposed in the paper, the convergence of the corresponding iterative process is proved and the corresponding algorithm is given. The method is illustrated by and compared with some other clustering methods, especially with the l2-clustering method, which is also known in the literature as a smooth k-means method, on a few typical situations, such as the presence of outliers among the data and the clustering of incomplete data. Numerical experiments show in this case that the proposed l1-clustering algorithm is faster and gives significantly better results than the l2-clustering algorithm.


2021 ◽  
Vol 15 ◽  
pp. 14-18
Author(s):  
Arun Pratap Singh Kushwah ◽  
Shailesh Jaloree ◽  
Ramjeevan Singh Thakur

Clustering is an approach of data mining, which helps us to find the underlying hidden structure in the dataset. K-means is a clustering method which usages distance functions to find the similarities or dissimilarities between the instances. DBSCAN is a clustering algorithm, which discovers the arbitrary shapes & sizes of clusters from huge volume of using spatial density method. These two approaches of clustering are the classical methods for efficient clustering but underperform when the data is updated frequently in the databases so, the incremental or gradual clustering approaches are always preferred in this environment. In this paper, an incremental approach for clustering is introduced using K-means and DBSCAN to handle the new datasets dynamically updated in the database in an interval.


2011 ◽  
Vol 268-270 ◽  
pp. 10-15
Author(s):  
Jun Yan Chen

This paper presents a hybrid-clustering algorithm that is a stochastic disturbance of particle swarm optimization (PSO) for K-means clustering method (SDPSO-K). The proposed algorithm can improve the particle global searching ability in PSO to avoid the K-means disadvantage of being easily trapped in a local optimal solution and to save the expensive computational cost of PSO clustering. The performance of the SDPSO-K, compared with three recently developed modified PSO techniques and related clustering algorithms for six datasets, indicates that the SDPSO-K algorithm is clearly and consistently superior in terms of precision and robustness.


Author(s):  
Vitaly Kholodovsky ◽  
Xin-Zhong Liang

Abstract. Extreme weather and climate events such as floods, droughts, and heat waves can cause extensive societal damages. While various statistical and climate models have been developed for the purpose of simulating extremes, a consistent definition of extreme events is still lacking. Furthermore, to better assess the performance of the climate models, a variety of spatial forecast verification measures have been developed. However, in most cases, the spatial verification measures that are widely used to compare mean states do not have sufficient theoretical justification to benchmark extreme events. In order to alleviate inconsistencies when defining extreme events within different scientific communities, we propose a new generalized Spatio-Temporal Threshold Clustering method for the identification of extreme event episodes, which uses machine learning techniques to couple existing pattern recognition indices with high or low threshold choices. The method consists of five main steps: (1) construction of essential field quantities; (2) dimension reduction; (3) spatial domain mapping; (4) time series clustering; and (5) threshold selection. We develop and apply this method using a gridded daily precipitation dataset derived from rain gauge stations over the contiguous United States. We observe changes in the distribution of conditional frequency of extreme precipitation from large-scale well-connected spatial patterns to smaller-scale more isolated rainfall clusters, possibly leading to more localized droughts and heat waves, especially during the summer months. The proposed method automates the threshold selection process through a clustering algorithm and can be directly applicable in conjunction with modeling and spatial forecast verification of extremes. Additionally, it allows for the identification of synoptic-scale spatial patterns that can be directly traced to the individual extreme episodes, and it offers users the flexibility to select an extreme threshold that is linked to the desired geometrical properties. The approach can be applied to broad scientific disciplines.


2019 ◽  
Vol 1 (1) ◽  
pp. 31-39
Author(s):  
Ilham Safitra Damanik ◽  
Sundari Retno Andani ◽  
Dedi Sehendro

Milk is an important intake to meet nutritional needs. Both consumed by children, and adults. Indonesia has many producers of fresh milk, but it is not sufficient for national milk needs. Data mining is a science in the field of computers that is widely used in research. one of the data mining techniques is Clustering. Clustering is a method by grouping data. The Clustering method will be more optimal if you use a lot of data. Data to be used are provincial data in Indonesia from 2000 to 2017 obtained from the Central Statistics Agency. The results of this study are in Clusters based on 2 milk-producing groups, namely high-dairy producers and low-milk producing regions. From 27 data on fresh milk production in Indonesia, two high-level provinces can be obtained, namely: West Java and East Java. And 25 others were added in 7 provinces which did not follow the calculation of the K-Means Clustering Algorithm, including in the low level cluster.


Author(s):  
Ana Belén Ramos-Guajardo

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.


IEEE Access ◽  
2021 ◽  
Vol 9 ◽  
pp. 43364-43377
Author(s):  
Xirui Xue ◽  
Shucai Huang ◽  
Jiahao Xie ◽  
Jiashun Ma ◽  
Ning Li

2013 ◽  
Vol 321-324 ◽  
pp. 1939-1942
Author(s):  
Lei Gu

The locality sensitive k-means clustering method has been presented recently. Although this approach can improve the clustering accuracies, it often gains the unstable clustering results because some random samples are employed for the initial centers. In this paper, an initialization method based on the core clusters is used for the locality sensitive k-means clustering. The core clusters can be formed by constructing the σ-neighborhood graph and their centers are regarded as the initial centers of the locality sensitive k-means clustering. To investigate the effectiveness of our approach, several experiments are done on three datasets. Experimental results show that our proposed method can improve the clustering performance compared to the previous locality sensitive k-means clustering.


Author(s):  
Shigang Wang ◽  
Shuai Peng ◽  
Jiawen He

Due to the point cloud of oral scan denture has a large amount of data and redundant points. A point cloud simplification algorithm based on feature preserving is proposed to solve the problem that the feature preserving is incomplete when processing point cloud data and cavities occur in relatively flat regions. Firstly, the algorithm uses kd-tree to construct the point cloud spatial topological to search the k-Neighborhood of the sampling point. On the basis of that to calculate the curvature of each point, the angle between the normal vector, the distance from the point to the neighborhood centroid, as well as the standard deviation and the average distance from the point to the neighborhood on this basis, therefore, the detailed features of point cloud can be extracted by multi-feature extraction and threshold determination. For the non-characteristic region, the non-characteristic point cloud is spatially divided through Octree to obtain the K-value of K-means clustering algorithm and the initial clustering center point. The simplified results of non-characteristic regions are obtained after further subdivision. Finally, the extracted detail features and the reduced result of non-featured region will be merged to obtain the final simplification result. The experimental results show that the algorithm can retain the characteristic information of point cloud model better, and effectively avoid the phenomenon of holes in the simplification process. The simplified results have better smoothness, simplicity and precision, and are of high practical value.


Sign in / Sign up

Export Citation Format

Share Document