Density-based adaptive spatial clustering algorithm for identifying local high-density areas in georeferenced documents

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

Tree-ART2 Learning Model for Spatial Clustering in Second Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.1934 ◽

2014 ◽

Vol 543-547 ◽

pp. 1934-1938

Author(s):

Ming Xiao

Keyword(s):

Network Model ◽

Spatial Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Adaptive Resonance Theory ◽

Spatial Distance ◽

Resonance Theory ◽

Adaptive Resonance ◽

Vector Module

For a clustering algorithm in two-dimension spatial data, the Adaptive Resonance Theory exists not only the shortcomings of pattern drift and vector module of information missing, but also difficultly adapts to spatial data clustering which is irregular distribution. A Tree-ART2 network model was proposed based on the above situation. It retains the memory of old model which maintains the constraint of spatial distance by learning and adjusting LTM pattern and amplitude information of vector. Meanwhile, introducing tree structure to the model can reduce the subjective requirement of vigilance parameter and decrease the occurrence of pattern mixing. It is showed that TART2 network has higher plasticity and adaptability through compared experiments.

Download Full-text

An environmental dependence of the physical and structural properties in the Hydra cluster galaxies

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3326 ◽

2020 ◽

Vol 500 (1) ◽

pp. 1323-1339

Author(s):

Ciria Lima-Dias ◽

Antonela Monachesi ◽

Sergio Torres-Flores ◽

Arianna Cortesi ◽

Daniel Hernández-Lang ◽

...

Keyword(s):

Structural Properties ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Broad Band ◽

Visible Region ◽

Early Type ◽

Cluster Galaxies ◽

Using Data ◽

Stellar Masses ◽

Environmental Dependence

ABSTRACT The nearby Hydra cluster (∼50 Mpc) is an ideal laboratory to understand, in detail, the influence of the environment on the morphology and quenching of galaxies in dense environments. We study the Hydra cluster galaxies in the inner regions (1R200) of the cluster using data from the Southern Photometric Local Universe Survey, which uses 12 narrow and broad-band filters in the visible region of the spectrum. We analyse structural (Sérsic index, effective radius) and physical (colours, stellar masses, and star formation rates) properties. Based on this analysis, we find that ∼88 per cent of the Hydra cluster galaxies are quenched. Using the Dressler–Schectman test approach, we also find that the cluster shows possible substructures. Our analysis of the phase-space diagram together with density-based spatial clustering algorithm indicates that Hydra shows an additional substructure that appears to be in front of the cluster centre, which is still falling into it. Our results, thus, suggest that the Hydra cluster might not be relaxed. We analyse the median Sérsic index as a function of wavelength and find that for red [(u − r) ≥2.3] and early-type galaxies it displays a slight increase towards redder filters (13 and 18 per cent, for red and early type, respectively), whereas for blue + green [(u − r)<2.3] galaxies it remains constant. Late-type galaxies show a small decrease of the median Sérsic index towards redder filters. Also, the Sérsic index of galaxies, and thus their structural properties, do not significantly vary as a function of clustercentric distance and density within the cluster; and this is the case regardless of the filter.

Download Full-text

Multi-zone prediction analysis of city-scale travel order demand

PLoS ONE ◽

10.1371/journal.pone.0248064 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248064

Author(s):

Pengshun Li ◽

Jiarui Chang ◽

Yi Zhang ◽

Yi Zhang

Keyword(s):

Prediction Model ◽

Clustering Algorithm ◽

Intelligent Transportation System ◽

Spatial Clustering ◽

Support Vector ◽

Nearest Neighbour ◽

Short Term ◽

Prediction Ability ◽

Demand Prediction ◽

Zone Division

Taxi order demand prediction is of tremendous importance for continuous upgrading of an intelligent transportation system to realise city-scale and personalised services. An accurate short-term taxi demand prediction model in both spatial and temporal relations can assist a city pre-allocate its resources and facilitate city-scale taxi operation management in a megacity. To address problems similar to the above, in this study, we proposed a multi-zone order demand prediction model to predict short-term taxi order demand in different zones at city-scale. A two-step methodology was developed, including order zone division and multi-zone order prediction. For the zone division step, the K-means++ spatial clustering algorithm was used, and its parameter k was estimated by the between–within proportion index. For the prediction step, six methods (backpropagation neural network, support vector regression, random forest, average fusion-based method, weighted fusion-based method, and k-nearest neighbour fusion-based method) were used for comparison. To demonstrate the performance, three multi-zone weighted accuracy indictors were proposed to evaluate the order prediction ability at city-scale. These models were implemented and validated on real-world taxi order demand data from a three-month consecutive collection in Shenzhen, China. Experiment on the city-scale taxi demand data demonstrated the superior prediction performance of the multi-zone order demand prediction model with the k-nearest neighbour fusion-based method based on the proposed accuracy indicator.

Download Full-text

A Machine Learning Approach to Delineating Neighborhoods from Geocoded Appraisal Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9070451 ◽

2020 ◽

Vol 9 (7) ◽

pp. 451

Author(s):

Rao Hamza Ali ◽

Josh Graves ◽

Stanley Wu ◽

Jenny Lee ◽

Erik Linstead

Keyword(s):

Real Estate ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Real Estate Market ◽

Spatial Filters ◽

Census Tracts ◽

The Real ◽

The Real Estate ◽

The Real Estate Market ◽

Machine Learning Approach

Identification of neighborhoods is an important, financially-driven topic in real estate. It is known that the real estate industry uses ZIP (postal) codes and Census tracts as a source of land demarcation to categorize properties with respect to their price. These demarcated boundaries are static and are inflexible to the shift in the real estate market and fail to represent its dynamics, such as in the case of an up-and-coming residential project. Delineated neighborhoods are also used in socioeconomic and demographic analyses where statistics are computed at a neighborhood level. Current practices of delineating neighborhoods have mostly ignored the information that can be extracted from property appraisals. This paper demonstrates the potential of using only the distance between subjects and their comparable properties, identified in an appraisal, to delineate neighborhoods that are composed of properties with similar prices and features. Using spatial filters, we first identify regions with the most appraisal activity, and through the application of a spatial clustering algorithm, generate neighborhoods composed of properties sharing similar characteristics. Through an application of bootstrapped linear regression, we find that delineating neighborhoods using geolocation of subjects and comparable properties explains more variation in a property’s features, such as valuation, square footage, and price per square foot, than ZIP codes or Census tracts. We also discuss the ability of the neighborhoods to grow and shrink over the years, due to shifts in each housing submarket.

Download Full-text

An Adaptive Ellipse Distance Density Peak Fuzzy Clustering Algorithm Based on the Multi-target Traffic Radar

Sensors ◽

10.3390/s20174920 ◽

2020 ◽

Vol 20 (17) ◽

pp. 4920

Author(s):

Lin Cao ◽

Xinyi Zhang ◽

Tao Wang ◽

Kangning Du ◽

Chong Fu

Keyword(s):

Euclidean Distance ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Search Algorithm ◽

Measurement Data ◽

Data Sets ◽

Density Peak ◽

Close Range ◽

Decision Graph ◽

Membership Matrix

In the multi-target traffic radar scene, the clustering accuracy between vehicles with close driving distance is relatively low. In response to this problem, this paper proposes a new clustering algorithm, namely an adaptive ellipse distance density peak fuzzy (AEDDPF) clustering algorithm. Firstly, the Euclidean distance is replaced by adaptive ellipse distance, which can more accurately describe the structure of data obtained by radar measurement vehicles. Secondly, the adaptive exponential function curve is introduced in the decision graph of the fast density peak search algorithm to accurately select the density peak point, and the initialization of the AEDDPF algorithm is completed. Finally, the membership matrix and the clustering center are calculated through successive iterations to obtain the clustering result.The time complexity of the AEDDPF algorithm is analyzed. Compared with the density-based spatial clustering of applications with noise (DBSCAN), k-means, fuzzy c-means (FCM), Gustafson-Kessel (GK), and adaptive Euclidean distance density peak fuzzy (Euclid-ADDPF) algorithms, the AEDDPF algorithm has higher clustering accuracy for real measurement data sets in certain scenarios. The experimental results also prove that the proposed algorithm has a better clustering effect in some close-range vehicle scene applications. The generalization ability of the proposed AEDDPF algorithm applied to other types of data is also analyzed.

Download Full-text

Multi-Radius Density Clustering Algorithm Based on Outlier Factor

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.472.427 ◽

2014 ◽

Vol 472 ◽

pp. 427-431

Author(s):

Zong Lin Ye ◽

Hui Cao ◽

Li Xin Jia ◽

Yan Bin Zhang ◽

Gang Quan Si

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Similar Process ◽

The Core ◽

Dbscan Algorithm ◽

Proposed Model ◽

Density Clustering ◽

Relationship Of ◽

Core Points ◽

The Relationship

This paper proposes a novel multi-radius density clustering algorithm based on outlier factor. The algorithm first calculates the density-similar-neighbor-based outlier factor (DSNOF) for each point in the dataset according to the relationship of the density of the point and its neighbors, and then treats the point whose DSNOF is smaller than 1 as a core point. Second, the core points are used for clustering by the similar process of the density based spatial clustering application with noise (DBSCAN) to get some sub-clusters. Third, the proposed algorithm merges the obtained sub-clusters into some clusters. Finally, the points whose DSNOF are larger than 1 are assigned into these clusters. Experiments are performed on some real datasets of the UCI Machine Learning Repository and the experiments results verify that the effectiveness of the proposed model is higher than the DBSCAN algorithm and k-means algorithm and would not be affected by the parameter greatly.

Download Full-text

UDSCA: Uniform Distribution Based Spatial Clustering Algorithm

Advances in Computing and Communications - Communications in Computer and Information Science ◽

10.1007/978-3-642-22709-7_63 ◽

2011 ◽

pp. 649-660 ◽

Cited By ~ 1

Author(s):

Animesh Tripathy ◽

Sumit Kumar Maji ◽

Prashanta Kumar Patra

Keyword(s):

Uniform Distribution ◽

Clustering Algorithm ◽

Spatial Clustering

Download Full-text

Automatic spike sorting for high-density microelectrode arrays

Journal of Neurophysiology ◽

10.1152/jn.00803.2017 ◽

2018 ◽

Vol 120 (6) ◽

pp. 3155-3171 ◽

Cited By ~ 13

Author(s):

Roland Diggelmann ◽

Michele Fiscella ◽

Andreas Hierlemann ◽

Felix Franke

Keyword(s):

Principal Component Analysis ◽

Clustering Algorithm ◽

State Of The Art ◽

Microelectrode Arrays ◽

Principal Component ◽

Component Analysis ◽

High Density ◽

Sorting Algorithm ◽

Spike Sorting ◽

Recording Electrodes

High-density microelectrode arrays can be used to record extracellular action potentials from hundreds to thousands of neurons simultaneously. Efficient spike sorters must be developed to cope with such large data volumes. Most existing spike sorting methods for single electrodes or small multielectrodes, however, suffer from the “curse of dimensionality” and cannot be directly applied to recordings with hundreds of electrodes. This holds particularly true for the standard reference spike sorting algorithm, principal component analysis-based feature extraction, followed by k-means or expectation maximization clustering, against which most spike sorters are evaluated. We present a spike sorting algorithm that circumvents the dimensionality problem by sorting local groups of electrodes independently with classical spike sorting approaches. It is scalable to any number of recording electrodes and well suited for parallel computing. The combination of data prewhitening before the principal component analysis-based extraction and a parameter-free clustering algorithm obviated the need for parameter adjustments. We evaluated its performance using surrogate data in which we systematically varied spike amplitudes and spike rates and that were generated by inserting template spikes into the voltage traces of real recordings. In a direct comparison, our algorithm could compete with existing state-of-the-art spike sorters in terms of sensitivity and precision, while parameter adjustment or manual cluster curation was not required. NEW & NOTEWORTHY We present an automatic spike sorting algorithm that combines three strategies to scale classical spike sorting techniques for high-density microelectrode arrays: 1) splitting the recording electrodes into small groups and sorting them independently; 2) clustering a subset of spikes and classifying the rest to limit computation time; and 3) prewhitening the spike waveforms to enable the use of parameter-free clustering. Finally, we combined these strategies into an automatic spike sorter that is competitive with state-of-the-art spike sorters.

Download Full-text

Extraction of Revolving Channels of Drifters around Mesoscale Eddy Centers Based on Spatiotemporal Trajectory Clustering

Journal of Atmospheric and Oceanic Technology ◽

10.1175/jtech-d-19-0007.1 ◽

2019 ◽

Vol 36 (9) ◽

pp. 1903-1916

Author(s):

Chunyong Ma ◽

Siqing Li ◽

Yang Yang ◽

Jie Yang ◽

Ge Chen

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

Mesoscale Eddy ◽

Mesoscale Eddies ◽

Trajectory Clustering ◽

Satellite Altimeter ◽

Principal Mode ◽

In Situ Data ◽

Anticyclonic Eddies

The global oceanic transports of energy, plankton, and other tracers by mesoscale eddies can be estimated by combining satellite altimetry and in situ data. However, the revolving channels of particles entrained by mesoscale eddies, which could help explain the dynamic process of eddies entraining materials, are still unknown. In this study, satellite altimeter and drifter data from 1993 to 2016 are adopted, and the normalized trajectory clustering algorithm (N-TRACLUS) is proposed to extract the revolving channels of drifters. First, the trajectories of drifters are normalized and clustered by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. Next, the revolving channels of drifters around the eddy center are extracted. The ring or arc pattern in the middle of a normalized eddy appears when drifters are uninterruptedly entrained by eddies for more than 30 days. Moreover, the revolving channels of drifters in cyclonic eddies are relatively closer to the eddy center than those in anticyclonic eddies. These revolving channels suggest the principal mode of materials’ continuous motion processes that are inside eddies.

Download Full-text