Spatial clustering of heavy clustering in ERA-5 precipitation over Europe}

<p>Extreme precipitation often cause floods and lead to important societal and economical damages. Rainfall is subject to local orography features and their intensities can be highly variable. In this context, identifying climatically coherent regions for extremes is paramount to understand and analyze rainfall at the correct spatial scale. We assume that the region of interest can be partitioned into homogeneous regions. In other words, sub-regions with common marginal distribution except a scale factor. As an example, considering extremes as block maxima or excesses over a threshold, a sub-region corresponds to a constant shape parameter. We develop a non-parametric clustering algorithm based on a ratio of Probability Weighted Moments to identify these homogeneous regions and gather weather stations. By construction this ratio does not depend on the location and scale parameters for the Generalized Extreme Value and Generalized Pareto distributions. Our method has the advantage to only rely on raw precipitation data and not on station covariates.</p><p>A simulation data study is performed based on the extended GPD distribution that appears to well capture low, moderate and heavy rainfall intensities. Sensitivity to the number of clusters is analyzed. Results of simulation reveal that the method detects homogeneous regions. We apply our clustering algorithm on ERA-5 precipitation over Europe. We obtain coherent homogeneous regions consistent with local orography. The marginal precipitation behaviour is analyzed through regional fitting of an extended GPD.</p>

Download Full-text

An Enhanced Spectral Clustering Algorithm with S-Distance

Symmetry ◽

10.3390/sym13040596 ◽

2021 ◽

Vol 13 (4) ◽

pp. 596

Author(s):

Krishna Kumar Sharma ◽

Ayan Seal ◽

Enrique Herrera-Viedma ◽

Ondrej Krejcar

Keyword(s):

Spectral Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Clustering Algorithms ◽

Rank Test ◽

Customer Churn ◽

Signed Rank ◽

Signed Rank Test ◽

Spectral Clustering Algorithm ◽

Industrial Databases

Calculating and monitoring customer churn metrics is important for companies to retain customers and earn more profit in business. In this study, a churn prediction framework is developed by modified spectral clustering (SC). However, the similarity measure plays an imperative role in clustering for predicting churn with better accuracy by analyzing industrial data. The linear Euclidean distance in the traditional SC is replaced by the non-linear S-distance (Sd). The Sd is deduced from the concept of S-divergence (SD). Several characteristics of Sd are discussed in this work. Assays are conducted to endorse the proposed clustering algorithm on four synthetics, eight UCI, two industrial databases and one telecommunications database related to customer churn. Three existing clustering algorithms—k-means, density-based spatial clustering of applications with noise and conventional SC—are also implemented on the above-mentioned 15 databases. The empirical outcomes show that the proposed clustering algorithm beats three existing clustering algorithms in terms of its Jaccard index, f-score, recall, precision and accuracy. Finally, we also test the significance of the clustering results by the Wilcoxon’s signed-rank test, Wilcoxon’s rank-sum test, and sign tests. The relative study shows that the outcomes of the proposed algorithm are interesting, especially in the case of clusters of arbitrary shape.

Download Full-text

Segmentation of corpus callosum based on tensor fuzzy clustering algorithm

Journal of X-Ray Science and Technology ◽

10.3233/xst-210928 ◽

2021 ◽

pp. 1-14

Author(s):

Yujia Qu ◽

Yuanjun Wang

Keyword(s):

Corpus Callosum ◽

Fuzzy Clustering ◽

Clustering Algorithm ◽

Diffusion Tensor ◽

Region Of Interest ◽

Midsagittal Plane ◽

Fuzzy Clustering Algorithm ◽

Human Connectome Project ◽

Clustering Center ◽

Fixed Region

BACKGROUND: The corpus callosum in the midsagittal plane plays a crucial role in the early diagnosis of diseases. When the anisotropy of the diffusion tensor in the midsagittal plane is calculated, the anisotropy of corpus callosum is close to that of the fornix, which leads to blurred boundary of the segmentation region. OBJECTIVE: To apply a fuzzy clustering algorithm combined with new spatial information to achieve accurate segmentation of the corpus callosum in the midsagittal plane in diffusion tensor images. METHODS: In this algorithm, a fixed region of interest is selected from the midsagittal plane, and the anisotropic filtering algorithm based on tensor is implemented by replacing the gradient direction of the structural tensor with an eigenvector, thus filtering the diffusion tensor of region of interest. Then, the iterative clustering center based on K-means clustering is used as the initial clustering center of tensor fuzzy clustering algorithm. Taking filtered diffusion tensor as input data and different metrics as similarity measures, the neighborhood diffusion tensor pixel calculation method of Log Euclidean framework is introduced in the membership function calculation, and tensor fuzzy clustering algorithm is proposed. In this study, MGH35 data from the Human Connectome Project (HCP) are tested and the variance, accuracy and specificity of the experimental results are discussed. RESULTS: Segmentation results of three groups of subjects in MGH35 data are reported. The average segmentation accuracy is 97.34%, and the average specificity is 98.43%. CONCLUSIONS: When segmenting the corpus callosum of diffusion tensor imaging, our method cannot only effective denoise images, but also achieve high accuracy and specificity.

Download Full-text

Inference on P(Y

Calcutta Statistical Association Bulletin ◽

10.1177/0008068320974472 ◽

2020 ◽

Vol 72 (2) ◽

pp. 89-110

Author(s):

Manoj Chacko ◽

Shiny Mathew

Keyword(s):

Maximum Likelihood ◽

Pareto Distribution ◽

Generalized Pareto Distribution ◽

Maximum Likelihood Estimators ◽

Record Values ◽

Asymptotic Distributions ◽

Bayes Estimators ◽

Generalized Pareto ◽

Pareto Distributions ◽

Generalized Pareto Distributions

In this article, the estimation of [Formula: see text] is considered when [Formula: see text] and [Formula: see text] are two independent generalized Pareto distributions. The maximum likelihood estimators and Bayes estimators of [Formula: see text] are obtained based on record values. The Asymptotic distributions are also obtained together with the corresponding confidence interval of [Formula: see text]. AMS 2000 subject classification: 90B25

Download Full-text

Coupon collector's problem and generalized Pareto distributions

Journal of Statistical Planning and Inference ◽

10.1016/j.jspi.2011.01.020 ◽

2011 ◽

Vol 141 (7) ◽

pp. 2348-2352 ◽

Cited By ~ 1

Author(s):

Jelena Jocković ◽

Pavle Mladenović

Keyword(s):

Coupon Collector’S Problem ◽

Generalized Pareto ◽

Pareto Distributions ◽

Generalized Pareto Distributions ◽

Coupon Collector's Problem

Download Full-text

Tree-ART2 Learning Model for Spatial Clustering in Second Dimension

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.543-547.1934 ◽

2014 ◽

Vol 543-547 ◽

pp. 1934-1938

Author(s):

Ming Xiao

Keyword(s):

Network Model ◽

Spatial Data ◽

Data Clustering ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Adaptive Resonance Theory ◽

Spatial Distance ◽

Resonance Theory ◽

Adaptive Resonance ◽

Vector Module

For a clustering algorithm in two-dimension spatial data, the Adaptive Resonance Theory exists not only the shortcomings of pattern drift and vector module of information missing, but also difficultly adapts to spatial data clustering which is irregular distribution. A Tree-ART2 network model was proposed based on the above situation. It retains the memory of old model which maintains the constraint of spatial distance by learning and adjusting LTM pattern and amplitude information of vector. Meanwhile, introducing tree structure to the model can reduce the subjective requirement of vigilance parameter and decrease the occurrence of pattern mixing. It is showed that TART2 network has higher plasticity and adaptability through compared experiments.

Download Full-text

Density-based adaptive spatial clustering algorithm for identifying local high-density areas in georeferenced documents

2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC) ◽

10.1109/smc.2014.6973959 ◽

2014 ◽

Cited By ~ 7

Author(s):

Tatsuhiro Sakai ◽

Keiichi Tamura ◽

Hajime Kitakami

Keyword(s):

Clustering Algorithm ◽

Spatial Clustering ◽

High Density

Download Full-text

An environmental dependence of the physical and structural properties in the Hydra cluster galaxies

Monthly Notices of the Royal Astronomical Society ◽

10.1093/mnras/staa3326 ◽

2020 ◽

Vol 500 (1) ◽

pp. 1323-1339

Author(s):

Ciria Lima-Dias ◽

Antonela Monachesi ◽

Sergio Torres-Flores ◽

Arianna Cortesi ◽

Daniel Hernández-Lang ◽

...

Keyword(s):

Structural Properties ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Broad Band ◽

Visible Region ◽

Early Type ◽

Cluster Galaxies ◽

Using Data ◽

Stellar Masses ◽

Environmental Dependence

ABSTRACT The nearby Hydra cluster (∼50 Mpc) is an ideal laboratory to understand, in detail, the influence of the environment on the morphology and quenching of galaxies in dense environments. We study the Hydra cluster galaxies in the inner regions (1R200) of the cluster using data from the Southern Photometric Local Universe Survey, which uses 12 narrow and broad-band filters in the visible region of the spectrum. We analyse structural (Sérsic index, effective radius) and physical (colours, stellar masses, and star formation rates) properties. Based on this analysis, we find that ∼88 per cent of the Hydra cluster galaxies are quenched. Using the Dressler–Schectman test approach, we also find that the cluster shows possible substructures. Our analysis of the phase-space diagram together with density-based spatial clustering algorithm indicates that Hydra shows an additional substructure that appears to be in front of the cluster centre, which is still falling into it. Our results, thus, suggest that the Hydra cluster might not be relaxed. We analyse the median Sérsic index as a function of wavelength and find that for red [(u − r) ≥2.3] and early-type galaxies it displays a slight increase towards redder filters (13 and 18 per cent, for red and early type, respectively), whereas for blue + green [(u − r)<2.3] galaxies it remains constant. Late-type galaxies show a small decrease of the median Sérsic index towards redder filters. Also, the Sérsic index of galaxies, and thus their structural properties, do not significantly vary as a function of clustercentric distance and density within the cluster; and this is the case regardless of the filter.

Download Full-text

Multi-zone prediction analysis of city-scale travel order demand

PLoS ONE ◽

10.1371/journal.pone.0248064 ◽

2021 ◽

Vol 16 (3) ◽

pp. e0248064

Author(s):

Pengshun Li ◽

Jiarui Chang ◽

Yi Zhang ◽

Yi Zhang

Keyword(s):

Prediction Model ◽

Clustering Algorithm ◽

Intelligent Transportation System ◽

Spatial Clustering ◽

Support Vector ◽

Nearest Neighbour ◽

Short Term ◽

Prediction Ability ◽

Demand Prediction ◽

Zone Division

Taxi order demand prediction is of tremendous importance for continuous upgrading of an intelligent transportation system to realise city-scale and personalised services. An accurate short-term taxi demand prediction model in both spatial and temporal relations can assist a city pre-allocate its resources and facilitate city-scale taxi operation management in a megacity. To address problems similar to the above, in this study, we proposed a multi-zone order demand prediction model to predict short-term taxi order demand in different zones at city-scale. A two-step methodology was developed, including order zone division and multi-zone order prediction. For the zone division step, the K-means++ spatial clustering algorithm was used, and its parameter k was estimated by the between–within proportion index. For the prediction step, six methods (backpropagation neural network, support vector regression, random forest, average fusion-based method, weighted fusion-based method, and k-nearest neighbour fusion-based method) were used for comparison. To demonstrate the performance, three multi-zone weighted accuracy indictors were proposed to evaluate the order prediction ability at city-scale. These models were implemented and validated on real-world taxi order demand data from a three-month consecutive collection in Shenzhen, China. Experiment on the city-scale taxi demand data demonstrated the superior prediction performance of the multi-zone order demand prediction model with the k-nearest neighbour fusion-based method based on the proposed accuracy indicator.

Download Full-text

A Machine Learning Approach to Delineating Neighborhoods from Geocoded Appraisal Data

ISPRS International Journal of Geo-Information ◽

10.3390/ijgi9070451 ◽

2020 ◽

Vol 9 (7) ◽

pp. 451

Author(s):

Rao Hamza Ali ◽

Josh Graves ◽

Stanley Wu ◽

Jenny Lee ◽

Erik Linstead

Keyword(s):

Real Estate ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Real Estate Market ◽

Spatial Filters ◽

Census Tracts ◽

The Real ◽

The Real Estate ◽

The Real Estate Market ◽

Machine Learning Approach

Identification of neighborhoods is an important, financially-driven topic in real estate. It is known that the real estate industry uses ZIP (postal) codes and Census tracts as a source of land demarcation to categorize properties with respect to their price. These demarcated boundaries are static and are inflexible to the shift in the real estate market and fail to represent its dynamics, such as in the case of an up-and-coming residential project. Delineated neighborhoods are also used in socioeconomic and demographic analyses where statistics are computed at a neighborhood level. Current practices of delineating neighborhoods have mostly ignored the information that can be extracted from property appraisals. This paper demonstrates the potential of using only the distance between subjects and their comparable properties, identified in an appraisal, to delineate neighborhoods that are composed of properties with similar prices and features. Using spatial filters, we first identify regions with the most appraisal activity, and through the application of a spatial clustering algorithm, generate neighborhoods composed of properties sharing similar characteristics. Through an application of bootstrapped linear regression, we find that delineating neighborhoods using geolocation of subjects and comparable properties explains more variation in a property’s features, such as valuation, square footage, and price per square foot, than ZIP codes or Census tracts. We also discuss the ability of the neighborhoods to grow and shrink over the years, due to shifts in each housing submarket.

Download Full-text

Reliability Inference for the Multicomponent System Based on Progressively Type II Censored Samples from Generalized Pareto Distributions

Mathematics ◽

10.3390/math8071176 ◽

2020 ◽

Vol 8 (7) ◽

pp. 1176

Author(s):

Lauren Sauer ◽

Yuhlong Lio ◽

Tzong-Ru Tsai

Keyword(s):

Maximum Likelihood ◽

Confidence Interval ◽

System Reliability ◽

Bootstrap Method ◽

Multicomponent System ◽

Type Ii ◽

Censored Samples ◽

Generalized Pareto ◽

Pareto Distributions ◽

Generalized Pareto Distributions

In this paper, the reliability of a k-component system, in which all components are subject to common stress, is considered. The multicomponent system will continue to survive if at least s out of k components’ strength exceed the common stress. The system reliability is investigated by utilizing the maximum likelihood estimator based on progressively type II censored samples from generalized Pareto distributions. The confidence interval of the system reliability can be obtained by using asymptotic normality with Fisher information matrix or bootstrap method approximation. An intensive simulation study is conducted to evaluate the performance of maximum likelihood estimators of the model parameters and system reliability for a variety of cases. For the confidence interval of the system reliability, simulation results indicate the bootstrap method approximation outperforms over the asymptotic normality approximation in terms of coverage probability.

Download Full-text