A new validity clustering index-based on finding new centroid positions using the mean of clustered data to determine the optimum number of clusters

2021 ◽  
pp. 116329
Author(s):  
Ahmed Khaldoon Abdalameer ◽  
Mohammed Alswaitti ◽  
Ahmed Adnan Alsudani ◽  
Nor Ashidi Mat Isa
2011 ◽  
Vol 467-469 ◽  
pp. 894-899
Author(s):  
Hong Men ◽  
Hai Yan Liu ◽  
Lei Wang ◽  
Yun Peng Pan

This paper presents an optimizing method of competitive neural network(CNN):During clustering analysis fixed on the optimum number of output neurons according to the change of DB value,and then adjusted connected weight including increasing ,dividing , delete. Each neuron had the different variety trend of learning rate according with the change of the probability of neurons. The optimizing method made classification more accurate. Simulation results showed that optimized network structure had a strong ability to adjust the number of clusters dynamically and good results of classification.


2018 ◽  
Vol 1 ◽  
pp. 1-5
Author(s):  
Fabian Bock ◽  
Karen Xia ◽  
Monika Sester

The search for a parking space is a severe and stressful problem for drivers in many cities. The provision of maps with parking space occupancy information assists drivers in avoiding the most crowded roads at certain times. Since parking occupancy reveals a repetitive pattern per day and per week, typical parking occupancy patterns can be extracted from historical data.<br> In this paper, we analyze city-wide parking meter data from Hannover, Germany, for a full year. We describe an approach of clustering these parking meters to reduce the complexity of this parking occupancy information and to reveal areas with similar parking behavior. The parking occupancy at every parking meter is derived from a timestamp of ticket payment and the validity period of the parking tickets. The similarity of the parking meters is computed as the mean-squared deviation of the average daily patterns in parking occupancy at the parking meters. Based on this similarity measure, a hierarchical clustering is applied. The number of clusters is determined with the Davies-Bouldin Index and the Silhouette Index.<br> Results show that, after extensive data cleansing, the clustering leads to three clusters representing typical parking occupancy day patterns. Those clusters differ mainly in the hour of the maximum occupancy. In addition, the lo-cations of parking meter clusters, computed only based on temporal similarity, also show clear spatial distinctions from other clusters.


1983 ◽  
Vol 104 ◽  
pp. 185-186
Author(s):  
M. Kalinkov ◽  
K. Stavrev ◽  
I. Kuneva

An attempt is made to establish the membership of Abell clusters in superclusters of galaxies. The relation is used to calibrate the distances to the clusters of galaxies with two redshift estimates. One is m10, the magnitude of the ten-ranked galaxy, and the other is the “mean population,” P, defined by: where p = 40, 65, 105 … galaxies for richness groups 0, 1, 2 …, and r is the apparent radius in degrees given by: The first iteration for redshift, z1, is obtained from m10 alone: The standard deviation for Eq. (1) is 0.105, the number of clusters with known velocities is 342 and the correlation coefficient between observed and fitted values is 0.921. With zi from Eq. (1), we define Cartesian galactic coordinates Xi = Rih−1 cosBi cosLi, Yi = Rih−1 cosBi sinLi, Zi = Rih−1 sinBi for each Abell cluster, i = 1, …, 2712, where Ri is the distance to the cluster (Mpc), and Ho = 100 h km s−1 Mpc−1.


2018 ◽  
Vol 7 (3) ◽  
pp. 541-559 ◽  
Author(s):  
Justin Esarey ◽  
Andrew Menger

Cluster-robust standard errors (as implemented by the eponymous cluster option in Stata) can produce misleading inferences when the number of clusters G is small, even if the model is consistent and there are many observations in each cluster. Nevertheless, political scientists commonly employ this method in data sets with few clusters. The contributions of this paper are: (a) developing new and easy-to-use Stata and R packages that implement alternative uncertainty measures robust to small G, and (b) explaining and providing evidence for the advantages of these alternatives, especially cluster-adjusted t-statistics based on Ibragimov and Müller. To illustrate these advantages, we reanalyze recent work where results are based on cluster-robust standard errors.


2019 ◽  
Vol 56 (8) ◽  
pp. 814-828 ◽  
Author(s):  
John T. Andrews

The goal of the paper is to ascertain whether there are significant regional variations in sediment mineral composition that might be used to elucidate ice sheet histories. The weight percentages of nonclay and clay minerals were determined by quantitative X-ray diffraction. Cluster analysis, an unsupervised learning approach, is used to group sediment mineralogy of 263 seafloor/core top samples between ∼80°N and 62°N. The optimum number of clusters, based on 30 indexes, was three for the weight percentage data but varied with data transformations. Maps of the distribution of the three mineral clusters or facies indicate a significant difference in weight percentages between samples from the West Greenland and Baffin Island shelves. However, several indexes support a larger number of clusters and similar analyses of the spatial distribution and defining minerals of nine mineral facies indicated a strong association with the original three clusters and with broad geographic designations (i.e., West Greenland shelf, Baffin Island fiords, etc). Classification Decision Tree analysis indicates that this difference is primarily controlled by the percentages of plagioclase feldspars versus alkali feldspars.


2020 ◽  
Vol 6 (01) ◽  
pp. 1-8
Author(s):  
Muhammad Muhajir ◽  
Annisa Ayunda Permata Sari

The Indonesian film industry continues to experience an increase seen from the number of films that appear in theaters today with a box office increase of 28 percent each year in the past four years. Internet Movie Database (IMDb) is a website that provides information about films around the world, including the people involved in it from actors, directors, writers to makeup artists and soundtracks. In this case the researcher wants to conduct research on the characteristics of the film and the factors that make a film to be included in the IMDb Top 250. The data used in this study uses scraped data from the website. The method used is a non-hierarchical clustering method, namely kmeans and Dbscan. Where the Dbscan algorithm is used to determine the optimum number of clusters then proceed by grouping data based on centroids with k-means algorithm. From the analysis it was found that the factors that could influence a film included in the IMDB Top 250 were duration, number of votes, and films directed by Rajkumar Hirani and the optimal number of clusters using Dbscan algorithm obtained six clusters. With the improved k-means algorithm, the accuracy value for the cluster results is 87.2%.


Author(s):  
Arif Fajar Solikin ◽  
Kusrini Kusrini ◽  
Ferry Wahyu Wibowo

Intercomparison was conducted to determine the ability and the performance of the laboratory. Intercomparison results are usually expressed in the range of En ratio values (En ?|1|) which express the equivalence of one laboratory with other laboratories. If the laboratory is declared unequal, then it needs to identify the source of the problem by itself. To make it easier, it can be done by Clustering which is one of the data mining techniques. Clustering is done by applying a self organizing map algorithm on the KNIME (Konstanz Information Miner) analytic tools. Several experiments were carried out with different layer size and data normalization status from one experiment to another experiment. The results were analyzed through pseudo F statistical test and icdrate test. The largest pseudo F statistic value was obtained from the 8th experiment (setting the layer size 2x2 without data normalization) with a pseudo F statistic value of 167.53 for 1kg artifacts and a Pseudo F statistic value of 104.86 for 200 g artifacts where the optimum number of clusters are 4. The smallest icdrate value was obtained from the 5th experiment (setting the 2x3 layer size without data normalization) with an icdrate value of 0.0713 for 1kg artifacts and icdrate value of 0.2889 for 200g artifacts with the best number of clusters being 6. From 12 laboratories can be grouped into 6 groups where each group has the same identification. There are groups 1, 3 and 6 have 1 member, while groups 2, 4 and 5 have 3 members.


Sign in / Sign up

Export Citation Format

Share Document