CLUSTER ANALYSIS AND RADIATION MONITORING OF ENVIRONMENT

Author(s):  
O. Getmanets ◽  
A. Nekos ◽  
M. Pelikhatyi

Building a background radiation field on the ground on the basis of measurement data taken at a finite number of points is one of the most important tasks of radiation monitoring. The aim of the work: to study the possibility of applying cluster analysis for the tasks of radiation monitoring of the environment. Cluster analysis is a multidimensional statistical analysis. Its main purpose is to split the set of objects under study (observation points) into homogeneous groups or clusters, that is, the task of classifying data and identifying the corresponding structure in them is solved. Methods of research: the measurements of the power of the ambient dose of continuous X-ray and gamma radiation on the terrain by using the MKS-05 dosimeter "TERRA-0"; processing of the obtained data by cluster analysis methods using the computer program "Statistics-10", wherein each cluster point is characterized by three coordinates: two coordinates on the ground and the power of the ambient dose of radiation at a given point; Euclidean distance was chosen as the distance between two points. Results: after processing data using various clustering methods: the method of Complete Linkage, the method of Weighted pair-group average and the Ward's method, it was found that the results of the analysis practically coincide with each other, that proves the reliability of the application of cluster analysis for the tasks of radiation monitoring of the environment and mapping of radiation pollution. Conclusions: the concept of a "radiation cluster" was first formulated in this work, combining coordinates on a plane with an ambient dose rate;the possibility of using cluster analysis to construct a map of radiation pollution of the environment has been proved by sequential projectionfrom more connected to less connected radiation clusters onto the plane of the controlled zone. In this sense, cluster analysis is similar to the operator approach to the construction of the radiation field. For further research, it is of some interest to study the issues of integration of cluster analysis with geographic information systems.

10.12737/7483 ◽  
2014 ◽  
Vol 8 (7) ◽  
pp. 0-0
Author(s):  
Олег Сдвижков ◽  
Oleg Sdvizhkov

Cluster analysis [3] is a relatively new branch of mathematics that studies the methods partitioning a set of objects, given a finite set of attributes into homogeneous groups (clusters). Cluster analysis is widely used in psychology, sociology, economics (market segmentation), and many other areas in which there is a problem of classification of objects according to their characteristics. Clustering methods implemented in a package STATISTICA [1] and SPSS [2], they return the partitioning into clusters, clustering and dispersion statistics dendrogram of hierarchical clustering algorithms. MS Excel Macros for main clustering methods and application examples are given in the monograph [5]. One of the central problems of cluster analysis is to define some criteria for the number of clusters, we denote this number by K, into which separated are a given set of objects. There are several dozen approaches [4] to determine the number K. In particular, according to [6], the number of clusters K - minimum number which satisfies where - the minimum value of total dispersion for partitioning into K clusters, N - number of objects. Among the clusters automatically causes the consistent application of abnormal clusters [4]. In 2010, proposed and experimentally validated was a method for obtaining the number of K by applying the density function [4]. The article offers two simple approaches to determining K, where each cluster has at least two objects. In the first number K is determined by the shortest Hamiltonian cycles in the second - through the minimum spanning tree. The examples of clustering with detailed step by step solutions and graphic illustrations are suggested. Shown is the use of macro VBA Excel, which returns the minimum spanning tree to the problems of clustering. The article contains a macro code, with commentaries to the main unit.


2016 ◽  
Author(s):  
Matthew J Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.


PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1720 ◽  
Author(s):  
Matthew J. Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean basedk-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and thek-means and NERC methods should be used in their place.


2007 ◽  
Vol 132 (3) ◽  
pp. 387-395 ◽  
Author(s):  
Guillermo Padilla ◽  
María Elena Cartea ◽  
Amando Ordás

Four clustering methods were compared for classification of a collection of 148 kale landraces (Brassica oleracea L. acephala group) from northwestern Spain based on morphologic characters: the unweighted pair group method using arithmetic averages (UPGMA) and the Ward method, hierarchical cluster algorithms, and the modified location model (MLM) applied to both the UPGMA and the Ward method (UPGMA-MLM and Ward-MLM, respectively). Comparisons were based on five criteria and on subjective considerations about the structure of each method and the characteristics of the material evaluated. Although the UPGMA-MLM was superior according to the objective criteria, its slight advantage with respect to the Ward-MLM strategy did not overcome the fact that the initial UPGMA cluster generated a classification with little value. The Ward-MLM strategy generated five homogeneous groups with defined morphologic characteristics. Moreover, the Ward-MLM strategy allowed the identification of redundant landraces, which would permit the number of accessions in further critical trials to be reduced.


2016 ◽  
Author(s):  
Matthew J Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.


2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Pooja Sengupta ◽  
Bhaswati Ganguli ◽  
Sugata SenRoy ◽  
Aditya Chatterjee

Abstract Background In this study we cluster the districts of India in terms of the spread of COVID-19 and related variables such as population density and the number of specialty hospitals. Simulation using a compartment model is used to provide insight into differences in response to public health interventions. Two case studies of interest from Nizamuddin and Dharavi provide contrasting pictures of the success in curbing spread. Methods A cluster analysis of the worst affected districts in India provides insight about the similarities between them. The effects of public health interventions in flattening the curve in their respective states is studied using the individual contact SEIQHRF model, a stochastic individual compartment model which simulates disease prevalence in the susceptible, infected, recovered and fatal compartments. Results The clustering of hotspot districts provide homogeneous groups that can be discriminated in terms of number of cases and related covariates. The cluster analysis reveal that the distribution of number of COVID-19 hospitals in the districts does not correlate with the distribution of confirmed COVID-19 cases. From the SEIQHRF model for Nizamuddin we observe in the second phase the number of infected individuals had seen a multitudinous increase in the states where Nizamuddin attendees returned, increasing the risk of the disease spread. However, the simulations reveal that implementing administrative interventions, flatten the curve. In Dharavi, through tracing, tracking, testing and treating, massive breakout of COVID-19 was brought under control. Conclusions The cluster analysis performed on the districts reveal homogeneous groups of districts that can be ranked based on the burden placed on the healthcare system in terms of number of confirmed cases, population density and number of hospitals dedicated to COVID-19 treatment. The study rounds up with two important case studies on Nizamuddin basti and Dharavi to illustrate the growth curve of COVID-19 in two very densely populated regions in India. In the case of Nizamuddin, the study showed that there was a manifold increase in the risk of infection. In contrast it is seen that there was a rapid decline in the number of cases in Dharavi within a span of about one month.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Anna Magdalena Korzeniowska

AbstractSocial expenditure plays an important role in European Union (EU) countries. It improves the lives of citizens whose welfare is endangered due to poverty or illness. However, social expenditure represents a considerable share of the budgets of EU member states. Despite evident similarities in their levels of development, EU countries show apparent differences in social expenditure levels. Therefore, this work aims to determine the similarities and differences between EU countries in this regard. The analysis uses clustering methods, such as hierarchical cluster analysis and the k-means, to divide countries into homogeneous groups. The research demonstrates significant differences between EU countries in the years 2008–2018, which resulted in a low number of objects (countries) in the identified groups. In the case of 6 out of 28 countries, it was not possible to assign them to any group. The research proves that EU countries should take more care when organising their social policy, taking into consideration cultural and social factors.


Genetics ◽  
2001 ◽  
Vol 159 (2) ◽  
pp. 699-713
Author(s):  
Noah A Rosenberg ◽  
Terry Burke ◽  
Kari Elo ◽  
Marcus W Feldman ◽  
Paul J Freidlin ◽  
...  

Abstract We tested the utility of genetic cluster analysis in ascertaining population structure of a large data set for which population structure was previously known. Each of 600 individuals representing 20 distinct chicken breeds was genotyped for 27 microsatellite loci, and individual multilocus genotypes were used to infer genetic clusters. Individuals from each breed were inferred to belong mostly to the same cluster. The clustering success rate, measuring the fraction of individuals that were properly inferred to belong to their correct breeds, was consistently ~98%. When markers of highest expected heterozygosity were used, genotypes that included at least 8–10 highly variable markers from among the 27 markers genotyped also achieved >95% clustering success. When 12–15 highly variable markers and only 15–20 of the 30 individuals per breed were used, clustering success was at least 90%. We suggest that in species for which population structure is of interest, databases of multilocus genotypes at highly variable markers should be compiled. These genotypes could then be used as training samples for genetic cluster analysis and to facilitate assignments of individuals of unknown origin to populations. The clustering algorithm has potential applications in defining the within-species genetic units that are useful in problems of conservation.


2003 ◽  
Vol 01 (03) ◽  
pp. 447-458 ◽  
Author(s):  
Xiwei Wu ◽  
T. Gregory Dewey

Cluster analysis has proven to be a valuable statistical method for analyzing whole genome expression data. Although clustering methods have great utility, they do represent a lower level statistical analysis that is not directly tied to a specific model. To extend such methods and to allow for more sophisticated lines of inference, we use cluster analysis in conjunction with a specific model of gene expression dynamics. This model provides phenomenological dynamic parameters on both linear and non-linear responses of the system. This analysis determines the parameters of two different transition matrices (linear and nonlinear) that describe the influence of one gene expression level on another. Using yeast cell cycle microarray data as test set, we calculated the transition matrices and used these dynamic parameters as a metric for cluster analysis. Hierarchical cluster analysis of this transition matrix reveals how a set of genes influence the expression of other genes activated during different cell cycle phases. Most strikingly, genes in different stages of cell cycle preferentially activate or inactivate genes in other stages of cell cycle, and this relationship can be readily visualized in a two-way clustering image. The observation is prior to any knowledge of the chronological characteristics of the cell cycle process. This method shows the utility of using model parameters as a metric in cluster analysis.


2017 ◽  
Vol 34 (1) ◽  
pp. 123-133 ◽  
Author(s):  
Zeguang Yi ◽  
Nan Pan ◽  
Yi Liu ◽  
Yu Guo

Purpose This paper aims to reduce and eliminate the abnormal peaks which, because of the reflection in the process of laser detection, make it easier to proceed with further analysis. Design/methodology/approach To solve the above problem, an abnormal data correction algorithm based on histogram, K-Means clustering and improved robust locally weighted scatter plot smoothing (LOWESS) is put forward. The proposed algorithm does section leveling for shear plant first and then applies histogram to define the abnormal fluctuation data between the neighboring points and utilizes a K-Means clustering to eliminate the abnormal data. After that, the improved robust LOWESS method, which is based on Euclidean distance, is used to remove the noise interference and finally obtain the waveform characteristics for next data processing. Findings The experiment result of liner tool mark laser test data correction demonstrates the accuracy and reliability of the proposed algorithm. Originality/value The study enables the following points: the detection signal automatic leveling; abnormal data identification and demarcation using K-Means clustering and histogram; and data smoothing using LOWESS.


Sign in / Sign up

Export Citation Format

Share Document