The method of dendrograms disclosure for evaluation of cluster analysis results in IoT domain

Author(s):  
Roman Kaminskyy ◽  
Nataliya Shakhovska

Background: Increasing the amount of information generated as a result of smart city activity leads to the problem of its accumulation and preprocessing. One type of data preprocessing is clustering. The cluster analysis is an objective method of classification. It provides an appropriate choice of further processing methods as well as the visualization and interpretation of the collected data, which are multidimensional objects. The most valuable feature of cluster analysis is the representation of the result by an image of a dendrogram that reflects a particular hierarchy of relationships between the selected clusters and their objects. The aim of the paper is to develop method of 3D visualization of hierarchical clustering for streaming and multidimensional data collected from IoT devices and open databases. Methods: It is suggested that a more detailed interpretation of the dendrogram is made by implementing the hypothesis given above. Testing this hypothesis means a procedure of visualizing and interpreting the result of a cluster analysis. The disclosed dendrogram allows fully usage of association metrics. Since this metric is derived from the calculation of the values of the proximity matrix in accordance with the chosen object pooling strategy, the use of the disclosed dendrogram is quite legitimate. In addition, the procedure for opening the dendrogram is specific and unambiguous. This methods is built on hierarchical clustering algorithm as the simplest and fasters one. The developed algorithm should make it impossible to cross clusters on a plane. It is also necessary to look for the distance not only between objects, but also between clusters, represented as complex geometric figures. It will allow explaining the nature of the clusters Results: The result of the research and verification of the proposed hypothesis is the diclosure of the dendrogram algorithm as the extension of classical methods of cluster analysis. This extension is made by studying and disclosing the resulting image of the dendrogram. The dendrogram visualization thus obtained differs significantly from the classical results. The opening of the dendrogram according to the developed algorithm allows us 3D visualization of the analysis results, as well as calculating the area and perimeter of the obtained clusters. Therefore, using analytical geometry methods, it is quite easy to isolate and calculate the parameters of minimum cluster coverage surfaces and the immediate distances between any objects of one or different clusters, as well as between the objects of a given cluster. This, in turn, is a significant complement to cluster analysis. Conclusion: The disclosed dendrogram retains proportions in distances between objects. On the basis of these characteristics, it is possible to determine the close relationship between the clusters themselves by correlating the values of their quantitative averaged values of the traits. Thus, the opening of the dendrogram allows us to clearly identify the set of clusters, each of which has its own distribution of the range of features values. The quantitative characteristics of clusters on both dendrograms are quite simple. In addition, the mean values of the features of objects in a given cluster can be interpreted as generalized characteristics of this cluster, and the cluster itself can be represented as a single integral object.

1993 ◽  
Vol 12 (3) ◽  
pp. 98-109 ◽  
Author(s):  
P.L. Pelmear ◽  
R. Kusiak ◽  
B. Dembek

Three hundred and sixty-four patients exposed to hand-arm vibration at work were assessed in a clinical laboratory designated for this purpose in Toronto, Canada during the period 1989–92. The assessment included completion of a subjective history questionnaire; a medical examination of the upper torso, cardiovascular and central nervous systems; and multiple vascular, sensorineural and laboratory tests. The test results were used to assess the severity of Hand-arm Vibration Syndrome (HAVS) and grade the subjects according to the Stockholm stages. A statistical clustering algorithm was used to categorise the subjects according to the results of their diagnostic tests. One set of clusters was made from the vascular test results and another from the sensory test results. These clusters were compared with the Stockholm history (SH) and diagnostic (SD) stages. The clusters based upon the vascular test results (the vascular clusters) agreed well with the SD vascular stages (P= 1×10−9), and less well with the SD sensorineural stages (P=0.04). The clusters based upon the sensory test results (the sensory clusters) agreed well with the SD sensorineural stages (P=0.0003), and less well with the SD vascular stages (P=0.03). The mean values for the diagnostic tests within the clusters were compared. The sensory test results differed between the sensory clusters while most vascular test results did not. Likewise, the vascular test results differed between the vascular clusters while most sensory tests did not. A comparison of the vascular and sensory clusters showed that while some men suffered severe sensory effects and others suffered severe vascular effects, few suffered both. Hence this analysis confirms that the severity grading of sensory and vascular components of HAVS must be evaluated separately as now practised. This cluster analysis technique (SAS'S FASTCLUS procedure) has proved to be useful for the objective analysis of the results from many diagnostic tests on a large group of individuals. The reference data of the tests within the cluster groupings provides a basis for the objective classification of the severity of HAVS in individual patients.


Author(s):  
Mohana Priya K ◽  
Pooja Ragavi S ◽  
Krishna Priya G

Clustering is the process of grouping objects into subsets that have meaning in the context of a particular problem. It does not rely on predefined classes. It is referred to as an unsupervised learning method because no information is provided about the "right answer" for any of the objects. Many clustering algorithms have been proposed and are used based on different applications. Sentence clustering is one of best clustering technique. Hierarchical Clustering Algorithm is applied for multiple levels for accuracy. For tagging purpose POS tagger, porter stemmer is used. WordNet dictionary is utilized for determining the similarity by invoking the Jiang Conrath and Cosine similarity measure. Grouping is performed with respect to the highest similarity measure value with a mean threshold. This paper incorporates many parameters for finding similarity between words. In order to identify the disambiguated words, the sense identification is performed for the adjectives and comparison is performed. semcor and machine learning datasets are employed. On comparing with previous results for WSD, our work has improvised a lot which gives a percentage of 91.2%


2015 ◽  
pp. 125-138 ◽  
Author(s):  
I. V. Goncharenko

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classification was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.


Author(s):  
Ana Belén Ramos-Guajardo

AbstractA new clustering method for random intervals that are measured in the same units over the same group of individuals is provided. It takes into account the similarity degree between the expected values of the random intervals that can be analyzed by means of a two-sample similarity bootstrap test. Thus, the expectations of each pair of random intervals are compared through that test and a p-value matrix is finally obtained. The suggested clustering algorithm considers such a matrix where each p-value can be seen at the same time as a kind of similarity between the random intervals. The algorithm is iterative and includes an objective stopping criterion that leads to statistically similar clusters that are different from each other. Some simulations to show the empirical performance of the proposal are developed and the approach is applied to two real-life situations.


Mathematics ◽  
2021 ◽  
Vol 9 (4) ◽  
pp. 370
Author(s):  
Shuangsheng Wu ◽  
Jie Lin ◽  
Zhenyu Zhang ◽  
Yushu Yang

The fuzzy clustering algorithm has become a research hotspot in many fields because of its better clustering effect and data expression ability. However, little research focuses on the clustering of hesitant fuzzy linguistic term sets (HFLTSs). To fill in the research gaps, we extend the data type of clustering to hesitant fuzzy linguistic information. A kind of hesitant fuzzy linguistic agglomerative hierarchical clustering algorithm is proposed. Furthermore, we propose a hesitant fuzzy linguistic Boole matrix clustering algorithm and compare the two clustering algorithms. The proposed clustering algorithms are applied in the field of judicial execution, which provides decision support for the executive judge to determine the focus of the investigation and the control. A clustering example verifies the clustering algorithm’s effectiveness in the context of hesitant fuzzy linguistic decision information.


Author(s):  
N. P. Szabó ◽  
B. A. Braun ◽  
M. M. G. Abdelrahman ◽  
M. Dobróka

AbstractThe identification of lithology, fluid types, and total organic carbon content are of great priority in the exploration of unconventional hydrocarbons. As a new alternative, a further developed K-means type clustering method is suggested for the evaluation of shale gas formations. The traditional approach of cluster analysis is mainly based on the use of the Euclidean distance for grouping the objects of multivariate observations into different clusters. The high sensitivity of the L2 norm applied to non-Gaussian distributed measurement noises is well-known, which can be reduced by selecting a more suitable norm as distance metrics. To suppress the harmful effect of non-systematic errors and outlying data, the Most Frequent Value method as a robust statistical estimator is combined with the K-means clustering algorithm. The Cauchy-Steiner weights calculated by the Most Frequent Value procedure is applied to measure the weighted distance between the objects, which improves the performance of cluster analysis compared to the Euclidean norm. At the same time, the centroids are also calculated as a weighted average (using the Most Frequent Value method), instead of applying arithmetic mean. The suggested statistical method is tested using synthetic datasets as well as observed wireline logs, mud-logging data and core samples collected from the Barnett Shale Formation, USA. The synthetic experiment using extremely noisy well logs demonstrates that the newly developed robust clustering procedure is able to separate the geological-lithological units in hydrocarbon formations and provide additional information to standard well log analysis. It is also shown that the Cauchy-Steiner weighted cluster analysis is affected less by outliers, which allows a more efficient processing of poor-quality wireline logs and an improved evaluation of shale gas reservoirs.


2019 ◽  
Vol 13 (4) ◽  
pp. 745-752 ◽  
Author(s):  
Habibolah Khazaie ◽  
Ali Zakiei ◽  
Saeid Komasi

ABSTRACTObjectiveThe current study compares the measures of sleep quality and intensity of insomnia based on the clustering analysis of variables including dysfunctional beliefs and attitudes about sleep, experiential avoidance, personality traits of neuroticism, and complications with emotion regulation among the individuals struck by an earthquake in Kermanshah Province.MethodsThis study is a cross-sectional study that was carried out among earthquake victims of Kermanshah Province (western Iran) in 2017. Data were gathered starting 10 days after the earthquake and lasted for 2 weeks; of 1,200 standard questionnaires distributed, 1,001 responses were received, and the analysis was performed using 999 participants. The data analysis was carried out using a cluster analysis (K-mean method).ResultsTwo clusters were identified, and there is a significant difference between these two clusters in regard to all of the variables. The cluster with higher mean values for the selected variables shows a higher intensity of insomnia and a lower sleep quality.ConclusionsConsidering the current results, it can be concluded that variables of dysfunctional attitudes and beliefs about sleep, experiential avoidance, the personality traits of neuroticism, and complications with emotion regulation are able to identify the clusters where there is a significant difference in regard to sleep quality and the intensity of insomnia. (Disaster Med Public Health Preparedness. 2019;13:745–752)


Sign in / Sign up

Export Citation Format

Share Document