cophenetic correlation
Recently Published Documents


TOTAL DOCUMENTS

59
(FIVE YEARS 20)

H-INDEX

11
(FIVE YEARS 2)

Author(s):  
Jerry W. Sangma ◽  
Mekhla Sarkar ◽  
Vipin Pal ◽  
Amit Agrawal ◽  
Yogita

AbstractOver the decade, a number of attempts have been made towards data stream clustering, but most of the works fall under clustering by example approach. There are a number of applications where clustering by variable approach is required which involves clustering of multiple data streams as opposed to clustering data examples in a data stream. Furthermore, a few works have been presented for clustering multiple data streams and these are applicable to numeric data streams only. Hence, this research gap has motivated current research work. In the present work, a hierarchical clustering technique has been proposed to cluster multiple data streams where data are nominal. To address the concept changes in the data streams splitting and merging of the clusters in the hierarchical structure are performed. The decision to split or merge is based on the entropy measure, representing the cluster’s degree of disparity. The performance of the proposed technique has been analysed and compared to Agglomerative Nesting clustering technique on synthetic as well as a real-world dataset in terms of Dunn Index, Modified Hubert $$\varGamma $$ Γ statistic, Cophenetic Correlation Coefficient, and Purity. The proposed technique outperforms Agglomerative Nesting clustering technique for concept evolving data streams. Furthermore, the effect of concept evolution on clustering structure and average entropy has been visualised for detailed analysis and understanding.


Author(s):  
Miriam Andrejiova ◽  
Miriama Pinosova ◽  
Miroslav Badida

The main objective of this article is to monitor the development of the number of occupational diseases related to selected physical factors in the working environment (noise, vibration and dust). Each region of Slovakia has its own specific social and economic conditions. Due to the existence of a strong correlation between the several regional variables observed, principal component analysis (PCA) was used to determine the new variables. Cluster analysis was used to group regions with similar characteristics. A dendrogram was created using the average linkage method, which illustrated the similarity of the regions studied. The value of the cophenetic correlation coefficient (CC = 0.90) confirms the validity of the average linkage method. The result of the cluster analysis is the grouping of the eight regions into five homogenic groups (clusters). An analysis of the data shows that Slovakia’s regional differences significantly influence the incidence of occupational diseases in individual regions. It is shown that, in Slovakia, the development of the number of occupational diseases has seen a favourable trend in the long term.


Author(s):  
S. Bincader ◽  
R. Pongpisutta ◽  
C. Rattanakreetakul

Background: Anthracnose disease caused by the genus Colletotrichum is one of the crucial problems occurring in the field, along with postharvest diseases and affects mango quality in Thailand. In particular, the Nam Dork Mai See Tong cultivar, which is highly susceptible to the disease, is an important product for exportation. Methods: In this research, thirty-seven Colletotrichum species isolate were obtained from anthracnose disease in mango cv. Nam Dork Mai See Tong in three provinces in Thailand. Morphological studies and molecular techniques using species-specific primers were investigated; moreover, the diversity of pathogens was analyzed using PCR amplification of inter simple sequence repeats (ISSRs) with 6 primers, including pathogenicity tests. Result: Morphological studies and molecular detection with species-specific primers revealed that 32 isolates belonged to the C. gloeosporioides species complex and 5 isolates to the C. acutatum species complex. The genetic diversity of pathogens was analyzed. PCR amplification using 6 ISSR primers produced 35 polymorphic bands. These bands were used to construct UPGMA, in which cluster analysis divided the 37 isolates into 3 main groups and 8 subgroups at 61-73% Jaccard similarity coefficient with cophenetic correlation (r) = 0.6781. The ISSR technique showed the greatest genetic variation among isolates collected from different locations. Hence, a study based on ISSR markers was profitable to investigate the phylogenetic relationship of the genus Colletotrichum. Pathogenicity tests revealed that PC006 (Ca) and CS005 (Cg) showed the highest aggressiveness, with disease incidences of 84.74 and 80.90%, respectively. This study indicates that the diversity of pathogenic Colletotrichum species related to mango plantations in Thailand is increasing.


Mathematics ◽  
2021 ◽  
Vol 9 (22) ◽  
pp. 2840
Author(s):  
José M. Maisog ◽  
Andrew T. DeMarco ◽  
Karthik Devarajan ◽  
Stanley Young ◽  
Paul Fogel ◽  
...  

Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m × n data matrix X into an m × k matrix W and a k × n matrix H, so that X ≈ W × H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k? In this paper, we first assess methods for estimating k in the context of NMF in synthetic data. Second, we examine the effect of normalization on this estimate’s accuracy in empirical data. In synthetic data with orthogonal underlying components, methods based on PCA and Brunet’s Cophenetic Correlation Coefficient achieved the highest accuracy. When evaluated on a well-known real dataset, normalization had an unpredictable effect on the estimate. For any given normalization method, the methods for estimating k gave widely varying results. We conclude that when estimating k, it is best not to apply normalization. If the underlying components are known to be orthogonal, then Velicer’s MAP or Minka’s Laplace-PCA method might be best. However, when the orthogonality of the underlying components is unknown, none of the methods seemed preferable.


2021 ◽  
Vol 37 ◽  
pp. e37048
Author(s):  
Bruno Wagner Zago ◽  
Marco Antonio Aparecido Barelli ◽  
Valvenarg Pereira da Silva ◽  
Rafhael Felipin-Azevedo ◽  
Carla Lima Corrêa ◽  
...  

The aim of this research was to evaluate the genetic divergence between 164 genotypes of Manihot esculenta from the South-Central mesoregion of the State of Mato Grosso. The genotypes are from projects conducted by the Laboratory of Genetic Resources & Biotechnology of the University of the State of Mato Grosso, Cáceres-Mato Grosso (UNEMAT), and the Brazilian Public Agricultural Research Corporation - Agrosilvopastoral (EMBRAPA). The agronomic descriptors evaluated were plant height, height of first branching, branching levels, weight of the aerial part of the plant, total weight of the plant, number of roots per plant, average weight of roots per plant, yield of commercial roots, yield of non-commercial roots, number of rotten roots per plant and harvest index. For the analysis of genetic divergence, multivariate analysis based on the standardized Euclidean mean distance was employed, later performing the Hierarchical UPGMA and Tocher Optimization agglomerative methods. The degree of preservation of the genetic distances in the dendrogram was verified using the Cophenetic Correlation Coefficient. The Singh criterion was used to quantify the relative contribution of characteristics to genetic divergence. The genotypes presented genetic dissimilarity for the evaluated characteristics and based on the results of the dissimilarity matrix and groupings, it is recommended the crossings between the genotypes allocated in group II with the genotype allocated in group V, for the development of segregated populations with high genetic variability.


2021 ◽  
Vol 5 (1) ◽  
pp. 117-129
Author(s):  
Zerlita Fahdha Pusdiktasari ◽  
Widiarni Ginta Sasmita ◽  
Wulaida Rizky Fitrilia ◽  
Rahma Fitriani ◽  
Suci Astutik

The Covid-19 pandemic has hit Indonesia since March 2020. Several policies have been issued by the Indonesian government to reduce the level of the spread of Covid-19. This policy has an impact on various fields of life, especially the economic sector in various sectors. This study was conducted to analyze the grouping of provinces whose economies are at risk of being affected by Covid-19 based on various economic sectors, namely the unemployment rate, the percentage of poor people, the provincial minimum wage, and the occupancy rate of hotels using cluster analysis. Cluster analysis was performed using several hierarchical methods, namely Simple, Complete, Average, and Centroid Linkage and Ward. The Cophenetic correlation coefficient (rCoph) was used to determine the best method, while the number of clusters was determined based on the Dunn, Connectivity, and Silhoutte indexes. The analysis result shows that Average Linkage is the best method with two clusters. The first cluster consists of all provinces in Indonesia except Papua, whose economy is highly at risk of being affected by Covid-19, characterized by a low percentage of the poor and a low provincial minimum wage, as well as high levels of open unemployment and hotel occupancy rates. Meanwhile, the second cluster consists of the Province of Papua, which is an economic group with a low risk of being affected by Covid-19. By looking at the impact of the Covid-19 disaster, the government can make recovery efforts and generalize economic recovery policies due to Covid-19 which have an impact on the economy of almost all provinces in Indonesia.


2021 ◽  
Vol 306 ◽  
pp. 01013
Author(s):  
Fyannita Perdhana ◽  
Tri Hastini ◽  
Iskandar Ishaq

As local varieties of rice have a very important role as a source of valuable traits in developing high yielding variety through plant breeding programs, it is needed to be characterized. Panicle branching characterization is one of the efforts to understand local varieties of rice characteristics more. We have observed thirtheen characters of panicle branching on 24 West Java local rice varieties. Five panicles of each varieties as accession was observed and statistical analysed. Tukey’s Honest Significant Difference (HSD) test showed differences among accessions in all panicle branching characteristics observed. Based on Principles Component Analysis (PCA), the panicle branching characters observed generally showed the same direction, but among them were not always to be correlated. In the result of clustering based on the ward linkage method, the accessions were divided into two clusters. The first one had 8 members, and the second one had 16 members. The cophenetic correlation coefficient was 0.60, indicated that the clustering through standardized value was faithfully enough to represent the original distances. The result of this research can provide the information for breeder in selecting rice genotypes which have more seeds per panicle as parent in assembling new high yielding rice varieties.


2020 ◽  
Vol 7 (4) ◽  
pp. 225-234
Author(s):  
Miriam Andrejiova ◽  
Zuzana Kimakova

The development of the transport segment is currently an essential process which affects several other industries. The transport infrastructure and the services provided in this sector influence economic growth, the efforts aimed at increasing competitiveness, as well as prosperity of the society. One of the key problems Slovakia is facing is the long-term growth of differences between individual regions. The present article deals with the evaluation and comparison of selected transport infrastructure indicators in eight regions of Slovakia. The evaluation was carried out by applying basic statistical methods and multiple-criteria statistical methods. Every region was characterised by 20 selected variables describing its uniqueness (e.g. population, area, GDP per capita, road infrastructure etc.). The evaluation of similarities between individual regions in terms of selected variables was carried out by applying the Principal Component Analysis (PCA) and Hierarchical Cluster Analysis. Within the PCA, the original input variables were replaced with three principal components describing as much as 86.68% of the cumulative variance. The average linkage method, as one of the hierarchical methods, was applied to create a dendrogram representing the similarities between the regions of Slovakia. The cophenetic correlation coefficient value of CC=0.936 confirmed the proper selection of the average linkage method. The output of the cluster analysis was that 8 regions of Slovakia were divided into five similar homogenous clusters based on the examined variables. The final analysis indicated that the transport infrastructure and the development thereof significantly affect the differences between individual regions of Slovakia and, as a matter of fact, they belong to the factors creating such differences.


Author(s):  
T Negara ◽  
◽  
C Kusmana ◽  
I Mansur ◽  
N A Santi

This paper examines the identification of key indicators that could be used to measure the success of reclamation plants in post-exploration oil and gas mining areas. The main objective of this research was to find key indicators or variables for evaluating the level success of reclamation results in the post-mining of oil and gas area. In this study, 44 environmental variables of the physical, biological, soil, water and air indicators were analyzed from 70 field plots of 6 reclamation and 2 natural forest sites. The analysis methods included (1) cluster analysis using the Agglomerative Hierarchical Clustering method with the Ward's method, and (2) quadratic discriminant analysis. The results of the clustering analysis showed that there were some clusters due to variation of biomass, water, soil and air conditions. The three clusters developed based on water and/or air variables provided high cophenetic correlation (0.80) with low within-cluster (14.5%) and high between-cluster variations (85.5%). Based on the multicollinearity analysis, average vector difference test, variance matrix variance test, unidimensional test of each variable and quadratic discriminant function, this study found that there were 3 key indicators determining variations of the quality of the reclamation plantations within the study sites, namely, biological indicator of biomass volume (Bio_B); soil indicator of P content in the soil (Tnh_P), saturation base of soil (Tnh_Kb), Manganese (Mn) content in the soil (Tnh_Mn), Sulfur content in the soil (Tnh_S), percentage of ash in the soil (Tnh_Ab), percentage of clay in the soil (Tnh_Li), and water indicator of chloride content in the surface water (Air_Cl). The examination on four classes of the reclamation quality showed that the classes were successfully classified having excellent cross-validation error matrix with overall accuracy more than 90%.


Author(s):  
Esther Fobi Donkor ◽  
Remember Roger Adjei ◽  
Sober Ernest Boadu

Rice is one of the major staple foods consumed in all part of the world including Ghana. The study was conducted at the research field of CSIR in Sokwai to evaluate the performance of the newly released rice varieties in lowland ecology in Ghana. The data collected were subjected to analysis of variance using Statistical Tool for Agricultural Research Version 2.0.1. The analysis of variance revealed a highly significant variation among most of the agronomic traits studied except for the panicle length, moisture content, total weight of 5 hills, the percent moisture content and the lodging index. The first five principal component (PC) accounted for 71% of the total variation with PC1, PC2 and PC3 contributing 20%, 17% and 13% respectively. The cluster analysis placed the accessions into five main clusters. The cophenetic correlation coefficient (CCC) was 0.42.


Sign in / Sign up

Export Citation Format

Share Document