cophenetic correlation coefficient
Recently Published Documents


TOTAL DOCUMENTS

22
(FIVE YEARS 7)

H-INDEX

6
(FIVE YEARS 1)

Mathematics ◽  
2021 ◽  
Vol 9 (22) ◽  
pp. 2840
Author(s):  
José M. Maisog ◽  
Andrew T. DeMarco ◽  
Karthik Devarajan ◽  
Stanley Young ◽  
Paul Fogel ◽  
...  

Non-negative matrix factorization is a relatively new method of matrix decomposition which factors an m × n data matrix X into an m × k matrix W and a k × n matrix H, so that X ≈ W × H. Importantly, all values in X, W, and H are constrained to be non-negative. NMF can be used for dimensionality reduction, since the k columns of W can be considered components into which X has been decomposed. The question arises: how does one choose k? In this paper, we first assess methods for estimating k in the context of NMF in synthetic data. Second, we examine the effect of normalization on this estimate’s accuracy in empirical data. In synthetic data with orthogonal underlying components, methods based on PCA and Brunet’s Cophenetic Correlation Coefficient achieved the highest accuracy. When evaluated on a well-known real dataset, normalization had an unpredictable effect on the estimate. For any given normalization method, the methods for estimating k gave widely varying results. We conclude that when estimating k, it is best not to apply normalization. If the underlying components are known to be orthogonal, then Velicer’s MAP or Minka’s Laplace-PCA method might be best. However, when the orthogonality of the underlying components is unknown, none of the methods seemed preferable.


2021 ◽  
Vol 5 (1) ◽  
pp. 117-129
Author(s):  
Zerlita Fahdha Pusdiktasari ◽  
Widiarni Ginta Sasmita ◽  
Wulaida Rizky Fitrilia ◽  
Rahma Fitriani ◽  
Suci Astutik

The Covid-19 pandemic has hit Indonesia since March 2020. Several policies have been issued by the Indonesian government to reduce the level of the spread of Covid-19. This policy has an impact on various fields of life, especially the economic sector in various sectors. This study was conducted to analyze the grouping of provinces whose economies are at risk of being affected by Covid-19 based on various economic sectors, namely the unemployment rate, the percentage of poor people, the provincial minimum wage, and the occupancy rate of hotels using cluster analysis. Cluster analysis was performed using several hierarchical methods, namely Simple, Complete, Average, and Centroid Linkage and Ward. The Cophenetic correlation coefficient (rCoph) was used to determine the best method, while the number of clusters was determined based on the Dunn, Connectivity, and Silhoutte indexes. The analysis result shows that Average Linkage is the best method with two clusters. The first cluster consists of all provinces in Indonesia except Papua, whose economy is highly at risk of being affected by Covid-19, characterized by a low percentage of the poor and a low provincial minimum wage, as well as high levels of open unemployment and hotel occupancy rates. Meanwhile, the second cluster consists of the Province of Papua, which is an economic group with a low risk of being affected by Covid-19. By looking at the impact of the Covid-19 disaster, the government can make recovery efforts and generalize economic recovery policies due to Covid-19 which have an impact on the economy of almost all provinces in Indonesia.


2021 ◽  
Vol 306 ◽  
pp. 01013
Author(s):  
Fyannita Perdhana ◽  
Tri Hastini ◽  
Iskandar Ishaq

As local varieties of rice have a very important role as a source of valuable traits in developing high yielding variety through plant breeding programs, it is needed to be characterized. Panicle branching characterization is one of the efforts to understand local varieties of rice characteristics more. We have observed thirtheen characters of panicle branching on 24 West Java local rice varieties. Five panicles of each varieties as accession was observed and statistical analysed. Tukey’s Honest Significant Difference (HSD) test showed differences among accessions in all panicle branching characteristics observed. Based on Principles Component Analysis (PCA), the panicle branching characters observed generally showed the same direction, but among them were not always to be correlated. In the result of clustering based on the ward linkage method, the accessions were divided into two clusters. The first one had 8 members, and the second one had 16 members. The cophenetic correlation coefficient was 0.60, indicated that the clustering through standardized value was faithfully enough to represent the original distances. The result of this research can provide the information for breeder in selecting rice genotypes which have more seeds per panicle as parent in assembling new high yielding rice varieties.


Author(s):  
Esther Fobi Donkor ◽  
Remember Roger Adjei ◽  
Sober Ernest Boadu

Rice is one of the major staple foods consumed in all part of the world including Ghana. The study was conducted at the research field of CSIR in Sokwai to evaluate the performance of the newly released rice varieties in lowland ecology in Ghana. The data collected were subjected to analysis of variance using Statistical Tool for Agricultural Research Version 2.0.1. The analysis of variance revealed a highly significant variation among most of the agronomic traits studied except for the panicle length, moisture content, total weight of 5 hills, the percent moisture content and the lodging index. The first five principal component (PC) accounted for 71% of the total variation with PC1, PC2 and PC3 contributing 20%, 17% and 13% respectively. The cluster analysis placed the accessions into five main clusters. The cophenetic correlation coefficient (CCC) was 0.42.


2019 ◽  
pp. 1625-1630
Author(s):  
Angela Vacaro de Souza ◽  
Fernando Ferrari Putti ◽  
Marcos Ribeiro da Silva Vieira ◽  
Rogério Lopes Vieites

The objective of this study was to investigate the relations between the amount of anthocyanins, carotenoids and the antioxidant activity measured by the FRAP and TEAC methods. Furthermore, pigments and the coloration of blackberry harvested fruits were measured at 3 different collection points then fruits stored in refrigerated environment and the jelly made from them, preserved in hermetically sealed glasses, without contact with light and temperature of 25ºC. In order to investigate the relations between the study variables (content of anthocyanins and carotenoids, antioxidant activity and coloration using digital colorimeter), Pearson’s correlation analysis was adopted, which indicates the existence of a positive or negative relation between two variables. The α = 5% (correlation coefficient) was used to verify the significance of the correlation. The Mahalanobis (D2) generalized distance for the clustering analysis by the mean linkage method between group of blackberry fruits and jellies was applied. Furthermore, the cophenetic correlation coefficient and multivariate analysis using major components were applied to verify the grouping of different responses of harvesting points of fresh blackberry fruit in natura and jellies. The results showed that there was correlation between the content of anthocyanins and carotenoids in fruits (0.99*) and between the same parameters in jellies. However, this behavior was not clearly observed between the pigments and the antioxidant activity. There was a positive correlation between the factors involved in the coloring of chroma fruits ‘L’, ‘a’, ‘b’ and ºHue in fruits and jellies. Blackberry jellies presented as good sources of anthocyanins and carotenoids.


2019 ◽  
Vol 32 (1) ◽  
pp. 81-91
Author(s):  
Jorge Xavier de Almeida Neto ◽  
Mailson Monteiro do Rêgo ◽  
Elizanilda Ramalho do Rêgo ◽  
Ana Paula Gomes da Silva

ABSTRACT Brave bean (Capparis flexuosa L.) is a Caatinga species that is used as forage, mainly during the dry season when some plant species lose their leaves. The aim of this study was to assess genetic diversity within and among brave bean populations using Random Amplified Polymorphic DNA (RAPD) markers. Brave bean leaves were collected from 30 accessions in the following municipalities of Paraíba state, Brazil: Barra de Santa Rosa (BSR), Cuité (C), São João do Cariri (SJC), Damião (D), Baraúna (B), and Picuí (P). DNA extraction followed the standard methodology of CTAB with modifications. RAPD analyses were carried out using 18 primers, and polymorphism of the amplified DNA fragments was visualized using agarose gel electrophoresis. Data were used to calculate Jaccard Similarity Coefficient values, which were then used to group samples with the Unweighted Pair Group Method with Arithmetic Mean. Cophenetic Correlation Coefficient, Stress, and Distortion Coefficient values were also calculated from these analyses. Band polymorphism was generated with 14 primers, but the sampled populations showed low numbers of polymorphic loci (27 in BSR, 18 in C, 7 in SJC, 9 in D, and 0 in B and P). The highest polymorphic information content was found in samples from the BSR (9 groups), C (22 groups), SJC (7 groups), and D (6 groups) municipalities. In the interpopulation analysis, 34 groups were formed, the matrices of which showed high cophenetic correlations (0.95 to 0.98), but low stress (12.9 to 17.45%) and distortion (3.05%). Therefore, results showed that there was genetic variability both among and within brave bean populations.


Author(s):  
Priscilla Ramos Carvalho ◽  
Casimiro Sepúlveda Munita ◽  
André Luiz Lapolli

The literature presents many methods for partitioning of data set, and is difficult choose which is the most suitable, since the various combinations of methods based on different measures of dissimilarity can lead to different patterns of grouping and false interpretations. Nevertheless, little effort has been expended in evaluating these methods empirically using an archaeological data set. In this way, the objective of this work is make a comparative study of the different cluster analysis methods and identify which is the most appropriate. For this, the study was carried out using a data set of 45 samples of ceramic fragments, analyzed by instrumental neutron activation analysis (INAA). The methods used for this study were: Single linkage, Complete linkage, Average linkage, Centroid and Ward. The validation was done using the cophenetic correlation coefficient and comparing these values the average linkage method obtained better results. A script of the statistical program R with some functions was created to obtain the cophenetic correlation. By means of these values was possible to choose the most appropriate method to be used in the data set.


2018 ◽  
Vol 15 (4) ◽  
pp. 949-955
Author(s):  
Madappa Machamada Bheemaiah ◽  
Bopaiah Ajikuttira Kushalappa ◽  
Grace Prabhakar

The plants in the Garcinia species are economically important. Phylogenetic investigation is needed for these tree species to boost breeding and conservation programmes. Six Garcinia species were investigated for their phylogenetic relationship using Random Amplified Polymorphic DNA(RAPD) markers. A standardised procedure was developed for isolation of DNA from the leaf samples of G. cambogia, G. indica, G. xanthochymus, G. morella, G. mangostana and G. livingstonei. Phylogenetic investigation is needed for these tree species to boost breeding and conservation programmes. A standardised procedure was developed for isolation of DNA from the leaf samples of G. cambogia, G. indica, G. xanthochymus, G. morella, G. mangostana and G. livingstonei. The DNA samples were subjected to PCR using 8 random primers. 269 polymorphic bands were obtained and scored to develop the values for the genetic distance. The dendrogram was developed using the software dendroUPGMA and the Cophenetic correlation coefficient of 0.801 is obtained. G. cambogia and G. livingstonei are closely placed with a score of 24% followed by G. morella. It had a 30% index score to G. cambogia and G. livingstonei but is followed by just 31% score with G. indica. G.mangostana is connected at 33.5% dissimilarity to the above groups showing it is an introduced variety. G. xanthochymus is the last link with 37% score in the matrix. The data represented is the first of the type for the species. This will help in further DNA related work in these species. The genetic relatedness among these species is reported and this can be utilised in marker analysis for other Garcinia species.


Blood ◽  
2018 ◽  
Vol 132 (Supplement 1) ◽  
pp. 106-106
Author(s):  
Brian M Reilly ◽  
Tiffany N Tanaka ◽  
Dinh Diep ◽  
Huwate Yeerna ◽  
Pablo Tamayo ◽  
...  

Abstract Background: Recurrent somatic mutations in patients with myelodysplastic syndromes (MDS) implicate several epigenetic regulators in the molecular pathobiology of these disorders. We hypothesized that identification of MDS subtypes defined by DNA methylation patterns could improve prognostication and enhance our understanding of MDS disease biology. Methylation assays that investigate regions of the genome beyond promoters may better inform which regulatory regions possess biologic importance in the development and progression of MDS. Additionally, computational methods that agnostically approach large methylation datasets are needed when attempting to classify clinically and genetically heterogeneous disease populations such as MDS. Methods: To measure DNA methylation, we used bisulfite padlock probes (BSPP), a targeted enrichment technique that captures approximately 500,000 unique CpGs in regions known to contain differentially methylated regulatory regions. BSPP was performed on bone marrow mononuclear DNA from 141 treatment-naïve patients with MDS that were genetically and clinically annotated in prior studies (Bejar et al. N Engl J Med 2011 and J Clin Oncol 2012). We used Onco-GPS, a computational method that identifies shared underlying cellular states among heterogeneous sample populations in an unsupervised manner, to identify DNA methylation subtypes of MDS (Kim et al. Cell Systems 2017). This method uses non-negative matrix factorization (NMF) followed by consensus clustering of the resulting NMF components. The optimal number of NMF components and consensus clusters were chosen to maximize the cophenetic correlation coefficient. A Fisher's Exact test was used to test for enrichment of genetic lesions in specific clusters. Univariate and multivariable models of overall survival (OS) were constructed using cox proportional hazards regression. Variables with univariate p-values <0.2 were evaluated for inclusion in the final multivariable model and the final model was constructed using a forward and backward stepwise procedure based on the Aikake Information Criterion. Results: We found that 5 NMF components and 5 consensus clusters maximized the cophenetic correlation coefficient, indicating the most stable partitioning of our dataset (Figures 1A-1B). The samples making up each cluster were enriched largely in a single NMF component, indicating that the underlying DNA methylation loci contributing to each cluster are distinct. Specific genetic and cytogenetic abnormalities were significantly enriched in different clusters, including enrichment of mutations in TP53 and U2AF1 in cluster 2; EZH2, ASXL1, RUNX1, SRSF2 in cluster 3; TET2 in cluster 4; SF3B1 in cluster 5; and abnormal and complex karyotypes enriched in clusters 1 and 2 (Figure 1C-1D). Kaplan-Meier curves revealed two major patterns of survival with clusters 2 and 3 displaying inferior OS when compared with clusters 1, 4, and 5. We combined clusters with similar median OS into "High Risk" and "Low Risk" cluster groups. In univariate Cox proportional hazards regression, High Risk and Low Risk cluster groups had significant differences in OS (HR, 1.95; 95% CI,1.24-3.08; p=0.003). In a multivariable model including elements of the IPSS-R, cluster risk group remained a significant predictor of OS (HR, 2.02; 95% CI, 1.25-3.27, p=0.004). We determined cluster-specific differentially methylated regions (DMRs) and found divergent patterns of methylation among clusters with the majority of DMRs located at non-promoter regulatory regions with high enrichment at both active and bivalent enhancers (as defined in a reference epigenome from mobilized CD34+ cells). Conclusion: Using BSPP and novel computational methods, we found that DNA methylation of MDS patients identifies clinically relevant subtypes not otherwise distinguished by current prognostic models. Our analysis also shows that while genetic abnormalities can be associated with specific DNA methylation changes, patients with diverse genetic lesions can converge on common DNA methylation states, reflecting shared pathogenic mechanisms and clinical outcomes. Ongoing work to annotate the biological differences between the DNA methylation subgroups identified will enhance our understanding of the role of DNAm in prognosis and the underlying biology of MDS. Disclosures Zhang: Singlera Genomics: Consultancy, Equity Ownership, Membership on an entity's Board of Directors or advisory committees. Bejar:AbbVie/Genentech: Consultancy, Honoraria; Modus Outcomes: Consultancy; Takeda: Research Funding; Celgene: Consultancy, Honoraria; Astex/Otsuka: Consultancy, Honoraria; Genoptix: Consultancy; Foundation Medicine: Consultancy.


Genetika ◽  
2018 ◽  
Vol 50 (1) ◽  
pp. 231-242
Author(s):  
Maleki Hatami ◽  
Kiomars Rouhrazi ◽  
Gholam Khodakaramian ◽  
Naser Sabaghnia

The diversity and phylogeny of 30 rhizobia isolated from nodules of faba bean plants grown on 5 geographic regions located in the East Azerbaijan province of Iran were examined using rep-PCR fingerprinting, sequence analysis of 16S rRNA accompanied with nodC genes. Based on cluster analysis of rep-PCR fingerprints, faba bean rhizobia isolates were differentiated into five clusters (A to E) at 80% similarity level. The cophenetic correlation coefficient for the dendrogram obtained from the combined dataset of BOX and ERIC primers was 0.942. The percentage of polymorphic loci was 59.2% using the BOX-PCR primer and 67.3% using the ERIC-PCR primers. The data obtained by rep-PCR fingerprinting showed high apparent correlation between genetic diversity and geographical origin of the isolates. The phylogenetic analysis based on 16S rRNA and nodC sequences showed that representative isolates were closely related to R. leguminosarum bv. viciae and R. fabae. To the best of our knowledge, this is first report of isolation and characterization of R. fabae from Iran.


Sign in / Sign up

Export Citation Format

Share Document