SOFTWARE ARCHITECTURE DECOMPOSITION USING ATTRIBUTES

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.

Download Full-text

Study on the Influence of Diversity and Quality in Entropy Based Collaborative Clustering

Entropy ◽

10.3390/e21100951 ◽

2019 ◽

Vol 21 (10) ◽

pp. 951

Author(s):

Jérémie Sublime ◽

Guénaël Cabanes ◽

Basarab Matei

Keyword(s):

Clustering Algorithms ◽

Mathematical Optimization ◽

Data Sets ◽

Distributed Data ◽

Clustering Methods ◽

Local Structures ◽

Collaborative Clustering ◽

Privacy Constraints ◽

The Stability ◽

The One

The aim of collaborative clustering is to enhance the performances of clustering algorithms by enabling them to work together and exchange their information to tackle difficult data sets. The fundamental concept of collaboration is that clustering algorithms operate locally but collaborate by exchanging information about the local structures found by each algorithm. This kind of collaborative learning can be beneficial to a wide number of tasks including multi-view clustering, clustering of distributed data with privacy constraints, multi-expert clustering and multi-scale analysis. Within this context, the main difficulty of collaborative clustering is to determine how to weight the influence of the different clustering methods with the goal of maximizing the final results and minimizing the risk of negative collaborations—where the results are worse after collaboration than before. In this paper, we study how the quality and diversity of the different collaborators, but also the stability of the partitions can influence the final results. We propose both a theoretical analysis based on mathematical optimization, and a second study based on empirical results. Our findings show that on the one hand, in the absence of a clear criterion to optimize, a low diversity pool of solution with a high stability are the best option to ensure good performances. And on the other hand, if there is a known criterion to maximize, it is best to rely on a higher diversity pool of solution with a high quality on the said criterion. While our approach focuses on entropy based collaborative clustering, we believe that most of our results could be extended to other collaborative algorithms.

Download Full-text

A comparison of clustering methods for biogeography with fossil datasets

PeerJ ◽

10.7717/peerj.1720 ◽

2016 ◽

Vol 4 ◽

pp. e1720 ◽

Cited By ~ 4

Author(s):

Matthew J. Vavrek

Keyword(s):

Euclidean Distance ◽

Group Method ◽

Neighbor Joining ◽

Clustering Methods ◽

Data Set ◽

Biogeographic Patterns ◽

Pair Group ◽

Linkage Methods ◽

Average Similarity ◽

Incomplete Datasets

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean basedk-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and thek-means and NERC methods should be used in their place.

Download Full-text

Comparison of Several Clustering Methods in Grouping Kale Landraces

Journal of the American Society for Horticultural Science ◽

10.21273/jashs.132.3.387 ◽

2007 ◽

Vol 132 (3) ◽

pp. 387-395 ◽

Cited By ~ 14

Author(s):

Guillermo Padilla ◽

María Elena Cartea ◽

Amando Ordás

Keyword(s):

Hierarchical Cluster ◽

Group Method ◽

Clustering Methods ◽

Homogeneous Groups ◽

Pair Group ◽

Morphologic Characteristics ◽

Northwestern Spain ◽

Ward Method ◽

Slight Advantage

Four clustering methods were compared for classification of a collection of 148 kale landraces (Brassica oleracea L. acephala group) from northwestern Spain based on morphologic characters: the unweighted pair group method using arithmetic averages (UPGMA) and the Ward method, hierarchical cluster algorithms, and the modified location model (MLM) applied to both the UPGMA and the Ward method (UPGMA-MLM and Ward-MLM, respectively). Comparisons were based on five criteria and on subjective considerations about the structure of each method and the characteristics of the material evaluated. Although the UPGMA-MLM was superior according to the objective criteria, its slight advantage with respect to the Ward-MLM strategy did not overcome the fact that the initial UPGMA cluster generated a classification with little value. The Ward-MLM strategy generated five homogeneous groups with defined morphologic characteristics. Moreover, the Ward-MLM strategy allowed the identification of redundant landraces, which would permit the number of accessions in further critical trials to be reduced.

Download Full-text

Genetic divergence of Heliconiaceae species in the Central West Brazil region

Agronomía Colombiana ◽

10.15446/agron.colomb.v35n3.67661 ◽

2017 ◽

Vol 35 (3) ◽

pp. 285-292

Author(s):

Cintia Graciele Da Silva ◽

Edneia Zullian Dalbosco ◽

Petterson Baptista Da Luz ◽

Willian Krause ◽

Vivian Loges ◽

...

Keyword(s):

Genetic Variability ◽

Genetic Divergence ◽

Quantitative Traits ◽

Group Method ◽

Clustering Methods ◽

Breeding Programs ◽

Mato Grosso ◽

Relative Contribution ◽

Pair Group ◽

Inflorescence Length

The purpose of this study was to describe morphological traits and estimate genetic divergence and parameters between accessions of the genus Heliconia sp. from different municipalities in the state of Mato Grosso, Brazil. A set of 25 traits, 15 quantitative and 10 qualitative were evaluated. The genetic divergence was estimated based on Mahalanobis' distance, with the clustering methods known as Unweighted Pair Group Method using Arithmetic Averages (UPGMA). Genetic variability was observed for all assessed quantitative traits and the accessions were grouped in different classes. The traits with highest relative contribution to variability were longevity of flower stems and inflorescence length. The results indicated the existence of genetic variability among accessions of the Heliconiasp. germplasm bank, which can be used in breeding programs.

Download Full-text

A comparison of clustering methods for biogeography with fossil datasets

10.7287/peerj.preprints.1693 ◽

2016 ◽

Author(s):

Matthew J Vavrek

Keyword(s):

Euclidean Distance ◽

Group Method ◽

Neighbor Joining ◽

Clustering Methods ◽

Data Set ◽

Biogeographic Patterns ◽

Pair Group ◽

Linkage Methods ◽

Average Similarity ◽

Incomplete Datasets

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.

Download Full-text

General practitioners and clinical pharmacology

Psychiatry and Psychobiology ◽

10.1017/s0767399x00002820 ◽

1989 ◽

Vol 4 (4) ◽

pp. 241-244

Author(s):

P. Lemoine

Keyword(s):

Clinical Pharmacology ◽

General Practitioners ◽

Field Studies ◽

Discussion Group ◽

Group Method ◽

Study Group ◽

Test Drug ◽

Monthly Basis ◽

One Year ◽

The One

SummaryIt is difficult to undertake field studies with non marketed psychotropic drugs because of two apparently contradictory conditions : on the one hand, the methodology has to be rigorously controlled, and on the other hand, such studies have to be carried out in their future environment by general practitioners (GPs). Bearing in mind the lack of training and experience regarding this kind of approach, the author adopted a discussion group method according to the techniques developed by M. Balint. The study group comprised five GPs, a clinical pharmacology expert and a doctor from the pharmaceutical laboratory which had developed the test drug. These persons met on a monthly basis over a one year period. In the present paper, the author indicates the benefits of such a methodology, based on six years’ experience and several trials, with special emphasis placed on the pedagogical aspects.

Download Full-text

Genetic Diversity Among Some Walnut (Juglans regia L.) Genotypes by SSR Markers

Sustainability ◽

10.3390/su13126830 ◽

2021 ◽

Vol 13 (12) ◽

pp. 6830

Author(s):

Murat Guney ◽

Salih Kafkas ◽

Hakan Keles ◽

Mozhgan Zarifikhosroshahi ◽

Muhammet Ali Gundesli ◽

...

Keyword(s):

Genetic Diversity ◽

Plant Genetic Resources ◽

Juglans Regia ◽

Genetic Distances ◽

Mean Value ◽

Central Anatolia ◽

Group Method ◽

Arithmetic Average ◽

Pair Group ◽

Principal Coordinates

The food needs for increasing population, climatic changes, urbanization and industrialization, along with the destruction of forests, are the main challenges of modern life. Therefore, it is very important to evaluate plant genetic resources in order to cope with these problems. Therefore, in this study, a set of ninety-one walnut (Juglans regia L.) accessions from Central Anatolia region, composed of seventy-four accessions and eight commercial cultivars from Turkey, and nine international reference cultivars, was analyzed using 45 SSR (Simple Sequence Repeats) markers to reveal the genetic diversity. SSR analysis identified 390 alleles for 91 accessions. The number of alleles per locus ranged from 3 to 19 alleles with a mean value of 9 alleles per locus. Genetic dissimilarity coefficients ranged from 0.03 to 0.68. The highest number of alleles was obtained from CUJRA212 locus (Na = 19). The values of polymorphism information content (PIC) ranged from 0.42 (JRHR222528) to 0.86 (CUJRA212) with a mean PIC value of 0.68. Genetic distances were estimated according to the UPGMA (Unweighted Pair Group Method with Arithmetic Average), Principal Coordinates (PCoA), and the Structure-based clustering. The UPGMA and Structure clustering of the accessions depicted five major clusters supporting the PCoA results. The dendrogram revealed the similarities and dissimilarities among the accessions by identifying five major clusters. Based on this study, SSR analyses indicate that Yozgat province has an important genetic diversity pool and rich genetic variance of walnuts.

Download Full-text

Genome-Wide Identification and Development of LTR Retrotransposon-Based Molecular Markers for the Melilotus Genus

Plants ◽

10.3390/plants10050890 ◽

2021 ◽

Vol 10 (5) ◽

pp. 890

Author(s):

Zifeng Ouyang ◽

Yimeng Wang ◽

Tiantian Ma ◽

Gisele Kanzana ◽

Fan Wu ◽

...

Keyword(s):

Group Method ◽

Ltr Retrotransposon ◽

Ltr Retrotransposons ◽

Important Data ◽

Breeding Programs ◽

Upgma Dendrogram ◽

Pair Group ◽

Medicinal Value ◽

Genome Wide ◽

Melilotus Albus

Melilotus is an important genus of legumes with industrial and medicinal value, partly due to the production of coumarin. To explore the genetic diversity and population structure of Melilotus, 40 accessions were analyzed using long terminal repeat (LTR) retrotransposon-based markers. A total of 585,894,349 bp of LTR retrotransposon sequences, accounting for 55.28% of the Melilotus genome, were identified using bioinformatics tools. A total of 181,040 LTR retrotransposons were identified and classified as Gypsy, Copia, or another type. A total of 350 pairs of primers were designed for assessing polymorphisms in 15 Melilotus albus accessions. Overall, 47 polymorphic primer pairs were screened for their availability and transferability in 18 Melilotus species. All the primer pairs were transferable, and 292 alleles were detected at 47 LTR retrotransposon loci. The average polymorphism information content (PIC) value was 0.66, which indicated that these markers were highly informative. Based on unweighted pair group method with arithmetic mean (UPGMA) dendrogram cluster analysis, the 18 Melilotus species were classified into three clusters. This study provides important data for future breeding programs and for implementing genetic improvements in the Melilotus genus.

Download Full-text

Genetic diversity in table grapes based on RAPD and microsatellite markers

Pesquisa Agropecuária Brasileira ◽

10.1590/s0100-204x2011000900010 ◽

2011 ◽

Vol 46 (9) ◽

pp. 1035-1044 ◽

Cited By ~ 9

Author(s):

Patrícia Coelho de Souza Leão ◽

Sérgio Yoshimitsu Motoike

Keyword(s):

Genetic Diversity ◽

Microsatellite Markers ◽

Genetic Relationships ◽

Similarity Index ◽

Genetic Distances ◽

Table Grape ◽

Group Method ◽

Molecular Characteristics ◽

Table Grapes ◽

Pair Group

The objective of this work was to analyze the genetic diversity of 47 table grape accessions, from the grapevine germplasm bank of Embrapa Semiárido, using 20 RAPD and seven microsatellite markers. Genetic distances between pairs of accessions were obtained based on Jaccard's similarity index for RAPD data and on the arithmetic complement of the weighted index for microsatellite data. The groups were formed according to the Tocher's cluster analysis and to the unweighted pair‑group method with arithmetic mean (UPGMA). The microsatellite markers were more efficient than the RAPD ones in the identification of genetic relationships. Information on the genetic distance, based on molecular characteristics and coupled with the cultivar agronomic performance, allowed for the recommendation of parents for crossings, in order to obtain superior hybrids in segregating populations for the table grape breeding program of Embrapa Semiárido.

Download Full-text