scholarly journals A comparison of clustering methods for biogeography with fossil datasets

Author(s):  
Matthew J Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.

PeerJ ◽  
2016 ◽  
Vol 4 ◽  
pp. e1720 ◽  
Author(s):  
Matthew J. Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean basedk-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and thek-means and NERC methods should be used in their place.


2016 ◽  
Author(s):  
Matthew J Vavrek

Cluster analysis is one of the most commonly used methods in palaeoecological studies, particularly in studies investigating biogeographic patterns. Although a number of different clustering methods are widely used, the approach and underlying assumptions of many of these methods are quite different. For example, methods may be hierarchical or non-hierarchical in their approaches, and may use Euclidean distance or non-Euclidean indices to cluster the data. In order to assess the effectiveness of the different clustering methods as compared to one another, a simulation was designed that could assess each method over a range of both cluster distinctiveness and sampling intensity. Additionally, a non-hierarchical, non-Euclidean, iterative clustering method implemented in the R Statistical Language is described. This method, Non-Euclidean Relational Clustering (NERC), creates distinct clusters by dividing the data set in order to maximize the average similarity within each cluster, identifying clusters in which each data point is on average more similar to those within its own group than to those in any other group. While all the methods performed well with clearly differentiated and well-sampled datasets, when data are less than ideal the linkage methods perform poorly compared to non-Euclidean based k-means and the NERC method. Based on this analysis, Unweighted Pair Group Method with Arithmetic Mean and neighbor joining methods are less reliable with incomplete datasets like those found in palaeobiological analyses, and the k-means and NERC methods should be used in their place.


2017 ◽  
Vol 52 (9) ◽  
pp. 751-760 ◽  
Author(s):  
Angela Maria de Sousa ◽  
Maria do Socorro Padilha de Oliveira ◽  
João Tomé de Farias Neto

Abstract: The objective of this work was to quantify the genetic divergence among accessions of white-type acai palm, through morpho-agronomic characters. The accessions belong to the active acai palm germplasm bank of Embrapa Amazônia Oriental. Thirteen characters were evaluated in 26 accessions, originated from six municipalities in the state of Pará, Brazil. The data were subjected to deviance and multivariate analyses, based on the average Euclidean distance, and were grouped by Tocher’s method and the unweighted pair group method with arithmetic mean (UPGMA). The accessions differed for eight characters. The distances among accessions ranged from 0.64 to 2.62, with an average of 1.36, and four groups were formed by Tocher’s method and two by the UPGMA. Seven major components explained 88.03% of the variation, whose graphic dispersion showed the tendency of forming four groups. The characters weight of 100 fruits, number of rachillae per bunch, and fruit yield per bunch contributed the most to the divergence, and the accessions from the municipalities of Breves, Curralinho, and Limoeiro do Ajuru were the most divergent. Therefore, the accessions of white acai palm show strong divergence and variability, which favor the selection of desirable individuals.


Zuriat ◽  
2015 ◽  
Vol 16 (1) ◽  
Author(s):  
Rudhy Gustiano ◽  
Laurent Pouyaud

A great variation in the external morphology of pangasiid catfishes and it is difficult to give a standard definition of their external appearance. The degree of similarity, known as phenetic analysis, is generally one of the criteria on which the recognition of taxa has been based. The objective of this phenetic analysis was to determine the degree of similarity. A measure of dissimilarity was computed from the coordinates of the first two axes of the Correspondence Analysis, using Euclidean distance. The distances were used to produce coefficients of similarity between each species pair. Distances are  non-negative and low values indicate similarity. Distance values for the matrix were then clustered with the unweighted pair-group method using the arithmetic average and summarized in a phenogram. The results showed that all species analysed for proposed genera aggregated. Analysis on 28 species supports that they should belong to four genera.


HortScience ◽  
2015 ◽  
Vol 50 (12) ◽  
pp. 1744-1750
Author(s):  
Kang Hee Cho ◽  
Jung Ho Noh ◽  
Seo Jun Park ◽  
Se Hee Kim ◽  
Dae-Hyun Kim ◽  
...  

Grapevine cultivars have traditionally been identified based on the morphological characteristics, but the identification of closely related cultivars has been difficult because of their similar pedigree backgrounds. In this study, we developed DNA markers for genetic fingerprinting in 37 grapevine cultivars, including 20 cultivars bred in Korea. A total of 180 randomly amplified polymorphic DNA (RAPD) markers were obtained using 30 different primers. The number of polymorphic bands ranged from three (OPG-08 and OPU-19) to nine (OPV-01 and UBC116), with an average of six. RAPD markers were used in cluster analysis performed with the unweighted pair-group method of arithmetic averages (UPGMA). The average similarity value was 0.69 and the dendrogram clustered the 37 grapevine cultivars into five clusters. The relationships among the grapevine cultivars were consistent with the known pedigrees of the cultivars. The 50 RAPD fragments selected were sequenced for the development of sequence-characterized amplified region (SCAR) markers. As a result, 16 of 50 fragments were successfully converted into SCAR markers. A single polymorphic band, the same size as the RAPD fragments or smaller, was amplified depending on the primer combinations in the 14 SCAR markers, and codominant polymorphisms were detected using the SCAR markers G119_412 and GB17_732. Among these markers, combination of 11 SCAR markers, GG05_281, G116_319, G146_365, G119_412, GW04_463, G169_515, G116_539, GV04_618, GV01_678, GG05_689, and GB17_732, provided sufficient polymorphisms to distinguish the grapevine cultivars investigated in this study. These newly developed markers could be a fast and reliable tool for identifying grapevine cultivars.


Author(s):  
CHUNG-HORNG LUNG ◽  
XIA XU ◽  
MARZIA ZAMAN

Software architectural design has an enormous effect on downstream software artifacts. Decomposition of function for the final system is one of the critical steps in software architectural design. The process of decomposition is typically conducted by designers based on their intuition and past experiences, which may not be robust sometimes. This paper presents a study of applying the clustering technique to support system decomposition based on requirements and their attributes. The approach can support the architectural design process by grouping closely related requirements to form a subsystem or module. In this paper, we demonstrate our experiments in applying the approach to an industrial communication protocol software system and comparing several clustering algorithms. The result obtained from WPGMA (weighted pair-group method using arithmetic averages) shows closer resemblance than other clustering methods to the one developed by the designer.


Author(s):  
O. Getmanets ◽  
A. Nekos ◽  
M. Pelikhatyi

Building a background radiation field on the ground on the basis of measurement data taken at a finite number of points is one of the most important tasks of radiation monitoring. The aim of the work: to study the possibility of applying cluster analysis for the tasks of radiation monitoring of the environment. Cluster analysis is a multidimensional statistical analysis. Its main purpose is to split the set of objects under study (observation points) into homogeneous groups or clusters, that is, the task of classifying data and identifying the corresponding structure in them is solved. Methods of research: the measurements of the power of the ambient dose of continuous X-ray and gamma radiation on the terrain by using the MKS-05 dosimeter "TERRA-0"; processing of the obtained data by cluster analysis methods using the computer program "Statistics-10", wherein each cluster point is characterized by three coordinates: two coordinates on the ground and the power of the ambient dose of radiation at a given point; Euclidean distance was chosen as the distance between two points. Results: after processing data using various clustering methods: the method of Complete Linkage, the method of Weighted pair-group average and the Ward's method, it was found that the results of the analysis practically coincide with each other, that proves the reliability of the application of cluster analysis for the tasks of radiation monitoring of the environment and mapping of radiation pollution. Conclusions: the concept of a "radiation cluster" was first formulated in this work, combining coordinates on a plane with an ambient dose rate;the possibility of using cluster analysis to construct a map of radiation pollution of the environment has been proved by sequential projectionfrom more connected to less connected radiation clusters onto the plane of the controlled zone. In this sense, cluster analysis is similar to the operator approach to the construction of the radiation field. For further research, it is of some interest to study the issues of integration of cluster analysis with geographic information systems.


2010 ◽  
Vol 148 (2) ◽  
pp. 171-181 ◽  
Author(s):  
T. JHANG ◽  
M. KAUR ◽  
P. KALIA ◽  
T. R. SHARMA

SUMMARYGenetic variability in carrots is a consequence of allogamy, which leads to a high level of inbreeding depression, affecting the development of new varieties. To understand the extent of genetic variability in 40 elite indigenous breeding lines of subtropical carrots, 48 DNA markers consisting of 16 inter simple sequence repeats (ISSRs), 10 universal rice primers (URPs), 16 random amplification of polymorphic DNA (RAPD) and six simple sequence repeat (SSR) markers were used. These 48 markers amplified a total of 591 bands, of which 569 were polymorphic (0·96). Amplicon size ranged from 200 to 3500 base pairs (bp) in ISSR, RAPD and URPs markers and from 100 to 300 bp in SSR markers. The ISSR marker system was found to be most efficient with (GT)n motifs as the most abundant SSR loci in the carrot genome. The unweighted pair group method with arithmetic mean (UPGMA) analysis of the combined data set of all the DNA markers obtained by four marker systems classified 40 genotypes in two groups with 0·45 genetic similarity with high Mantel matrix correlation (r=0·92). The principal component analysis (PCA) of marker data also explained 0·55 of the variation by first three components. Molecular diversity was very high and non-structured in these open-pollinated genotypes. The study demonstrated for the first time that URPs can be used successfully in genetic diversity analysis of tropical carrots. In addition, an entirely a new set of microsatellite markers, derived from the expressed sequence tags (ESTs) sequences of carrots, has been developed and utilized successfully.


2019 ◽  
Vol 20 (3) ◽  
pp. 629-635
Author(s):  
LAILANI A LAILANI A. MASUNGSONG ◽  
MARILYN M BELARMINO ◽  
INOCENCIO E BUOT JR

Abstract. Masungsong LA, Belarmino MM, Buot IEJr. 2019. Delineation of the selected Cucumis L. species and accessions using leaf architecture characters. Biodiversitas 20: 629-635. Regardless of the several attempts of the early and recent studies to separate the wild species of Cucumis from the cultivated ones, there is still taxonomic confusion brought about by the similarities in morphology of the genus. In a gene bank with so many species and accessions of Cucumis stored, it is appropriate to delineate these numerous accessions to save time and resources as well. This study aims to delineate fifty selected Cucumis accessions based on leaf architecture. Using Unweighted Pair Group Method using Averages (UPGMA) and Euclidean distance coefficient, a cluster analysis for the fifty Cucumis accessions was done. A dendogram with cophenetic coefficient of 0.9606 supported the clustering of the Cucumis species and accessions. At Eucledian distance of 1.5 two major clusters were formed on the basis of secondary vein spacing. Cucumis melo accessions separated from all the remaining accessions of C. myriocarpus, C. metuliferus, C. anguria and C. anguria var longaculeatus for having an increasing towards the base secondary vein spacing while the rest have irregular pattern of secondary vein spacing. Further sub-clustering of the remaining accessions comprising four species were delineated on the basis of tertiary vein (C. myriocarpus), tertiary vein angle to primary (C. metuliferus), and blade class (C. anguria and C. anguria var longaculeatus). Laminar shape delineated C. myriocarpus accessions from each other, apex angle for C. metuliferus accessions, and primary vein size for C. melo accessions. Results implied that leaf architecture is a good tool to classify the numerous accessions of Cucumis.


2008 ◽  
Vol 5 (1) ◽  
pp. 33-41 ◽  
Author(s):  
Huang Wen-Kun ◽  
Guo Jian-Ying ◽  
Wan Fang-Hao ◽  
Gao Bi-Da ◽  
Xie Bing-Yan

AbstractEupatorium adenophorum (crofton weed) is one of the most widespread invasive species in China. Its genetic diversity and population structure in China were analysed by amplified fragment length polymorphism (AFLP). Three primer pairs were selected for the analysis and 490 bands were produced from 62 E. adenophorum populations selected from five major geographic areas. A total of 328 of the bands showed polymorphism [percentage of polymorphic bands (PPB)=59.4%]. Diversity levels of populations were relatively high (mean expected heterozygosity=0.154, mean Shannon index=0.241). At the regional level, the AMOVA indicated that about 70.25% of variation in the data set was from genotypic variations within populations, whereas 8.04% of the variation was due to regional differences, and the remaining 21.71% to differences among populations within the provincial regions. Cluster analysis based on the unweighted pair-group method using the method of arithmetic averages (UPGMA) grouped the majority of E. adenophorum populations into four main clusters, which correspond to their geographic regions. It is concluded that E. adenophorum spread mainly by wind or water and its genetic diversity level in newly invaded areas was lower than that in formerly colonized areas.


Sign in / Sign up

Export Citation Format

Share Document