scholarly journals Information-based summary statistics for spatial genetic structure inference

2021 ◽  
Author(s):  
Xinghu Qin ◽  
Oscar E. Gaggiotti

AbstractInference of spatial patterns of genetic structure often relies on parameter estimation and model evaluation using a set of summary statistics (SS) that summarise the information present in the data. An important subset of these SS is best described as diversity indices, which are based on information theory principles that can be classified as belonging to three different ‘families’ encompassing a spectrum of information measures, qH. These include the richness family of order q = 0, ArSS; the Shannon family of order q = 1, HSS; and the heterozygosity family of order q = 2, HeSS. Although commonly used by ecologists, the Shannon family has been rather neglected by population geneticists and evolutionary biologists. However, recent population genetic studies have advocated their use, yet the power of these SS for spatial structure discrimination has not been systematically assessed.In this study, we performed a comprehensive assessment of the three families of SS, as well as a fourth family consisting of SS belonging to the Shannon family but expressed in terms of Hill numbers , for spatial structure inference using simulated microsatellites data under typical spatial scenarios. To give an unbiased evaluation, we used three machine learning methods, Kernel Local Fisher discriminant analysis (KLFDA), random forest classification (RFC), and deep neural network (DL), to test the performance of different SS to discriminate between spatial scenarios, and then identified the most informative metrics for discriminatory power.Results showed that the SS family of order q = 1 expressed in terms of Hill numbers, , outperformed the other two families (ArSS, HeSS) as well as the untransformed Shannon entropy (HSS) family. Jaccard dissimilarity (J) and its Mantel’s r showed the highest discriminatory power to discriminate all spatial scenarios, followed by Shannon differentiation ΔD and its Mantel’s r.Information-based summary statistics, especially the diversity of order q = 1 and Shannon differentiation measures, can increase the power of spatial structure inference. In addition, different sets of SS provide complementary power for discriminating between spatial scenarios.

2015 ◽  
Vol 13 ◽  
pp. 17-30 ◽  
Author(s):  
Agnieszka Sutkowska ◽  
Kesara Anamthawat-Jónsson ◽  
Borgthór Magnússon ◽  
Wojciech Bąba ◽  
Józef R. Mitka

Prior to the present study there was limited knowledge about the genetic basis of plant colonization on the 50-year-old island of Surtsey, South Iceland. The aim here was to compare genetic structure of two contrasting species, Festuca rubra (arctic fescue) and Empetrum nigrum (crowberry), which have colonized Surtsey since 1973 and 1993, respectively. Inter-simple sequence repeat (ISSR) markers were used to assess genetic diversity and population structure. Two census periods were compared: 1996-1997 and 2005-2006. Using six ISSR primers, we obtained 103 and 139 discernible DNA fragments from F. rubra and E. nigrum respectively. Although the two species displayed similarly high genetic diversity indices (h = 0.238 and 0.235; I = 0.384 and 0.380, respectively), they differed significantly in their genetic profiles. Festuca was genetically structured at the subpopulation level (FST = 0.034, p = 0.007), whereas Empetrum showed a lack of genetic differentiation. A Bayesian STRUCTURE computation further revealed temporal and spatial genetic structure of the species. The early arrival grass F. rubra has expanded from a local genepool. The population was however initially established from different sources, forming a genetic melting pot on Surtsey. On the other hand, the late arrival shrub E. nigrum probably derived from a common source of immigrants.


2020 ◽  
Author(s):  
Jesse F. Abrams ◽  
Rahel Sollmann ◽  
Simon L. Mitchell ◽  
Matthew J. Struebig ◽  
Andreas Wilting

AbstractMeasuring the multidimensional diversity properties of a community is of great importance for ecologists, conservationists and stakeholders. Diversity profiles, a plotted series of Hill numbers, simultaneously capture all the common diversity indices. However, diversity metrics require information on species abundance. They often rely on raw count data without accounting for imperfect and varying detection, although detectability can vary between species and study sites. Hierarchical occupancy models explicitly account for variation in detectability, and Hill numbers have been expanded to allow estimation based on occupancy probability. But agreement between occupancy and abundance-based diversity profiles has not been investigated.Here, we fit community occupancy models to simulated animal communities to explore how well occupancy-based diversity profiles reflect true abundance-based diversity. Because we expect occupancy-based diversity to be overestimated, we further tested a novel occupancy thresholding approach to reduce potential biases in the estimated diversity profiles. Finally, we use empirical data from a megadiverse bird community to present how the framework can be extended to consider trait or phylogeny-based similarity when calculating diversity profiles.The simulation study showed that occupancy-based diversity profiles produced among-community patterns in diversity similar to true abundance diversity profiles, although within-community diversity was overestimated with the exception of richness. While applying an occupancy threshold reduced this positive bias, this resulted in negative bias in species richness estimates and slightly reduced the ability to reproduce true differences among the simulated communities. Application of our approach to a large bird dataset revealed differential diversity patterns in communities of different habitat types. Accounting for phylogenetic and ecological similarities between species reduced diversity and its variability among habitats.Our framework allows investigating the complexity of diversity for incidence data, while accounting for imperfect and varying detection probabilities, as well as species similarities. Visualizing results in the form of diversity profiles facilitates comparison of diversity between sites or across time. Therefore, our extension to the diversity profile framework will be a useful tool for studying and monitoring biodiversity.


CERNE ◽  
2011 ◽  
Vol 17 (2) ◽  
pp. 195-201 ◽  
Author(s):  
Mirian de Sousa Silva ◽  
Fábio de Almeida Vieira ◽  
Dulcinéia de Carvalho

Geonoma schottiana is an underbrush palm which is found in high densities in tropical forests. This species is known for having an asynchronous fruit producing pattern, over all seasons of the year, thus being an important food source for frugivores. This work aims to determine the diversity and spatial genetic structure of two natural populations, referred to as MC I and MC II, of which 60 individuals were sampled, in Poço Bonito Biological Reserve, Lavras, Minas Gerais state. Results of 10 polymorphic isozyme loci indicated a high genetic diversity for the species (Ĥe= 0.428 and Ĥo = 0.570), with an mean number of alleles per locus of 2.0. Estimates of Cockerham's coancestry coefficients indicated an absence of intrapopulation (<img border=0 width=28 height=24 src="../../../../../../img/revistas/cerne/v17n2/a06car02.jpg" > or = -0.343) and interpopulation inbreeding (<img border=0 width=26 height=26 src="../../../../../../img/revistas/cerne/v17n2/a06car01.jpg" > or = -0.161), suggesting that on average populations are not endogamous. A high genetic divergence was found between populations (<img border=0 width=26 height=27 src="../../../../../../img/revistas/cerne/v17n2/a06car03.jpg"> = 13.5%), in comparison to most tropical species (<5%). Consequently, the estimated historical gene flow was low (<img border=0 width=26 height=27 src="../../../../../../img/revistas/cerne/v17n2/a06car04.jpg">m = 0.40). The analysis of spatial distribution of G. schottiana genotypes in MCI revealed a random distribution of genotypes. The high genetic diversity indices found suggest that the populations in question favor in situ genetic conservation, consequently favoring the conservation of riparian environments.


Forests ◽  
2018 ◽  
Vol 9 (10) ◽  
pp. 622 ◽  
Author(s):  
Andrea Piotti ◽  
Matteo Garbarino ◽  
Camilla Avanzi ◽  
Roberta Berretti ◽  
Renzo Motta ◽  
...  

The tandem analysis of dendrochronological and genetic data is piquing forest ecologists’ interest and represents a promising approach for studying the temporal development of genetic structure in forest tree populations. Such multidisciplinary approach can help elucidate to what extent different management practices have impacted the fine-scale spatial genetic structure of forest stands through time. In this study, we jointly analysed spatial, age and genetic data from three differently managed Norway spruce permanent plots to assess: (1) possible differences among plots in the spatial distribution of individuals and their genetic structure due to different management practices, and (2) whether modifications in the age structure influenced the fine-scale spatial genetic structure within each permanent plot. With these aims, we genetically characterized at five nuclear microsatellite markers a large subset (328) of all the trees for which spatial and age data were collected (1472). We found that different management practices determined a similar spatial structure in terms of trees’ ages (r < 25 m in all plots) and neutral genetic diversity (Sp ranging from 0.002 to 0.004). Hot spots and cold spots of trees’ age were not statistically different in terms of genetic diversity, and trees’ age was not statistically different among the genetic clusters detected. On the other hand, the spatial distribution of individuals was significantly clustered up to 22 m only in the wooded pasture plot. Our main findings show that forest land use and management can indeed determine markedly different spatial layouts of Norway spruce individuals but do not produce strong distortions in the spatial structure of age and genetic parameters.


2019 ◽  
Author(s):  
M. Crotti ◽  
C.E. Adams ◽  
K.R. Elmer

SummaryEpigenetics is increasingly recognised as an important molecular mechanism underlying phenotypic variation. To study DNA methylation in ecological and evolutionary contexts, epiRADseq is a cost-effective next-generation sequencing technique based on reduced representation sequencing of genomic regions surrounding non-/methylated sites. EpiRADseq for genome-wide methylation abundance and ddRADseq for genome-wide SNP genotyping follow very similar library and sequencing protocols, but to date these two types of dataset have been handled separately. Here we test the performance of using epiRADseq data to generate SNPs for population genomic analyses.We tested the robustness of using epiRADseq data for population genomics with two independent datasets: a newly generated single-end dataset for the European whitefish Coregonus lavaretus, and a re-analysis of publicly available, previously published paired-end data on corals. Using standard bioinformatic pipelines with a reference genome and without (i.e. de novo catalogue loci), we compared the number of SNPs retained, population genetic summary statistics, and population genetic structure between data drawn from ddRADseq and epiRADseq library preparations.We find that SNPs drawn from epiRADseq are similar in number to those drawn from ddRADseq, with a 55-83% of SNPs being identified by both methods. Genotyping error rate was <5% in both approaches. For summary statistics such as heterozygosity and nucleotide diversity, there is a strong correlation between methods (Spearman’s rho > 0.88). Furthermore, identical patterns of population genetic structure were recovered using SNPs from epiRADseq and ddRADseq approaches.We show that SNPs obtained from epiRADseq are highly similar to those from ddRADseq and are equivalent for estimating genetic diversity and population structure. This finding is particularly relevant to researchers interested in genetics and epigenetics on the same individuals because using a single epigenomic approach to generate two datasets greatly reduces the time and financial costs compared to using these techniques separately. It also efficiently enables correction of epigenetic estimates with population genetic data. Many studies will benefit from a combinatorial approach with genetic and epigenetic markers and this demonstrates a single, efficient method to do so.


2005 ◽  
Vol 250 (3-4) ◽  
pp. 231-242 ◽  
Author(s):  
M. Y. Chung ◽  
K.-J. Kim ◽  
J.-H. Pak ◽  
C.-W. Park ◽  
B.-Y. Sun ◽  
...  

2021 ◽  
Vol 21 (1) ◽  
Author(s):  
Kelly B. Klingler ◽  
Joshua P. Jahner ◽  
Thomas L. Parchman ◽  
Chris Ray ◽  
Mary M. Peacock

Abstract Background Distributional responses by alpine taxa to repeated, glacial-interglacial cycles throughout the last two million years have significantly influenced the spatial genetic structure of populations. These effects have been exacerbated for the American pika (Ochotona princeps), a small alpine lagomorph constrained by thermal sensitivity and a limited dispersal capacity. As a species of conservation concern, long-term lack of gene flow has important consequences for landscape genetic structure and levels of diversity within populations. Here, we use reduced representation sequencing (ddRADseq) to provide a genome-wide perspective on patterns of genetic variation across pika populations representing distinct subspecies. To investigate how landscape and environmental features shape genetic variation, we collected genetic samples from distinct geographic regions as well as across finer spatial scales in two geographically proximate mountain ranges of eastern Nevada. Results Our genome-wide analyses corroborate range-wide, mitochondrial subspecific designations and reveal pronounced fine-scale population structure between the Ruby Mountains and East Humboldt Range of eastern Nevada. Populations in Nevada were characterized by low genetic diversity (π = 0.0006–0.0009; θW = 0.0005–0.0007) relative to populations in California (π = 0.0014–0.0019; θW = 0.0011–0.0017) and the Rocky Mountains (π = 0.0025–0.0027; θW = 0.0021–0.0024), indicating substantial genetic drift in these isolated populations. Tajima’s D was positive for all sites (D = 0.240–0.811), consistent with recent contraction in population sizes range-wide. Conclusions Substantial influences of geography, elevation and climate variables on genetic differentiation were also detected and may interact with the regional effects of anthropogenic climate change to force the loss of unique genetic lineages through continued population extirpations in the Great Basin and Sierra Nevada.


Sign in / Sign up

Export Citation Format

Share Document