Impact of reduced-representation sequencing protocols on detecting population structure in a threatened marsupial

2019 ◽  
Vol 46 (5) ◽  
pp. 5575-5580 ◽  
Author(s):  
B. R. Wright ◽  
C. E. Grueber ◽  
M. J. Lott ◽  
K. Belov ◽  
R. N. Johnson ◽  
...  
2020 ◽  
Author(s):  
D. Selechnik ◽  
M.F. Richardson ◽  
M.K. Hess ◽  
A.S. Hess ◽  
K.G. Dodds ◽  
...  

AbstractAs technological advancements enhance our ability to study population genetics, we must understand how the intrinsic properties of our datasets influence the decisions we make when designing experiments. Filtering parameter thresholds, such as call rate and minimum minor allele frequency (MAF), are known to affect inferences of population structure in reduced representation sequencing (RRS) studies. However, it is unclear to what extent the impacts of these parameter choices vary across datasets. Here, we reviewed literature on filtering choices and levels of genetic differentiation across RRS studies on wild populations to highlight the diverse approaches that have been used. Next, we hypothesized that choices in filtering thresholds would have the greatest impact when analyzing datasets with low levels of genetic differentiation between populations. To test this hypothesis, we produced seven simulated RRS datasets with varying levels of population structure, and analyzed them using four different combinations of call rate and MAF. We performed the same analysis on two empirical RRS datasets (low or high population structure). Our simulated and empirical results suggest that the effects of filtering choices indeed vary based on inherent levels of differentiation: specifically, choosing stringent filtering choices was important to detect distinct populations that were slightly differentiated, but not those that were highly differentiated. As a result, experimental design and analysis choices need to consider attributes of each specific dataset. Based on our literature review and analyses, we recommend testing a range of filtering parameter choices, and presenting all results with clear justification for ultimate filtering decisions used in downstream analyses.


Author(s):  
Brandon T. Sinn ◽  
Sandra J. Simon ◽  
Mathilda V. Santee ◽  
Stephen P. DiFazio ◽  
Nicole M. Fama ◽  
...  

2018 ◽  
Vol 49 (6) ◽  
pp. 579-591 ◽  
Author(s):  
Zhe Zhang ◽  
Qianqian Zhang ◽  
Qian Xiao ◽  
Hao Sun ◽  
Hongding Gao ◽  
...  

2015 ◽  
Author(s):  
Thomas F Cooke ◽  
Muh-Ching Yee ◽  
Marina Muzzio ◽  
Alexandra Sockell ◽  
Ryan Bell ◽  
...  

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.


2019 ◽  
Vol 36 (1) ◽  
pp. 26-32
Author(s):  
Davoud Torkamaneh ◽  
Jérôme Laroche ◽  
Brian Boyle ◽  
François Belzile

Abstract Motivation Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. Results Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. Availability and implementation https://bitbucket.org/jerlar73/DepthFinder Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (17) ◽  
pp. 3160-3162
Author(s):  
Davoud Torkamaneh ◽  
Jérôme Laroche ◽  
Istvan Rajcan ◽  
François Belzile

Abstract Motivation Reduced-representation sequencing is a genome-wide scanning method for simultaneous discovery and genotyping of thousands to millions of single nucleotide polymorphisms that is used across a wide range of species. However, in this method a reproducible but very small fraction of the genome is captured for sequencing, while the resulting reads are typically aligned against the entire reference genome. Results Here we present a skinny reference genome approach in which a simplified reference genome is used to decrease computing time for data processing and to increase single nucleotide polymorphism counts and accuracy. A skinny reference genome can be integrated into any reduced-representation sequencing analytical pipeline. Availability and implementation https://bitbucket.org/jerlar73/SRG-Extractor. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Cheng Jin ◽  
Huixia Kao ◽  
Shubin Dong

Abstract BackgroundStudying population genetic structure and gene flow of plant populations and their influence factors is crucial in field of conservation biology, especially rare and endangered plants. Tetraena mongolica Maxim (TM), belong to Zygophyllaceae family, a rare and endangered plant with narrow distribution. Due to excessive logging, urban expansion, industrial development and development of the scenic spot in the last decades, has caused habitat fragments and decline.ResultsIn this study, the genetic diversity, the population genetic structure and gene flow of TM populations were evaluated by reduced representation sequencing technology, a total of more than 133.45 GB high-quality clean reads and 38,097 high-quality SNPs were generated. Analysis based on multiple methods, we found existing TM populations have moderate levels of genetic diversity, very low genetic differentiation and high levels of gene flow between populations. Population structure and principal coordinates analysis showed that 8 TM populations can be divided into two groups, Mantel test detected no significant correlation between geographical distances and genetic distance for the whole sampling. The migration model indicated that the gene flow is more of an north to south migration pattern in history.ConclusionsOur study demonstrate that the present genetic structure is mainly due to habitat fragmentation caused by urban sprawl, industrial development and coal mining. For recommendations of conservation management, all 8 populations should be protected as a whole population, rather than just those in the core area of TM nature reserve, especially the populations near the edge of TM distribution in cities and industrial areas deserve our special protection.


Sign in / Sign up

Export Citation Format

Share Document