Impact of reduced-representation sequencing protocols on detecting population structure in a threatened marsupial

AbstractAs technological advancements enhance our ability to study population genetics, we must understand how the intrinsic properties of our datasets influence the decisions we make when designing experiments. Filtering parameter thresholds, such as call rate and minimum minor allele frequency (MAF), are known to affect inferences of population structure in reduced representation sequencing (RRS) studies. However, it is unclear to what extent the impacts of these parameter choices vary across datasets. Here, we reviewed literature on filtering choices and levels of genetic differentiation across RRS studies on wild populations to highlight the diverse approaches that have been used. Next, we hypothesized that choices in filtering thresholds would have the greatest impact when analyzing datasets with low levels of genetic differentiation between populations. To test this hypothesis, we produced seven simulated RRS datasets with varying levels of population structure, and analyzed them using four different combinations of call rate and MAF. We performed the same analysis on two empirical RRS datasets (low or high population structure). Our simulated and empirical results suggest that the effects of filtering choices indeed vary based on inherent levels of differentiation: specifically, choosing stringent filtering choices was important to detect distinct populations that were slightly differentiated, but not those that were highly differentiated. As a result, experimental design and analysis choices need to consider attributes of each specific dataset. Based on our literature review and analyses, we recommend testing a range of filtering parameter choices, and presenting all results with clear justification for ultimate filtering decisions used in downstream analyses.

Download Full-text

ISSRseq: an extensible method for reduced representation sequencing

Methods in Ecology and Evolution ◽

10.1111/2041-210x.13784 ◽

2021 ◽

Author(s):

Brandon T. Sinn ◽

Sandra J. Simon ◽

Mathilda V. Santee ◽

Stephen P. DiFazio ◽

Nicole M. Fama ◽

...

Keyword(s):

Reduced Representation ◽

Reduced Representation Sequencing

Download Full-text

Distribution of runs of homozygosity in Chinese and Western pig breeds evaluated by reduced-representation sequencing data

Animal Genetics ◽

10.1111/age.12730 ◽

2018 ◽

Vol 49 (6) ◽

pp. 579-591 ◽

Cited By ~ 9

Author(s):

Zhe Zhang ◽

Qianqian Zhang ◽

Qian Xiao ◽

Hao Sun ◽

Hongding Gao ◽

...

Keyword(s):

Runs Of Homozygosity ◽

Sequencing Data ◽

Reduced Representation ◽

Pig Breeds ◽

Reduced Representation Sequencing

Download Full-text

GBStools: A Unified Approach for Reduced Representation Sequencing and Genotyping

10.1101/030494 ◽

2015 ◽

Author(s):

Thomas F Cooke ◽

Muh-Ching Yee ◽

Marina Muzzio ◽

Alexandra Sockell ◽

Ryan Bell ◽

...

Keyword(s):

Restriction Site ◽

Variant Calling ◽

Simulated Data ◽

Error Rates ◽

Genomic Diversity ◽

Model Organisms ◽

Data Sets ◽

Reduced Representation ◽

Restriction Site Polymorphisms ◽

Reduced Representation Sequencing

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.

Download Full-text

A population genomics approach to uncover the CNVs, and their evolutionary significance, hidden in reduced‐representation sequencing data sets

Molecular Ecology ◽

10.1111/mec.15665 ◽

2020 ◽

Vol 29 (24) ◽

pp. 4749-4753

Author(s):

Anna Tigano

Keyword(s):

Population Genomics ◽

Evolutionary Significance ◽

Data Sets ◽

Sequencing Data ◽

Reduced Representation ◽

Reduced Representation Sequencing

Download Full-text

DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz473 ◽

2019 ◽

Vol 36 (1) ◽

pp. 26-32

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

Brian Boyle ◽

François Belzile

Keyword(s):

Cost Effective ◽

Read Depth ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Reduced Representation ◽

Sequencing Platform ◽

Genome Complexity ◽

Wide Range ◽

Broad Array ◽

Reduced Representation Sequencing

Abstract Motivation Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. Results Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. Availability and implementation https://bitbucket.org/jerlar73/DepthFinder Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SRG extractor: a skinny reference genome approach for reduced-representation sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz043 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3160-3162

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

Istvan Rajcan ◽

François Belzile

Keyword(s):

Reference Genome ◽

Computing Time ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Scanning Method ◽

Reduced Representation ◽

A Genome ◽

Wide Range ◽

Reduced Representation Sequencing

Abstract Motivation Reduced-representation sequencing is a genome-wide scanning method for simultaneous discovery and genotyping of thousands to millions of single nucleotide polymorphisms that is used across a wide range of species. However, in this method a reproducible but very small fraction of the genome is captured for sequencing, while the resulting reads are typically aligned against the entire reference genome. Results Here we present a skinny reference genome approach in which a simplified reference genome is used to decrease computing time for data processing and to increase single nucleotide polymorphism counts and accuracy. A skinny reference genome can be integrated into any reduced-representation sequencing analytical pipeline. Availability and implementation https://bitbucket.org/jerlar73/SRG-Extractor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Customize and get the most out of your reduced‐representation sequencing experiment with the new simulation software RADinitio

Molecular Ecology Resources ◽

10.1111/1755-0998.13218 ◽

2020 ◽

Author(s):

Marvin Choquet

Keyword(s):

Simulation Software ◽

Reduced Representation ◽

Sequencing Experiment ◽

Reduced Representation Sequencing

Download Full-text

Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy

Molecular Ecology Resources ◽

10.1111/1755-0998.12314 ◽

2014 ◽

Vol 15 (2) ◽

pp. 329-336 ◽

Cited By ~ 32

Author(s):

M. M. Y. Tin ◽

F. E. Rheindt ◽

E. Cros ◽

A. S. Mikheyev

Keyword(s):

Sequencing Data ◽

Genotype Calling ◽

Reduced Representation ◽

Pcr Duplicates ◽

Reduced Representation Sequencing

Download Full-text

Population genetic structure and gene flow of rare and endangered Tetraena mongolica Maxim revealed by reduced representation sequencing

10.21203/rs.3.rs-19749/v1 ◽

2020 ◽

Author(s):

Cheng Jin ◽

Huixia Kao ◽

Shubin Dong

Keyword(s):

Genetic Diversity ◽

Gene Flow ◽

Genetic Structure ◽

Population Genetic Structure ◽

Population Genetic ◽

Industrial Development ◽

High Quality ◽

Reduced Representation ◽

Rare And Endangered ◽

Reduced Representation Sequencing

Abstract BackgroundStudying population genetic structure and gene flow of plant populations and their influence factors is crucial in field of conservation biology, especially rare and endangered plants. Tetraena mongolica Maxim (TM), belong to Zygophyllaceae family, a rare and endangered plant with narrow distribution. Due to excessive logging, urban expansion, industrial development and development of the scenic spot in the last decades, has caused habitat fragments and decline.ResultsIn this study, the genetic diversity, the population genetic structure and gene flow of TM populations were evaluated by reduced representation sequencing technology, a total of more than 133.45 GB high-quality clean reads and 38,097 high-quality SNPs were generated. Analysis based on multiple methods, we found existing TM populations have moderate levels of genetic diversity, very low genetic differentiation and high levels of gene flow between populations. Population structure and principal coordinates analysis showed that 8 TM populations can be divided into two groups, Mantel test detected no significant correlation between geographical distances and genetic distance for the whole sampling. The migration model indicated that the gene flow is more of an north to south migration pattern in history.ConclusionsOur study demonstrate that the present genetic structure is mainly due to habitat fragmentation caused by urban sprawl, industrial development and coal mining. For recommendations of conservation management, all 8 populations should be protected as a whole population, rather than just those in the core area of TM nature reserve, especially the populations near the edge of TM distribution in cities and industrial areas deserve our special protection.

Download Full-text