ISSRseq: an extensible method for reduced representation sequencing

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.

Download Full-text

A population genomics approach to uncover the CNVs, and their evolutionary significance, hidden in reduced‐representation sequencing data sets

Molecular Ecology ◽

10.1111/mec.15665 ◽

2020 ◽

Vol 29 (24) ◽

pp. 4749-4753

Author(s):

Anna Tigano

Keyword(s):

Population Genomics ◽

Evolutionary Significance ◽

Data Sets ◽

Sequencing Data ◽

Reduced Representation ◽

Reduced Representation Sequencing

Download Full-text

DepthFinder: a tool to determine the optimal read depth for reduced-representation sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz473 ◽

2019 ◽

Vol 36 (1) ◽

pp. 26-32

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

Brian Boyle ◽

François Belzile

Keyword(s):

Cost Effective ◽

Read Depth ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Reduced Representation ◽

Sequencing Platform ◽

Genome Complexity ◽

Wide Range ◽

Broad Array ◽

Reduced Representation Sequencing

Abstract Motivation Identification of DNA sequence variations such as single nucleotide polymorphisms (SNPs) is a fundamental step toward genetic studies. Reduced-representation sequencing methods have been developed as alternatives to whole genome sequencing to reduce costs and enable the analysis of many more individual. Amongst these methods, restriction site associated sequencing (RSAS) methodologies have been widely used for rapid and cost-effective discovery of SNPs and for high-throughput genotyping in a wide range of species. Despite the extensive improvements of the RSAS methods in the last decade, the estimation of the number of reads (i.e. read depth) required per sample for an efficient and effective genotyping remains mostly based on trial and error. Results Herein we describe a bioinformatics tool, DepthFinder, designed to estimate the required read counts for RSAS methods. To illustrate its performance, we estimated required read counts in six different species (human, cattle, spruce budworm, salmon, barley and soybean) that cover a range of different biological (genome size, level of genome complexity, level of DNA methylation and ploidy) and technical (library preparation protocol and sequencing platform) factors. To assess the prediction accuracy of DepthFinder, we compared DepthFinder-derived results with independent datasets obtained from an RSAS experiment. This analysis yielded estimated accuracies of nearly 94%. Moreover, we present DepthFinder as a powerful tool to predict the most effective size selection interval in RSAS work. We conclude that DepthFinder constitutes an efficient, reliable and useful tool for a broad array of users in different research communities. Availability and implementation https://bitbucket.org/jerlar73/DepthFinder Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

SRG extractor: a skinny reference genome approach for reduced-representation sequencing

Bioinformatics ◽

10.1093/bioinformatics/btz043 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3160-3162

Author(s):

Davoud Torkamaneh ◽

Jérôme Laroche ◽

Istvan Rajcan ◽

François Belzile

Keyword(s):

Reference Genome ◽

Computing Time ◽

Supplementary Information ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Scanning Method ◽

Reduced Representation ◽

A Genome ◽

Wide Range ◽

Reduced Representation Sequencing

Abstract Motivation Reduced-representation sequencing is a genome-wide scanning method for simultaneous discovery and genotyping of thousands to millions of single nucleotide polymorphisms that is used across a wide range of species. However, in this method a reproducible but very small fraction of the genome is captured for sequencing, while the resulting reads are typically aligned against the entire reference genome. Results Here we present a skinny reference genome approach in which a simplified reference genome is used to decrease computing time for data processing and to increase single nucleotide polymorphism counts and accuracy. A skinny reference genome can be integrated into any reduced-representation sequencing analytical pipeline. Availability and implementation https://bitbucket.org/jerlar73/SRG-Extractor. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Customize and get the most out of your reduced‐representation sequencing experiment with the new simulation software RADinitio

Molecular Ecology Resources ◽

10.1111/1755-0998.13218 ◽

2020 ◽

Author(s):

Marvin Choquet

Keyword(s):

Simulation Software ◽

Reduced Representation ◽

Sequencing Experiment ◽

Reduced Representation Sequencing

Download Full-text

Degenerate adaptor sequences for detecting PCR duplicates in reduced representation sequencing data improve genotype calling accuracy

Molecular Ecology Resources ◽

10.1111/1755-0998.12314 ◽

2014 ◽

Vol 15 (2) ◽

pp. 329-336 ◽

Cited By ~ 32

Author(s):

M. M. Y. Tin ◽

F. E. Rheindt ◽

E. Cros ◽

A. S. Mikheyev

Keyword(s):

Sequencing Data ◽

Genotype Calling ◽

Reduced Representation ◽

Pcr Duplicates ◽

Reduced Representation Sequencing

Download Full-text

Population genetic structure and gene flow of rare and endangered Tetraena mongolica Maxim revealed by reduced representation sequencing

10.21203/rs.3.rs-19749/v1 ◽

2020 ◽

Author(s):

Cheng Jin ◽

Huixia Kao ◽

Shubin Dong

Keyword(s):

Genetic Diversity ◽

Gene Flow ◽

Genetic Structure ◽

Population Genetic Structure ◽

Population Genetic ◽

Industrial Development ◽

High Quality ◽

Reduced Representation ◽

Rare And Endangered ◽

Reduced Representation Sequencing

Abstract BackgroundStudying population genetic structure and gene flow of plant populations and their influence factors is crucial in field of conservation biology, especially rare and endangered plants. Tetraena mongolica Maxim (TM), belong to Zygophyllaceae family, a rare and endangered plant with narrow distribution. Due to excessive logging, urban expansion, industrial development and development of the scenic spot in the last decades, has caused habitat fragments and decline.ResultsIn this study, the genetic diversity, the population genetic structure and gene flow of TM populations were evaluated by reduced representation sequencing technology, a total of more than 133.45 GB high-quality clean reads and 38,097 high-quality SNPs were generated. Analysis based on multiple methods, we found existing TM populations have moderate levels of genetic diversity, very low genetic differentiation and high levels of gene flow between populations. Population structure and principal coordinates analysis showed that 8 TM populations can be divided into two groups, Mantel test detected no significant correlation between geographical distances and genetic distance for the whole sampling. The migration model indicated that the gene flow is more of an north to south migration pattern in history.ConclusionsOur study demonstrate that the present genetic structure is mainly due to habitat fragmentation caused by urban sprawl, industrial development and coal mining. For recommendations of conservation management, all 8 populations should be protected as a whole population, rather than just those in the core area of TM nature reserve, especially the populations near the edge of TM distribution in cities and industrial areas deserve our special protection.

Download Full-text