Peer Review #2 of "dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms (v0.1)"

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

Download Full-text

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

10.7287/peerj.preprints.314v1 ◽

2014 ◽

Author(s):

Jonathan Puritz ◽

Christopher M. Hollenbeck ◽

John R. Gold

Keyword(s):

Population Genomics ◽

De Novo ◽

Variant Calling ◽

Population Level ◽

Model Organisms ◽

Effective Population ◽

Reduction Techniques ◽

Indel Polymorphisms ◽

Indel Calling ◽

Population Sizes

Restriction-site associated DNA sequencing (RADseq) has become a powerful and useful approach for population genomics. Currently, no software exists that utilizes both paired-end reads from RADseq data to efficiently produce population-informative variant calls, especially for organisms with large effective population sizes and high levels of genetic polymorphism but for which no genomic resources exist. dDocent is an analysis pipeline with a user-friendly, command-line interface designed to process individually barcoded RADseq data (with double cut sites) into informative SNPs/Indels for population-level analyses. The pipeline, written in BASH, uses data reduction techniques and other stand-alone software packages to perform quality trimming and adapter removal, de novo assembly of RAD loci, read mapping, SNP and Indel calling, and baseline data filtering. Double-digest RAD data from population pairings of three different marine fishes were used to compare dDocent with Stacks, the first generally available, widely used pipeline for analysis of RADseq data. dDocent consistently identified more SNPs shared across greater numbers of individuals and with higher levels of coverage. This is most likely due to the fact that dDocent quality trims instead of filtering and incorporates both forward and reverse reads in assembly, mapping, and SNP calling, thus enabling use of reads with Indel polymorphisms. The pipeline and a comprehensive user guide can be found at (http://dDocent.wordpress.com).

Download Full-text

dDocent: a RADseq, variant-calling pipeline designed for population genomics of non-model organisms

PeerJ ◽

10.7717/peerj.431 ◽

2014 ◽

Vol 2 ◽

pp. e431 ◽

Cited By ~ 159

Author(s):

Jonathan B. Puritz ◽

Christopher M. Hollenbeck ◽

John R. Gold

Keyword(s):

Population Genomics ◽

Variant Calling ◽

Model Organisms

Download Full-text

Genome sequencing and population genomics in non-model organisms

Trends in Ecology & Evolution ◽

10.1016/j.tree.2013.09.008 ◽

2014 ◽

Vol 29 (1) ◽

pp. 51-63 ◽

Cited By ~ 374

Author(s):

Hans Ellegren

Keyword(s):

Genome Sequencing ◽

Population Genomics ◽

Model Organisms

Download Full-text

Peer Review #2 of "DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics (v0.1)"

10.7287/peerj.9291v0.1/reviews/2 ◽

2020 ◽

Keyword(s):

Peer Review ◽

Population Genomics ◽

De Novo

Download Full-text

Peer Review #1 of "DiscoSnp-RAD: de novo detection of small variants for RAD-Seq population genomics (v0.1)"

10.7287/peerj.9291v0.1/reviews/1 ◽

2020 ◽

Keyword(s):

Peer Review ◽

Population Genomics ◽

De Novo

Download Full-text

Peer Review #2 of "CircParser: a novel streamlined pipeline for circular RNA structure and host gene prediction in non-model organisms (v0.1)"

10.7287/peerj.8757v0.1/reviews/2 ◽

2020 ◽

Author(s):

A Sokolov

Keyword(s):

Peer Review ◽

Rna Structure ◽

Gene Prediction ◽

Circular Rna ◽

Host Gene ◽

Model Organisms

Download Full-text

GBStools: A Unified Approach for Reduced Representation Sequencing and Genotyping

10.1101/030494 ◽

2015 ◽

Author(s):

Thomas F Cooke ◽

Muh-Ching Yee ◽

Marina Muzzio ◽

Alexandra Sockell ◽

Ryan Bell ◽

...

Keyword(s):

Restriction Site ◽

Variant Calling ◽

Simulated Data ◽

Error Rates ◽

Genomic Diversity ◽

Model Organisms ◽

Data Sets ◽

Reduced Representation ◽

Restriction Site Polymorphisms ◽

Reduced Representation Sequencing

Reduced representation sequencing methods such as genotyping-by-sequencing (GBS) enable low-cost measurement of genetic variation without the need for a reference genome assembly. These methods are widely used in genetic mapping and population genetics studies, especially with non-model organisms. Variant calling error rates, however, are higher in GBS than in standard sequencing, in particular due to restriction site polymorphisms, and few computational tools exist that specifically model and correct these errors. We developed a statistical method to remove errors caused by restriction site polymorphisms, implemented in the software package GBStools. We evaluated it in several simulated data sets, varying in number of samples, mean coverage and population mutation rate, and in two empirical human data sets (N = 8 and N = 63 samples). In our simulations, GBStools improved genotype accuracy more than commonly used filters such as Hardy-Weinberg equilibrium p-values. GBStools is most effective at removing genotype errors in data sets over 100 samples when coverage is 40X or higher, and the improvement is most pronounced in species with high genomic diversity. We also demonstrate the utility of GBS and GBStools for human population genetic inference in Argentine populations and reveal widely varying individual ancestry proportions and an excess of singletons, consistent with recent population growth.

Download Full-text