scholarly journals Kmer2SNP: reference-free SNP calling from raw reads based on matching

Author(s):  
Yanbo Li ◽  
Hardip Patel ◽  
Yu Lin
Keyword(s):  
Author(s):  
Fereshteh Shahoveisi ◽  
Atena Oladzad ◽  
Luis E. del Rio Mendoza ◽  
Seyedali Hosseinirad ◽  
Susan Ruud ◽  
...  

The polyploid nature of canola (Brassica napus) represents a challenge for the accurate identification of single nucleotide polymorphisms (SNPs) and the detection of quantitative trait loci (QTL). In this study, combinations of eight phenotyping scoring systems and six SNP calling and filtering parameters were evaluated for their efficiency in detection of QTL associated with response to Sclerotinia stem rot, caused by Sclerotinia sclerotiorum, in two doubled haploid (DH) canola mapping populations. Most QTL were detected in lesion length, relative areas under the disease progress curve (rAUDPC) for lesion length, and binomial-plant mortality data sets. Binomial data derived from lesion size were less efficient in QTL detection. Inclusion of additional phenotypic sets to the analysis increased the numbers of significant QTL by 2.3-fold; however, the continuous data sets were more efficient. Between two filtering parameters used to analyze genotyping by sequencing (GBS) data, imputation of missing data increased QTL detection in one population with a high level of missing data but not in the other. Inclusion of segregation-distorted SNPs increased QTL detection but did not impact their R2 values significantly. Twelve of the 16 detected QTL were on chromosomes A02 and C01, and the rest were on A07, A09, and C03. Marker A02-7594120, associated with a QTL on chromosome A02 was detected in both populations. Results of this study suggest the impact of genotypic variant calling and filtering parameters may be population dependent while deriving additional phenotyping scoring systems such as rAUDPC datasets and mortality binary may improve QTL detection efficiency.


Author(s):  
Russ Jasper ◽  
Tegan Krista McDonald ◽  
Pooja Singh ◽  
Mengmeng Lu ◽  
Clément Rougeux ◽  
...  

The use of NGS datasets has increased dramatically over the last decade, however, there have been few systematic analyses quantifying the accuracy of the commonly used variant caller programs. Here we used a familial design consisting of diploid tissue from a single Pinus contorta parent and the maternally derived haploid tissue from 106 full-sibling offspring, where mismatches could only arise due to mutation or bioinformatic error. Given the rarity of mutation, we used the rate of mismatches between parent and offspring genotype calls to infer the SNP genotyping error rates of FreeBayes, HaplotypeCaller, SAMtools, UnifiedGenotyper, and VarScan. With baseline filtering HaplotypeCaller and UnifiedGenotyper yielded one to two orders of magnitude larger numbers of SNPs and error rates, whereas FreeBayes, SAMtools and VarScan yielded lower numbers of SNPs and more modest error rates. To facilitate comparison between variant callers we standardized each SNP set to the same number of SNPs using additional filtering, where UnifiedGenotyper consistently produced the smallest proportion of genotype errors, followed by HaplotypeCaller, VarScan, SAMtools, and FreeBayes. Additionally, we found that error rates were minimized for SNPs called by more than one variant caller. Finally, we evaluated the performance of various commonly used filtering metrics on SNP calling. Our analysis provides a quantitative assessment of the accuracy of five widely used variant calling programs and offers valuable insights into both the choice of variant caller program and the choice of filtering metrics, especially for researchers using non-model study systems.


2018 ◽  
Author(s):  
Tristan Cumer ◽  
Charles Pouchon ◽  
Frédéric Boyer ◽  
Glenn Yannic ◽  
Delphine Rioux ◽  
...  

ABSTRACTNext-generation sequencing technologies have opened a new era of research in genomics. Among these, restriction enzyme-based techniques such as restriction-site associated DNA sequencing (RADseq) or double-digest RAD-sequencing (ddRADseq) are now widely used in many population genomics fields. From DNA sampling to SNP calling, both wet and dry protocols have been discussed in the literature to identify key parameters for an optimal loci reconstruction.The impact of these parameters on downstream analyses and biological results drawn from RADseq or ddRADseq data has however not been fully explored yet. In this study, we tackled this issue by investigating the effects of ddRADseq laboratory (i.e. wet protocol) and bioinformatics (i.e. dry protocol) settings on loci reconstruction and inferred biological signal at two evolutionary scale using two systems: a complex of butterfly species (Coenonympha sp.) and populations of Common beech (Fagus sylvatica).Results suggest an impact of wet protocol parameters (DNA quantity, number of PCR cycles during library preparation) on the number of recovered reads and SNPs, the number of unique alleles and individual heterozygosity. We also found that bioinformatic settings (i.e. clustering and minimum coverage thresholds) impact loci reconstruction (e.g. number of loci, mean coverage) and SNP calling (e.g. number of SNPs, heterozygosity). We however do not detect an impact of parameter settings on three types of analysis performed with ddRADseq data: measure of genetic differentiation, estimation of individual admixture, and demographic inferences. In addition, our work demonstrates the high reproducibility and low rate of genotyping inconsistencies of the ddRADseq protocol.Thus, our study highlights the impact of wet parameters on ddRADseq protocol with strong consequences on experimental success and biological conclusions. Dry parameters affects loci reconstruction and descriptive statistics but not biological conclusion for the two studied systems. Overall, this study illustrates, with others, the relevance of ddRADseq for population and evolutionary genomics at the inter- or intraspecific scales.


2019 ◽  
Vol 11 (10) ◽  
pp. 2797-2806 ◽  
Author(s):  
Julie C Chow ◽  
Paul E Anderson ◽  
Andrew M Shedlock

Abstract In the era of genomics, single-nucleotide polymorphisms (SNPs) have become a preferred molecular marker to study signatures of selection and population structure and to enable improved population monitoring and conservation of vulnerable populations. We apply a SNP calling pipeline to assess population differentiation, visualize linkage disequilibrium, and identify loci with sex-specific genotypes of 45 loggerhead sea turtles (Caretta caretta) sampled from the southeastern coast of the United States, including 42 individuals experimentally confirmed for gonadal sex. By performing reference-based SNP calling in independent runs of Stacks, 3,901–6,998 SNPs and up to 30 potentially sex-specific genotypes were identified. Up to 68 pairs of loci were found to be in complete linkage disequilibrium, potentially indicating regions of natural selection and adaptive evolution. This study provides a valuable SNP diagnostic workflow and a large body of new biomarkers for guiding targeted studies of sea turtle genome evolution and for managing legally protected nonmodel iconic species that have high economic and ecological importance but limited genomic resources.


Author(s):  
Dan Yao ◽  
Hainan Wu ◽  
Yuhua Chen ◽  
Wenguo Yang ◽  
Hua Gao ◽  
...  

2012 ◽  
Vol 13 (7) ◽  
pp. R61 ◽  
Author(s):  
Yaping Liu ◽  
Kimberly D Siegmund ◽  
Peter W Laird ◽  
Benjamin P Berman
Keyword(s):  

2015 ◽  
Vol 21 (A) ◽  
pp. 826
Author(s):  
Anatoliy Dimitrov ◽  
Milko Krachunov ◽  
Ognyan Kulev ◽  
Jérôme Salse ◽  
Irena Avdjieva ◽  
...  
Keyword(s):  

2015 ◽  
pp. btv507 ◽  
Author(s):  
Shengjie Gao ◽  
Dan Zou ◽  
Likai Mao ◽  
Huayu Liu ◽  
Pengfei Song ◽  
...  
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document