scholarly journals SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data

PLoS ONE ◽  
2012 ◽  
Vol 7 (7) ◽  
pp. e37558 ◽  
Author(s):  
Rasmus Nielsen ◽  
Thorfinn Korneliussen ◽  
Anders Albrechtsen ◽  
Yingrui Li ◽  
Jun Wang
2021 ◽  
Author(s):  
Michael Schneider ◽  
Asis Shrestha ◽  
Agim Ballvora ◽  
Jens Leon

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.


2016 ◽  
Vol 49 (1) ◽  
Author(s):  
Nicolas Berthet ◽  
Stéphane Descorps-Declère ◽  
Andriniaina Andy Nkili-Meyong ◽  
Emmanuel Nakouné ◽  
Antoine Gessain ◽  
...  

2013 ◽  
Vol 23 (11) ◽  
pp. 1852-1861 ◽  
Author(s):  
Filipe G. Vieira ◽  
Matteo Fumagalli ◽  
Anders Albrechtsen ◽  
Rasmus Nielsen

2019 ◽  
Vol 17 (02) ◽  
pp. 1950008 ◽  
Author(s):  
Sanjeev Kumar ◽  
Suneeta Agarwal ◽  
Ranvijay

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome re-sequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20–40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows–Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6[Formula: see text]N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25[Formula: see text]N to 5[Formula: see text]N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html .


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Takumi Miura ◽  
Satoshi Yasuda ◽  
Yoji Sato

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.


BMC Genomics ◽  
2012 ◽  
Vol 13 (1) ◽  
pp. 16 ◽  
Author(s):  
Michael P Mullen ◽  
Christopher J Creevey ◽  
Donagh P Berry ◽  
Matt S McCabe ◽  
David A Magee ◽  
...  

Author(s):  
А.Т. ДАУГАЛИЕВА ◽  
С.Т. ДАУГАЛИЕВА ◽  
Б.С. АРЫНГАЗИЕВ ◽  
Т.А. ЛАВРЕНТЬЕВА

Целью исследования было определение таксономической структуры микробиома кишечника крупного рогатого скота породы Абердин-Ангус с помощью технологии секвенирования нового поколения. 16S метагеномный анализ, позволил определить микробный состав содержимого кишечника, минуя стадию культивирования на питательных средах. Проведена генетическая идентификация и получен таксономический профиль всех присутствующих бактерий, в том числе и некультивируемых форм. The aim of the study was to determine the taxonomic structure of the intestinal microbiome of Aberdeen Angus cattle using a new generation sequencing technology. 16S metagenomic analysis made it possible to determine the microbial composition of the intestinal contents bypassing the stage of cultivation on nutrient media. Genetic identification was carried out and a taxonomic profile of all bacteria present, including non-cultivated forms, was obtained. Key words: microbiome, cattle, Aberdeen Angus, next generation sequencing.


Sign in / Sign up

Export Citation Format

Share Document