SNP Calling, Genotype Calling, and Sample Allele Frequency Estimation from New-Generation Sequencing Data

Rasmus Nielsen; Thorfinn Korneliussen; Anders Albrechtsen; Yingrui Li; Jun Wang

doi:10.1371/journal.pone.0037558

High throughput crop genome genotyping by a combination of pool next generation sequencing and haplotype-based data processing

10.21203/rs.3.rs-415602/v1 ◽

2021 ◽

Author(s):

Michael Schneider ◽

Asis Shrestha ◽

Agim Ballvora ◽

Jens Leon

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Frequency Estimation ◽

Whole Genome ◽

Next Generation ◽

Conservation Genomics ◽

High Coverage ◽

Allele Frequency Estimation ◽

Low Coverage ◽

Generation Sequencing

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.

Get full-text (via PubEx)

Improved assembly procedure of viral RNA genomes amplified with Phi29 polymerase from new generation sequencing data

Biological Research ◽

10.1186/s40659-016-0099-y ◽

2016 ◽

Vol 49 (1) ◽

Cited By ~ 6

Author(s):

Nicolas Berthet ◽

Stéphane Descorps-Declère ◽

Andriniaina Andy Nkili-Meyong ◽

Emmanuel Nakouné ◽

Antoine Gessain ◽

...

Keyword(s):

Viral Rna ◽

Sequencing Data ◽

New Generation Sequencing ◽

Assembly Procedure ◽

New Generation ◽

Generation Sequencing

Get full-text (via PubEx)

A unified approach for allele frequency estimation, SNP detection and association studies based on pooled sequencing data using EM algorithms

BMC Genomics ◽

10.1186/1471-2164-14-s1-s1 ◽

2013 ◽

Vol 14 (Suppl 1) ◽

pp. S1 ◽

Cited By ~ 19

Author(s):

Quan Chen ◽

Fengzhu Sun

Keyword(s):

Allele Frequency ◽

Frequency Estimation ◽

Association Studies ◽

Sequencing Data ◽

Unified Approach ◽

Snp Detection ◽

Em Algorithms ◽

Allele Frequency Estimation ◽

Pooled Sequencing

Get full-text (via PubEx)

Estimating inbreeding coefficients from NGS data: Impact on genotype calling and allele frequency estimation

Genome Research ◽

10.1101/gr.157388.113 ◽

2013 ◽

Vol 23 (11) ◽

pp. 1852-1861 ◽

Cited By ~ 51

Author(s):

Filipe G. Vieira ◽

Matteo Fumagalli ◽

Anders Albrechtsen ◽

Rasmus Nielsen

Keyword(s):

Allele Frequency ◽

Frequency Estimation ◽

Genotype Calling ◽

Allele Frequency Estimation ◽

Inbreeding Coefficients ◽

Ngs Data

Get full-text (via PubEx)

Biases and Errors on Allele Frequency Estimation and Disease Association Tests of Next-Generation Sequencing of Pooled Samples

Genetic Epidemiology ◽

10.1002/gepi.21648 ◽

2012 ◽

Vol 36 (6) ◽

pp. 549-560 ◽

Cited By ~ 17

Author(s):

Xiaowei Chen ◽

Jennifer B. Listman ◽

Frank J. Slack ◽

Joel Gelernter ◽

Hongyu Zhao

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Frequency Estimation ◽

Disease Association ◽

Next Generation ◽

Association Tests ◽

Allele Frequency Estimation ◽

Pooled Samples ◽

Generation Sequencing

Get full-text (via PubEx)

An evaluation of allele frequency estimation accuracy using pooled sequencing data

International Journal of Computational Biology and Drug Design ◽

10.1504/ijcbdd.2013.056709 ◽

2013 ◽

Vol 6 (4) ◽

pp. 279 ◽

Cited By ~ 4

Author(s):

Yan Guo ◽

Qiuyin Cai ◽

Chun Li ◽

Jiang Li ◽

Chung I Li ◽

...

Keyword(s):

Allele Frequency ◽

Frequency Estimation ◽

Estimation Accuracy ◽

Sequencing Data ◽

Allele Frequency Estimation ◽

Pooled Sequencing

Get full-text (via PubEx)

Fast and memory efficient approach for mapping NGS reads to a reference genome

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019500082 ◽

2019 ◽

Vol 17 (02) ◽

pp. 1950008 ◽

Cited By ~ 3

Author(s):

Sanjeev Kumar ◽

Suneeta Agarwal ◽

Ranvijay

Keyword(s):

Reference Genome ◽

Next Generation Sequencing Data ◽

Sequencing Data ◽

Efficient Approach ◽

Gapped Alignment ◽

New Generation Sequencing ◽

Genome Space ◽

New Generation ◽

Burrows Wheeler Transform ◽

Generation Sequencing

New generation sequencing machines: Illumina and Solexa can generate millions of short reads from a given genome sequence on a single run. Alignment of these reads to a reference genome is a core step in Next-generation sequencing data analysis such as genetic variation and genome re-sequencing etc. Therefore there is a need of a new approach, efficient with respect to memory as well as time to align these enormous reads with the reference genome. Existing techniques such as MAQ, Bowtie, BWA, BWBBLE, Subread, Kart, and Minimap2 require huge memory for whole reference genome indexing and reads alignment. Gapped alignment versions of these techniques are also 20–40% slower than their respective normal versions. In this paper, an efficient approach: WIT for reference genome indexing and reads alignment using Burrows–Wheeler Transform (BWT) and Wavelet Tree (WT) is proposed. Both exact and approximate alignments are possible by it. Experimental work shows that the proposed approach WIT performs the best in case of protein sequence indexing. For indexing, the reference genome space required by WIT is 0.6[Formula: see text]N (N is the size of reference genome) whereas existing techniques BWA, Subread, Kart, and Minimap2 require space in between 1.25[Formula: see text]N to 5[Formula: see text]N. Experimentally, it is also observed that even using such small index size alignment time of proposed approach is comparable in comparison to BWA, Subread, Kart, and Minimap2. Other alignment parameters accuracy and confidentiality are also experimentally shown to be better than Minimap2. The source code of the proposed approach WIT is available at http://www.algorithm-skg.com/wit/home.html .

Get full-text (via PubEx)

A simple method to estimate the in-house limit of detection for genetic mutations with low allele frequencies in whole-exome sequencing analysis by next-generation sequencing

BMC Genomic Data ◽

10.1186/s12863-020-00956-x ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Takumi Miura ◽

Satoshi Yasuda ◽

Yoji Sato

Keyword(s):

Next Generation Sequencing ◽

Allele Frequency ◽

Somatic Mutations ◽

Limit Of Detection ◽

Allele Frequencies ◽

Genetic Mutations ◽

Sequencing Data ◽

Simple Method ◽

Whole Exome ◽

Generation Sequencing

Abstract Background Next-generation sequencing (NGS) has profoundly changed the approach to genetic/genomic research. Particularly, the clinical utility of NGS in detecting mutations associated with disease risk has contributed to the development of effective therapeutic strategies. Recently, comprehensive analysis of somatic genetic mutations by NGS has also been used as a new approach for controlling the quality of cell substrates for manufacturing biopharmaceuticals. However, the quality evaluation of cell substrates by NGS largely depends on the limit of detection (LOD) for rare somatic mutations. The purpose of this study was to develop a simple method for evaluating the ability of whole-exome sequencing (WES) by NGS to detect mutations with low allele frequency. To estimate the LOD of WES for low-frequency somatic mutations, we repeatedly and independently performed WES of a reference genomic DNA using the same NGS platform and assay design. LOD was defined as the allele frequency with a relative standard deviation (RSD) value of 30% and was estimated by a moving average curve of the relation between RSD and allele frequency. Results Allele frequencies of 20 mutations in the reference material that had been pre-validated by droplet digital PCR (ddPCR) were obtained from 5, 15, 30, or 40 G base pair (Gbp) sequencing data per run. There was a significant association between the allele frequencies measured by WES and those pre-validated by ddPCR, whose p-value decreased as the sequencing data size increased. By this method, the LOD of allele frequency in WES with the sequencing data of 15 Gbp or more was estimated to be between 5 and 10%. Conclusions For properly interpreting the WES data of somatic genetic mutations, it is necessary to have a cutoff threshold of low allele frequencies. The in-house LOD estimated by the simple method shown in this study provides a rationale for setting the cutoff.

Get full-text (via PubEx)

Polymorphism discovery and allele frequency estimation using high-throughput DNA sequencing of target-enriched pooled DNA samples

BMC Genomics ◽

10.1186/1471-2164-13-16 ◽

2012 ◽

Vol 13 (1) ◽

pp. 16 ◽

Cited By ~ 12

Author(s):

Michael P Mullen ◽

Christopher J Creevey ◽

Donagh P Berry ◽

Matt S McCabe ◽

David A Magee ◽

...

Keyword(s):

Dna Sequencing ◽

Allele Frequency ◽

High Throughput ◽

Frequency Estimation ◽

Allele Frequency Estimation ◽

Polymorphism Discovery ◽

Pooled Dna ◽

High Throughput Dna Sequencing

Get full-text (via PubEx)

RESEARCH OF THE MICROBIOM OF THE GASTROINTESTINAL TRACT OF THE ABERDIN-ANGUS BREED CATTLE

МИКРОБИОЛОГИЯ ЖӘНЕ ВИРУСОЛОГИЯ ◽

10.53729/mv-as.2021.03.02 ◽

2021 ◽

Author(s):

А.Т. ДАУГАЛИЕВА ◽

С.Т. ДАУГАЛИЕВА ◽

Б.С. АРЫНГАЗИЕВ ◽

Т.А. ЛАВРЕНТЬЕВА

Keyword(s):

Genetic Identification ◽

Intestinal Microbiome ◽

Taxonomic Structure ◽

Sequencing Technology ◽

Microbial Composition ◽

Angus Cattle ◽

Taxonomic Profile ◽

New Generation Sequencing ◽

New Generation ◽

Generation Sequencing

Целью исследования было определение таксономической структуры микробиома кишечника крупного рогатого скота породы Абердин-Ангус с помощью технологии секвенирования нового поколения. 16S метагеномный анализ, позволил определить микробный состав содержимого кишечника, минуя стадию культивирования на питательных средах. Проведена генетическая идентификация и получен таксономический профиль всех присутствующих бактерий, в том числе и некультивируемых форм. The aim of the study was to determine the taxonomic structure of the intestinal microbiome of Aberdeen Angus cattle using a new generation sequencing technology. 16S metagenomic analysis made it possible to determine the microbial composition of the intestinal contents bypassing the stage of cultivation on nutrient media. Genetic identification was carried out and a taxonomic profile of all bacteria present, including non-cultivated forms, was obtained. Key words: microbiome, cattle, Aberdeen Angus, next generation sequencing.

Get full-text (via PubEx)