scholarly journals Pool‐hmm: a Python program for estimating the allele frequency spectrum and detecting selective sweeps from next generation sequencing of pooled samples

2013 ◽  
Vol 13 (2) ◽  
pp. 337-340 ◽  
Author(s):  
Simon Boitard ◽  
Robert Kofler ◽  
Pierre Françoise ◽  
David Robelin ◽  
Christian Schlötterer ◽  
...  
2011 ◽  
Vol 32 (6) ◽  
pp. E2246-E2258 ◽  
Author(s):  
Paola Benaglio ◽  
Terri L. McGee ◽  
Leonardo P. Capelli ◽  
Shyana Harper ◽  
Eliot L. Berson ◽  
...  

2021 ◽  
Author(s):  
Michael Schneider ◽  
Asis Shrestha ◽  
Agim Ballvora ◽  
Jens Leon

Abstract BackgroundThe identification of environmentally specific alleles and the observation of evolutional processes is a goal of conservation genomics. By generational changes of allele frequencies in populations, questions regarding effective population size, gene flow, drift, and selection can be addressed. The observation of such effects often is a trade-off of costs and resolution, when a decent sample of genotypes should be genotyped for many loci. Pool genotyping approaches can derive a high resolution and precision in allele frequency estimation, when high coverage sequencing is utilized. Still, pool high coverage pool sequencing of big genomes comes along with high costs.ResultsHere we present a reliable method to estimate a barley population’s allele frequency at low coverage sequencing. Three hundred genotypes were sampled from a barley backcross population to estimate the entire population’s allele frequency. The allele frequency estimation accuracy and yield were compared for three next generation sequencing methods. To reveal accurate allele frequency estimates on a low coverage sequencing level, a haplotyping approach was performed. Low coverage allele frequency of positional connected single polymorphisms were aggregated to a single haplotype allele frequency, resulting in two to 271 times higher depth and increased precision. We compared different haplotyping tactics, showing that gene and chip marker-based haplotypes perform on par or better than simple contig haplotype windows. The comparison of multiple pool samples and the referencing against an individual sequencing approach revealed whole genome pool resequencing having the highest correlation to individual genotyping (up to 0.97), while transcriptomics and genotyping by sequencing indicated higher error rates and lower correlations.ConclusionUsing the proposed method allows to identify the allele frequency of populations with high accuracy at low cost. This is particularly interesting for conservation genomics in species with big genomes, like barley or wheat. Whole genome low coverage resequencing at 10x coverage can deliver a highly accurate estimation of the allele frequency, when a loci-based haplotyping approach is applied. Using annotated haplotypes allows to capitalize from biological background and statistical robustness.


2009 ◽  
Vol 70 ◽  
pp. S107
Author(s):  
Martha B. Ladner ◽  
Gordon Bentley ◽  
Damian Goodridge ◽  
Henry A. Erlich ◽  
Elizabeth Trachtenberg

PLoS ONE ◽  
2011 ◽  
Vol 6 (1) ◽  
pp. e15292 ◽  
Author(s):  
Quan Long ◽  
Daniel C. Jeffares ◽  
Qingrun Zhang ◽  
Kai Ye ◽  
Viktoria Nizhynska ◽  
...  

2014 ◽  
Vol 32 (3_suppl) ◽  
pp. 470-470
Author(s):  
Chloe Evelyn Atreya ◽  
James Watters ◽  
Steve Rowley ◽  
Joon Sang Lee ◽  
Oleg Iartchouk ◽  
...  

470 Background: A comprehensive molecular characterization of primary colorectal cancers (CRC) was recently reported. Less is known about mutation patterns in CRC metastases and association with survival. Our sequencing analysis focused on CRC liver metastases with RAS/RAF mutations, representing a patient population with limited therapeutic options. Methods: DNA was extracted from formalin-fixed paraffin-embedded CRC liver metastases. Fifty tumors found by Sequenom MassARRAY to harbor KRAS, NRAS or BRAF mutations underwent next generation sequencing on the Ion AmpliSeq Comprehensive Cancer Panel of 409 genes. Co-investigators were blinded to Sequenom mutations identified at UCSF. Variants called by Strelka and VarScan were extensively filtered to control the False Positive Rate and find mutations occurring with > 5-10% variant allele frequency compared to normals. The dataset was evaluated for significant co-mutations, biclustering, and population probabilities of mutations. Results: Following sequencing, 37,744 variants were called in 409 genes with a median coverage depth of 1053x. After filtering to minimize false positives, 2335 variants in 315 genes remained. ARID1A and PIK3R1 were the most significantly associated co-mutation pair, P < 3.5e-5. Biclustering showed no stratification of patients; genes stratified only by mutation frequency. Further filtering yielded 1,186 variants present at < 1% allele frequency within 1,000 Genomes, of which 131 variants in 24 genes are referenced in the Catalog of Somatic Mutations in Cancer. In addition to anticipated mutations in mismatch repair genes and the RTK/RAS/PI3K, Wnt, TP53, and TGF beta pathways, infrequent mutations were found in Akt1, mTOR, MET and PPP2R1a. After APC, TP53 was the most commonly mutated gene, in 44% of the tumors (95% cl: 31.1% - 57.8%). Survival was similar with mutation of RAS/RAF plus either TP53 or PIK3CA. Conclusions: Next generation sequencing was used to characterize co-variants in RAS/RAF mutated CRC liver metastases. The complexity of our results is consistent with the clinical observation that targeting RAS/RAF mutated metastatic CRC is a formidable challenge. These analyses may nonetheless inform the design of future clinical trials.


BMC Genomics ◽  
2014 ◽  
Vol 15 (1) ◽  
pp. 685 ◽  
Author(s):  
Yu Liu ◽  
Mehmet Koyutürk ◽  
Sean Maxwell ◽  
Min Xiang ◽  
Martina Veigl ◽  
...  

2020 ◽  
Author(s):  
Susanne Gerber ◽  
Stephan Weißbach ◽  
Stanislav Jur`Evic Sys ◽  
Charlotte Hewel ◽  
Hristo Todorov ◽  
...  

Abstract Background Next Generation Sequencing (NGS) is the fundament of various studies providing insights into questions from biology and medicine. Nevertheless, integrating data from different experimental backgrounds can introduce strong biases. In order to methodically investigate the magnitude of systematic errors, we performed a cross-sectional observational study on a genomic cohort of 99 subjects each sequenced via (i) Illumina HiSeq X, (ii) Illumina HiSeq and (iii) Complete Genomics. Consequently, we systematically analyzed the heterogeneity between the sequencing cohorts with respect to genomic annotation and common filter criteria like minimum allele frequency (MAF). Results The number of detected variants/variant classes per individual was highly dependent on the sequencing technology. We observed a statistically significant overrepresentation of variants uniquely called by a single platform which indicates potential systematic biases. These variants were enriched in low complexity genomic regions and simple repeats. Furthermore, estimates of allele frequency were highly discrepant for a subset of variants in pairwise comparisons between different sequencing platforms. Applying common filters – such as MAF 5% and HWE- greatly reduced the heterogeneity between cohorts but still left discrepancies of several thousand variants after filtering.Conclusion We provide empirical evidence of systematic heterogeneity in variant calls between alternative experimental and data analysis setups. Our results highlight the potential benefit of reprocessing genomic data with harmonized pipelines when integrating data from different studies.


Sign in / Sign up

Export Citation Format

Share Document