scholarly journals Accurate, ultra-low coverage genome reconstruction and association studies in Hybrid Swarm mapping populations

2021 ◽  
Vol 11 (4) ◽  
Author(s):  
Cory A Weller ◽  
Susanne Tilk ◽  
Subhash Rajpurohit ◽  
Alan O Bergland

Abstract Genetic association studies seek to uncover the link between genotype and phenotype, and often utilize inbred reference panels as a replicable source of genetic variation. However, inbred reference panels can differ substantially from wild populations in their genotypic distribution, patterns of linkage-disequilibrium, and nucleotide diversity. As a result, associations discovered using inbred reference panels may not reflect the genetic basis of phenotypic variation in natural populations. To address this problem, we evaluated a mapping population design where dozens to hundreds of inbred lines are outbred for few generations, which we call the Hybrid Swarm. The Hybrid Swarm approach has likely remained underutilized relative to pre-sequenced inbred lines due to the costs of genome-wide genotyping. To reduce sequencing costs and make the Hybrid Swarm approach feasible, we developed a computational pipeline that reconstructs accurate whole genomes from ultra-low-coverage (0.05X) sequence data in Hybrid Swarm populations derived from ancestors with phased haplotypes. We evaluate reconstructions using genetic variation from the Drosophila Genetic Reference Panel as well as variation from neutral simulations. We compared the power and precision of Genome-Wide Association Studies using the Hybrid Swarm, inbred lines, recombinant inbred lines (RILs), and highly outbred populations across a range of allele frequencies, effect sizes, and genetic architectures. Our simulations show that these different mapping panels vary in their power and precision, largely depending on the architecture of the trait. The Hybrid Swam and RILs outperform inbred lines for quantitative traits, but not for monogenic ones. Taken together, our results demonstrate the feasibility of the Hybrid Swarm as a cost-effective method of fine-scale genetic mapping.

2019 ◽  
Author(s):  
Cory A. Weller ◽  
Alan O. Bergland

AbstractGenetic association studies seek to uncover the link between genotype and phenotype, and often utilize inbred reference panels as a replicable source of genetic variation. However, inbred reference panels can differ substantially from wild populations in their genotypic distribution, patterns of linkage-disequilibrium, and nucleotide diversity. As a result, associations discovered using inbred reference panels may not reflect the genetic basis of phenotypic variation in natural populations. To address this problem, we evaluated a mapping population design where dozens to hundreds of inbred lines are outbred for few (e.g. five) generations, which we call the Hybrid Swarm. The Hybrid Swarm approach has likely remained underutilized relative to pre-sequenced inbred lines due to the costs of genome-wide genotyping. To reduce sequencing costs and make the Hybrid Swarm approach feasible, we developed a computational pipeline that reconstructs accurate whole genomes from ultra-low-coverage (0.05X) sequence data in Hybrid Swarm populations derived from ancestors with phased haplotypes, modeling genetic variation from the Drosophila Genetic Reference Panel as well as variation from neutral simulations encompassing a range of diversity levels. Next, we compared the power and precision of GWAS using the Hybrid Swarm, inbred lines, recombinant inbred lines (RILs), and highly outbred populations across a range of effect sizes and genetic architectures. Our simulations show that these different mapping panels vary in their power and precision, largely depending on the architecture of the trait. Notably, both the Hybrid Swarm and RILs outperform inbred lines for quantitative traits, but not for monogenic ones. Taken together, our results demonstrate the feasibility of the Hybrid Swarm as a cost-effective method of fine-scale genetic mapping.


Nature ◽  
2021 ◽  
Vol 590 (7845) ◽  
pp. 290-299 ◽  
Author(s):  
Daniel Taliun ◽  
◽  
Daniel N. Harris ◽  
Michael D. Kessler ◽  
Jedidiah Carlson ◽  
...  

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


2018 ◽  
Author(s):  
Zhou Shaoqun ◽  
Karl A. Kremling ◽  
Bandillo Nonoy ◽  
Richter Annett ◽  
Ying K. Zhang ◽  
...  

One Sentence SummaryHPLC-MS metabolite profiling of maize seedlings, in combination with genome-wide association studies, identifies numerous quantitative trait loci that influence the accumulation of foliar metabolites.AbstractCultivated maize (Zea mays) retains much of the genetic and metabolic diversity of its wild ancestors. Non-targeted HPLC-MS metabolomics using a diverse panel of 264 maize inbred lines identified a bimodal distribution in the prevalence of foliar metabolites. Although 15% of the detected mass features were present in >90% of the inbred lines, the majority were found in <50% of the samples. Whereas leaf bases and tips were differentiated primarily by flavonoid abundance, maize varieties (stiff-stalk, non-stiff-stalk, tropical, sweet corn, and popcorn) were differentiated predominantly by benzoxazinoid metabolites. Genome-wide association studies (GWAS), performed for 3,991 mass features from the leaf tips and leaf bases, showed that 90% have multiple significantly associated loci scattered across the genome. Several quantitative trait locus hotspots in the maize genome regulate the abundance of multiple, often metabolically related mass features. The utility of maize metabolite GWAS was demonstrated by confirming known benzoxazinoid biosynthesis genes, as well as by mapping isomeric variation in the accumulation of phenylpropanoid hydroxycitric acid esters to a single linkage block in a citrate synthase-like gene. Similar to gene expression databases, this metabolomic GWAS dataset constitutes an important public resource for linking maize metabolites with biosynthetic and regulatory genes.


2021 ◽  
Author(s):  
Adam C. Naj ◽  
Ganna Leonenko ◽  
Xueqiu Jian ◽  
Benjamin Grenier-Boley ◽  
Maria Carolina Dalmasso ◽  
...  

Risk for late-onset Alzheimer's disease (LOAD) is driven by multiple loci primarily identified by genome-wide association studies, many of which are common variants with minor allele frequencies (MAF)>0.01. To identify additional common and rare LOAD risk variants, we performed a GWAS on 25,170 LOAD subjects and 41,052 cognitively normal controls in 44 datasets from the International Genomics of Alzheimer's Project (IGAP). Existing genotype data were imputed using the dense, high-resolution Haplotype Reference Consortium (HRC) r1.1 reference panel. Stage 1 associations of P<10-5 were meta-analyzed with the European Alzheimer's Disease Biobank (EADB) (n=20,301 cases; 21,839 controls) (stage 2 combined IGAP and EADB). An expanded meta-analysis was performed using a GWAS of parental AD/dementia history in the UK Biobank (UKBB) (n=35,214 cases; 180,791 controls) (stage 3 combined IGAP, EADB, and UKBB). Common variant (MAF≥0.01) associations were identified for 29 loci in stage 2, including novel genome-wide significant associations at TSPAN14 (P=2.33×10-12), SHARPIN (P=1.56×10-9), and ATF5/SIGLEC11 (P=1.03[mult]10-8), and newly significant associations without using AD proxy cases in MTSS1L/IL34 (P=1.80×10-8), APH1B (P=2.10×10-13), and CLNK (P=2.24×10-10). Rare variant (MAF<0.01) associations with genome-wide significance in stage 2 included multiple variants in APOE and TREM2, and a novel association of a rare variant (rs143080277; MAF=0.0054; P=2.69×10-9) in NCK2, further strengthened with the inclusion of UKBB data in stage 3 (P=7.17×10-13). Single-nucleus sequence data shows that NCK2 is highly expressed in amyloid-responsive microglial cells, suggesting a role in LOAD pathology.


2021 ◽  
Author(s):  
Aleksejs Sazonovs ◽  
Christine R Stevens ◽  
Guhan R Venkataraman ◽  
Kai Yuan ◽  
Brandon Avila ◽  
...  

Genome-wide association studies (GWAS) have identified hundreds of loci associated with Crohns disease (CD); however, as with all complex diseases, deriving pathogenic mechanisms from these non-coding GWAS discoveries has been challenging. To complement GWAS and better define actionable biological targets, we analysed sequence data from more than 30,000 CD cases and 80,000 population controls. We observe rare coding variants in established CD susceptibility genes as well as ten genes where coding variation directly implicates the gene in disease risk for the first time.


2020 ◽  
Author(s):  
Ali Jalil Sarghale ◽  
Mohammad Moradi Shahrebabak ◽  
Hossein Moradi Shahrebabak ◽  
Ardeshir Nejati Javaremi ◽  
Mahdi Saatchi ◽  
...  

Abstract Background: Methane emission by ruminants has contributed considerably to the global warming and understanding the genomic architecture of methane production may help the livestock producers to reduce the methane emission from the livestock production system. The goal of our study was to identify genomic regions affecting the predicted methane emission (PME) from volatile fatty acids (VFAs) indicators and VFA traits using imputed whole-genome sequence data in Iranian Holstein cattle. Results: Based on the significant-association threshold (p < 5 × 10−8), 33 single nucleotide polymorphisms (SNPs) were detected for PME per kg milk (n=2), PME per kg fat (n=14), and valeric acid (n=17). Besides, 69 genes were identified for valeric acid (n=18), PME per kg milk (n=4) and PME per kg fat (n=47) that were located within 1 Mb of significant SNPs. Based on the gene ontology (GO) term analysis, six promising candidate genes were significantly clustered in organelle organization (GO:0004984, p = 3.9 × 10-2) for valeric acid, and 17 candidate genes significantly clustered in olfactory receptors activity (GO:0004984, p = 4 × 10-10) for PME traits. Annotation results revealed 31 quantitative trait loci (QTLs) for milk yield and its components, body weight, and residual feed intake within 1 Mb of significant SNPs. Conclusions: Our results identified 33 SNPs associated with PME and valeric acid traits, as well as 17 olfactory receptors activity genes for PME traits related to food preference and feed intake. Identified SNPs in this study were close to 31 QTLs for milk yield and its components, body weight, and residual feed intake traits. In addition, these traits had high correlations with PME trait. Overall, our findings suggest that marker-assisted and genomic selection could be used to improve the difficult and expensive-to-measure phenotypes such as PME. Moreover, prediction of methane emission by VFA indicators could be useful for increasing the size of the reference population required in genome-wide association studies and genomic selection.


Sign in / Sign up

Export Citation Format

Share Document