scholarly journals Genomic Resources to Guide Improvement of the Shea Tree

2021 ◽  
Vol 12 ◽  
Author(s):  
Iago Hale ◽  
Xiao Ma ◽  
Arthur T. O. Melo ◽  
Francis Kwame Padi ◽  
Prasad S. Hendre ◽  
...  

A defining component of agroforestry parklands across Sahelo-Sudanian Africa (SSA), the shea tree (Vitellaria paradoxa) is central to sustaining local livelihoods and the farming environments of rural communities. Despite its economic and cultural value, however, not to mention the ecological roles it plays as a dominant parkland species, shea remains semi-domesticated with virtually no history of systematic genetic improvement. In truth, shea’s extended juvenile period makes traditional breeding approaches untenable; but the opportunity for genome-assisted breeding is immense, provided the foundational resources are available. Here we report the development and public release of such resources. Using the FALCON-Phase workflow, 162.6 Gb of long-read PacBio sequence data were assembled into a 658.7 Mbp, chromosome-scale reference genome annotated with 38,505 coding genes. Whole genome duplication (WGD) analysis based on this gene space revealed clear signatures of two ancient WGD events in shea’s evolutionary past, one prior to the Astrid-Rosid divergence (116–126 Mya) and the other at the root of the order Ericales (65–90 Mya). In a first genome-wide look at the suite of fatty acid (FA) biosynthesis genes that likely govern stearin content, the primary determinant of shea butter quality, relatively high copy numbers of six key enzymes were found (KASI, KASIII, FATB, FAD2, FAD3, and FAX2), some likely originating in shea’s more recent WGD event. To help translate these findings into practical tools for characterization, selection, and genome-wide association studies (GWAS), resequencing data from a shea diversity panel was used to develop a database of more than 3.5 million functionally annotated, physically anchored SNPs. Two smaller, more curated sets of suggested SNPs, one for GWAS (104,211 SNPs) and the other targeting FA biosynthesis genes (90 SNPs), are also presented. With these resources, the hope is to support national programs across the shea belt in the strategic, genome-enabled conservation and long-term improvement of the shea tree for SSA.

Nature ◽  
2021 ◽  
Vol 590 (7845) ◽  
pp. 290-299 ◽  
Author(s):  
Daniel Taliun ◽  
◽  
Daniel N. Harris ◽  
Michael D. Kessler ◽  
Jedidiah Carlson ◽  
...  

AbstractThe Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.


2021 ◽  
Author(s):  
Adam C. Naj ◽  
Ganna Leonenko ◽  
Xueqiu Jian ◽  
Benjamin Grenier-Boley ◽  
Maria Carolina Dalmasso ◽  
...  

Risk for late-onset Alzheimer's disease (LOAD) is driven by multiple loci primarily identified by genome-wide association studies, many of which are common variants with minor allele frequencies (MAF)>0.01. To identify additional common and rare LOAD risk variants, we performed a GWAS on 25,170 LOAD subjects and 41,052 cognitively normal controls in 44 datasets from the International Genomics of Alzheimer's Project (IGAP). Existing genotype data were imputed using the dense, high-resolution Haplotype Reference Consortium (HRC) r1.1 reference panel. Stage 1 associations of P<10-5 were meta-analyzed with the European Alzheimer's Disease Biobank (EADB) (n=20,301 cases; 21,839 controls) (stage 2 combined IGAP and EADB). An expanded meta-analysis was performed using a GWAS of parental AD/dementia history in the UK Biobank (UKBB) (n=35,214 cases; 180,791 controls) (stage 3 combined IGAP, EADB, and UKBB). Common variant (MAF≥0.01) associations were identified for 29 loci in stage 2, including novel genome-wide significant associations at TSPAN14 (P=2.33×10-12), SHARPIN (P=1.56×10-9), and ATF5/SIGLEC11 (P=1.03[mult]10-8), and newly significant associations without using AD proxy cases in MTSS1L/IL34 (P=1.80×10-8), APH1B (P=2.10×10-13), and CLNK (P=2.24×10-10). Rare variant (MAF<0.01) associations with genome-wide significance in stage 2 included multiple variants in APOE and TREM2, and a novel association of a rare variant (rs143080277; MAF=0.0054; P=2.69×10-9) in NCK2, further strengthened with the inclusion of UKBB data in stage 3 (P=7.17×10-13). Single-nucleus sequence data shows that NCK2 is highly expressed in amyloid-responsive microglial cells, suggesting a role in LOAD pathology.


2021 ◽  
Author(s):  
Aleksejs Sazonovs ◽  
Christine R Stevens ◽  
Guhan R Venkataraman ◽  
Kai Yuan ◽  
Brandon Avila ◽  
...  

Genome-wide association studies (GWAS) have identified hundreds of loci associated with Crohns disease (CD); however, as with all complex diseases, deriving pathogenic mechanisms from these non-coding GWAS discoveries has been challenging. To complement GWAS and better define actionable biological targets, we analysed sequence data from more than 30,000 CD cases and 80,000 population controls. We observe rare coding variants in established CD susceptibility genes as well as ten genes where coding variation directly implicates the gene in disease risk for the first time.


2020 ◽  
Author(s):  
Ali Jalil Sarghale ◽  
Mohammad Moradi Shahrebabak ◽  
Hossein Moradi Shahrebabak ◽  
Ardeshir Nejati Javaremi ◽  
Mahdi Saatchi ◽  
...  

Abstract Background: Methane emission by ruminants has contributed considerably to the global warming and understanding the genomic architecture of methane production may help the livestock producers to reduce the methane emission from the livestock production system. The goal of our study was to identify genomic regions affecting the predicted methane emission (PME) from volatile fatty acids (VFAs) indicators and VFA traits using imputed whole-genome sequence data in Iranian Holstein cattle. Results: Based on the significant-association threshold (p < 5 × 10−8), 33 single nucleotide polymorphisms (SNPs) were detected for PME per kg milk (n=2), PME per kg fat (n=14), and valeric acid (n=17). Besides, 69 genes were identified for valeric acid (n=18), PME per kg milk (n=4) and PME per kg fat (n=47) that were located within 1 Mb of significant SNPs. Based on the gene ontology (GO) term analysis, six promising candidate genes were significantly clustered in organelle organization (GO:0004984, p = 3.9 × 10-2) for valeric acid, and 17 candidate genes significantly clustered in olfactory receptors activity (GO:0004984, p = 4 × 10-10) for PME traits. Annotation results revealed 31 quantitative trait loci (QTLs) for milk yield and its components, body weight, and residual feed intake within 1 Mb of significant SNPs. Conclusions: Our results identified 33 SNPs associated with PME and valeric acid traits, as well as 17 olfactory receptors activity genes for PME traits related to food preference and feed intake. Identified SNPs in this study were close to 31 QTLs for milk yield and its components, body weight, and residual feed intake traits. In addition, these traits had high correlations with PME trait. Overall, our findings suggest that marker-assisted and genomic selection could be used to improve the difficult and expensive-to-measure phenotypes such as PME. Moreover, prediction of methane emission by VFA indicators could be useful for increasing the size of the reference population required in genome-wide association studies and genomic selection.


2019 ◽  
Vol 6 (1) ◽  
Author(s):  
Alejandra Vergara-Lope ◽  
M. Reza Jabalameli ◽  
Clare Horscroft ◽  
Sarah Ennis ◽  
Andrew Collins ◽  
...  

Abstract Quantification of linkage disequilibrium (LD) patterns in the human genome is essential for genome-wide association studies, selection signature mapping and studies of recombination. Whole genome sequence (WGS) data provides optimal source data for this quantification as it is free from biases introduced by the design of array genotyping platforms. The Malécot-Morton model of LD allows the creation of a cumulative map for each choromosome, analogous to an LD form of a linkage map. Here we report LD maps generated from WGS data for a large population of European ancestry, as well as populations of Baganda, Ethiopian and Zulu ancestry. We achieve high average genetic marker densities of 2.3–4.6/kb. These maps show good agreement with prior, low resolution maps and are consistent between populations. Files are provided in BED format to allow researchers to readily utilise this resource.


2019 ◽  
Author(s):  
M. Pérez-Enciso ◽  
L. C. Ramírez-Ayala ◽  
L.M. Zingaretti

AbstractBackgroundGenomic Prediction (GP) is the procedure whereby molecular information is used to predict complex phenotypes. Although GP can significantly enhance predictive accuracy, it can be expensive and difficult to implement. To help in designing optimum experiments, including genome wide association studies and genomic selection experiments, we have developed SeqBreed, a generic and flexible python3 forward simulator.ResultsSeqBreed accommodates sex and mitochondrion chromosomes as well as autopolyploidy. It can simulate any number of complex phenotypes determined by any number of causal loci. SeqBreed implements several GP methods, including single step GBLUP. We demonstrate its functionality with Drosophila Genome Reference Panel (DGRP) sequence data and with tetraploid potato genotypes.ConclusionsSeqBreed is a flexible and easy to use tool appropriate for optimizing GP or genome wide association studies. It incorporates some of the most popular GP methods and includes several visualization tools. Code is open and can be freely modified. Software, documentation and examples are available at https://github.com/miguelperezenciso/SeqBreed.


2021 ◽  
Vol 11 (4) ◽  
Author(s):  
Cory A Weller ◽  
Susanne Tilk ◽  
Subhash Rajpurohit ◽  
Alan O Bergland

Abstract Genetic association studies seek to uncover the link between genotype and phenotype, and often utilize inbred reference panels as a replicable source of genetic variation. However, inbred reference panels can differ substantially from wild populations in their genotypic distribution, patterns of linkage-disequilibrium, and nucleotide diversity. As a result, associations discovered using inbred reference panels may not reflect the genetic basis of phenotypic variation in natural populations. To address this problem, we evaluated a mapping population design where dozens to hundreds of inbred lines are outbred for few generations, which we call the Hybrid Swarm. The Hybrid Swarm approach has likely remained underutilized relative to pre-sequenced inbred lines due to the costs of genome-wide genotyping. To reduce sequencing costs and make the Hybrid Swarm approach feasible, we developed a computational pipeline that reconstructs accurate whole genomes from ultra-low-coverage (0.05X) sequence data in Hybrid Swarm populations derived from ancestors with phased haplotypes. We evaluate reconstructions using genetic variation from the Drosophila Genetic Reference Panel as well as variation from neutral simulations. We compared the power and precision of Genome-Wide Association Studies using the Hybrid Swarm, inbred lines, recombinant inbred lines (RILs), and highly outbred populations across a range of allele frequencies, effect sizes, and genetic architectures. Our simulations show that these different mapping panels vary in their power and precision, largely depending on the architecture of the trait. The Hybrid Swam and RILs outperform inbred lines for quantitative traits, but not for monogenic ones. Taken together, our results demonstrate the feasibility of the Hybrid Swarm as a cost-effective method of fine-scale genetic mapping.


2018 ◽  
Author(s):  
Calvin McCarter ◽  
Judie Howrylak ◽  
Seyoung Kim

AbstractRecent technologies are generating an abundance of genome sequence data and molecular and clinical phenotype data, providing an opportunity to understand the genetic architecture and molecular mechanisms underlying diseases. Previous approaches have largely focused on the co-localization of single-nucleotide polymorphisms (SNPs) associated with clinical and expression traits, each identified from genome-wide association studies and expression quantitative trait locus (eQTL) mapping, and thus have provided only limited capabilities for uncovering the molecular mechanisms behind the SNPs influencing clinical phenotypes. Here we aim to extract rich information on the functional role of trait-perturbing SNPs that goes far beyond this simple co-localization. We introduce a computational framework called Perturb-Net for learning the gene network that modulates the influence of SNPs on phenotypes, using SNPs as naturally occurring perturbation of a biological system. Perturb-Net uses a probabilistic graphical model to directly model both the cascade of perturbation from SNPs to the gene network to the phenotype network and the network at each layer of molecular and clinical phenotypes. Perturb-Net learns the entire model by solving a single optimization problem with an extremely fast algorithm that can analyze human genome-wide data within a few hours. In our analysis of asthma data, for a locus that was previously implicated in asthma susceptibility but for which little is known about the molecular mechanism underlying the association, Perturb-Net revealed the gene network modules that mediate the influence of the SNP on asthma phenotypes. Many genes in this network module were well supported in the literature as asthma-related.


Circulation ◽  
2017 ◽  
Vol 135 (suppl_1) ◽  
Author(s):  
Changwei Li ◽  
Shengxu Li ◽  
James E Hixon

Background: Genome-wide association studies (GWASs) have identified multiple genomic loci associated with atherosclerotic diseases. However, specific genes underlying the observed associations are largely unknown. Objectives: We aimed to examine the associations between genes that harbor variants in high LD with index variants in GWAS-identified loci and pathologically determined atherosclerosis in major arteries from the Pathobiological Determinants of Atherosclerosis in Youth (PDAY) study. Methods and Results: Data for 1,938 single nucleotide polymorphisms (SNPs) from 28 genes were retrieved from whole exome sequence data. Atherosclerosis was confirmed by postmortem examination of major arteries from 1,005 young persons (aged 15-34 years) who died from non-cardiovascular causes. Logistic regression was used to evaluate associations between common SNPs and atherosclerosis controlling for age and sex. Gene-based analysis was conducted using Sequence Kernel Association Test (SKAT) method to test the combined effect of rare and common variants on atherosclerosis controlling for age and gender. All analyses were performed separately in blacks and whites. Statistical significance was determined by false positive discovery rate (FDR) method. In gene-based analyses, BUD13 ( P =1.11х10 -2 ) and COL4A1 ( P =3.58х10 -2 ) were associated with atherosclerosis among young blacks; none of the 28 genes was associated with atherosclerosis in whites. In single marker analysis of common SNPs, LRP1 missense variant rs7397167 ( P =8.50х10 -3 ), COL4A1 variant rs16975492 ( P =4.60х10 -3 ), STK32B variant rs168985 ( P =4.00х10 -3 ), and SMARCA4 variant rs8104480 ( P =1.20х10 -3 ) were associated with atherosclerosis in blacks; MIA3 variant rs17465637 ( P =8.00х10 -3 ), DUS4L missense variant rs6957510 ( P =6.4х10 -3 ), BOLL variant rs771018 ( P =6.2х10 -3 ), BUD3 missense variant rs11820589 ( P =2.1х10 -3 ), and COL4A1 variant rs1133219 ( P =1.8х10 -3 ) were associated with atherosclerosis in whites. Conclusion: Genes in GWAS-identified loci may play a role in the development of atherosclerosis at a young age.


Sign in / Sign up

Export Citation Format

Share Document