scholarly journals The genome of oil-Camellia and population genomics analysis provide insights into seed oil domestication

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Ping Lin ◽  
Kailiang Wang ◽  
Yupeng Wang ◽  
Zhikang Hu ◽  
Chao Yan ◽  
...  

Abstract Background As a perennial crop, oil-Camellia possesses a long domestication history and produces high-quality seed oil that is beneficial to human health. Camellia oleifera Abel. is a sister species to the tea plant, which is extensively cultivated for edible oil production. However, the molecular mechanism of the domestication of oil-Camellia is still limited due to the lack of sufficient genomic information. Results To elucidate the genetic and genomic basis of evolution and domestication, here we report a chromosome-scale reference genome of wild oil-Camellia (2.95 Gb), together with transcriptome sequencing data of 221 cultivars. The oil-Camellia genome, assembled by an integrative approach of multiple sequencing technologies, consists of a large proportion of repetitive elements (76.1%) and high heterozygosity (2.52%). We construct a genetic map of high-density corrected markers by sequencing the controlled-pollination hybrids. Genome-wide association studies reveal a subset of artificially selected genes that are involved in the oil biosynthesis and phytohormone pathways. Particularly, we identify the elite alleles of genes encoding sugar-dependent triacylglycerol lipase 1, β-ketoacyl-acyl carrier protein synthase III, and stearoyl-acyl carrier protein desaturases; these alleles play important roles in enhancing the yield and quality of seed oil during oil-Camellia domestication. Conclusions We generate a chromosome-scale reference genome for oil-Camellia plants and demonstrate that the artificial selection of elite alleles of genes involved in oil biosynthesis contributes to oil-Camellia domestication.

2019 ◽  
Vol 17 (06) ◽  
pp. 1940012
Author(s):  
Yuan Liu ◽  
Yongchao Ma ◽  
Evan Salsman ◽  
Frank A. Manthey ◽  
Elias M. Elias ◽  
...  

Mapping short reads to a reference genome is an essential step in many next-generation sequencing (NGS) analyses. In plants with large genomes, a large fraction of the reads can align to multiple locations of the genome with equally good alignment scores. How to map these ambiguous reads to the genome is a challenging problem with big impacts on the downstream analysis. Traditionally, the default method is to assign an ambiguous read randomly to one of the many potential locations. In this study, we explore two alternative methods that are based on the hypothesis that the possibility of an ambiguous read being generated by a location is proportional to the total number of reads produced by that location: (1) the enrichment method that assigns an ambiguous read to the location that has produced the most reads among all the potential locations, (2) the probability method that assigns an ambiguous read to a location based on a probability proportional to the number of reads the location produces. We systematically compared the performance of the proposed methods with that of the default random method. Our results showed that the enrichment method produced better results than the default random method and the probability method in the discovery of single nucleotide polymorphisms (SNPs). Not only did it produce more SNP markers, but it also produced SNP markers with better quality, which was demonstrated using multiple mainstay genomic analyses, including genome-wide association studies (GWAS), minor allele distribution, population structure, and genomic prediction.


2020 ◽  
Vol 2 (4) ◽  
Author(s):  
Gerard A Bouland ◽  
Joline W J Beulens ◽  
Joey Nap ◽  
Arno R van der Slik ◽  
Arnaud Zaldumbide ◽  
...  

Abstract Numerous large genome-wide association studies have been performed to understand the influence of genetics on traits. Many identified risk loci are in non-coding and intergenic regions, which complicates understanding how genes and their downstream pathways are influenced. An integrative data approach is required to understand the mechanism and consequences of identified risk loci. Here, we developed the R-package CONQUER. Data for SNPs of interest are acquired from static- and dynamic repositories (build GRCh38/hg38), including GTExPortal, Epigenomics Project, 4D genome database and genome browsers. All visualizations are fully interactive so that the user can immediately access the underlying data. CONQUER is a user-friendly tool to perform an integrative approach on multiple SNPs where risk loci are not seen as individual risk factors but rather as a network of risk factors.


2018 ◽  
Author(s):  
Afsheen Yousaf ◽  
Eftichia Duketis ◽  
Tomas Jarczok ◽  
Michael Sachse ◽  
Monica Biscaldi ◽  
...  

AbstractMotivationComplex neuropsychiatric conditions including autism spectrum disorders are among the most heritable neurodevelopmental disorders with distinct profiles of neuropsychological traits. A variety of genetic factors modulate these traits (phenotypes) underlying clinical diagnoses. To explore the associations between genetic factors and phenotypes, genome-wide association studies are broadly applied. Stringent quality checks and thorough downstream analyses for in-depth interpretation of the associations are an indispensable prerequisite. However, in the area of neuropsychology there is no framework existing, which besides performing association studies also affiliates genetic variants at the brain and gene network level within a single framework.ResultsWe present a novel bioinformatics approach in the field of neuropsychology that integrates current state-of-the-art tools, algorithms and brain transcriptome data to elaborate the association of phenotype and genotype data. The integration of transcriptome data gives an advantage over the existing pipelines by directly translating genetic associations to brain regions and developmental patterns. Based on our data integrative approach, we identify genetic variants associated with Intelligence Quotient (IQ) in an autism cohort and found their respective genes to be expressed in specific brain areas.ConclusionOur data integrative approach revealed that IQ is related to early down-regulated and late up-regulated gene modules implicated in frontal cortex and striatum, respectively. Besides identifying new gene associations with IQ we also provide a proof of concept, as several of the identified genes in our analysis are candidate genes related to intelligence in autism, intellectual disability, and Alzheimer’s disease. The framework provides a complete extensive analysis starting from a phenotypic trait data to its association at specific brain areas at vulnerable time points within a timespan of four days.Availability and ImplementationOur framework is implemented in R and Python. It is available as an in-house script, which can be provided on [email protected]


2021 ◽  
Vol 4 (4) ◽  
pp. e202000902 ◽  
Author(s):  
Robert A Player ◽  
Ellen R Forsyth ◽  
Kathleen J Verratti ◽  
David W Mohr ◽  
Alan F Scott ◽  
...  

Reference genome fidelity is critically important for genome wide association studies, yet most vary widely from the study population. A typical whole genome sequencing approach implies short-read technologies resulting in fragmented assemblies with regions of ambiguity. Further information is lost by economic necessity when genotyping populations, as lower resolution technologies such as genotyping arrays are commonly used. Here, we present a phased reference genome for Canis lupus familiaris using high molecular weight DNA-sequencing technologies. We tested wet laboratory and bioinformatic approaches to demonstrate a minimum workflow to generate the 2.4 gigabase genome for a Labrador Retriever. The de novo assembly required eight Oxford Nanopore R9.4 flowcells (∼23X depth) and running a 10X Genomics library on the equivalent of one lane of an Illumina NovaSeq S1 flowcell (∼88X depth), bringing the cost of generating a nearly complete reference genome to less than $10K (USD). Mapping of short-read data from 10 Labrador Retrievers against this reference resulted in 1% more aligned reads versus the current reference (CanFam3.1, P < 0.001), and a 15% reduction of variant calls, increasing the chance of identifying true, low-effect size variants in a genome-wide association studies. We believe that by incorporating the cost to produce a full genome assembly into any large-scale genotyping project, an investigator can improve study power, decrease costs, and optimize the overall scientific value of their study.


Genes ◽  
2021 ◽  
Vol 12 (2) ◽  
pp. 148
Author(s):  
Yu-Huang Liao ◽  
Leay-Kiaw Er ◽  
Semon Wu ◽  
Yu-Lin Ko ◽  
Ming-Sheng Teng

Hepatic lipase (encoded by LIPC) is a glycoprotein in the triacylglycerol lipase family and mainly synthesized in and secreted from the liver. Previous studies demonstrated that hepatic lipase is crucial for reverse cholesterol transport and modulating metabolism and the plasma levels of several lipoproteins. This study was conducted to investigate the suppression effect of high-density lipoprotein cholesterol (HDL-C) levels in a genome-wide association study and explore the possible mechanisms linking triglyceride (TG) to LIPC variants and HDL-C. Genome-wide association data for TG and HDL-C were available for 4657 Taiwan-biobank participants. The prevalence of haplotypes in the LIPC promoter region and their effects were calculated. The cloned constructs of the haplotypes were expressed transiently in HepG2 cells and evaluated in a luciferase reporter assay. Genome-wide association analysis revealed that HDL-C was significantly associated with variations in LIPC after adjusting for TG. Three haplotypes (H1: TCG, H2: CTA and H3: CCA) in LIPC were identified. H2: CTA was significantly associated with HDL-C levels and H1: TCG suppressed HDL-C levels when a third factor, TG, was included in mediation analysis. The luciferase reporter assay further showed that the H2: CTA haplotype significantly inhibited luciferase activity compared with the H1: TCG haplotype. In conclusion, we identified a suppressive role for TG in the genome-wide association between LIPC and HDL-C. A functional haplotype of hepatic lipase may reduce HDL-C levels and is suppressed by TG.


2019 ◽  
Author(s):  
Grazyella M. Yoshida ◽  
Agustín Barria ◽  
Katharina Correa ◽  
Giovanna Cáceres ◽  
Ana Jedlicki ◽  
...  

AbstractNile tilapia (Oreochromis niloticus) is one of the most produced farmed fish in the world and represents an important source of protein for human consumption. Farmed Nile tilapia populations are increasingly based on genetically improved stocks, which have been established from admixed populations. To date, there is scarce information about the population genomics of farmed Nile tilapia, assessed by dense single nucleotide polymorphism (SNP) panels. The patterns of linkage disequilibrium (LD) may affect the success of genome-wide association studies (GWAS) and genomic selection and can also provide key information about demographic history of farmed Nile tilapia populations. The objectives of this study were to provide further knowledge about the population structure and LD patterns, as well as, estimate the effective population size (Ne) for three farmed Nile tilapia populations, one from Brazil (POP A) and two from Costa Rica (POP B and POP C). A total of 55, 56 and 57 individuals from POP A, POP B and POP C, respectively, were genotyped using a 50K SNP panel selected from a whole-genome sequencing (WGS) experiment. Two principal components explained about 20% of the total variation and clearly discriminated between the three populations. Population genetic structure analysis showed evidence of admixture, especially for POP C. The contemporary Ne values calculated based to LD values, ranged from 71 to 141. No differences were observed in the LD decay among populations, with a rapid decrease of r2 when increasing inter-marker distance. Average r2 between adjacent SNP pairs ranged from 0.03 to 0.18, 0.03 to 0.17 and 0.03 to 0.16 for POP A, POP B and POP C, respectively. Based on the number of independent chromosome segments in the Nile tilapia genome, at least 4.2 K SNP are required for the implementation of GWAS and genomic selection in farmed Nile tilapia populations.


2017 ◽  
Author(s):  
Kyoko Watanabe ◽  
Erdogan Taskesen ◽  
Bochoven Arjen van ◽  
Bochoven Arjen van ◽  
Danielle Posthuma

ABSTRACTA main challenge in genome-wide association studies (GWAS) is to prioritize genetic variants and identify potential causal mechanisms of human diseases. Although multiple bioinformatics resources are available for functional annotation and prioritization, a standard, integrative approach is lacking. We developed FUMA: a web-based platform to facilitate functional annotation of GWAS results, prioritization of genes and interactive visualization of annotated results by incorporating information from multiple state-of-the-art biological databases.


2020 ◽  
Author(s):  
Zakaria Mehrab ◽  
Jaiaid Mobin ◽  
Ibrahim Asadullah Tahmid ◽  
Atif Rahman

AbstractGenome wide association studies (GWAS) attempt to map genotypes to phenotypes in organisms. This is typically performed by genotyping individuals using microarray or by aligning whole genome sequencing reads to a reference genome. Both approaches require knowledge of a reference genome which limits their application to organisms with no or incomplete reference genomes. This caveat can be removed using alignment-free association mapping methods based on k-mers from sequencing reads. Here we present an implementation of an alignment free association mapping method [1] to improve its execution time and flexibility. We have tested our implementation on an E. Coli ampicillin resistance dataset and observe improvement in performance over the original implementation while maintaining accuracy in results. Finally, we demonstrate that the method can be applied to find sex specific sequences.


2018 ◽  
Author(s):  
Brian P. Ward ◽  
Gina Brown-Guedira ◽  
Frederic L. Kolb ◽  
David A. Van Sanford ◽  
Priyanka Tyagi ◽  
...  

AbstractGrain yield is a trait of paramount importance in the breeding of all cereals. In wheat (Triticum aestivum L.), yield has steadily increased since the Green Revolution, though the current rate of increase is not forecasted to keep pace with demand due to growing world population and affluence. While several genome-wide association studies (GWAS) on yield and related component traits have been performed in wheat, the previous lack of a reference genome has made comparisons between studies difficult. In this study, a GWAS for yield and yield-related traits was carried out on a population of 324 soft red winter wheat lines across a total of four rain-fed environments in the state of Virginia using single-nucleotide polymorphism (SNP) marker data generated by a genotyping-by-sequencing (GBS) protocol. Two separate mixed linear models were used to identify significant marker-trait associations (MTAs). The first was a single-locus model utilizing a leave-one-chromosome-out approach to estimating kinship. The second was a sub-setting kinship multi-locus method (FarmCPU). The single-locus model identified nine significant MTAs for various yield-related traits, while the FarmCPU model identified 74 significant MTAs. The availability of the wheat reference genome allowed for the description of MTAs in terms of both genetic and physical positions, and enabled more extensive post-GWAS characterization of significant MTAs. The results indicate promising avenues for increasing grain yield by exploiting variation in traits relating to the number of grains per unit area, as well as phenological traits influencing grain-filling duration of genotypes.


Sign in / Sign up

Export Citation Format

Share Document