scholarly journals Cohort-wide deep whole genome sequencing and the allelic architecture of complex traits

2018 ◽  
Author(s):  
Arthur Gilly ◽  
Daniel Suveges ◽  
Karoline Kuchenbaecker ◽  
Martin Pollard ◽  
Lorraine Southam ◽  
...  

The role of rare variants in complex traits remains uncharted. Here, we conduct deep whole genome sequencing of 1,457 individuals from an isolated population, and test for rare variant burdens across six cardiometabolic traits. We identify a role for rare regulatory variation, which has hitherto been missed. We find evidence of rare variant burdens overlapping with, and mostly independent of established common variant signals (ADIPOQ and adiponectin, P=4.2×10−8; APOC3 and triglyceride levels, P=1.58×10−26; GGT1 and gamma-glutamyltransferase, P=2.3×10−6; UGT1A9 and bilirubin, P=1.9×10−8), and identify replicating evidence for a burden associated with triglyceride levels in FAM189A (P=2.26×10−8), indicating a role for this gene in lipid metabolism.

2019 ◽  
Author(s):  
Zilin Li ◽  
Xihao Li ◽  
Yaowu Liu ◽  
Jincheng Shen ◽  
Han Chen ◽  
...  

AbstractWhole genome sequencing (WGS) studies are being widely conducted to identify rare variants associated with human diseases and disease-related traits. Classical single-marker association analyses for rare variants have limited power, and variant-set based analyses are commonly used to analyze rare variants. However, existing variant-set based approaches need to pre-specify genetic regions for analysis, and hence are not directly applicable to WGS data due to the large number of intergenic and intron regions that consist of a massive number of non-coding variants. The commonly used sliding window method requires pre-specifying fixed window sizes, which are often unknown as a priori, are difficult to specify in practice and are subject to limitations given genetic association region sizes are likely to vary across the genome and phenotypes. We propose a computationally-efficient and dynamic scan statistic method (Scan the Genome (SCANG)) for analyzing WGS data that flexibly detects the sizes and the locations of rare-variants association regions without the need of specifying a prior fixed window size. The proposed method controls the genome-wise type I error rate and accounts for the linkage disequilibrium among genetic variants. It allows the detected rare variants association region sizes to vary across the genome. Through extensive simulated studies that consider a wide variety of scenarios, we show that SCANG substantially outperforms several alternative rare-variant association detection methods while controlling for the genome-wise type I error rates. We illustrate SCANG by analyzing the WGS lipids data from the Atherosclerosis Risk in Communities (ARIC) study.


2020 ◽  
Author(s):  
Prisca K. Thami ◽  
Wonderful Choga ◽  
Delesa D. Mulisa ◽  
Collet Dandara ◽  
Andrey K. Shevchenko ◽  
...  

ABSTRACTDespite the high burden of HIV-1 in Botswana, the population of Botswana is significantly underrepresentation in host genetics studies of HIV-1. Furthermore, the bulk of previous genomics studies evaluated common human genetic variations, however, there is increasing evidence of the influence of rare variants in the outcome of diseases which may be uncovered by comprehensive complete and deep genome sequencing. This research aimed to evaluate the role of rare-variants in susceptibility to HIV-1 and progression through whole genome sequencing. Whole genome sequences (WGS) of 265 HIV-1 positive and 125 were HIV-1 negative unrelated individuals from Botswana were mapped to the human reference genome GRCh38. Population joint variant calling was performed using Genome Analysis Tool Kit (GATK) and BCFTools. Cumulative effects of rare variant sets on susceptibility to HIV-1 and progression (CD4+ T-cell decline) were determined with optimized Sequence Kernel Association Test (SKAT-O). In silico functional analysis of the prioritized variants was performed through gene-set enrichment using databases in GeneMANIA and Enrichr. Novel rare-variants within the ANKRD39 (8.48 × 10−8), LOC105378523 (7.45 × 10−7) and GTF3C3 (1.36 × 10−6) genes were significantly associated with HIV-1 progression. Functional analysis revealed that these genes are involved in viral translation and transcription. These findings highlight the significance of whole genome sequencing in pinpointing rare-variants of clinical relevance. The research contributes towards a deeper understanding of the host genetics HIV-1 and offers promise of population specific interventions against HIV-1.


2017 ◽  
Author(s):  
Stephan J. Sanders ◽  
Benjamin M. Neale ◽  
Hailiang Huang ◽  
Donna M. Werling ◽  
Joon-Yong An ◽  
...  

AbstractAs technology advances, whole genome sequencing (WGS) is likely to supersede other genotyping technologies. The rate of this change depends on its relative cost and utility. Variants identified uniquely through WGS may reveal novel biological pathways underlying complex disorders and provide high-resolution insight into when, where, and in which cell type these pathways are affected. Alternatively, cheaper and less computationally intensive approaches may yield equivalent insights. Understanding the role of rare variants in the noncoding gene-regulating genome, through pilot WGS projects, will be critical to determine which of these two extremes best represents reality. With large cohorts, well-defined risk loci, and a compelling need to understand the underlying biology, psychiatric disorders have a role to play in this preliminary WGS assessment. The WGSPD consortium will integrate data for 18,000 individuals with psychiatric disorders, beginning with autism spectrum disorder, schizophrenia, bipolar disorder, and major depressive disorder, along with over 150,000 controls.


Blood ◽  
2016 ◽  
Vol 128 (22) ◽  
pp. 2540-2540
Author(s):  
Ernest Turro ◽  
Nihr BioResource

Abstract Inherited bleeding, thrombotic and platelet disorders (BPDs) affect approximately 3M people worldwide and an appreciable portion have a disorder of megakaryopoiesis and the production and function of platelets, including the formation of granules. While genetic variants in 76 genes have been implicated in BPDs, many patients remain without a molecular diagnosis (Lentaigne et al, Simeoni et al). We hypothesised that some of these disorders may be caused by compound inheritance of variants in two different genes, a mode of inheritance thus far never implicated in BPDs. For the pilot phase of the 100 000 Genomes Project we have sequenced the whole genomes of 10,000 individuals consisting of probands with molecularly unexplained rare disorders and their close relatives, with 3,000 having inherited disorders of the blood and immune system. We searched for causal variants in known BPD-related genes and employed a new statistical method for Bayesian evaluation of rare variants in Mendelian disease (BeviMed) (Greene et al) to identify novel marginal associations between rare variants and disease status. Where the identified variants could not individually explain the phenotype in full within the pedigrees, we searched for additional variants affecting other BPD-related genes or novel genes identified using BeviMed. First, we have identified a large pedigree in which certain members with mild thrombocytopenia (lower 10th percentile of the population distribution) are affected by a single variant encoding a premature stop at residue 69 in the major isoform of the tropomyosin gene TPM4 expressed in megakaryocytes (Pleines et al). Other members of the family have distinctly more severe macrothrombocytopenia (with platelet counts as low as 24x109/l) and this is due to inheritance of a second variant in the actinin gene ACTN1 leading to a Thr340Met substitution, demonstrating a compound additive effect of variants in two genes encoding cytoskeletal proteins important for actin polymerization and thereby causing inadequate platelet formation. It is noteworthy that both of these genes have been identified in a genome wide association study of the count and mean volume of platelets (Gieger et al). Second, we used BeviMed to identify genes containing variants that are marginally associated with a syndrome defined by a platelet granule defect combined with familial autism (Bijl et al). The strongest association is due to splice variants in a granule-related gene on Chr17p13 but in all four unrelated cases at least one additional variant is required to explain the observed segregation patterns. Two of these unrelated cases harbor variants in two other marginally associated genes at Chr9q32 and Chr4q31 and a third harbors a de novo copy number variant. These additional variants likely explain why the children in three of the families are affected by this syndrome while their parents are not. In conclusion, we have used whole genome sequencing, pedigree building, detailed platelet phenotyping and new association approaches to identify the first cases of digenic inheritance of BPDs. Our results illustrate the cooperative role of different cytoskeletal proteins in platelet formation and cement the role of granule biology in the function of both platelets and neurons. References: Lentaigne et al (2016) Inherited platelet disorders: towards DNA-based diagnosis. Blood127(23) 2814-2823.Simeoni et al (2016) A high-throughput sequencing test for diagnosing inherited bleeding, thrombotic, and platelet disorders. Blood127(23) 2793-2803.Greene et al (2016) Bayesian evaluation of variant involvement in Mendelian disease. http://cran.r-project.org/web/packages/BeviMed.Pleines et al (2016) Tropomyosins regulate platelet biogenesis. (Under submission).Gieger et al (2011) New gene functions in megakaryopoiesis and platelet formation. Nature480(7376) 201-208Bijl et al (2015) Platelet studies in autism spectrum disorder patients and first-degree relatives. Molecular Autism6:57. Disclosures No relevant conflicts of interest to declare.


2021 ◽  
Author(s):  
Marcin Kierczak ◽  
Nima Rafati ◽  
Julia Höglund ◽  
Hadrien Gourle ◽  
Daniel Schmitz ◽  
...  

Abstract Despite the success in identifying effects of common genetic variants, using genome-wide association studies (GWAS), much of the genetic contribution to complex traits remains unexplained. Here, we analysed high coverage whole-genome sequencing (WGS) data, to evaluate the contribution of rare genetic variants to 414 plasma proteins. The frequency distribution of genetic variants was skewed towards the rare spectrum, and damaging variants were more often rare. However, only 2.24% of the heritability was estimated to be explained by rare variants. A gene-based approach, developed to also capture the effect of rare variants, identified associations for 249 of the proteins, which was 25% more as compared to a GWAS. Out of those, 24 associations were driven by rare variants, clearly highlighting the capacity of aggregated tests and WGS data. We conclude that, while many rare variants have considerable phenotypic effects, their contribution to the missing heritability is limited by their low frequencies.


2018 ◽  
Author(s):  
Yiding Ma ◽  
Peng Wei

AbstractDespite ongoing large-scale population-based whole-genome sequencing (WGS) projects such as the NIH NHLBI TOPMed program and the NHGRI Genome Sequencing Program, WGS-based association analysis of complex traits remains a tremendous challenge due to the large number of rare variants, many of which are non-trait-associated neutral variants. External biological knowledge, such as functional annotations based on ENCODE, may be helpful in distinguishing causal rare variants from neutral ones; however, each functional annotation can only provide certain aspects of the biological functions. Our knowledge for selecting informative annotations a priori is limited, and incorporating non-informative annotations will introduce noise and lose power. We propose FunSPU, a versatile and adaptive test that incorporates multiple biological annotations and is adaptive at both the annotation and variant levels and thus maintains high power even in the presence of noninformative annotations. In addition to extensive simulations, we illustrate our proposed test using the TWINSUK cohort (n=1,752) of UK10K WGS data based on six functional annotations: CADD, RegulomeDB, FunSeq, Funseq2, GERP++, and GenoSkyline. We identified genome-wide significant genetic loci on chromosome 19 near gene TOMM40 and APOC4-APOC2 associated with low-density lipoprotein (LDL), which are replicated in the UK10K ALSPAC cohort (n=1,497). These replicated LDL-associated loci were missed by existing rare variant association tests that either ignore external biological information or rely on a single source of biological knowledge. We have implemented the proposed test in an R package “FunSPU”.


2021 ◽  
Author(s):  
Sheila M. Gaynor ◽  
Kenneth E. Westerman ◽  
Lea L. Ackovic ◽  
Xihao Li ◽  
Zilin Li ◽  
...  

AbstractSummaryWe developed the STAAR WDL workflow to facilitate the analysis of rare variants in whole genome sequencing association studies. The open-access STAAR workflow written in the workflow description language (WDL) allows a user to perform rare variant testing for both gene-centric and genetic region approaches, enabling genome-wide, candidate, and conditional analyses. It incorporates functional annotations into the workflow as introduced in the STAAR method in order to boost the rare variant analysis power. This tool was specifically developed and optimized to be implemented on cloud-based platforms such as BioData Catalyst Powered by Terra. It provides easy-to-use functionality for rare variant analysis that can be incorporated into an exhaustive whole genome sequencing analysis pipeline.Availability and implementationThe workflow is freely available from https://dockstore.org/workflows/github.com/sheilagaynor/STAAR_workflow.


Pathogens ◽  
2021 ◽  
Vol 10 (3) ◽  
pp. 331
Author(s):  
Montserrat Palau ◽  
Núria Piqué ◽  
M. José Ramírez-Lázaro ◽  
Sergio Lario ◽  
Xavier Calvet ◽  
...  

Helicobacter pylori is a common pathogen associated with several severe digestive diseases. Although multiple virulence factors have been described, it is still unclear the role of virulence factors on H. pylori pathogenesis and disease progression. Whole genome sequencing could help to find genetic markers of virulence strains. In this work, we analyzed three complete genomes from isolates obtained at the same point in time from a stomach of a patient with adenocarcinoma, using multiple available bioinformatics tools. The genome analysis of the strains B508A-S1, B508A-T2A and B508A-T4 revealed that they were cagA, babA and sabB/hopO negative. The differences among the three genomes were mainly related to outer membrane proteins, methylases, restriction modification systems and flagellar biosynthesis proteins. The strain B508A-T2A was the only one presenting the genotype vacA s1, and had the most distinct genome as it exhibited fewer shared genes, higher number of unique genes, and more polymorphisms were found in this genome. With all the accumulated information, no significant differences were found among the isolates regarding virulence and origin of the isolates. Nevertheless, some B508A-T2A genome characteristics could be linked to the pathogenicity of H. pylori.


2019 ◽  
Vol 28 (9) ◽  
pp. 2192-2205 ◽  
Author(s):  
Liliana C. M. Salvador ◽  
Daniel J. O'Brien ◽  
Melinda K. Cosgrove ◽  
Tod P. Stuber ◽  
Angie M. Schooley ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document