scholarly journals find-tfbs: a tool to identify functional non-coding variants associated with complex human traits using open chromatin maps and phased whole-genome sequences

2020 ◽  
Author(s):  
Sébastian Méric de Bellefon ◽  
Florian Thibord ◽  
Paul L. Auer ◽  
John Blangero ◽  
Zeynep H Coban-Akdemir ◽  
...  

AbstractMotivationWhole-genome DNA sequencing (WGS) enables the discovery of non-coding variants, but tools are lacking to prioritize the subset that functionally impacts human phenotypes. DNA sequence variants that disrupt or create transcription factor binding sites (TFBS) can modulate gene expression. find-tfbs efficiently scans phased WGS in large cohorts to identify and count TFBSs in regulatory sequences. This information can then be used in association testing to find putatively functional non-coding variants associated with complex human diseases or traits.ResultsWe applied find-tfbs to discover functional non-coding variants associated with hematological traits in the NHLBI Trans-Omics for Precision Medicine (TOPMed) WGS dataset (Nmax=44,709). We identified >2000 associations at P<1×10−9, implicating specific blood cell-types, transcription factors and causal genes. The vast majority of these associations are captured by variants identified in large genome-wide association studies (GWAS) for blood-cell traits. find-tfbs is computationally efficient and robust, allowing for the rapid identification of non-coding variants associated with multiple human phenotypes in very large sample size.Availabilityhttps://github.com/Helkafen/find-tfbs and https://github.com/Helkafen/[email protected] and [email protected] informationSupplementary data are available.

2021 ◽  
Author(s):  
Marsha M. Wheeler ◽  
Adrienne M Stilp ◽  
Shuquan Rao ◽  
Bjarni V Halldorsson ◽  
Doruk V Beyter ◽  
...  

Genome-wide association studies (GWAS) have identified thousands of single nucleotide variants and small indels that contribute to the genetic architecture of hematologic traits. While structural variants (SVs) are known to cause rare blood or hematopoietic disorders, the genome-wide contribution of SVs to quantitative blood cell trait variation is unknown. Here we utilized SVs detected from whole genome sequencing (WGS) in ancestrally diverse participants of the NHLBI TOPMed program (N=50,675). Using single variant tests, we assessed the association of common and rare SVs with red cell-, white cell-, and platelet-related quantitative traits. The results show 33 independent SVs (23 common and 10 rare) reaching genome-wide significance. The majority of significant association signals (N=27) replicated in independent datasets from deCODE genetics and the UK BioBank. Moreover, most trait-associated SVs (N=24) are within 1Mb of previously-reported GWAS loci. SV analyses additionally discovered an association between a complex structural variant on 17p11.2 and white blood cell-related phenotypes. Based on functional annotation, the majority of significant SVs are located in non-coding regions (N=26) and predicted to impact regulatory elements and/or local chromatin domain boundaries in blood cells. We predict that several trait-associated SVs represent the causal variant. This is supported by genome-editing experiments which provide evidence that a deletion associated with lower monocyte counts leads to disruption of an S1PR3 monocyte enhancer and decreased S1PR3 expression.


2018 ◽  
Author(s):  
John A Lees ◽  
Marco Galardini ◽  
Stephen D Bentley ◽  
Jeffrey N Weiser ◽  
Jukka Corander

AbstractSummaryGenome-wide association studies (GWAS) in microbes face different challenges to eukaryotes and have been addressed by a number of different methods. pyseer brings these techniques together in one package tailored to microbial GWAS, allows greater flexibility of the input data used, and adds new methods to interpret the association results.Availability and Implementationpyseer is written in python and is freely available at https://github.com/mgalardini/pyseer, or can be installed through pip. Documentation and a tutorial are available at http://[email protected] and [email protected] informationSupplementary data are available online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4440-4448 ◽  
Author(s):  
Zhenqin Wu ◽  
Nilah M Ioannidis ◽  
James Zou

Abstract Summary Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Non-coding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in genome-wide association studies (GWAS) analyses. Predicting the regulatory effects of non-coding variants on candidate genes is a key step in evaluating their clinical significance. Here, we develop a machine-learning algorithm, Inference of Connected expression quantitative trait loci (eQTLs) (IRT), to predict the regulatory targets of non-coding variants identified in studies of eQTLs. We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. IRT achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally validated regulatory variants shows a significant enrichment in IRT identifying the true target genes versus negative controls. In gene-ranking experiments, IRT achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC-content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. IRT can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies. Availability and implementation Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 78 (4) ◽  
pp. 446-453 ◽  
Author(s):  
Yukinori Okada ◽  
Stephen Eyre ◽  
Akari Suzuki ◽  
Yuta Kochi ◽  
Kazuhiko Yamamoto

Study of the genetics of rheumatoid arthritis (RA) began about four decades ago with the discovery of HLA-DRB1. Since the beginning of this century, a number of non-HLA risk loci have been identified through genome-wide association studies (GWAS). We now know that over 100 loci are associated with RA risk. Because genetic information implies a clear causal relationship to the disease, research into the pathogenesis of RA should be promoted. However, only 20% of GWAS loci contain coding variants, with the remaining variants occurring in non-coding regions, and therefore, the majority of causal genes and causal variants remain to be identified. The use of epigenetic studies, high-resolution mapping of open chromatin, chromosomal conformation technologies and other approaches could identify many of the missing links between genetic risk variants and causal genetic components, thus expanding our understanding of RA genetics.


2019 ◽  
Author(s):  
Audrey Lemaçon ◽  
Marie-Pier Scott-Boyer ◽  
Penny Soucy ◽  
Régis Ongaro-Carcy ◽  
Jacques Simard ◽  
...  

AbstractOne of the most challenging tasks of the post-genome-wide association studies (GWAS) research era is the identification of functional variants among those associated with a trait for an observed GWAS signal. Several methods have been developed to evaluate the potential functional implications of genetic variants. Each of these tools has its own scoring system which forces users to become acquainted with each approach to interpret their results. From an awareness of the amount of work needed to analyze and integrate results for a single locus, we proposed a flexible and versatile approach designed to help the prioritization of variants by aggregating the predictions of their potential functional implications. This approach has been made available through a web interface called DSNetwork which acts as a single-point of entry to almost 60 reference predictors for both coding and non-coding variants and displays predictions in an easy-to-interpret visualization. We confirmed the usefulness of our methodology by successfully identifying functional variants in four breast cancer susceptibility loci. DSNetwork is an integrative web application implemented through the Shiny framework and available at: http://romix.genome.ulaval.ca/dsnetwork.Author summaryOver the past years, GWAS have enabled the identification of numerous susceptibility loci associated with complex traits (https://www.ebi.ac.uk/gwas/). However, many of those signals contain hundreds or even thousands of significantly associated variants among which only a few are really responsible of the phenotype. Substantial efforts have been made in the development of prediction methods to prioritize variants within GWAS-associated regions to go from statistical associations, to the identification of functional variants modulating gene expression, in order to ultimately gain insight into disease pathophysiology. Unfortunately, these numerous prediction tools generate contradictory predictions rendering the interpretation of results challenging. Some tools such as VEP [McLaren et al., 2016] report their scores using a color scheme, thus acknowledging the need to assist the user in the interpretation of predictor results. Nonetheless, the multiplication of approaches can often result in an extensive amount of data that is hard to synthesize. Aware of the challenge of evaluating the potential deleteriousness of variants in the context of fine mapping analyses, we created a customizable visualization approach that was implemented it in the decision support tool called DSNetwork for Decision Support Network. This tool enables quick access to gold standard and new predictors for both coding and non-coding variants through an easily interpretable visualization of these predictions for a set of variants.


2019 ◽  
Author(s):  
Moli Huang ◽  
Yunpeng Wang ◽  
Manqiu Yang ◽  
Jun Yan ◽  
Henry Yang ◽  
...  

Abstract Summary Cancer hallmarks rely on its specific transcriptional programs, which are dysregulated by multiple mechanisms, including genomic aberrations in the DNA regulatory regions. Genome-wide association studies have shown many variants are found within putative enhancer elements. To provide insights into the regulatory role of enhancer-associated non-coding variants in cancer epigenome, and to facilitate the identification of functional non-coding mutations, we present dbInDel, a database where we have comprehensively analyzed enhancer-associated insertion and deletion variants for both human and murine samples using ChIP-Seq data. Moreover, we provide the identification and visualization of upstream TF binding motifs in InDel-containing enhancers. Downstream target genes are also predicted and analyzed in the context of cancer biology. The dbInDel database promotes the investigation of functional contributions of non-coding variants in cancer epigenome. Availability and implementation The database, dbInDel, can be accessed from http://enhancer-indel.cam-su.org/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Qiuming Yao ◽  
Paolo Ferragina ◽  
Yakir Reshef ◽  
Guillaume Lettre ◽  
Daniel E Bauer ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) have identified thousands of common trait-associated genetic variants but interpretation of their function remains challenging. These genetic variants can overlap the binding sites of transcription factors (TFs) and therefore could alter gene expression. However, we currently lack a systematic understanding on how this mechanism contributes to phenotype. Results We present Motif-Raptor, a TF-centric computational tool that integrates sequence-based predictive models, chromatin accessibility, gene expression datasets and GWAS summary statistics to systematically investigate how TF function is affected by genetic variants. Given trait associated non-coding variants, Motif-Raptor can recover relevant cell types and critical TFs to drive hypotheses regarding their mechanism of action. We tested Motif-Raptor on complex traits such as rheumatoid arthritis and red blood cell count and demonstrated its ability to prioritize relevant cell types, potential regulatory TFs and non-coding SNPs which have been previously characterized and validated. Availability Motif-Raptor is freely available as a Python package at: https://github.com/pinellolab/MotifRaptor. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
pp. 1-10
Author(s):  
Sophie E. Legge ◽  
Marcos L. Santoro ◽  
Sathish Periyasamy ◽  
Adeniran Okewole ◽  
Arsalan Arsalan ◽  
...  

Abstract Schizophrenia is a severe psychiatric disorder with high heritability. Consortia efforts and technological advancements have led to a substantial increase in knowledge of the genetic architecture of schizophrenia over the past decade. In this article, we provide an overview of the current understanding of the genetics of schizophrenia, outline remaining challenges, and summarise future directions of research. World-wide collaborations have resulted in genome-wide association studies (GWAS) in over 56 000 schizophrenia cases and 78 000 controls, which identified 176 distinct genetic loci. The latest GWAS from the Psychiatric Genetics Consortium, available as a pre-print, indicates that 270 distinct common genetic loci have now been associated with schizophrenia. Polygenic risk scores can currently explain around 7.7% of the variance in schizophrenia case-control status. Rare variant studies have implicated eight rare copy-number variants, and an increased burden of loss-of-function variants in SETD1A, as increasing the risk of schizophrenia. The latest exome sequencing study, available as a pre-print, implicates a burden of rare coding variants in a further nine genes. Gene-set analyses have demonstrated significant enrichment of both common and rare genetic variants associated with schizophrenia in synaptic pathways. To address current challenges, future genetic studies of schizophrenia need increased sample sizes from more diverse populations. Continued expansion of international collaboration will likely identify new genetic regions, improve fine-mapping to identify causal variants, and increase our understanding of the biology and mechanisms of schizophrenia.


2021 ◽  
Vol 11 (3) ◽  
pp. 195
Author(s):  
Yitang Sun ◽  
Jingqi Zhou ◽  
Kaixiong Ye

Increasing evidence shows that white blood cells are associated with the risk of coronavirus disease 2019 (COVID-19), but the direction and causality of this association are not clear. To evaluate the causal associations between various white blood cell traits and the COVID-19 susceptibility and severity, we conducted two-sample bidirectional Mendelian Randomization (MR) analyses with summary statistics from the largest and most recent genome-wide association studies. Our MR results indicated causal protective effects of higher basophil count, basophil percentage of white blood cells, and myeloid white blood cell count on severe COVID-19, with odds ratios (OR) per standard deviation increment of 0.75 (95% CI: 0.60–0.95), 0.70 (95% CI: 0.54–0.92), and 0.85 (95% CI: 0.73–0.98), respectively. Neither COVID-19 severity nor susceptibility was associated with white blood cell traits in our reverse MR results. Genetically predicted high basophil count, basophil percentage of white blood cells, and myeloid white blood cell count are associated with a lower risk of developing severe COVID-19. Individuals with a lower genetic capacity for basophils are likely at risk, while enhancing the production of basophils may be an effective therapeutic strategy.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Chao-Yu Guo ◽  
Reng-Hong Wang ◽  
Hsin-Chou Yang

AbstractAfter the genome-wide association studies (GWAS) era, whole-genome sequencing is highly engaged in identifying the association of complex traits with rare variations. A score-based variance-component test has been proposed to identify common and rare genetic variants associated with complex traits while quickly adjusting for covariates. Such kernel score statistic allows for familial dependencies and adjusts for random confounding effects. However, the etiology of complex traits may involve the effects of genetic and environmental factors and the complex interactions between genes and the environment. Therefore, in this research, a novel method is proposed to detect gene and gene-environment interactions in a complex family-based association study with various correlated structures. We also developed an R function for the Fast Gene-Environment Sequence Kernel Association Test (FGE-SKAT), which is freely available as supplementary material for easy GWAS implementation to unveil such family-based joint effects. Simulation studies confirmed the validity of the new strategy and the superior statistical power. The FGE-SKAT was applied to the whole genome sequence data provided by Genetic Analysis Workshop 18 (GAW18) and discovered concordant and discordant regions compared to the methods without considering gene by environment interactions.


Sign in / Sign up

Export Citation Format

Share Document