scholarly journals Intrinsic DNA topology as a prioritization metric in genomic fine-mapping studies

2020 ◽  
Vol 48 (20) ◽  
pp. 11304-11321
Author(s):  
Hannah C Ainsworth ◽  
Timothy D Howard ◽  
Carl D Langefeld

Abstract In genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation resources can present challenges as many lack information for novel variants and/or may be uninformative for non-coding regions. We propose a novel annotation source, sequence-dependent DNA topology, as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied Minor Groove Width (MGW) as a prioritization metric. Using an established MGW-prediction method, we generated a MGW census for 199 038 197 SNPs across the human genome. Summarizing a SNP’s change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We tested this hypothesis in 116 regions analyzed by a Massively Parallel Reporter Assay and observed enrichment of large ΔMGW for functional polymorphisms (P = 0.0007). To illustrate application in fine-mapping studies, we applied our MGW-prioritization approach to three non-coding regions associated with systemic lupus erythematosus. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.

2019 ◽  
Author(s):  
Hannah C. Ainsworth ◽  
Timothy D. Howard ◽  
Carl D. Langefeld

AbstractIn genomic fine-mapping studies, some approaches leverage annotation data to prioritize likely functional polymorphisms. However, existing annotation sources often present challenges as many: lack data for novel variants, offer no context for noncoding regions, and/or are confounded with linkage disequilibrium. We propose a novel annotation source – sequence-dependent DNA topology – as a prioritization metric for fine-mapping. DNA topology and function are well-intertwined, and as an intrinsic DNA property, it is readily applicable to any genomic region. Here, we constructed and applied, Minor Groove Width (MGW), as a prioritization metric. Using an established MGW-prediction method, we generated an MGW census for 199,038,197 SNPs across the human genome. Summarizing a SNP’s change in MGW (ΔMGW) as a Euclidean distance, ΔMGW exhibited a strongly right-skewed distribution, highlighting the infrequency of SNPs that generate dissimilar shape profiles. We hypothesized that phenotypically-associated SNPs can be prioritized by ΔMGW. We applied Bayesian and frequentist MGW-prioritization approaches to three non-coding regions associated with System Lupus Erythematosus in multiple ancestries. In two regions, including ΔMGW resolved the association to a single, trans-ancestral, SNP, corroborated by external functional data. Together, this study presents the first usage of sequence-dependent DNA topology as a prioritization metric in genomic association studies.Graphical AbstractWe hypothesize that SNPs imposing dissimilar minor groove width profiles (ΔMGW) are more likely to alter function. ΔMGW was interrogated genome-wide and then used as a weighting metric for fine-mapping associations.


2021 ◽  
Vol 23 (Supplement_6) ◽  
pp. vi1-vi1
Author(s):  
Kristen Drucker ◽  
Connor Yanchus ◽  
Thomas Kollmeyer ◽  
Asma Ali ◽  
Decker Paul ◽  
...  

Abstract BACKGROUND Determination of the causation of germline single nucleotide polymorphisms (SNPs) located in non-coding regions of the genome is challenging. The genomic region of 8q24 has been identified as important in many kinds of cancer, linked to a topologically associated domain (TAD) encompassing MYC; this TAD contains a GWAS SNP (rs55705857) associated with IDH-mutant glioma. METHODS Germline genotyping data from 622 IDH-mutant glioma and 668 controls were used to fine map the rs55705857 locus by detailed haplotype analysis. Chromatin immunoprecipitation sequencing (ChIP-seq) of histone markers H3K4me1, H3K4me3, H3K27ac and H3K36me3 was performed on normal brain samples (n=8) and human glioma samples (n=11 IDH-wt and 52 IDH-mut). RNAseq from 9 normal and 83 brain tumors (n=26 IDH-wt and 55 IDH-mut) were used to assess differential gene expression. RESULTS Fine-mapping identified rs55705857 SNP as the most likely causative allele (OR=8.69; p<0.001) within 8q24 for the development of IDH-mutant glioma. At rs55705857, both H3K27ac and H3K4me1 in IDH-mutant vs IDH-wt tumors were increased 3.05- and 1.58-fold, respectively (DiffBind; p=5.81×10-7 and p=2.31×10-3). ChromHMM analysis of the marks indicated that promoter and enhancer functions were significantly increased, and the activity broadened at rs55705857 in IDH-mut gliomas compared to IDH-wt tumors and normal brain samples. This enhancement correlated with significant increased MYC expression in IDH-mut gliomas (p=3.1×10-13), as well as alterations of Myc signaling targets. Publicly available ATACseq, ChIPseq and long-range DNA interaction data demonstrated that the rs55705857 locus is open and interacts with the MYC promoter. CONCLUSIONS Fine-mapping of the 8q24 locus provided strong evidence that rs55705857 is the causative 8q24 locus associated with IDH-mut glioma. Functional experiments suggest that IDH mutation facilitates rs55705857 interaction with MYC to alter downstream MYC targets.


2018 ◽  
Vol 77 (7) ◽  
pp. 1078-1084 ◽  
Author(s):  
Yong-Fei Wang ◽  
Yan Zhang ◽  
Zhengwei Zhu ◽  
Ting-You Wang ◽  
David L Morris ◽  
...  

ObjectivesSystemic lupus erythematosus (SLE) is a prototype autoimmune disease with a strong genetic component in its pathogenesis. Through genome-wide association studies (GWAS), we recently identified 10 novel loci associated with SLE and uncovered a number of suggestive loci requiring further validation. This study aimed to validate those loci in independent cohorts and evaluate the role of SLE genetics in drug repositioning.MethodsWe conducted GWAS and replication studies involving 12 280 SLE cases and 18 828 controls, and performed fine-mapping analyses to identify likely causal variants within the newly identified loci. We further scanned drug target databases to evaluate the role of SLE genetics in drug repositioning.ResultsWe identified three novel loci that surpassed genome-wide significance, including ST3AGL4 (rs13238909, pmeta=4.40E-08), MFHAS1 (rs2428, pmeta=1.17E-08) and CSNK2A2 (rs2731783, pmeta=1.08E-09). We also confirmed the association of CD226 locus with SLE (rs763361, pmeta=2.45E-08). Fine-mapping and functional analyses indicated that the putative causal variants in CSNK2A2 locus reside in an enhancer and are associated with expression of CSNK2A2 in B-lymphocytes, suggesting a potential mechanism of association. In addition, we demonstrated that SLE risk genes were more likely to be interacting proteins with targets of approved SLE drugs (OR=2.41, p=1.50E-03) which supports the role of genetic studies to repurpose drugs approved for other diseases for the treatment of SLE.ConclusionThis study identified three novel loci associated with SLE and demonstrated the role of SLE GWAS findings in drug repositioning.


2021 ◽  
Vol 12 ◽  
Author(s):  
Binglan Li ◽  
Marylyn D. Ritchie

Since their inception, genome-wide association studies (GWAS) have identified more than a hundred thousand single nucleotide polymorphism (SNP) loci that are associated with various complex human diseases or traits. The majority of GWAS discoveries are located in non-coding regions of the human genome and have unknown functions. The valley between non-coding GWAS discoveries and downstream affected genes hinders the investigation of complex disease mechanism and the utilization of human genetics for the improvement of clinical care. Meanwhile, advances in high-throughput sequencing technologies reveal important genomic regulatory roles that non-coding regions play in the transcriptional activities of genes. In this review, we focus on data integrative bioinformatics methods that combine GWAS with functional genomics knowledge to identify genetically regulated genes. We categorize and describe two types of data integrative methods. First, we describe fine-mapping methods. Fine-mapping is an exploratory approach that calibrates likely causal variants underneath GWAS signals. Fine-mapping methods connect GWAS signals to potentially causal genes through statistical methods and/or functional annotations. Second, we discuss gene-prioritization methods. These are hypothesis generating approaches that evaluate whether genetic variants regulate genes via certain genetic regulatory mechanisms to influence complex traits, including colocalization, mendelian randomization, and the transcriptome-wide association study (TWAS). TWAS is a gene-based association approach that investigates associations between genetically regulated gene expression and complex diseases or traits. TWAS has gained popularity over the years due to its ability to reduce multiple testing burden in comparison to other variant-based analytic approaches. Multiple types of TWAS methods have been developed with varied methodological designs and biological hypotheses over the past 5 years. We dive into discussions of how TWAS methods differ in many aspects and the challenges that different TWAS methods face. Overall, TWAS is a powerful tool for identifying complex trait-associated genes. With the advent of single-cell sequencing, chromosome conformation capture, gene editing technologies, and multiplexing reporter assays, we are expecting a more comprehensive understanding of genomic regulation and genetically regulated genes underlying complex human diseases and traits in the future.


2020 ◽  
Vol 36 (9) ◽  
pp. 2936-2937 ◽  
Author(s):  
Gareth Peat ◽  
William Jones ◽  
Michael Nuhn ◽  
José Carlos Marugán ◽  
William Newell ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


2020 ◽  
Vol 79 (Suppl 1) ◽  
pp. 221.1-222
Author(s):  
E. Eliopoulos ◽  
G. Goulielmos ◽  
M. Matalliotakis ◽  
D. Vlachakis ◽  
T. Niewold ◽  
...  

Background:Gene association studies and genome wide association studies (GWAS) have played a primary role in depicting genetic contributions to systemic lupus erythematosus (SLE) development, while accommodating the exonic polymorphisms on the protein structure level, when available, enhances our understanding of protein function modification or depletion. Linking human genetics with therapeutic targets requires the biological function of the causal gene and variant to be known.Objectives:To investigate recently identified SLE-associated functional gene polymorphisms, such asPARP1,ITGAM, TNFAIP3, NCF1, PON1, IFIH1, SH2B3andTYK2[1-4] by correlation to protein structure and function.Methods:Three-dimensional (3D) homology modeling and molecular mechanics/dynamics studies were applied for the localization of the polymorphisms under study on the respective proteins. The mutants were constructed using molecular modeling with the program Maestro (Schrodinger, LLC), which was also used to analyze the conformational changes caused by the mutation. All figures depicting 3D models were created using the molecular graphics program PyMOL V.2.2 [5].Results:Modeling revealed that rs1136410 SNP encodes the less common polymorphism Val762Ala onPARP1that reduces enzymatic activity of Poly(ADP-ribose) polymerase 1 (Figure 1),ITGAMpolymorphism rs1143679 (Arg77His) on Integrin alpha M, component of the macrophage-1 antigen complex affects protein surface recognition,TNFAIP3rs2230926 polymorphism encodes Cys instead of Phe at residue 127 of the ubiquitin editing A20 protein, while rs201802880 polymorphism of the neutrophil cytosolic factor 1 (NCF1) gene modifies the function of the cytosolic subunit of neutrophil NADPH oxidase with the mutation Arg90His.PON1is involved in the oxidative stress process that cause tissue damage observed in SLE and anti-phospholipid syndrome (APS). ThePON1Gln192Arg mutation (rs662 SNP) affects shape and recognition of the ligand recognition site as part of the evolutionary process, whileIFIH1(rs35667974) helicase C domain1 mutant I923V is located on an essential RNA beta loop interacting directly with the nucleic acid (Figure 2). Finally, the rs3184504 SNP ofSH2B3gene generates mutant Arg262Trp on SH2 adapter protein 3, acting as a signaling pathway involved in autoimmune disorders, while inTYK2 gene, one of the Janus kinases, the rs35018800 producing mutant Ala928Val modifies the ADP binding site.Figure 1.Details of the Val762 interaction where V762A mutation occurs in PARP1protein.Figure 2.Nucleic acid interacting IFIH1 helicase beta-loop where I923V mutation occurs (in purple).Conclusion:Based on several examples, we have tried to define a rational link from SLE-associated gene polymorphisms to structure and to modified function, including metagenomic analysis of SNPs, protein crystallography, protein molecular modeling, molecular mechanics and dynamics. Locating, shaping and understanding the target protein interaction interface plays a decisive role in most cases and provides clues for further pharmacological or medical actions [6].References:[1]Hur JW et al (2006). Rheumatology 45:711-7[2]Maiti AK et al (2014). Hum Mol Genet 23:4161-76[3]Shimane K et al (2010). Arthritis Rheum. 62:574-9[4]Linge P et al (2019). Ann Rheum Dis. 2019 Nov 8. pii: annrheumdis-2019-215820[5]Schrödinger LLC: The PyMOL Molecular Graphics System 2016 version 2.2. Available from: pymol.org/2/support.html[6]Plenge RM et al (2013). Nat Rev Drug Discov 12:581–94Disclosure of Interests:None declared


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Kevin K. Esoh ◽  
Tobias O. Apinjoh ◽  
Steven G. Nyanjom ◽  
Ambroise Wonkam ◽  
Emile R. Chimusa ◽  
...  

AbstractInferences from genetic association studies rely largely on the definition and description of the underlying populations that highlight their genetic similarities and differences. The clustering of human populations into subgroups (population structure) can significantly confound disease associations. This study investigated the fine-scale genetic structure within Cameroon that may underlie disparities observed with Cameroonian ethnicities in malaria genome-wide association studies in sub-Saharan Africa. Genotype data of 1073 individuals from three regions and three ethnic groups in Cameroon were analyzed using measures of genetic proximity to ascertain fine-scale genetic structure. Model-based clustering revealed distinct ancestral proportions among the Bantu, Semi-Bantu and Foulbe ethnic groups, while haplotype-based coancestry estimation revealed possible longstanding and ongoing sympatric differentiation among individuals of the Foulbe ethnic group, and their Bantu and Semi-Bantu counterparts. A genome scan found strong selection signatures in the HLA gene region, confirming longstanding knowledge of natural selection on this genomic region in African populations following immense disease pressure. Signatures of selection were also observed in the HBB gene cluster, a genomic region known to be under strong balancing selection in sub-Saharan Africa due to its co-evolution with malaria. This study further supports the role of evolution in shaping genomes of Cameroonian populations and reveals fine-scale hierarchical structure among and within Cameroonian ethnicities that may impact genetic association studies in the country.


2021 ◽  
Vol 80 (Suppl 1) ◽  
pp. 1041.1-1041
Author(s):  
V. Agarwal ◽  
S. Kakati ◽  
P. Debbaruah

Background:SNP rs7574865, located within the third intron of STAT4 gene at chromosome 2, has been associated with susceptibility to SLE among different ethnic groups.1,2 Interestingly, we recently have documented an association between this gene and susceptibility to systemic lupus erythematosus (SLE) in Indian population.3Objectives:To determine whether the STAT4 (rs7574865) SNP is associated with clinical and immunological manifestations in SLE.Methods:The study was carried out on 100 unrelated SLE (SLICC criteria 2012) patients from North-East India. Genotyping of STAT4 rs7574865 SNP was done using Taqman probe and Real-Time Polymerase chain reaction. An association study was performed between the alleles and genotypes of STAT4 rs7574865 with the clinical and immunological manifestations included in the SLE SLICC classification criteria. For all analysis, the statistical significance was fixed at 5% level of significance (p < 0.05).Results:The mean duration of illness was 2.69±2.55 years. Cases and Controls remained in Hardy-Weinberg equilibrium.The occurrence of Photosensitivity and hyperpigmentation was significantly higher in TT genotype group (97.22% and 77.77%, respectively) with p <0.001 in each case.SLE patients with nephritis (Albuminuria >500mg/24 hours) and elevated serum creatinine were both significantly higher in TT genotype group as compared to GT and GG (p< 0.001 and p=0.001 respectively).The Anti-dsDNA antibody was significantly associated with TT genotype (p <0.001).Conclusion:Our study provides evidence regarding the association between STAT4 rs7574865 gene polymorphism is risk factor for cutaneous manifestations, Lupus nephritis and Anti ds-DNA positivity in SLE. So, our findings reinforce the need for further association studies including prospective studies with larger subjects in order to replicate such findings.References:[1]Graham RR, Ph D, Hom G, Ph D, Behrens TW, Bakker PIW De, et al. and the Risk of Rheumatoid Arthritis and Systemic Lupus Erythematosus. N Engl J Med. 2007;357(10):977–86.[2]Yuan H, Feng JB, Pan HF, Qiu LX, Li LH, Zhang N, et al. A meta-analysis of the association of STAT4 polymorphism with systemic lupus erythematosus. Mod Rheumatol. 2010;20(3):257–62.[3]Gupta V, Kumar S, Pratap A, Singh R, Kumari R, Kumar S, et al. Association of ITGAM, TNFSF4, TNFAIP3 and STAT4 gene polymorphisms with risk of systemic lupus erythematosus in a North Indian population. Lupus. 2018;27(12):1973–9.Disclosure of Interests:None declared


Author(s):  
Jianhua Wang ◽  
Dandan Huang ◽  
Yao Zhou ◽  
Hongcheng Yao ◽  
Huanhuan Liu ◽  
...  

Abstract Genome-wide association studies (GWASs) have revolutionized the field of complex trait genetics over the past decade, yet for most of the significant genotype-phenotype associations the true causal variants remain unknown. Identifying and interpreting how causal genetic variants confer disease susceptibility is still a big challenge. Herein we introduce a new database, CAUSALdb, to integrate the most comprehensive GWAS summary statistics to date and identify credible sets of potential causal variants using uniformly processed fine-mapping. The database has six major features: it (i) curates 3052 high-quality, fine-mappable GWAS summary statistics across five human super-populations and 2629 unique traits; (ii) estimates causal probabilities of all genetic variants in GWAS significant loci using three state-of-the-art fine-mapping tools; (iii) maps the reported traits to a powerful ontology MeSH, making it simple for users to browse studies on the trait tree; (iv) incorporates highly interactive Manhattan and LocusZoom-like plots to allow visualization of credible sets in a single web page more efficiently; (v) enables online comparison of causal relations on variant-, gene- and trait-levels among studies with different sample sizes or populations and (vi) offers comprehensive variant annotations by integrating massive base-wise and allele-specific functional annotations. CAUSALdb is freely available at http://mulinlab.org/causaldb.


Sign in / Sign up

Export Citation Format

Share Document