scholarly journals Diverse Functions Associate With Non-Coding Trans-species Polymorphisms in Humans

Author(s):  
Keila Velazquez-Arcelay ◽  
Mary Lauren Benton ◽  
John A. Capra

Abstract Background: Long-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. Variants shared between species, hereafter “trans-species polymorphisms” (TSPs), often result from LTBS due to host-pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred candidate TSPs have been identified in humans and chimpanzees; however, because many are in non-coding regions of the genome, the functions and adaptive roles for most TSPs remain unknown. Results: We integrated diverse genomic annotations, with a focus on non-coding regions, to explore the functions of 125 previously identified regions containing multiple TSPs in humans and chimpanzees. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS). We identify functional annotations for 119 TSP regions, including 71 with evidence of gene regulatory function from GTEx or genome-wide functional genomics data and 21 with evidence of trait association from GWAS and PheWAS. TSPs in humans associate with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body mass, alcohol intake, urate levels, chronotype, and risk-taking behavior. Conclusions: The diversity of traits associated with non-coding human TSPs further support previous hypotheses that functions beyond the immune system are subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in great ape populations, such as the importance of variation in sleep cycles and risk sensitivity.

2021 ◽  
Author(s):  
Keila Velázquez-Arcelay ◽  
Mary Lauren Benton ◽  
John A. Capra

AbstractLong-term balancing selection (LTBS) can maintain allelic variation at a locus over millions of years and through speciation events. Variants shared between species, hereafter “trans-species polymorphisms” (TSPs), often result from LTBS due to host-pathogen interactions. For instance, the major histocompatibility complex (MHC) locus contains TSPs present across primates. Several hundred TSPs have been identified in humans and chimpanzees; however, because many are in non-coding regions of the genome, the functions and adaptive roles for most TSPs remain unknown. We integrated diverse genomic annotations to explore the functions of 125 previously identified non-coding TSPs that are likely under LTBS since the common ancestor of humans and chimpanzees. We analyzed genome-wide functional assays, expression quantitative trait loci (eQTL), genome-wide association studies (GWAS), and phenome-wide association studies (PheWAS). We identify functional annotations for 119 TSP regions, including 71 with evidence of gene regulatory function from GTEx or genome-wide functional genomics data and 21 with evidence of trait association from GWAS and PheWAS. TSPs in humans associate with many immune system phenotypes, including response to pathogens, but we also find associations with a range of other phenotypes, including body mass, alcohol intake, urate levels, chronotype, and risk-taking behavior. The diversity of traits associated with non-coding human TSPs suggest that functions beyond the immune system are often subject to LTBS. Furthermore, several of these trait associations provide support and candidate genetic loci for previous hypothesis about behavioral diversity in great ape populations, such as the importance of variation in sleep cycles and risk sensitivity.Significance statementMost genetic variants present in human populations are young (<100,000 years old); however, a few hundred are millions of years old with origins before the divergence of humans and chimpanzees. These trans-species polymorphisms (TSPs) were likely maintained by balancing selection—evolutionary pressure to maintain genetic diversity at a locus. However, the functions driving this selection, especially for non-coding TSPs, are largely unknown. We integrate genome-wide annotation strategies to demonstrate TSP associations with immune system function, behavior (addition, cognition, risky behavior), uric acid metabolism, and many other phenotypes. These results substantially expand our understanding of functions TSPs and suggest a substantial role for balancing selection beyond the immune system.


2020 ◽  
Vol 36 (9) ◽  
pp. 2936-2937 ◽  
Author(s):  
Gareth Peat ◽  
William Jones ◽  
Michael Nuhn ◽  
José Carlos Marugán ◽  
William Newell ◽  
...  

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.


2011 ◽  
Vol 26 (S2) ◽  
pp. 1346-1346
Author(s):  
D. Benmessaoud ◽  
A.-M. Lepagnol-Bestel ◽  
M. Delepine ◽  
J. Hager ◽  
J.-M. Moalic ◽  
...  

Genome wide association studies (GWAS) of Schizophrenia (SZ) patients have identified common variants in ten genes including SMARCA2 (Koga et al., HMG, 2009). We found that the SZ-GWAS genes are part of an interacting network centered on SMARCA2 (Loe-Mie et al., HMG, 2010). Furthermore, SMARCA2 was found disrupted in SZ (Walsh et al., Science, 2008). SMARCA2 encodes the ATPase (BRM) of the SWI/SNF chromatin remodeling complex that is at the interface of genome and environmental adaptation.Taking advantage of an Algerian trio cohort of one hundred SZ patients (Benmessaoud et al., BMC Psychiatry, 2008), we replicated the association of SNP rs2296212 localized in exon 33, already shown associated in Koga study and resulting in D1546E amino acid change in the SMARCA2 protein. We studied SMARCA2 codons and found that exon 33 displays a signature of positive evolution in the primate lineage.Our working hypothesis is that the coding regions displaying positive selection are target of novel rare variants. To address this question, we sequenced two exons displaying positive evolution and one exon without evidence of positive evolution.We found (i) that rare variants are significantly in excess in SZ-patients compared to their parents (p = 0.038, Fisher test) and (ii) a higher proportion of rare variants in the primate-accelerated exons compared with the non-evolutionary exon in SZ-patients (p = 0.032, Fisher test).SMARCA2 exon sequencing and whole exome sequencing from patients harboring SNP rs2296212 common variant are under progress. Altogether, these results are expected to give new insights into the genetic architecture of SZ.


2019 ◽  
Vol 35 (22) ◽  
pp. 4724-4729 ◽  
Author(s):  
Wujuan Zhong ◽  
Cassandra N Spracklen ◽  
Karen L Mohlke ◽  
Xiaojing Zheng ◽  
Jason Fine ◽  
...  

Abstract Summary Tens of thousands of reproducibly identified GWAS (Genome-Wide Association Studies) variants, with the vast majority falling in non-coding regions resulting in no eventual protein products, call urgently for mechanistic interpretations. Although numerous methods exist, there are few, if any methods, for simultaneously testing the mediation effects of multiple correlated SNPs via some mediator (e.g. the expression of a gene in the neighborhood) on phenotypic outcome. We propose multi-SNP mediation intersection-union test (SMUT) to fill in this methodological gap. Our extensive simulations demonstrate the validity of SMUT as well as substantial, up to 92%, power gains over alternative methods. In addition, SMUT confirmed known mediators in a real dataset of Finns for plasma adiponectin level, which were missed by many alternative methods. We believe SMUT will become a useful tool to generate mechanistic hypotheses underlying GWAS variants, facilitating functional follow-up. Availability and implementation The R package SMUT is publicly available from CRAN at https://CRAN.R-project.org/package=SMUT. Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Clare Bycroft ◽  
Colin Freeman ◽  
Desislava Petkova ◽  
Gavin Band ◽  
Lloyd T. Elliott ◽  
...  

AbstractThe UK Biobank project is a large prospective cohort study of ~500,000 individuals from across the United Kingdom, aged between 40-69 at recruitment. A rich variety of phenotypic and health-related information is available on each participant, making the resource unprecedented in its size and scope. Here we describe the genome-wide genotype data (~805,000 markers) collected on all individuals in the cohort and its quality control procedures. Genotype data on this scale offers novel opportunities for assessing quality issues, although the wide range of ancestries of the individuals in the cohort also creates particular challenges. We also conducted a set of analyses that reveal properties of the genetic data – such as population structure and relatedness – that can be important for downstream analyses. In addition, we phased and imputed genotypes into the dataset, using computationally efficient methods combined with the Haplotype Reference Consortium (HRC) and UK10K haplotype resource. This increases the number of testable variants by over 100-fold to ~96 million variants. We also imputed classical allelic variation at 11 human leukocyte antigen (HLA) genes, and as a quality control check of this imputation, we replicate signals of known associations between HLA alleles and many common diseases. We describe tools that allow efficient genome-wide association studies (GWAS) of multiple traits and fast phenome-wide association studies (PheWAS), which work together with a new compressed file format that has been used to distribute the dataset. As a further check of the genotyped and imputed datasets, we performed a test-case genome-wide association scan on a well-studied human trait, standing height.


2019 ◽  
Author(s):  
James Boocock ◽  
Megan Leask ◽  
Yukinori Okada ◽  
Hirotaka Matsuo ◽  
Yusuke Kawamura ◽  
...  

AbstractSerum urate is the end-product of purine metabolism. Elevated serum urate is causal of gout and a predictor of renal disease, cardiovascular disease and other metabolic conditions. Genome-wide association studies (GWAS) have reported dozens of loci associated with serum urate control, however there has been little progress in understanding the molecular basis of the associated loci. Here we employed trans-ancestral meta-analysis using data from European and East Asian populations to identify ten new loci for serum urate levels. Genome-wide colocalization with cis-expression quantitative trait loci (eQTL) identified a further five new loci. By cis- and trans-eQTL colocalization analysis we identified 24 and 20 genes respectively where the causal eQTL variant has a high likelihood that it is shared with the serum urate-associated locus. One new locus identified was SLC22A9 that encodes organic anion transporter 7 (OAT7). We demonstrate that OAT7 is a very weak urate-butyrate exchanger. Newly implicated genes identified in the eQTL analysis include those encoding proteins that make up the dystrophin complex, a scaffold for signaling proteins and transporters at the cell membrane; MLXIP that, with the previously identified MLXIPL, is a transcription factor that may regulate serum urate via the pentose-phosphate pathway; and MRPS7 and IDH2 that encode proteins necessary for mitochondrial function. Trans-ancestral functional fine-mapping identified six loci (RREB1, INHBC, HLF, UBE2Q2, SFMBT1, HNF4G) with colocalized eQTL that contained putative causal SNPs (posterior probability of causality > 0.8). This systematic analysis of serum urate GWAS loci has identified candidate causal genes at 19 loci and a network of previously unidentified genes likely involved in control of serum urate levels, further illuminating the molecular mechanisms of urate control.Author SummaryHigh serum urate is a prerequisite for gout and a risk factor for metabolic disease. Previous GWAS have identified numerous loci that are associated with serum urate control, however, only a small handful of these loci have known molecular consequences. The majority of loci are within the non-coding regions of the genome and therefore it is difficult to ascertain how these variants might influence serum urate levels without tangible links to gene expression and / or protein function. We have applied a novel bioinformatic pipeline where we combined population-specific GWAS data with gene expression and genome connectivity information to identify putative causal genes for serum urate associated loci. Overall, we identified 15 novel serum urate loci and show that these loci along with previously identified loci are linked to the expression of 44 genes. We show that some of the variants within these loci have strong predicted regulatory function which can be further tested in functional analyses. This study expands on previous GWAS by identifying further loci implicated in serum urate control and new causal mechanisms supported by gene expression changes.


eLife ◽  
2018 ◽  
Vol 7 ◽  
Author(s):  
Mushtak Kisko ◽  
Nadia Bouain ◽  
Alaeddine Safi ◽  
Anna Medici ◽  
Robert C Akkers ◽  
...  

All living organisms require a variety of essential elements for their basic biological functions. While the homeostasis of nutrients is highly intertwined, the molecular and genetic mechanisms of these dependencies remain poorly understood. Here, we report a discovery of a molecular pathway that controls phosphate (Pi) accumulation in plants under Zn deficiency. Using genome-wide association studies, we first identified allelic variation of the Lyso-PhosphatidylCholine (PC) AcylTransferase 1 (LPCAT1) gene as the key determinant of shoot Pi accumulation under Zn deficiency. We then show that regulatory variation at the LPCAT1 locus contributes significantly to this natural variation and we further demonstrate that the regulation of LPCAT1 expression involves bZIP23 TF, for which we identified a new binding site sequence. Finally, we show that in Zn deficient conditions loss of function of LPCAT1 increases the phospholipid Lyso-PhosphatidylCholine/PhosphatidylCholine ratio, the expression of the Pi transporter PHT1;1, and that this leads to shoot Pi accumulation.


2019 ◽  
Vol 78 (8) ◽  
pp. 1062-1069 ◽  
Author(s):  
Tomoki Motegi ◽  
Yuta Kochi ◽  
Koichi Matsuda ◽  
Michiaki Kubo ◽  
Kazuhiko Yamamoto ◽  
...  

ObjectiveAlthough genome-wide association studies (GWAS) have identified approximately 100 loci for rheumatoid arthritis (RA), the disease mechanisms are not completely understood. We evaluated the pathogenesis of RA by focusing on rare coding variants.MethodsThe coding regions of 98 candidate genes identified by GWAS were sequenced in 2294 patients with RA and 4461 controls in Japan. An association analysis was performed using cases and controls for variants, genes and domains of TYK2. Cytokine responses for two associated variants (R231W, rs201917359; and R703W, rs55882956) in TYK2 as well as a previously reported risk variant (P1004A, rs34536443) for multiple autoimmune diseases were evaluated by reporter assays.ResultsA variant in TYK2 (R703W) showed a suggestive association (p=5.47×10−8, OR=0.48). We observed more accumulation of rare coding variants in controls in TYK2 (p=3.94×10−12, OR=0.56). The four-point-one, ezrin, radixin, moesin (FERM; 2.14×10−3, OR=0.66) and pseudokinase domains (1.63×10−8, OR=0.52) of TYK2 also showed enrichment of variants in controls. R231W in FERM domain especially reduced interleukin (IL)-6 and interferon (IFN)-γ signalling, whereas P1104A in kinase domain reduced IL-12, IL-23 and IFN-α signalling. R703W in pseudokinase domain reduced cytokine signals similarly to P1104A, but the effects were weaker than those of P1104A.ConclusionsThe FERM and pseudokinase domains in TYK2 were associated with the risk of RA in the Japanese population. Variants in TYK2 had different effects on cytokine signalling, suggesting that the regulation of selective cytokine signalling is a target for RA treatment.


2016 ◽  
Vol 2016 ◽  
pp. 1-6 ◽  
Author(s):  
Tianjiao Zhang ◽  
Yang Hu ◽  
Xiaoliang Wu ◽  
Rui Ma ◽  
Qinghua Jiang ◽  
...  

Many disease-related single nucleotide polymorphisms (SNPs) have been inferred from genome-wide association studies (GWAS) in recent years. Numerous studies have shown that some SNPs located in protein-coding regions are associated with numerous diseases by affecting gene expression. However, in noncoding regions, the mechanism of how SNPs contribute to disease susceptibility remains unclear. Enhancer elements are functional segments of DNA located in noncoding regions that play an important role in regulating gene expression. The SNPs located in enhancer elements may affect gene expression and lead to disease. We presented a method for identifying liver cancer-related enhancer SNPs through integrating GWAS and histone modification ChIP-seq data. We identified 22 liver cancer-related enhancer SNPs, 9 of which were regulatory SNPs involved in distal transcriptional regulation. The results highlight that these enhancer SNPs may play important roles in liver cancer.


Sign in / Sign up

Export Citation Format

Share Document