scholarly journals Integration of methylation QTL and enhancer–target gene maps with schizophrenia GWAS summary results identifies novel genes

2019 ◽  
Vol 35 (19) ◽  
pp. 3576-3583 ◽  
Author(s):  
Chong Wu ◽  
Wei Pan

Abstract Motivation Most trait-associated genetic variants identified in genome-wide association studies (GWASs) are located in non-coding regions of the genome and thought to act through their regulatory roles. Results To account for enriched association signals in DNA regulatory elements, we propose a novel and general gene-based association testing strategy that integrates enhancer-target gene pairs and methylation quantitative trait locus data with GWAS summary results; it aims to both boost statistical power for new discoveries and enhance mechanistic interpretability of any new discovery. By reanalyzing two large-scale schizophrenia GWAS summary datasets, we demonstrate that the proposed method could identify some significant and novel genes (containing no genome-wide significant SNPs nearby) that would have been missed by other competing approaches, including the standard and some integrative gene-based association methods, such as one incorporating enhancer-target gene pairs and one integrating expression quantitative trait loci. Availability and implementation Software: wuchong.org/egmethyl.html Supplementary information Supplementary data are available at Bioinformatics online.

2018 ◽  
Vol 35 (14) ◽  
pp. 2512-2514 ◽  
Author(s):  
Bongsong Kim ◽  
Xinbin Dai ◽  
Wenchao Zhang ◽  
Zhaohong Zhuang ◽  
Darlene L Sanchez ◽  
...  

Abstract Summary We present GWASpro, a high-performance web server for the analyses of large-scale genome-wide association studies (GWAS). GWASpro was developed to provide data analyses for large-scale molecular genetic data, coupled with complex replicated experimental designs such as found in plant science investigations and to overcome the steep learning curves of existing GWAS software tools. GWASpro supports building complex design matrices, by which complex experimental designs that may include replications, treatments, locations and times, can be accounted for in the linear mixed model. GWASpro is optimized to handle GWAS data that may consist of up to 10 million markers and 10 000 samples from replicable lines or hybrids. GWASpro provides an interface that significantly reduces the learning curve for new GWAS investigators. Availability and implementation GWASpro is freely available at https://bioinfo.noble.org/GWASPRO. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Vol 12 ◽  
Author(s):  
Minglong Cai ◽  
Tao Yuan ◽  
He Huang ◽  
Lan Gui ◽  
Li Zhang ◽  
...  

Vitiligo is a multifactorial polygenic disorder, characterized by acquired depigmented skin and overlying hair resulting from the destruction of melanocytes. Genome-wide association studies (GWASs) of vitiligo have identified approximately 100 genetic variants. However, the identification of functional genes and their regulatory elements remains a challenge. To prioritize putative functional genes and DNAm sites, we performed a Summary data-based Mendelian Randomization (SMR) and heterogeneity in dependent instruments (HEIDI) test to integrate omics summary statistics from GWAS, expression quantitative trait locus (eQTL), and methylation quantitative trait loci (meQTL) analysis of large sample size. By integrating omics data, we identified two newly putative functional genes (SPATA2L and CDK10) associated with vitiligo and further validated CDK10 by qRT-PCR in independent samples. We also identified 17 vitiligo-associated DNA methylation (DNAm) sites in Chr16, of which cg05175606 was significantly associated with the expression of CDK10 and vitiligo. Colocalization analyses detected transcript of CDK10 in the blood and skin colocalizing with cg05175606 at single nucleotide polymorphism (SNP) rs77651727. Our findings revealed that a shared genetic variant rs77651727 alters the cg05175606 as well as up-regulates gene expression of CDK10 and further decreases the risk of vitiligo.


BMC Cancer ◽  
2020 ◽  
Vol 20 (1) ◽  
Author(s):  
Stephen Cristiano ◽  
David McKean ◽  
Jacob Carey ◽  
Paige Bracci ◽  
Paul Brennan ◽  
...  

Abstract Background Germline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging due to biological and technical sources of heterogeneity that vary across the genome within and between samples. Methods We developed an approach called CNPBayes to identify latent batch effects in genome-wide association studies involving copy number, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. Results Applying a hidden Markov model (HMM) to identify CNVs in a large multi-site Pancreatic Cancer Case Control study (PanC4) of 7598 participants, we found CNV inference was highly sensitive to technical noise that varied appreciably among participants. Applying CNPBayes to this dataset, we found that the major sources of technical variation were linked to sample processing by the centralized laboratory and not the individual study sites. Modeling the latent batch effects at each CNV region hierarchically, we developed probabilistic estimates of copy number that were directly incorporated in a Bayesian regression model for pancreatic cancer risk. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Suppressor Candidate 3 (TUSC3). Conclusions Laboratory effects may not account for the major sources of technical variation in genome-wide association studies. This study provides a robust Bayesian inferential framework for identifying latent batch effects, estimating copy number, and evaluating the role of copy number in heritable diseases.


Genes ◽  
2021 ◽  
Vol 12 (8) ◽  
pp. 1175
Author(s):  
Amarni L. Thomas ◽  
Judith Marsman ◽  
Jisha Antony ◽  
William Schierding ◽  
Justin M. O’Sullivan ◽  
...  

The RUNX1/AML1 gene encodes a developmental transcription factor that is an important regulator of haematopoiesis in vertebrates. Genetic disruptions to the RUNX1 gene are frequently associated with acute myeloid leukaemia. Gene regulatory elements (REs), such as enhancers located in non-coding DNA, are likely to be important for Runx1 transcription. Non-coding elements that modulate Runx1 expression have been investigated over several decades, but how and when these REs function remains poorly understood. Here we used bioinformatic methods and functional data to characterise the regulatory landscape of vertebrate Runx1. We identified REs that are conserved between human and mouse, many of which produce enhancer RNAs in diverse tissues. Genome-wide association studies detected single nucleotide polymorphisms in REs, some of which correlate with gene expression quantitative trait loci in tissues in which the RE is active. Our analyses also suggest that REs can be variant in haematological malignancies. In summary, our analysis identifies features of the RUNX1 regulatory landscape that are likely to be important for the regulation of this gene in normal and malignant haematopoiesis.


Author(s):  
Le Wang ◽  
Fei Sun ◽  
Zi Yi Wan ◽  
Baoqing Ye ◽  
Yanfei Wen ◽  
...  

Abstract Resolving the genomic basis underlying phenotypic variations is a question of great importance in evolutionary biology. However, understanding how genotypes determine the phenotypes is still challenging. Centuries of artificial selective breeding for beauty and aggression resulted in a plethora of colors, long fin varieties, and hyper-aggressive behavior in the air-breathing Siamese fighting fish (Betta splendens), supplying an excellent system for studying the genomic basis of phenotypic variations. Combining whole genome sequencing, QTL mapping, genome-wide association studies and genome editing, we investigated the genomic basis of huge morphological variation in fins and striking differences in coloration in the fighting fish. Results revealed that the double tail, elephant ear, albino and fin spot mutants each were determined by single major-effect loci. The elephant ear phenotype was likely related to differential expression of a potassium ion channel gene, kcnh8. The albinotic phenotype was likely linked to a cis-regulatory element acting on the mitfa gene and the double tail mutant was suggested to be caused by a deletion in a zic1/zic4 co-enhancer. Our data highlight that major loci and cis-regulatory elements play important roles in bringing about phenotypic innovations and establish Bettas as new powerful model to study the genomic basis of evolved changes.


2012 ◽  
Vol 215 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Georg Homuth ◽  
Alexander Teumer ◽  
Uwe Völker ◽  
Matthias Nauck

The metabolome, defined as the reflection of metabolic dynamics derived from parameters measured primarily in easily accessible body fluids such as serum, plasma, and urine, can be considered as the omics data pool that is closest to the phenotype because it integrates genetic influences as well as nongenetic factors. Metabolic traits can be related to genetic polymorphisms in genome-wide association studies, enabling the identification of underlying genetic factors, as well as to specific phenotypes, resulting in the identification of metabolome signatures primarily caused by nongenetic factors. Similarly, correlation of metabolome data with transcriptional or/and proteome profiles of blood cells also produces valuable data, by revealing associations between metabolic changes and mRNA and protein levels. In the last years, the progress in correlating genetic variation and metabolome profiles was most impressive. This review will therefore try to summarize the most important of these studies and give an outlook on future developments.


2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.


2018 ◽  
Author(s):  
Zhou Shaoqun ◽  
Karl A. Kremling ◽  
Bandillo Nonoy ◽  
Richter Annett ◽  
Ying K. Zhang ◽  
...  

One Sentence SummaryHPLC-MS metabolite profiling of maize seedlings, in combination with genome-wide association studies, identifies numerous quantitative trait loci that influence the accumulation of foliar metabolites.AbstractCultivated maize (Zea mays) retains much of the genetic and metabolic diversity of its wild ancestors. Non-targeted HPLC-MS metabolomics using a diverse panel of 264 maize inbred lines identified a bimodal distribution in the prevalence of foliar metabolites. Although 15% of the detected mass features were present in >90% of the inbred lines, the majority were found in <50% of the samples. Whereas leaf bases and tips were differentiated primarily by flavonoid abundance, maize varieties (stiff-stalk, non-stiff-stalk, tropical, sweet corn, and popcorn) were differentiated predominantly by benzoxazinoid metabolites. Genome-wide association studies (GWAS), performed for 3,991 mass features from the leaf tips and leaf bases, showed that 90% have multiple significantly associated loci scattered across the genome. Several quantitative trait locus hotspots in the maize genome regulate the abundance of multiple, often metabolically related mass features. The utility of maize metabolite GWAS was demonstrated by confirming known benzoxazinoid biosynthesis genes, as well as by mapping isomeric variation in the accumulation of phenylpropanoid hydroxycitric acid esters to a single linkage block in a citrate synthase-like gene. Similar to gene expression databases, this metabolomic GWAS dataset constitutes an important public resource for linking maize metabolites with biosynthetic and regulatory genes.


PLoS Genetics ◽  
2021 ◽  
Vol 17 (1) ◽  
pp. e1009315
Author(s):  
Ardalan Naseri ◽  
Junjie Shi ◽  
Xihong Lin ◽  
Shaojie Zhang ◽  
Degui Zhi

Inference of relationships from whole-genome genetic data of a cohort is a crucial prerequisite for genome-wide association studies. Typically, relationships are inferred by computing the kinship coefficients (ϕ) and the genome-wide probability of zero IBD sharing (π0) among all pairs of individuals. Current leading methods are based on pairwise comparisons, which may not scale up to very large cohorts (e.g., sample size >1 million). Here, we propose an efficient relationship inference method, RAFFI. RAFFI leverages the efficient RaPID method to call IBD segments first, then estimate the ϕ and π0 from detected IBD segments. This inference is achieved by a data-driven approach that adjusts the estimation based on phasing quality and genotyping quality. Using simulations, we showed that RAFFI is robust against phasing/genotyping errors, admix events, and varying marker densities, and achieves higher accuracy compared to KING, the current leading method, especially for more distant relatives. When applied to the phased UK Biobank data with ~500K individuals, RAFFI is approximately 18 times faster than KING. We expect RAFFI will offer fast and accurate relatedness inference for even larger cohorts.


2020 ◽  
Author(s):  
Katherina C. Chua ◽  
Chenling Xiong ◽  
Carol Ho ◽  
Taisei Mushiroda ◽  
Chen Jiang ◽  
...  

AbstractMicrotubule targeting agents (MTAs) are anticancer therapies commonly prescribed for breast cancer and other solid tumors. Sensory peripheral neuropathy (PN) is the major dose-limiting toxicity for MTAs and can limit clinical efficacy. The current pharmacogenomic study aimed to identify genetic variations that explain patient susceptibility and drive mechanisms underlying development of MTA-induced PN. A meta-analysis of genome-wide association studies (GWAS) from two clinical cohorts treated with MTAs (CALGB 40502 and CALGB 40101) was conducted using a Cox regression model with cumulative dose to first instance of grade 2 or higher PN. Summary statistics from a GWAS of European subjects (n = 469) in CALGB 40502 that estimated cause-specific risk of PN were meta-analyzed with those from a previously published GWAS of European ancestry (n = 855) from CALGB 40101 that estimated the risk of PN. Novel single nucleotide polymorphisms in an enhancer region downstream of sphingosine-1-phosphate receptor 1 (S1PR1 encoding S1PR1; e.g., rs74497159, βCALGB40101 per allele log hazard ratio (95% CI) = 0.591 (0.254 - 0.928), βCALGB40502 per allele log hazard ratio (95% CI) = 0.693 (0.334 - 1.053); PMETA = 3.62×10−7) were the most highly ranked associations based on P-values with risk of developing grade 2 and higher PN. In silico functional analysis identified multiple regulatory elements and potential enhancer activity for S1PR1 within this genomic region. Inhibition of S1PR1 function in iPSC-derived human sensory neurons shows partial protection against paclitaxel-induced neurite damage. These pharmacogenetic findings further support ongoing clinical evaluations to target S1PR1 as a therapeutic strategy for prevention and/or treatment of MTA-induced neuropathy.


Sign in / Sign up

Export Citation Format

Share Document