Integrating co-expression networks with GWAS to prioritize causal genes in maize

AbstractBackgroundGenome wide association studies (GWAS) have identified thousands of loci linked to hundreds of traits in many different species. However, because linkage equilibrium implicates a broad region surrounding each identified locus, the causal genes often remain unknown. This problem is especially pronounced in non-human, non-model species where functional annotations are sparse and there is frequently little information available for prioritizing candidate genes.ResultsTo address this issue, we developed a computational approach called Camoco (Co-Analysis of Molecular Components) that systematically integrates loci identified by GWAS with gene co-expression networks to prioritize putative causal genes. We applied Camoco to prioritize candidate genes from a large-scale GWAS examining the accumulation of 17 different elements in maize seeds. Camoco identified statistically significant subnetworks for the majority of traits examined, producing a prioritized list of high-confidence causal genes for several agronomically important maize traits. Two candidate genes identified by our approach were validated through analysis of mutant phenotypes. Strikingly, we observed a strong dependence in the performance of our approach on the type of co-expression network used: expression variation across genetically diverse individuals in a relevant tissue context (in our case, maize roots) outperformed other alternatives.ConclusionsOur study demonstrates that co-expression networks can provide a powerful basis for prioritizing candidate causal genes from GWAS loci, but suggests that the success of such strategies can highly depend on the gene expression data context. Both the Camoco software and the lessons on integrating GWAS data with co-expression networks generalize to species beyond maize.

Download Full-text

Integrating comprehensive functional annotations to boost power and accuracy in gene-based association analysis

PLoS Genetics ◽

10.1371/journal.pgen.1009060 ◽

2020 ◽

Vol 16 (12) ◽

pp. e1009060

Author(s):

Corbin Quick ◽

Xiaoquan Wen ◽

Gonçalo Abecasis ◽

Michael Boehnke ◽

Hyun Min Kang

Keyword(s):

Association Analysis ◽

Association Studies ◽

Early Gene ◽

Genome Wide Association Studies ◽

Functional Annotations ◽

Regulatory Variants ◽

Causal Genes ◽

And Performance ◽

The Uk ◽

Coding Variants

Gene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.

Download Full-text

From Genotype to Phenotype: Through Chromatin

Genes ◽

10.3390/genes10020076 ◽

2019 ◽

Vol 10 (2) ◽

pp. 76 ◽

Cited By ~ 7

Author(s):

Julia Romanowska ◽

Anagha Joshi

Keyword(s):

Large Scale ◽

Association Studies ◽

Genetic Diseases ◽

Future Research ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Sequencing Technologies ◽

Complementary Approach ◽

Rare Genetic Diseases ◽

Causal Genes

Advances in sequencing technologies have enabled the exploration of the genetic basis for several clinical disorders by allowing identification of causal mutations in rare genetic diseases. Sequencing technology has also facilitated genome-wide association studies to gather single nucleotide polymorphisms in common diseases including cancer and diabetes. Sequencing has therefore become common in the clinic for both prognostics and diagnostics. The success in follow-up steps, i.e., mapping mutations to causal genes and therapeutic targets to further the development of novel therapies, has nevertheless been very limited. This is because most mutations associated with diseases lie in inter-genic regions including the so-called regulatory genome. Additionally, no genetic causes are apparent for many diseases including neurodegenerative disorders. A complementary approach is therefore gaining interest, namely to focus on epigenetic control of the disease to generate more complete functional genomic maps. To this end, several recent studies have generated large-scale epigenetic datasets in a disease context to form a link between genotype and phenotype. We focus DNA methylation and important histone marks, where recent advances have been made thanks to technology improvements, cost effectiveness, and large meta-scale epigenome consortia efforts. We summarize recent studies unravelling the mechanistic understanding of epigenetic processes in disease development and progression. Moreover, we show how methodology advancements enable causal relationships to be established, and we pinpoint the most important issues to be addressed by future research.

Download Full-text

Integrating Comprehensive Functional Annotations to Boost Power and Accuracy in Gene-Based Association Analysis

10.1101/732404 ◽

2019 ◽

Author(s):

Corbin Quick ◽

Xiaoquan Wen ◽

Gonçalo Abecasis ◽

Michael Boehnke ◽

Hyun Min Kang

Keyword(s):

Association Analysis ◽

Association Studies ◽

Early Gene ◽

Genome Wide Association Studies ◽

Functional Annotations ◽

Regulatory Variants ◽

Causal Genes ◽

And Performance ◽

The Uk ◽

Coding Variants

AbstractGene-based association tests aggregate genotypes across multiple variants for each gene, providing an interpretable gene-level analysis framework for genome-wide association studies (GWAS). Early gene-based test applications often focused on rare coding variants; a more recent wave of gene-based methods, e.g. TWAS, use eQTLs to interrogate regulatory associations. Regulatory variants are expected to be particularly valuable for gene-based analysis, since most GWAS associations to date are non-coding. However, identifying causal genes from regulatory associations remains challenging and contentious. Here, we present a statistical framework and computational tool to integrate heterogeneous annotations with GWAS summary statistics for gene-based analysis, applied with comprehensive coding and tissue-specific regulatory annotations. We compare power and accuracy identifying causal genes across single-annotation, omnibus, and annotation-agnostic gene-based tests in simulation studies and an analysis of 128 traits from the UK Biobank, and find that incorporating heterogeneous annotations in gene-based association analysis increases power and performance identifying causal genes.

Download Full-text

Prioritizing Parkinson’s Disease genes using population-scale transcriptomic data

10.1101/231001 ◽

2017 ◽

Cited By ~ 2

Author(s):

Yang I Li ◽

Garrett Wong ◽

Jack Humphrey ◽

Towfique Raj

Keyword(s):

Parkinson’S Disease ◽

Parkinson's Disease ◽

Large Scale ◽

Late Onset ◽

Association Studies ◽

Disease Genes ◽

Genome Wide Association Studies ◽

Gene Associations ◽

Causal Genes ◽

Underlying Mechanisms

AbstractGenome-wide association studies (GWAS) have identified over 41 susceptibility loci associated with late-onset Parkinson’s Disease (PD) but identifying putative causal genes and the underlying mechanisms remains challenging. To address this, we leveraged large-scale transcriptomic datasets to prioritize genes that are likely to affect PD. We found 29 gene associations in peripheral monocytes, and 44 gene associations whose expression or differential splicing in prefrontal cortex is associated with PD. This includes many novel genes but also known associations such as MAPT, for which we found that variation in exon 3 splicing explains the common genetic association. Genes identified in our analyses are more likely to interact physically with known PD genes and belong to the same or related pathways including lysosomal and innate immune function. Overall, our study provides a strong foundation for further mechanistic studies that will elucidate the molecular drivers of PD.

Download Full-text

Large-scale transcriptome-wide association study identifies new prostate cancer risk regions

10.1101/345736 ◽

2018 ◽

Author(s):

Nicholas Mancuso ◽

Simon Gayther ◽

Alexander Gusev ◽

Wei Zheng ◽

Kathryn L. Penney ◽

...

Keyword(s):

Prostate Cancer ◽

Association Study ◽

Large Scale ◽

Association Studies ◽

Probabilistic Approach ◽

Prostate Cancer Risk ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Causal Genes ◽

Credible Set

AbstractAlthough genome-wide association studies (GWAS) for prostate cancer (PrCa) have identified more than 100 risk regions, most of the risk genes at these regions remain largely unknown. Here, we integrate the largest PrCa GWAS (N=142,392) with gene expression measured in 45 tissues (N=4,458), including normal and tumor prostate, to perform a multi-tissue transcriptomewide association study (TWAS) for PrCa. We identify 235 genes at 87 independent 1Mb regions associated with PrCa risk, 9 of which are regions with no genome-wide significant SNP within 2Mb. 24 genes are significant in TWAS only for alternative splicing models in prostate tumor thus supporting the hypothesis of splicing driving risk for continued oncogenesis. Finally, we use a Bayesian probabilistic approach to estimate credible sets of genes containing the causal gene at pre-defined level; this reduced the list of 235 associations to 120 genes in the 90% credible set. Overall, our findings highlight the power of integrating expression with PrCa GWAS to identify novel risk loci and prioritize putative causal genes at known risk loci.

Download Full-text

The open targets post-GWAS analysis pipeline

Bioinformatics ◽

10.1093/bioinformatics/btaa020 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2936-2937 ◽

Cited By ~ 4

Author(s):

Gareth Peat ◽

William Jones ◽

Michael Nuhn ◽

José Carlos Marugán ◽

William Newell ◽

...

Keyword(s):

Drug Targets ◽

Gene Expression Regulation ◽

Association Studies ◽

Genome Wide Association Studies ◽

Protein Coding ◽

Data Resource ◽

Coding Regions ◽

Genome Wide ◽

Causal Genes ◽

Interactive Data

Abstract Motivation Genome-wide association studies (GWAS) are a powerful method to detect even weak associations between variants and phenotypes; however, many of the identified associated variants are in non-coding regions, and presumably influence gene expression regulation. Identifying potential drug targets, i.e. causal protein-coding genes, therefore, requires crossing the genetics results with functional data. Results We present a novel data integration pipeline that analyses GWAS results in the light of experimental epigenetic and cis-regulatory datasets, such as ChIP-Seq, Promoter-Capture Hi-C or eQTL, and presents them in a single report, which can be used for inferring likely causal genes. This pipeline was then fed into an interactive data resource. Availability and implementation The analysis code is available at www.github.com/Ensembl/postgap and the interactive data browser at postgwas.opentargets.io.

Download Full-text

A meta-analysis of genome-wide association studies for average daily gain and lean meat percentage in two Duroc pig populations

BMC Genomics ◽

10.1186/s12864-020-07288-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Shenping Zhou ◽

Rongrong Ding ◽

Fanming Meng ◽

Xingwang Wang ◽

Zhanwei Zhuang ◽

...

Keyword(s):

Candidate Genes ◽

Growth And Development ◽

Genetic Architecture ◽

Association Studies ◽

Meta Analysis ◽

Average Daily Gain ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Daily Gain

Abstract Background Average daily gain (ADG) and lean meat percentage (LMP) are the main production performance indicators of pigs. Nevertheless, the genetic architecture of ADG and LMP is still elusive. Here, we conducted genome-wide association studies (GWAS) and meta-analysis for ADG and LMP in 3770 American and 2090 Canadian Duroc pigs. Results In the American Duroc pigs, one novel pleiotropic quantitative trait locus (QTL) on Sus scrofa chromosome 1 (SSC1) was identified to be associated with ADG and LMP, which spans 2.53 Mb (from 159.66 to 162.19 Mb). In the Canadian Duroc pigs, two novel QTLs on SSC1 were detected for LMP, which were situated in 3.86 Mb (from 157.99 to 161.85 Mb) and 555 kb (from 37.63 to 38.19 Mb) regions. The meta-analysis identified ten and 20 additional SNPs for ADG and LMP, respectively. Finally, four genes (PHLPP1, STC1, DYRK1B, and PIK3C2A) were detected to be associated with ADG and/or LMP. Further bioinformatics analysis showed that the candidate genes for ADG are mainly involved in bone growth and development, whereas the candidate genes for LMP mainly participated in adipose tissue and muscle tissue growth and development. Conclusions We performed GWAS and meta-analysis for ADG and LMP based on a large sample size consisting of two Duroc pig populations. One pleiotropic QTL that shared a 2.19 Mb haplotype block from 159.66 to 161.85 Mb on SSC1 was found to affect ADG and LMP in the two Duroc pig populations. Furthermore, the combination of single-population and meta-analysis of GWAS improved the efficiency of detecting additional SNPs for the analyzed traits. Our results provide new insights into the genetic architecture of ADG and LMP traits in pigs. Moreover, some significant SNPs associated with ADG and/or LMP in this study may be useful for marker-assisted selection in pig breeding.

Download Full-text

A gene co-association network regulating gut microbial communities in a Duroc pig population

Microbiome ◽

10.1186/s40168-020-00994-8 ◽

2021 ◽

Vol 9 (1) ◽

Author(s):

Antonio Reverter ◽

Maria Ballester ◽

Pamela A. Alexandre ◽

Emilio Mármol-Sánchez ◽

Antoni Dalmau ◽

...

Keyword(s):

Microbial Communities ◽

Candidate Genes ◽

Relative Abundance ◽

Association Studies ◽

Single Gene ◽

Host Genome ◽

Genome Wide Association Studies ◽

Vaccine Response ◽

Microbiome Composition ◽

Host Genetic

Abstract Background Analyses of gut microbiome composition in livestock species have shown its potential to contribute to the regulation of complex phenotypes. However, little is known about the host genetic control over the gut microbial communities. In pigs, previous studies are based on classical “single-gene-single-trait” approaches and have evaluated the role of host genome controlling gut prokaryote and eukaryote communities separately. Results In order to determine the ability of the host genome to control the diversity and composition of microbial communities in healthy pigs, we undertook genome-wide association studies (GWAS) for 39 microbial phenotypes that included 2 diversity indexes, and the relative abundance of 31 bacterial and six commensal protist genera in 390 pigs genotyped for 70 K SNPs. The GWAS results were processed through a 3-step analytical pipeline comprised of (1) association weight matrix; (2) regulatory impact factor; and (3) partial correlation and information theory. The inferred gene regulatory network comprised 3561 genes (within a 5 kb distance from a relevant SNP–P < 0.05) and 738,913 connections (SNP-to-SNP co-associations). Our findings highlight the complexity and polygenic nature of the pig gut microbial ecosystem. Prominent within the network were 5 regulators, PRDM15, STAT1, ssc-mir-371, SOX9 and RUNX2 which gathered 942, 607, 588, 284 and 273 connections, respectively. PRDM15 modulates the transcription of upstream regulators of WNT and MAPK-ERK signaling to safeguard naive pluripotency and regulates the production of Th1- and Th2-type immune response. The signal transducer STAT1 has long been associated with immune processes and was recently identified as a potential regulator of vaccine response to porcine reproductive and respiratory syndrome. The list of regulators was enriched for immune-related pathways, and the list of predicted targets includes candidate genes previously reported as associated with microbiota profile in pigs, mice and human, such as SLIT3, SLC39A8, NOS1, IL1R2, DAB1, TOX3, SPP1, THSD7B, ELF2, PIANP, A2ML1, and IFNAR1. Moreover, we show the existence of host-genetic variants jointly associated with the relative abundance of butyrate producer bacteria and host performance. Conclusions Taken together, our results identified regulators, candidate genes, and mechanisms linked with microbiome modulation by the host. They further highlight the value of the proposed analytical pipeline to exploit pleiotropy and the crosstalk between bacteria and protists as significant contributors to host-microbiome interactions and identify genetic markers and candidate genes that can be incorporated in breeding program to improve host-performance and microbial traits.

Download Full-text

Common genetic variants with fetal effects on birth weight are enriched for proximity to genes implicated in rare developmental disorders

Human Molecular Genetics ◽

10.1093/hmg/ddab060 ◽

2021 ◽

Author(s):

Robin N Beaumont ◽

Isabelle K Mayne ◽

Rachel M Freathy ◽

Caroline F Wright

Keyword(s):

Birth Weight ◽

Statistical Power ◽

Developmental Disorders ◽

Association Studies ◽

Later Life ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Genome Wide ◽

Common Genetic Variants ◽

Causal Genes

Abstract Birth weight is an important factor in newborn survival; both low and high birth weights are associated with adverse later-life health outcomes. Genome-wide association studies (GWAS) have identified 190 loci associated with maternal or fetal effects on birth weight. Knowledge of the underlying causal genes is crucial to understand how these loci influence birth weight and the links between infant and adult morbidity. Numerous monogenic developmental syndromes are associated with birth weights at the extreme ends of the distribution. Genes implicated in those syndromes may provide valuable information to prioritize candidate genes at the GWAS loci. We examined the proximity of genes implicated in developmental disorders (DDs) to birth weight GWAS loci using simulations to test whether they fall disproportionately close to the GWAS loci. We found birth weight GWAS single nucleotide polymorphisms (SNPs) fall closer to such genes than expected both when the DD gene is the nearest gene to the birth weight SNP and also when examining all genes within 258 kb of the SNP. This enrichment was driven by genes causing monogenic DDs with dominant modes of inheritance. We found examples of SNPs in the intron of one gene marking plausible effects via different nearby genes, highlighting the closest gene to the SNP not necessarily being the functionally relevant gene. This is the first application of this approach to birth weight, which has helped identify GWAS loci likely to have direct fetal effects on birth weight, which could not previously be classified as fetal or maternal owing to insufficient statistical power.

Download Full-text

Optimized permutation testing for information theoretic measures of multi-gene interactions

BMC Bioinformatics ◽

10.1186/s12859-021-04107-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

James M. Kunert-Graf ◽

Nikita A. Sakhanenko ◽

David J. Galas

Keyword(s):

Large Scale ◽

Permutation Test ◽

Association Studies ◽

Genome Wide Association Studies ◽

Permutation Testing ◽

Exact Test ◽

Information Theoretic ◽

Information Theoretic Measures ◽

Full Analysis ◽

Computational Bottleneck

Abstract Background Permutation testing is often considered the “gold standard” for multi-test significance analysis, as it is an exact test requiring few assumptions about the distribution being computed. However, it can be computationally very expensive, particularly in its naive form in which the full analysis pipeline is re-run after permuting the phenotype labels. This can become intractable in multi-locus genome-wide association studies (GWAS), in which the number of potential interactions to be tested is combinatorially large. Results In this paper, we develop an approach for permutation testing in multi-locus GWAS, specifically focusing on SNP–SNP-phenotype interactions using multivariable measures that can be computed from frequency count tables, such as those based in Information Theory. We find that the computational bottleneck in this process is the construction of the count tables themselves, and that this step can be eliminated at each iteration of the permutation testing by transforming the count tables directly. This leads to a speed-up by a factor of over 103 for a typical permutation test compared to the naive approach. Additionally, this approach is insensitive to the number of samples making it suitable for datasets with large number of samples. Conclusions The proliferation of large-scale datasets with genotype data for hundreds of thousands of individuals enables new and more powerful approaches for the detection of multi-locus genotype-phenotype interactions. Our approach significantly improves the computational tractability of permutation testing for these studies. Moreover, our approach is insensitive to the large number of samples in these modern datasets. The code for performing these computations and replicating the figures in this paper is freely available at https://github.com/kunert/permute-counts.

Download Full-text