cageminer: an R/Bioconductor package to prioritize candidate genes by integrating GWAS and gene coexpression networks

Mapping Intimacies ◽

10.1101/2021.08.04.455037 ◽

2021 ◽

Author(s):

Fabricio Almeida-Silva ◽

Thiago M. Venancio

Keyword(s):

Candidate Genes ◽

Expression Profiles ◽

Association Studies ◽

Real Data ◽

R Package ◽

Genome Wide Association Studies ◽

High Confidence ◽

Data Set ◽

Gene Coexpression ◽

Coexpression Networks

Summary: Although genome-wide association studies (GWAS) identify variants associated with traits of interest, they often fail in identifying causative genes underlying a given phenotype. Integrating GWAS and gene coexpression networks can help prioritize high-confidence candidate genes, as the expression profiles of trait-associated genes can be used to mine novel candidates. Here, we present cageminer, the first R package to prioritize candidate genes through the integration of GWAS and coexpression networks. Genes are considered high-confidence candidates if they pass all three filtering criteria implemented in cageminer, namely physical proximity to SNPs, coexpression with known trait-associated genes, and significant changes in expression levels in conditions of interest. Prioritized candidates can also be scored and ranked to select targets for experimental validation. By applying cageminer to a real data set, we demonstrate that it can effectively prioritize candidates, leading to >99% reductions in candidate gene lists. Availability and implementation: The package is available at Bioconductor (http://bioconductor.org/packages/cageminer).

Download Full-text

Comprehensive analysis and genome-wide association studies of biomass, chlorophyll, seed and salinity tolerance related traits in rice highlight genetic hotspots for crop improvement

10.1101/2020.12.24.424354 ◽

2020 ◽

Author(s):

Md Nafis Ul Alam ◽

G.M. Nurnabi Azad Jewel ◽

Tomalika Azim ◽

Zeba I. Seraj

Keyword(s):

Candidate Genes ◽

Sequence Data ◽

Expression Profiles ◽

Association Studies ◽

Crop Improvement ◽

Gene Expression Profiles ◽

Fixed Effect Model ◽

Specific Gene ◽

Genome Wide Association Studies ◽

Multiple Trait

AbstractFarmland is on the decline and worldwide food security is at risk. Rice is the staple of choice for over half the Earth’s people. To sustain current demands and ascertain a food secure future, substandard farmland affected by abiotic stresses must be utilized. For rapid crop improvement, a broader understanding of polygenic traits like stress tolerance and crop yield is indispensable. To this end, the hidden diversity of resilient and neglected wild varieties must be traced back to their genetic roots. In this study, we separately assayed 15 phenotypes in a panel of 176 diverse accessions predominantly comprised of local landraces from Bangladesh. We compiled high resolution sequence data for these accessions. We collectively studied the ties between the observed phenotypic differences and the examined additive genetic effects underlying these variations. We applied a sophisticated fixed effect model to associate phenotypes with genotypes on a genomic scale. Discovered QTLs were mapped to known genes. Candidate genes were sorted by tissue specific gene expression profiles and protein level consequence of existing polymorphisms. Our explorations yielded 17 QTLs related to various traits in multiple trait classes. 12 identified QTLs were equivalent to findings from previous studies. Integrative analysis assumes novel functionality for 21 candidate genes on multiple evidence levels. These findings will usher novel avenues for the bioengineering of high yielding crops of the future fortified with genetic defenses against abiotic stressors.

Download Full-text

Identification of candidate genes underlying nodulation-specific phenotypes in Medicago truncotula through integration of genome-wide association studies and co-expression networks

10.1101/392779 ◽

2018 ◽

Author(s):

Jean-Michel Michno ◽

Liana T. Burghardt ◽

Junqi Liu ◽

Joseph R. Jeffers ◽

Peter Tiffin ◽

...

Keyword(s):

Candidate Genes ◽

Medicago Truncatula ◽

Association Studies ◽

Expression Patterns ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

High Confidence ◽

Functional Relationships ◽

Genome Wide ◽

Mapping Information

ABSTRACTGenome-wide association studies (GWAS) have proven to be a valuable approach for identifying genetic intervals associated with phenotypic variation in Medicago truncatula. These intervals can vary in size, depending on the historical local recombination near each significant interval. Typically, significant intervals span numerous gene models, limiting the ability to resolve high-confidence candidate genes underlying the trait of interest. Additional genomic data, including gene co-expression networks, can be combined with the genetic mapping information to successfully identify candidate genes. Co-expression network analysis provides information about the functional relationships of each gene through its similarity of expression patterns to other well-defined clusters of genes. In this study, we integrated data from GWAS and co-expression networks to pinpoint candidate genes that may be associated with nodule-related phenotypes in Medicago truncatula. We further investigated a subset of these genes and confirmed that several had existing evidence linking them nodulation, including MEDTR2G101090 (PEN3-like), a previously validated gene associated with nodule number.

Download Full-text

Integrative analysis of genome-wide association studies and gene expression profiles identified candidate genes for osteoporosis in Kashin-Beck disease patients

Osteoporosis International ◽

10.1007/s00198-015-3364-y ◽

2015 ◽

Vol 27 (3) ◽

pp. 1041-1046 ◽

Cited By ~ 7

Author(s):

Y. Wen ◽

X. Guo ◽

J. Hao ◽

X. Xiao ◽

W. Wang ◽

...

Keyword(s):

Gene Expression ◽

Candidate Genes ◽

Expression Profiles ◽

Association Studies ◽

Gene Expression Profiles ◽

Integrative Analysis ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Beck Disease

Download Full-text

deTS: tissue-specific enrichment analysis to decode tissue specificity

Bioinformatics ◽

10.1093/bioinformatics/btz138 ◽

2019 ◽

Vol 35 (19) ◽

pp. 3842-3845 ◽

Cited By ~ 8

Author(s):

Guangsheng Pei ◽

Yulin Dai ◽

Zhongming Zhao ◽

Peilin Jia

Keyword(s):

Expression Profiles ◽

Association Studies ◽

Gene Expression Profiles ◽

Enrichment Analysis ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Tissue Specific ◽

Genome Wide ◽

Specific Regulation

Abstract Motivation Diseases and traits are under dynamic tissue-specific regulation. However, heterogeneous tissues are often collected in biomedical studies, which reduce the power in the identification of disease-associated variants and gene expression profiles. Results We present deTS, an R package, to conduct tissue-specific enrichment analysis with two built-in reference panels. Statistical methods are developed and implemented for detecting tissue-specific genes and for enrichment test of different forms of query data. Our applications using multi-trait genome-wide association studies data and cancer expression data showed that deTS could effectively identify the most relevant tissues for each query trait or sample, providing insights for future studies. Availability and implementation https://github.com/bsml320/deTS and CRAN https://cran.r-project.org/web/packages/deTS/ Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Interaction screening by Kendall’s partial correlation for ultrahigh-dimensional data with survival trait

Bioinformatics ◽

10.1093/bioinformatics/btaa017 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2763-2769

Author(s):

Jie-Huei Wang ◽

Yi-Hau Chen

Keyword(s):

Partial Correlation ◽

Association Studies ◽

B Cell Lymphoma ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Interaction Screening ◽

Relationship Of

Abstract Motivation In gene expression and genome-wide association studies, the identification of interaction effects is an important and challenging issue owing to its ultrahigh-dimensional nature. In particular, contaminated data and right-censored survival outcome make the associated feature screening even challenging. Results In this article, we propose an inverse probability-of-censoring weighted Kendall’s tau statistic to measure association of a survival trait with biomarkers, as well as a Kendall’s partial correlation statistic to measure the relationship of a survival trait with an interaction variable conditional on the main effects. The Kendall’s partial correlation is then used to conduct interaction screening. Simulation studies under various scenarios are performed to compare the performance of our proposal with some commonly available methods. In the real data application, we utilize our proposed method to identify epistasis associated with the clinical survival outcomes of non-small-cell lung cancer, diffuse large B-cell lymphoma and lung adenocarcinoma patients. Both simulation and real data studies demonstrate that our method performs well and outperforms existing methods in identifying main and interaction biomarkers. Availability and implementation R-package ‘IPCWK’ is available to implement this method, together with a reference manual describing how to perform the ‘IPCWK’ package. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrating Genome-Wide Association Analysis With Transcriptome Sequencing to Identify Candidate Genes Related to Blooming Time in Prunus mume

Frontiers in Plant Science ◽

10.3389/fpls.2021.690841 ◽

2021 ◽

Vol 12 ◽

Author(s):

Man Zhang ◽

Qingqing Yang ◽

Xi Yuan ◽

Xiaolan Yan ◽

Jia Wang ◽

...

Keyword(s):

Flowering Time ◽

Candidate Genes ◽

Expression Profiles ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Prunus Mume ◽

Woody Perennials ◽

Genome Wide ◽

Transcriptional Pattern

Prunus mume is one of the most important woody perennials for edible and ornamental use. Despite a substantial variation in the flowering phenology among the P. mume germplasm resources, the genetic control for flowering time remains to be elucidated. In this study, we examined five blooming time-related traits of 235 P. mume landraces for 2 years. Based on the phenotypic data, we performed genome-wide association studies, which included a combination of marker- and gene-based association tests, and identified 1,445 candidate genes that are consistently linked with flowering time across multiple years. Furthermore, we assessed the global transcriptome change of floral buds from the two P. mume cultivars exhibiting contrasting bloom dates and detected 617 associated genes that were differentially expressed during the flowering process. By integrating a co-expression network analysis, we screened out 191 gene candidates of conserved transcriptional pattern during blooming across cultivars. Finally, we validated the temporal expression profiles of these candidates and highlighted their putative roles in regulating floral bud break and blooming time in P. mume. Our findings are important to expand the understanding of flowering time control in woody perennials and will boost the molecular breeding of novel varieties in P. mume.

Download Full-text

HiGwas: how to compute longitudinal GWAS data in population designs

Bioinformatics ◽

10.1093/bioinformatics/btaa294 ◽

2020 ◽

Vol 36 (14) ◽

pp. 4222-4224

Author(s):

Zhong Wang ◽

Nating Wang ◽

Zilu Wang ◽

Libo Jiang ◽

Yaqun Wang ◽

...

Keyword(s):

Data Analysis ◽

Complex Traits ◽

Association Studies ◽

Computer Software ◽

Real Data ◽

R Package ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Significance Level

Abstract Summary Genome-wide association studies (GWAS), particularly designed with thousands and thousands of single-nucleotide polymorphisms (SNPs) (big p) genotyped on tens of thousands of subjects (small n), are encountered by a major challenge of p ≪ n. Although the integration of longitudinal information can significantly enhance a GWAS’s power to comprehend the genetic architecture of complex traits and diseases, an additional challenge is generated by an autocorrelative process. We have developed several statistical models for addressing these two challenges by implementing dimension reduction methods and longitudinal data analysis. To make these models computationally accessible to applied geneticists, we wrote an R package of computer software, HiGwas, designed to analyze longitudinal GWAS datasets. Functions in the package encompass single SNP analyses, significance-level adjustment, preconditioning and model selection for a high-dimensional set of SNPs. HiGwas provides the estimates of genetic parameters and the confidence intervals of these estimates. We demonstrate the features of HiGwas through real data analysis and vignette document in the package. Availability and implementation https://github.com/wzhy2000/higwas. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Meta-QTLs, ortho-MQTLs and candidate genes for thermotolerance in wheat (Triticum aestivum L.)

10.21203/rs.3.rs-946920/v1 ◽

2021 ◽

Author(s):

Sourabh Kumar ◽

Vivudh Pratap Singh ◽

Dinesh Kumar Saini ◽

Hemant Sharma ◽

Gautam Saripalli ◽

...

Keyword(s):

Candidate Genes ◽

Association Studies ◽

Triticum Aestivum L ◽

Genome Wide Association Studies ◽

High Confidence ◽

Genome Wide ◽

Fold Reduction ◽

The Mean ◽

Golgi Protein ◽

Gdsl Lipase

Abstract Meta-QTL analysis for thermotolerance in wheat was conducted to identify robust meta-QTLs (MQTLs). In this study, 441 QTLs related to 31 heat-responsive traits were projected on the consensus map saturated with 50,310 markers. This exercise resulted in the identification of 85 MQTLs with confidence interval (CI) ranging from 0.11 to 34.9 cM with an average of 5.6 cM. This amounted to a 2.96 fold reduction relative to the mean CI (16.5 cM) of the QTLs used. Seventy-seven (77) of these MQTLs were also verified with the results of recent genome-wide association studies (GWAS). These MQTLs included seven MQTLs that are particularly useful for breeding purposes (we called them Breeders’ MQTLs). Seven ortho-MQTLs between wheat and rice genomes were also identified using synteny and collinearity. The MQTLs were used for the identification of 1,704 candidate genes (CGs). In silico expression analysis of these CGs permitted identification of 182 differentially expressed genes (DEGs), which included 36 high-confidence candidate genes (CGs) with known functions previously reported to be important for thermotolerance. These high confidence CGs encoded proteins belonging to the following families: protein kinase, WD40 repeat, glycosyltransferase, ribosomal protein, SNARE associated Golgi protein, GDSL lipase/esterase, SANT/Myb domain, K homology domain, etc. Thus, the present study resulted in the identification of MQTLs (including breeders’ MQTLs), ortho-MQTLs, and underlying CGs, which could prove useful not only for molecular breeding for the development of thermotolerant wheat cultivars but also for future studies focused on understanding the molecular basis of thermotolerance.

Download Full-text

Genetic Factors Associated With Nodulation and Nitrogen Derived From Atmosphere in a Middle American Common Bean Panel

Frontiers in Plant Science ◽

10.3389/fpls.2020.576078 ◽

2020 ◽

Vol 11 ◽

Author(s):

Atena Oladzad ◽

Abiezer González ◽

Raul Macchiavelli ◽

Consuelo Estevez de Jensen ◽

James Beaver ◽

...

Keyword(s):

Common Bean ◽

Candidate Genes ◽

Genetic Factors ◽

Association Studies ◽

Small Gtpase ◽

Grain Legume ◽

Genome Wide Association Studies ◽

Cellular Polarity ◽

Data Set ◽

Factors Associated

Among grain legume crops, common beans (Phaseolus vulgaris L.) are considered to have poor biological nitrogen (N2) fixation (BNF) capabilities although variation in N2 fixing capabilities exists within the species. The availability of genetic panel varying in BNF capacity and a large-scale single nucleotide polymorphism (SNP) data set for common bean provided an opportunity to discover genetic factors associated with N2 fixation among genotypes in the Middle American gene pool. Using nodulation and percentage of N2-derived from atmosphere (%NDFA) data collected from field trials, at least 11 genotypes with higher levels of BNF capacity were identified. Genome-wide association studies (GWASs) detected both major and minor effects that control these traits. A major nodulation interval at Pv06:28.0–28.27 Mbp was discovered. In this interval, the peak SNP was located within a small GTPase that positively regulates cellular polarity and growth of root hair tips. Located 20 kb upstream of this peak SNP is an auxin-responsive factor AUX/indole acetic auxin (IAA)-related gene involved in auxin transportation during root nodulation. For %NDFA, nitrate (NO3−) transporters, NRT1:2 and NRT1.7 (Pv02:8.64), squamosa promoter binding transcriptome factor (Pv08:28.42), and multi-antimicrobial extrusion protein (MATE) efflux family protein (Pv06:10.91) were identified as candidate genes. Three additional QTLs were identified on chromosomes Pv03:5.24, Pv09:25.89, and Pv11: 32.89 Mbp. These key candidate genes from both traits were integrated with previous results on N2 fixation to describe a BNF pathway.

Download Full-text

Identifying functional genes and pathways towards a unifying model for atrial fibrillation

10.1101/2021.09.20.21263861 ◽

2021 ◽

Author(s):

Sojin Youn Wass ◽

Erik J. Offerman ◽

Han Sun ◽

Jeffrey Hsu ◽

Julie H. Rennison ◽

...

Keyword(s):

Atrial Fibrillation ◽

Candidate Gene ◽

Candidate Genes ◽

Cell Injury ◽

Risk Model ◽

Association Studies ◽

Gene Set Enrichment Analysis ◽

Genome Wide Association Studies ◽

Regulatory Pathways ◽

Gene Coexpression

ABSTRACTRationaleGenome wide association studies (GWAS) have associated >100 genetic loci with atrial fibrillation (AF), yet the biological pathways of AF remain elusive.ObjectiveTo determine candidate causal genes associated with AF risk loci and their coexpression partners, modules, biologic and mechanistic pathways.Methods and ResultsCis-expression quantitative trait loci (eQTLs) were identified for candidate genes near AF risk single nucleotide polymorphisms (SNPs) in human left atrial tissues. Genes were categorized into 3 sets according to likelihood of being a causative AF gene: 1) All Candidate Genes (with significant eQTLs or previously prioritized); 2) Any eQTL Genes (with ≥1 significant eQTL); and 3) Top GWAS SNP eQTL Genes (top SNP within the top 10 eQTL SNPs). Coexpression partners were identified for each candidate gene. Weighted gene coexpression network analysis (WGCNA) identified modules and modules with overrepresentation of candidate AF genes. Ingenuity Pathway Analysis (IPA) was applied to the coexpression partners of each candidate gene, and IPA and gene set enrichment analysis (GSEA) to each WGCNA module. 166 AF-risk SNPs were located in 135 distinct loci. The All Candidate Genes group contained 233, the Any eQTL Genes group 131 (83 novel), and the Top GWAS SNP eQTL Genes group 37 genes. IPA identified mitochondrial dysfunction, oxidative stress, epithelial adherens junction signaling, and sirtuin signaling as the most frequent pathways. WGCNA characterized 64 modules; candidate AF genes were overrepresented in 8. Modules were represented by cell injury, death, stress, developmental, metabolic/mitochondrial, transcription/translation, and immune activation/inflammation regulatory pathways.ConclusionsAF candidate gene coexpression analyses suggest significant roles for cellular stress and remodeling in AF. We propose a dual risk model for AF: Genetic susceptibility to AF may not manifest until later in life, when cellular stressors overwhelm adaptive responses. These analyses provide a resource for further functional studies on potential causal AF genes.

Download Full-text