CoMM: A Collaborative Mixed Model That Integrates GWAS and eQTL Data Sets to Investigate the Genetic Architecture of Complex Traits

Genome-wide association study (GWAS) analyses have identified thousands of associations between genetic variants and complex traits. However, it is still a challenge to uncover the mechanisms underlying the association. With the growing availability of transcriptome data sets, it has become possible to perform statistical analyses targeted at identifying influential genes whose expression levels correlate with the phenotype. Methods such as PrediXcan and transcriptome-wide association study (TWAS) use the transcriptome data set to fit a predictive model for gene expression, with genetic variants as covariates. The gene expression levels for the GWAS data set are then ‘imputed’ using the prediction model, and the imputed expression levels are tested for their association with the phenotype. These methods fail to account for the uncertainty in the GWAS imputation step, and we propose a collaborative mixed model (CoMM) that addresses this limitation by jointly modelling the multiple analysis steps. We illustrate CoMM’s ability to identify relevant genes in the Northern Finland Birth Cohort 1966 data set and extend the model to handle the more widely available GWAS summary statistics.

Download Full-text

Genome-wide association study identifies 48 common genetic variants associated with handedness

10.1101/831321 ◽

2019 ◽

Author(s):

Gabriel Cuellar Partida ◽

Joyce Y Tung ◽

Nicholas Eriksson ◽

Eva Albrecht ◽

Fazil Aliev ◽

...

Keyword(s):

Association Study ◽

Genetic Variants ◽

Complex Traits ◽

Genome Wide Association Study ◽

Genetic Correlations ◽

Genome Wide Association ◽

Left Handedness ◽

Left Handed ◽

Genome Wide ◽

Common Genetic Variants

AbstractHandedness, a consistent asymmetry in skill or use of the hands, has been studied extensively because of its relationship with language and the over-representation of left-handers in some neurodevelopmental disorders. Using data from the UK Biobank, 23andMe and 32 studies from the International Handedness Consortium, we conducted the world’s largest genome-wide association study of handedness (1,534,836 right-handed, 194,198 (11.0%) left-handed and 37,637 (2.1%) ambidextrous individuals). We found 41 genetic loci associated with left-handedness and seven associated with ambidexterity at genome-wide levels of significance (P < 5×10−8). Tissue enrichment analysis implicated the central nervous system and brain tissues including the hippocampus and cerebrum in the etiology of left-handedness. Pathways including regulation of microtubules, neurogenesis, axonogenesis and hippocampus morphology were also highlighted. We found suggestive positive genetic correlations between being left-handed and some neuropsychiatric traits including schizophrenia and bipolar disorder. SNP heritability analyses indicated that additive genetic effects of genotyped variants explained 5.9% (95% CI = 5.8% – 6.0%) of the underlying liability of being left-handed, while the narrow sense heritability was estimated at 12% (95% CI = 7.2% – 17.7%). Further, we show that genetic correlation between left-handedness and ambidexterity is low (rg = 0.26; 95% CI = 0.08 – 0.43) implying that these traits are largely influenced by different genetic mechanisms. In conclusion, our findings suggest that handedness, like many other complex traits is highly polygenic, and that the genetic variants that predispose to left-handedness may underlie part of the association with some psychiatric disorders that has been observed in multiple observational studies.

Download Full-text

P1-229: Genome-Wide Association Study of Brain Gene Expression Levels (eGWAS)

Alzheimer s & Dementia ◽

10.1016/j.jalz.2011.05.509 ◽

2011 ◽

Vol 7 ◽

pp. S184-S184

Author(s):

Nilufer Ertekin-Taner ◽

Fanggeng Zou ◽

High Chai ◽

Curtis Younkin ◽

Julia Crook ◽

...

Keyword(s):

Gene Expression ◽

Association Study ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Expression Levels ◽

Brain Gene Expression ◽

Genome Wide ◽

Gene Expression Levels

Download Full-text

Improved Prediction of Bacterial Genotype-Phenotype Associations Using Interpretable Pangenome-Spanning Regressions

mBio ◽

10.1128/mbio.01344-20 ◽

2020 ◽

Vol 11 (4) ◽

Cited By ~ 3

Author(s):

John A. Lees ◽

T. Tien Mai ◽

Marco Galardini ◽

Nicole E. Wheeler ◽

Samuel T. Horsfield ◽

...

Keyword(s):

Antibiotic Resistance ◽

Association Study ◽

Genetic Variants ◽

Genome Wide Association Study ◽

Joint Modeling ◽

Genome Wide Association ◽

Data Sets ◽

Modeling Framework ◽

Phenotype Prediction ◽

Genome Wide

ABSTRACT Discovery of genetic variants underlying bacterial phenotypes and the prediction of phenotypes such as antibiotic resistance are fundamental tasks in bacterial genomics. Genome-wide association study (GWAS) methods have been applied to study these relations, but the plastic nature of bacterial genomes and the clonal structure of bacterial populations creates challenges. We introduce an alignment-free method which finds sets of loci associated with bacterial phenotypes, quantifies the total effect of genetics on the phenotype, and allows accurate phenotype prediction, all within a single computationally scalable joint modeling framework. Genetic variants covering the entire pangenome are compactly represented by extended DNA sequence words known as unitigs, and model fitting is achieved using elastic net penalization, an extension of standard multiple regression. Using an extensive set of state-of-the-art bacterial population genomic data sets, we demonstrate that our approach performs accurate phenotype prediction, comparable to popular machine learning methods, while retaining both interpretability and computational efficiency. Compared to those of previous approaches, which test each genotype-phenotype association separately for each variant and apply a significance threshold, the variants selected by our joint modeling approach overlap substantially. IMPORTANCE Being able to identify the genetic variants responsible for specific bacterial phenotypes has been the goal of bacterial genetics since its inception and is fundamental to our current level of understanding of bacteria. This identification has been based primarily on painstaking experimentation, but the availability of large data sets of whole genomes with associated phenotype metadata promises to revolutionize this approach, not least for important clinical phenotypes that are not amenable to laboratory analysis. These models of phenotype-genotype association can in the future be used for rapid prediction of clinically important phenotypes such as antibiotic resistance and virulence by rapid-turnaround or point-of-care tests. However, despite much effort being put into adapting genome-wide association study (GWAS) approaches to cope with bacterium-specific problems, such as strong population structure and horizontal gene exchange, current approaches are not yet optimal. We describe a method that advances methodology for both association and generation of portable prediction models.

Download Full-text

Genome-wide association study identifies QTL for eight fruit traits in cultivated tomato (Solanum lycopersicum L.)

Horticulture Research ◽

10.1038/s41438-021-00638-4 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Minkyung Kim ◽

Thuy Tien Phan Nguyen ◽

Joon-Hyung Ahn ◽

Gi-Jun Kim ◽

Sung-Chur Sim

Keyword(s):

Association Study ◽

Mixed Model ◽

Genome Wide Association Study ◽

Tomato Fruit ◽

Fruit Weight ◽

Genome Wide Association ◽

Data Sets ◽

Fruit Traits ◽

Crop Species ◽

Genome Wide

AbstractGenome-wide association study (GWAS) is effective in identifying favorable alleles for traits of interest with high mapping resolution in crop species. In this study, we conducted GWAS to explore quantitative trait loci (QTL) for eight fruit traits using 162 tomato accessions with diverse genetic backgrounds. The eight traits included fruit weight, fruit width, fruit height, fruit shape index, pericarp thickness, locule number, fruit firmness, and brix. Phenotypic variations of these traits in the tomato collection were evaluated with three replicates in field trials over three years. We filtered 34,550 confident SNPs from the 51 K Axiom® tomato array based on < 10% of missing data and > 5% of minor allele frequency for association analysis. The 162 tomato accessions were divided into seven clusters and their membership coefficients were used to account for population structure along with a kinship matrix. To identify marker-trait associations (MTAs), four phenotypic data sets representing each of three years and combined were independently analyzed in the multilocus mixed model (MLMM). A total of 30 significant MTAs was detected over data sets for eight fruit traits at P < 0.0005. The number of MTA per trait ranged from one (brix) to seven (fruit weight and fruit width). Two SNP markers on chromosomes 1 and 2 were significantly associated with multiple traits, suggesting pleiotropic effects of QTL. Furthermore, 16 of 30 MTAs suggest potential novel QTL for eight fruit traits. These results facilitate genetic dissection of tomato fruit traits and provide a useful resource to develop molecular tools for improving fruit traits via marker-assisted selection and genomic selection in tomato breeding programs.

Download Full-text

CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies

10.1101/652263 ◽

2019 ◽

Cited By ~ 2

Author(s):

Yi Yang ◽

Xingjie Shi ◽

Yuling Jiao ◽

Jian Huang ◽

Min Chen ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Gwas Data ◽

Supplementary Information ◽

Summary Statistics ◽

Individual Level ◽

The Relationship

AbstractMotivationAlthough genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) [42] was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required.ResultsIn this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS [email protected] and implementationThe implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Prediction of Missing Values in Microarray and Use of Mixed Models to Evaluate the Predictors

Statistical Applications in Genetics and Molecular Biology ◽

10.2202/1544-6115.1120 ◽

2005 ◽

Vol 4 (1) ◽

Cited By ~ 5

Author(s):

Guri Feten ◽

Trygve Almøy ◽

Are H. Aastveit

Keyword(s):

Gene Expression ◽

Missing Values ◽

Mixed Model ◽

Linear Mixed Model ◽

Correlation Structure ◽

Data Sets ◽

Data Set ◽

Nearest Neighbours ◽

Complete Matrix ◽

The Ideal

Gene expression microarray experiments generate data sets with multiple missing expression values. In some cases, analysis of gene expression requires a complete matrix as input. Either genes with missing values can be removed, or the missing values can be replaced using prediction. We propose six imputation methods. A comparative study of the methods was performed on data from mice and data from the bacterium Enterococcus faecalis, and a linear mixed model was used to test for differences between the methods. The study showed that different methods' capability to predict is dependent on the data, hence the ideal choice of method and number of components are different for each data set. For data with correlation structure methods based on K-nearest neighbours seemed to be best, while for data without correlation structure using the average of the gene was to be preferred.

Download Full-text

CoMM-S2: a collaborative mixed model using summary statistics in transcriptome-wide association studies

Bioinformatics ◽

10.1093/bioinformatics/btz880 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2009-2016 ◽

Cited By ~ 6

Author(s):

Yi Yang ◽

Xingjie Shi ◽

Yuling Jiao ◽

Jian Huang ◽

Min Chen ◽

...

Keyword(s):

Gene Expression ◽

Genetic Variants ◽

Complex Traits ◽

Mixed Model ◽

Association Studies ◽

Gwas Data ◽

Supplementary Information ◽

Summary Statistics ◽

Individual Level ◽

The Relationship

Abstract Motivation Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. Results In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. Availability and implementation The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Integrating genome-wide association study data with gene expression to understand complex traits and common diseases

10.14264/uql.2018.495 ◽

2018 ◽

Author(s):

Jennifer Marguerite Pavlides

Keyword(s):

Gene Expression ◽

Association Study ◽

Complex Traits ◽

Genome Wide Association Study ◽

Study Data ◽

Genome Wide Association ◽

Common Diseases ◽

Genome Wide

Download Full-text

Underestimation of heritability using a mixed model with a polygenic covariance structure in a genome-wide association study for complex traits

European Journal of Human Genetics ◽

10.1038/ejhg.2013.236 ◽

2013 ◽

Vol 22 (6) ◽

pp. 851-854 ◽

Cited By ~ 11

Author(s):

Hyunju Ryoo ◽

Chaeyoung Lee

Keyword(s):

Association Study ◽

Complex Traits ◽

Mixed Model ◽

Genome Wide Association Study ◽

Covariance Structure ◽

Genome Wide Association ◽

Genome Wide ◽

A Genome

Download Full-text

A statistical framework for cross-tissue transcriptome-wide association analysis

10.1101/286013 ◽

2018 ◽

Cited By ~ 4

Author(s):

Yiming Hu ◽

Mo Li ◽

Qiongshi Lu ◽

Haoyi Weng ◽

Jiawei Wang ◽

...

Keyword(s):

Gene Expression ◽

Association Analysis ◽

Complex Traits ◽

Genome Wide Association Study ◽

Late Onset ◽

Imputation Accuracy ◽

Genome Wide Association ◽

Expression Levels ◽

Genome Wide ◽

Specific Tissue

AbstractTranscriptome-wide association analysis is a powerful approach to studying the genetic architecture of complex traits. A key component of this approach is to build a model to predict (impute) gene expression levels from genotypes from samples with matched genotypes and expression levels in a specific tissue. However, it is challenging to develop robust and accurate imputation models with limited sample sizes for any single tissue. Here, we first introduce a multi-task learning approach to jointly impute gene expression in 44 human tissues. Compared with single-tissue methods, our approach achieved an average 39% improvement in imputation accuracy and generated effective imputation models for an average 120% (range 13%-339%) more genes in each tissue. We then describe a summary statistic-based testing framework that combines multiple single-tissue associations into a single powerful metric to quantify overall gene-trait association at the organism level. When our method, called UTMOST, was applied to analyze genome wide association results for 50 complex traits (Ntotal=4.5 million), we were able to identify considerably more genes in tissues enriched for trait heritability, and cross-tissue analysis significantly outperformed single-tissue strategies (p=1.7e-8). Finally, we performed a cross-tissue genome-wide association study for late-onset Alzheimer’s disease (LOAD) and replicated our findings in two independent datasets (Ntotal=175,776). In total, we identified 69 significant genes, many of which are novel, leading to novel insights on LOAD etiologies.

Download Full-text