Estimation of Non-null SNP Effect Size Distributions Enables the Detection of Enriched Genes Underlying Complex Traits

AbstractTraditional univariate genome-wide association studies generate false positives and negatives due to difficulties distinguishing associated variants from variants with spurious nonzero effects that do not directly influence the trait. Recent efforts have been directed at identifying genes or signaling pathways enriched for mutations in quantitative traits or case-control studies, but these can be computationally costly and hampered by strict model assumptions. Here, we present gene-ε, a new approach for identifying statistical associations between sets of variants and quantitative traits. Our key insight is that enrichment studies on the gene-level are improved when we reformulate the genome-wide SNP-level null hypothesis to identify spurious small-to-intermediate SNP effects and classify them as non-causal. gene-ε efficiently identifies enriched genes under a variety of simulated genetic architectures, achieving greater than a 90% true positive rate at 1% false positive rate for polygenic traits. Lastly, we apply gene-ε to summary statistics derived from six quantitative traits using European-ancestry individuals in the UK Biobank, and identify enriched genes that are in biologically relevant pathways.Author SummaryEnrichment tests augment the standard univariate genome-wide association (GWA) framework by identifying groups of biologically interacting mutations that are enriched for associations with a trait of interest, beyond what is expected by chance. These analyses model local linkage disequilibrium (LD), allow many different mutations to be disease-causing across patients, and generate biologically interpretable hypotheses for disease mechanisms. However, existing enrichment analyses are hampered by high computational costs, and rely on GWA summary statistics despite the high false positive rate of the standard univariate GWA framework. Here, we present the gene-level association framework gene-ε (pronounced “genie”), an empirical Bayesian approach for identifying statistical associations between sets of mutations and quantitative traits. The central innovation of gene-ε is reformulating the GWA null model to distinguish between (i) mutations that are statistically associated with the disease but are unlikely to directly influence it, and (ii) mutations that are most strongly associated with a disease of interest. We find that, with our reformulated SNP-level null hypothesis, our gene-level enrichment model outperforms existing enrichment methods in simulation studies and scales well for application to emerging biobank datasets. We apply gene-ε to six quantitative traits in the UK Biobank and recover novel and functionally validated gene-level associations.

Download Full-text

Reproducibility in the UK Biobank of Genome-Wide Significant Signals Discovered in Earlier Genome-wide Association Studies

10.1101/2020.06.24.20139576 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jack W. O’Sullivan ◽

John P. A. Ioannidis

Keyword(s):

Effect Size ◽

Association Studies ◽

Genome Wide Association ◽

P Value ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Single Nucleotide ◽

Genome Wide ◽

The Uk ◽

Open Question

AbstractWith the establishment of large biobanks, discovery of single nucleotide polymorphism (SNPs) that are associated with various phenotypes has been accelerated. An open question is whether SNPs identified with genome-wide significance in earlier genome-wide association studies (GWAS) are replicated also in later GWAS conducted in biobanks. To address this question, the authors examined a publicly available GWAS database and identified two, independent GWAS on the same phenotype (an earlier, “discovery” GWAS and a later, replication GWAS done in the UK biobank). The analysis evaluated 136,318,924 SNPs (of which 6,289 had reached p<5e-8 in the discovery GWAS) from 4,397,962 participants across nine phenotypes. The overall replication rate was 85.0% and it was lower for binary than for quantitative phenotypes (58.1% versus 94.8% respectively). There was a18.0% decrease in SNP effect size for binary phenotypes, but a 12.0% increase for quantitative phenotypes. Using the discovery SNP effect size, phenotype trait (binary or quantitative), and discovery p-value, we built and validated a model that predicted SNP replication with area under the Receiver Operator Curve = 0.90. While non-replication may often reflect lack of power rather than genuine false-positive findings, these results provide insights about which discovered associations are likely to be seen again across subsequent GWAS.

Download Full-text

Genome-Wide Association Meta-Analysis Supports Genes Involved in Valve and Cardiac Development to Associate With Mitral Valve Prolapse

Circulation Genomic and Precision Medicine ◽

10.1161/circgen.120.003148 ◽

2021 ◽

Author(s):

Mengyao Yu ◽

Sergiy Kyryachenko ◽

Stephanie Debette ◽

Philippe Amouyel ◽

Jean-Jacques Schott ◽

...

Keyword(s):

Mitral Valve ◽

Association Study ◽

Genome Wide Association Study ◽

Cardiac Development ◽

Meta Analysis ◽

Genome Wide Association ◽

Uk Biobank ◽

Additional Risk ◽

Genome Wide ◽

The Uk

Background: Mitral valve prolapse (MVP) is a common cardiac valve disease, which affects 1 in 40 in the general population. Previous genome-wide association study have identified 6 risk loci for MVP. But these loci explained only partially the genetic risk for MVP. We aim to identify additional risk loci for MVP by adding data set from the UK Biobank. Methods: We reanalyzed 1007/479 cases from the MVP-France study, 1469/862 controls from the MVP-Nantes study for reimputation genotypes using HRC and TOPMed panels. We also incorporated 434 MVP cases and 4527 controls from the UK Biobank for discovery analyses. Genetic association was conducted using SNPTEST and meta-analyses using METAL. We used FUMA for post-genome-wide association study annotations and MAGMA for gene-based and gene-set analyses. Results: We found TOPMed imputation to perform better in terms of accuracy in the lower ranges of minor allele frequency below 0.1. Our updated meta-analysis included UK Biobank study for ≈8 million common single-nucleotide polymorphisms (minor allele frequency >0.01) and replicated the association on Chr2 as the top association signal near TNS1 . We identified an additional risk locus on Chr1 ( SYT2 ) and 2 suggestive risk loci on chr8 ( MSRA ) and chr19 ( FBXO46 ), all driven by common variants. Gene-based association using MAGMA revealed 6 risk genes for MVP with pronounced expression levels in cardiovascular tissues, especially the heart and globally part of enriched GO terms related to cardiac development. Conclusions: We report an updated meta-analysis genome-wide association study for MVP using dense imputation coverage and an improved case-control sample. We describe several loci and genes with MVP spanning biological mechanisms highly relevant to MVP, especially during valve and heart development.

Download Full-text

P15 Cervical intraepithelial neoplasia and cervical cancer: a genome wide association study (GWAS) of the UK biobank cohort

10.1136/ijgc-2019-esgo.78 ◽

2019 ◽

Author(s):

S Bowden ◽

I Kalliala ◽

M Wielscher ◽

B Bodinier ◽

J Flanagan ◽

...

Keyword(s):

Cervical Cancer ◽

Association Study ◽

Cervical Intraepithelial Neoplasia ◽

Genome Wide Association Study ◽

Intraepithelial Neoplasia ◽

Genome Wide Association ◽

Uk Biobank ◽

Genome Wide ◽

A Genome ◽

The Uk

Download Full-text

0014 Genome-wide Association Analysis Of Excessive Daytime Sleepiness In The Uk Biobank Identifies 42 Novel Loci

SLEEP ◽

10.1093/sleep/zsy061.013 ◽

2018 ◽

Vol 41 (suppl_1) ◽

pp. A6-A6

Author(s):

H Wang ◽

J M Lane ◽

H S Dashti ◽

S Jones ◽

B E Cade ◽

...

Keyword(s):

Association Analysis ◽

Excessive Daytime Sleepiness ◽

Daytime Sleepiness ◽

Genome Wide Association ◽

Uk Biobank ◽

Genome Wide Association Analysis ◽

Genome Wide ◽

The Uk

Download Full-text

A fast mrMLM algorithm for multi-locus genome-wide association studies

10.1101/341784 ◽

2018 ◽

Cited By ~ 23

Author(s):

Cox Lwaka Tamba ◽

Yuan-Ming Zhang

Keyword(s):

False Positive ◽

Statistical Power ◽

Association Studies ◽

False Positive Rate ◽

Real Data ◽

High Accuracy ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

AbstractBackgroundRecent developments in technology result in the generation of big data. In genome-wide association studies (GWAS), we can get tens of million SNPs that need to be tested for association with a trait of interest. Indeed, this poses a great computational challenge. There is a need for developing fast algorithms in GWAS methodologies. These algorithms must ensure high power in QTN detection, high accuracy in QTN estimation and low false positive rate.ResultsHere, we accelerated mrMLM algorithm by using GEMMA idea, matrix transformations and identities. The target functions and derivatives in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. All potentially associated QTNs with P-values ≤ 0.01 are evaluated in a multi-locus model by LARS algorithm and/or EM-Empirical Bayes. We call the algorithm FASTmrMLM. Numerical simulation studies and real data analysis validated the FASTmrMLM. FASTmrMLM reduces the running time in mrMLM by more than 50%. FASTmrMLM also shows high statistical power in QTN detection, high accuracy in QTN estimation and low false positive rate as compared to GEMMA, FarmCPU and mrMLM. Real data analysis shows that FASTmrMLM was able to detect more previously reported genes than all the other methods: GEMMA/EMMA, FarmCPU and mrMLM.ConclusionsFASTmrMLM is a fast and reliable algorithm in multi-locus GWAS and ensures high statistical power, high accuracy of estimates and low false positive rate.Author SummaryThe current developments in technology result in the generation of a vast amount of data. In genome-wide association studies, we can get tens of million markers that need to be tested for association with a trait of interest. Due to the computational challenge faced, we developed a fast algorithm for genome-wide association studies. Our approach is a two stage method. In the first step, we used matrix transformations and identities to quicken the testing of each random marker effect. The target functions and derivatives which are in vector/matrix forms for each marker scanning are transformed into some simple forms that are easy and efficient to evaluate during each optimization step. In the second step, we selected all potentially associated SNPs and evaluated them in a multi-locus model. From simulation studies, our algorithm significantly reduces the computing time. The new method also shows high statistical power in detecting significant markers, high accuracy in marker effect estimation and low false positive rate. We also used the new method to identify relevant genes in real data analysis. We recommend our approach as a fast and reliable method for carrying out a multi-locus genome-wide association study.

Download Full-text

The evolution of skin pigmentation associated variation in West Eurasia

10.1101/2020.05.08.085274 ◽

2020 ◽

Author(s):

Dan Ju ◽

Iain Mathieson

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Skin Pigmentation ◽

Directional Selection ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Genome Wide ◽

Light Skin ◽

The Uk

AbstractSkin pigmentation is a classic example of a polygenic trait that has experienced directional selection in humans. Genome-wide association studies have identified well over a hundred pigmentation-associated loci, and genomic scans in present-day and ancient populations have identified selective sweeps for a small number of light pigmentation-associated alleles in Europeans. It is unclear whether selection has operated on all the genetic variation associated with skin pigmentation as opposed to just a small number of large-effect variants. Here, we address this question using ancient DNA from 1158 individuals from West Eurasia covering a period of 40,000 years combined with genome-wide association summary statistics from the UK Biobank. We find a robust signal of directional selection in ancient West Eurasians on skin pigmentation variants ascertained in the UK Biobank, but find this signal is driven mostly by a limited number of large-effect variants. Consistent with this observation, we find that a polygenic selection test in present-day populations fails to detect selection with the full set of variants; rather, only the top five show strong evidence of selection. Our data allow us to disentangle the effects of admixture and selection. Most notably, a large-effect variant at SLC24A5 was introduced to Europe by migrations of Neolithic farming populations but continued to be under selection post-admixture. This study shows that the response to selection for light skin pigmentation in West Eurasia was driven by a relatively small proportion of the variants that are associated with present-day phenotypic variation.SignificanceSome of the genes responsible for the evolution of light skin pigmentation in Europeans show signals of positive selection in present-day populations. Recently, genome-wide association studies have highlighted the highly polygenic nature of skin pigmentation. It is unclear whether selection has operated on all of these genetic variants or just a subset. By studying variation in over a thousand ancient genomes from West Eurasia covering 40,000 years we are able to study both the aggregate behavior of pigmentation-associated variants and the evolutionary history of individual variants. We find that the evolution of light skin pigmentation in Europeans was driven by frequency changes in a relatively small fraction of the genetic variants that are associated with variation in the trait today.

Download Full-text

Genome-wide association study of circulating liver enzymes reveals an expanded role for manganese transporter SLC30A10 in liver health

10.1101/2020.05.19.104570 ◽

2020 ◽

Author(s):

Lucas D. Ward ◽

Ho-Chou Tu ◽

Chelsea Quenneville ◽

Alexander O. Flynn-Carroll ◽

Margaret M. Parker ◽

...

Keyword(s):

Extrahepatic Bile Duct ◽

Association Studies ◽

Genome Wide Association ◽

Detectable Effect ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Extrahepatic Bile Duct Cancer ◽

Genome Wide ◽

Liver Health ◽

The Uk

AbstractTo better understand molecular pathways underlying liver health and disease, we performed genome-wide association studies (GWAS) on circulating levels of alanine aminotransferase (ALT) and aspartate aminotransferase (AST) across 408,300 subjects from four ethnic groups in the UK Biobank, focusing on variants associating with both enzymes. Of these variants, the strongest effect is a rare (MAF in White British = 0.12%) missense variant in the gene encoding manganese efflux transporter SLC30A10, Thr95Ile (rs188273166), associating with a 5.9% increase in ALT and a 4.2% increase in AST. Carriers have higher prevalence of all-cause liver disease (OR = 1.70; 95% CI = 1.24 to 2.34) and higher prevalence of extrahepatic bile duct cancer (OR = 23.8; 95% CI = 9.1 to 62.1) compared to non-carriers. Over 4% of the cases of extrahepatic cholangiocarcinoma in the UK Biobank carry SLC30A10 Thr95Ile. Unlike variants in SLC30A10 known to cause the recessive syndrome hypermanganesemia with dystonia-1 (HMNDYT1), the Thr95Ile variant has a detectable effect even in the heterozygous state. Also unlike HMNDYT1-causing variants, Thr95Ile results in a protein that is properly trafficked to the plasma membrane when expressed in HeLa cells. These results suggest that coding variation in SLC30A10 impacts liver health in more individuals than the small population of HMNDYT1 patients.

Download Full-text

PopCluster: an algorithm to identify genetic variants with ethnicity-dependent effects

Bioinformatics ◽

10.1093/bioinformatics/btz017 ◽

2019 ◽

Vol 35 (17) ◽

pp. 3046-3054 ◽

Cited By ~ 2

Author(s):

Anastasia Gurinovich ◽

Harold Bae ◽

John J Farrell ◽

Stacy L Andersen ◽

Stefano Monti ◽

...

Keyword(s):

Genetic Variants ◽

Association Studies ◽

False Positive Rate ◽

Principal Component ◽

True Positive Rate ◽

Genome Wide Association ◽

Supplementary Information ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Positive Rate

Abstract Motivation Over the last decade, more diverse populations have been included in genome-wide association studies. If a genetic variant has a varying effect on a phenotype in different populations, genome-wide association studies applied to a dataset as a whole may not pinpoint such differences. It is especially important to be able to identify population-specific effects of genetic variants in studies that would eventually lead to development of diagnostic tests or drug discovery. Results In this paper, we propose PopCluster: an algorithm to automatically discover subsets of individuals in which the genetic effects of a variant are statistically different. PopCluster provides a simple framework to directly analyze genotype data without prior knowledge of subjects’ ethnicities. PopCluster combines logistic regression modeling, principal component analysis, hierarchical clustering and a recursive bottom-up tree parsing procedure. The evaluation of PopCluster suggests that the algorithm has a stable low false positive rate (∼4%) and high true positive rate (>80%) in simulations with large differences in allele frequencies between cases and controls. Application of PopCluster to data from genetic studies of longevity discovers ethnicity-dependent heterogeneity in the association of rs3764814 (USP42) with the phenotype. Availability and implementation PopCluster was implemented using the R programming language, PLINK and Eigensoft software, and can be found at the following GitHub repository: https://github.com/gurinovich/PopCluster with instructions on its installation and usage. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

0055 Genome-Wide Association Analysis of Accelerometer-Derived Traits Reveals Novel Genetic Loci Associated with Rest-Activity Patterns in the UK Biobank

SLEEP ◽

10.1093/sleep/zsy061.054 ◽

2018 ◽

Vol 41 (suppl_1) ◽

pp. A22-A22

Author(s):

D R Mazzotti ◽

S E Jones ◽

V van Hees ◽

A I Pack ◽

T M Frayling ◽

...

Keyword(s):

Association Analysis ◽

Activity Patterns ◽

Genome Wide Association ◽

Uk Biobank ◽

Genetic Loci ◽

Genome Wide Association Analysis ◽

Genome Wide ◽

The Uk ◽

Rest Activity

Download Full-text

Body size and composition and site-specific cancers in UK Biobank: a Mendelian randomisation study

10.1101/2020.02.28.970459 ◽

2020 ◽

Cited By ~ 1

Author(s):

Mathew Vithayathil ◽

Paul Carter ◽

Siddhartha Kar ◽

Amy M. Mason ◽

Stephen Burgess ◽

...

Keyword(s):

Instrumental Variables ◽

Association Studies ◽

Genome Wide Association ◽

Mendelian Randomisation ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Site Specific ◽

Genome Wide ◽

Increased Risk ◽

The Uk

ABSTRACTObjectivesTo investigate the casual role of body mass index, body fat composition and height in cancer.DesignTwo stage mendelian randomisation studySettingPrevious genome wide association studies and the UK BiobankParticipantsGenetic instrumental variables for body mass index (BMI), fat mass index (FMI), fat free mass index (FFMI) and height from previous genome wide association studies and UK Biobank. Cancer outcomes from 367 586 participants of European descent from the UK Biobank.Main outcome measuresOverall cancer risk and 22 site-specific cancers risk for genetic instrumental variables for BMI, FMI, FFMI and height.ResultsGenetically predicted BMI (per 1 kg/m2) was not associated with overall cancer risk (OR 0.99; 95% confidence interval (CI) 0-98-1.00, p=0.105). Elevated BMI was associated with increased risk of stomach cancer (OR 1.15, 95% (CI) 1.05-1.26; p=0.003) and melanoma (OR 0.96, 95% CI 0.92-1.00; p=0.044). For sex-specific cancers, BMI was positively associated with uterine cancer (OR 1.08, 95% CI 1.01-1.14; p=0.015) but inversely associated with breast (OR 0.95, 95% CI 0.92-0.98; p=0.001), prostate (OR 0.95, 95% CI 0.92-0.99; p=0.007) and testicular cancer (OR 0.89, 95% CI 0.81-0.98; p=0.017). Elevated FMI (per 1 kg/m2) was associated with gastrointestinal cancer (stomach cancer OR 4.23, 95% CI 1.18-15.13, p=0.027; colorectal cancer OR 1.94, 95% CI 1.23-3.07; p=0.004). Increased height (per 1 standard deviation, approximately 6.5cm) was associated with increased risk of overall cancer (OR 1.06; 95% 1.04-1.09; p = 2.97×10-8) and most site-specific cancers with the strongest estimates for kidney, non-Hodgkin lymphoma, colorectal, lung, melanoma and breast cancer.ConclusionsThere is little evidence for BMI as a casual risk factor for cancer. BMI may have a causal role for sex-specific cancers, although with inconsistent directions of effect, and FMI for gastrointestinal malignancies. Elevated height is a risk factor for overall cancer and multiple site cancers.

Download Full-text