A summary-statistics-based approach to examine the role of serotonin transporter promoter tandem repeat polymorphism in psychiatric phenotypes

AbstractIn genetic studies of psychiatric disorders in the pre-genome-wide association study (GWAS) era, one of the most commonly studied loci is the serotonin transporter (SLC6A4) promoter polymorphism, a 43-base-pair insertion/deletion polymorphism in the promoter region (5-HTTLPR). The genetic association signals between 5-HTTLPR and psychiatric phenotypes, however, have been inconsistent across many studies. Since the polymorphism cannot be tested via available SNP arrays, we had previously proposed an efficient machine learning algorithm to predict the genotypes of 5-HTTLPR based on the genotypes of eight nearby SNPs, which requires access to individual-level genotype and phenotype data. To utilize the advantage of publicly available GWAS summary statistics obtained from studies with very large sample sizes, we develop a GWAS summary-statistics-based approach for testing the variable number of tandem repeat (VNTR) associations with various phenotypes. We first cross-verify the accuracy of the summary-statistics-based approach for 61 phenotypes in the UK Biobank. Since we observed a strong similarity between the predicted individual-level 5-HTTLPR genotype-based approach and the summary-statistics-based approach, we applied our method to the available neurobehavioral GWAS summary statistics data obtained from large-scale GWAS. We found no genome-wide significant evidence for association between 5-HTTLPR and any of the neurobehavioral traits. We did observe, however, genome-wide significant evidence for association between this locus and human adult height, BMI, and total cholesterol. Our summary-statistics-based approach provides a systematic way to examine the role of VNTRs and related types of genetic polymorphisms in disease risk and trait susceptibility of phenotypes for which large-scale GWAS summary statistics data are available.

Download Full-text

Approximate conditional phenotype analysis based on genome wide association summary statistics

Scientific Reports ◽

10.1038/s41598-021-82000-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Peitao Wu ◽

Biqi Wang ◽

Steven A. Lubitz ◽

Emelia J. Benjamin ◽

James B. Meigs ◽

...

Keyword(s):

Large Scale ◽

Genome Wide Association Study ◽

Genome Wide Association ◽

Summary Statistics ◽

Phenotypic Data ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

A Genome ◽

Phenotype Analysis

AbstractBecause single genetic variants may have pleiotropic effects, one trait can be a confounder in a genome-wide association study (GWAS) that aims to identify loci associated with another trait. A typical approach to address this issue is to perform an additional analysis adjusting for the confounder. However, obtaining conditional results can be time-consuming. We propose an approximate conditional phenotype analysis based on GWAS summary statistics, the covariance between outcome and confounder, and the variant minor allele frequency (MAF). GWAS summary statistics and MAF are taken from GWAS meta-analysis results while the traits covariance may be estimated by two strategies: (i) estimates from a subset of the phenotypic data; or (ii) estimates from published studies. We compare our two strategies with estimates using individual level data from the full GWAS sample (gold standard). A simulation study for both binary and continuous traits demonstrates that our approximate approach is accurate. We apply our method to the Framingham Heart Study (FHS) GWAS and to large-scale cardiometabolic GWAS results. We observed a high consistency of genetic effect size estimates between our method and individual level data analysis. Our approach leads to an efficient way to perform approximate conditional analysis using large-scale GWAS summary statistics.

Download Full-text

Bayesian large-scale multiple regression with summary statistics from genome-wide association studies

10.1101/042457 ◽

2016 ◽

Cited By ~ 5

Author(s):

Xiang Zhu ◽

Matthew Stephens

Keyword(s):

Multiple Regression ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Individual Level ◽

Genome Wide ◽

Level Data ◽

Wide Range

Bayesian methods for large-scale multiple regression provide attractive approaches to the analysis of genome-wide association studies (GWAS). For example, they can estimate heritability of complex traits, allowing for both polygenic and sparse models; and by incorporating external genomic data into the priors they can increase power and yield new biological insights. However, these methods require access to individual genotypes and phenotypes, which are often not easily available. Here we provide a framework for performing these analyses without individual-level data. Specifically, we introduce a “Regression with Summary Statistics” (RSS) likelihood, which relates the multiple regression coefficients to univariate regression results that are often easily available. The RSS likelihood requires estimates of correlations among covariates (SNPs), which also can be obtained from public databases. We perform Bayesian multiple regression analysis by combining the RSS likelihood with previously-proposed prior distributions, sampling posteriors by Markov chain Monte Carlo. In a wide range of simulations RSS performs similarly to analyses using the individual data, both for estimating heritability and detecting associations. We apply RSS to a GWAS of human height that contains 253,288 individuals typed at 1.06 million SNPs, for which analyses of individual-level data are practically impossible. Estimates of heritability (52%) are consistent with, but more precise, than previous results using subsets of these data. We also identify many previously-unreported loci that show evidence for association with height in our analyses. Software is available at https://github.com/stephenslab/rss.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

Bayesian copy number detection and association in large-scale studies

10.1101/2020.01.24.918672 ◽

2020 ◽

Author(s):

Stephen Cristiano ◽

David McKean ◽

Jacob Carey ◽

Paige Bracci ◽

Paul Brennan ◽

...

Keyword(s):

Copy Number ◽

Large Scale ◽

Disease Risk ◽

Copy Number Variants ◽

Regulatory Elements ◽

Cancer Case ◽

Tumor Supressor ◽

Increase Risk ◽

Control Study

AbstractGermline copy number variants (CNVs) increase risk for many diseases, yet detection of CNVs and quantifying their contribution to disease risk in large-scale studies is challenging. We developed an approach called CNPBayes to identify latent batch effects, to provide probabilistic estimates of integer copy number across the estimated batches, and to fully integrate the copy number uncertainty in the association model for disease. We demonstrate this approach in a Pancreatic Cancer Case Control study of 7,598 participants where the major sources of technical variation were not captured by study site and varied across the genome. Candidate associations aided by this approach include deletions of 8q24 near regulatory elements of the tumor oncogene MYC and of Tumor Supressor Candidate 3 (TUSC3). This study provides a robust Bayesian inferential framework for estimating copy number and evaluating the role of copy number in heritable diseases.

Download Full-text

Identifying Nootropic Drug Targets via Large-Scale Cognitive GWAS and Transcriptomics

10.1101/2020.02.06.934752 ◽

2020 ◽

Author(s):

Max Lam ◽

Chen Chia-Yen ◽

Xia Yan ◽

W. David Hill ◽

Joey W. Trampush ◽

...

Keyword(s):

Cognitive Ability ◽

Drug Targets ◽

Large Scale ◽

Drug Repurposing ◽

Gene Identification ◽

Summary Statistics ◽

General Cognitive Ability ◽

Nootropic Drug ◽

Genome Wide ◽

Meta Analyses

AbstractBackgroundCognitive traits demonstrate significant genetic correlations with many psychiatric disorders and other health-related traits. Many neuropsychiatric and neurodegenerative disorders are marked by cognitive deficits. Therefore, genome-wide association studies (GWAS) of general cognitive ability might suggest potential targets for nootropic drug repurposing. Our previous effort to identify “druggable genes” (i.e., GWAS-identified genes that produce proteins targeted by known small molecules) was modestly powered due to the small cognitive GWAS sample available at the time. Since then, two large cognitive GWAS meta-analyses have reported 148 and 205 genome-wide significant loci, respectively. Additionally, large-scale gene expression databases, derived from post-mortem human brain, have recently been made available for GWAS annotation. Here, we 1) reconcile results from these two cognitive GWAS meta-analyses to further enhance power for locus discovery; 2) employ several complementary transcriptomic methods to identify genes in these loci with variants that are credibly associated with cognition; and 3) further annotate the resulting genes to identify “druggable” targets.MethodsGWAS summary statistics were harmonized and jointly analysed using Multi-Trait Analysis of GWAS [MTAG], which is optimized for handling sample overlaps. Downstream gene identification was carried out using MAGMA, S-PrediXcan/S-TissueXcan Transcriptomic Wide Analysis, and eQTL mapping, as well as more recently developed methods that integrate GWAS and eQTL data via Summary-statistics Mendelian Randomization [SMR] and linkage methods [HEIDI], Available brain-specific eQTL databases included GTEXv7, BrainEAC, CommonMind, ROSMAP, and PsychENCODE. Intersecting credible genes were then annotated against multiple chemoinformatic databases [DGIdb, KI, and a published review on “druggability”].ResultsUsing our meta-analytic data set (N = 373,617) we identified 241 independent cognition-associated loci (29 novel), and 76 genes were identified by 2 or more methods of gene identification. 26 genes were associated with general cognitive ability via SMR, 16 genes via STissueXcan/S-PrediXcan, 47 genes via eQTL mapping, and 68 genes via MAGMA pathway analysis. The use of the HEIDI test permitted the exclusion of candidate genes that may have been artifactually associated to cognition due to linkage, rather than direct causal or indirect pleiotropic effects. Actin and chromatin binding gene sets were identified as novel pathways that could be targeted via drug repurposing. Leveraging on our various transcriptome and pathway analyses, as well as available chemoinformatic databases, we identified 16 putative genes that may suggest drug targets with nootropic properties.DiscussionResults converged on several categories of significant drug targets, including serotonergic and glutamatergic genes, voltage-gated ion channel genes, carbonic anhydrase genes, and phosphodiesterase genes. The current results represent the first efforts to apply a multi-method approach to integrate gene expression and SNP level data to identify credible actionable genes for general cognitive ability.

Download Full-text

Large-Scale Analysis of Apolipoprotein CIII Glycosylation by Ultrahigh Resolution Mass Spectrometry

Frontiers in Chemistry ◽

10.3389/fchem.2021.678883 ◽

2021 ◽

Vol 9 ◽

Author(s):

Daniel Demus ◽

Annemieke Naber ◽

Viktoria Dotz ◽

Bas C. Jansen ◽

Marco R. Bladergroen ◽

...

Keyword(s):

Cardiovascular Disease ◽

Lipid Metabolism ◽

Large Scale ◽

Disease Risk ◽

Cardiovascular Disease Risk ◽

Sinapinic Acid ◽

Chemical Oxidation ◽

Ultrahigh Resolution ◽

Apolipoprotein Ciii

Apolipoprotein-CIII (apo-CIII) is a glycoprotein involved in lipid metabolism and its levels are associated with cardiovascular disease risk. Apo-CIII sialylation is associated with improved plasma triglyceride levels and its glycosylation may have an effect on the clearance of triglyceride-rich lipoproteins by directing these particles to different metabolic pathways. Large-scale sample cohort studies are required to fully elucidate the role of apo-CIII glycosylation in lipid metabolism and associated cardiovascular disease. In this study, we revisited a high-throughput workflow for the analysis of intact apo-CIII by ultrahigh-resolution MALDI FT-ICR MS. The workflow includes a chemical oxidation step to reduce methionine oxidation heterogeneity and spectrum complexity. Sinapinic acid matrix was used to minimize the loss of sialic acids upon MALDI. MassyTools software was used to standardize and automate MS data processing and quality control. This method was applied on 771 plasma samples from individuals without diabetes allowing for an evaluation of the expression levels of apo-CIII glycoforms against a panel of lipid biomarkers demonstrating the validity of the method. Our study supports the hypothesis that triglyceride clearance may be regulated, or at least strongly influenced by apo-CIII sialylation. Interestingly, the association of apo-CIII glycoforms with triglyceride levels was found to be largely independent of body mass index. Due to its precision and throughput, the new workflow will allow studying the role of apo-CIII in the regulation of lipid metabolism in various disease settings.

Download Full-text

Trust in scientists in times of pandemic: Panel evidence from 12 countries

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.2108576118 ◽

2021 ◽

Vol 118 (40) ◽

pp. e2108576118

Author(s):

Yann Algan ◽

Daniel Cohen ◽

Eva Davoine ◽

Martial Foucault ◽

Stefanie Stantcheva

Keyword(s):

Large Scale ◽

Social Trust ◽

Critical Role ◽

Individual Level ◽

Individual Support ◽

Compliant Behavior ◽

Paradoxical Effects ◽

The Government ◽

Nonpharmaceutical Interventions

This article analyzes the specific and critical role of trust in scientists on both the support for and compliance with nonpharmaceutical interventions (NPIs) during the COVID-19 pandemic. We exploit large-scale, longitudinal, and representative surveys for 12 countries over the period from March to December 2020, and we complement the analysis with experimental data. We find that trust in scientists is the key driving force behind individual support for and compliance with NPIs and for favorable attitudes toward vaccination. The effect of trust in government is more ambiguous and tends to diminish support for and compliance with NPIs in countries where the recommendations from scientists and the government were not aligned. Trust in others also has seemingly paradoxical effects: in countries where social trust is high, the support for NPIs is low due to higher expectations that others will voluntary social distance. Our individual-level longitudinal data also allows us to evaluate the effects of within-person changes in trust over the pandemic: we show that trust levels and, in particular, trust in scientists have changed dramatically for individuals and within countries, with important subsequent effects on compliant behavior and support for NPIs. Such findings point out the challenging but critical need to maintain trust in scientists during a lasting pandemic that strains citizens and governments.

Download Full-text

The genomic footprint of coastal earthquake uplift

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2020.0712 ◽

2020 ◽

Vol 287 (1930) ◽

pp. 20200712 ◽

Cited By ~ 1

Author(s):

Elahe Parvizi ◽

Ceridwen I. Fraser ◽

Ludovic Dutoit ◽

Dave Craw ◽

Jonathan M. Waters

Keyword(s):

Large Scale ◽

Biological Evolution ◽

Habitat Choice ◽

Natural Experiments ◽

Ecological Constraints ◽

Dispersal Capacity ◽

Interspecific Differences ◽

Genome Wide ◽

Genetic Patterns

Theory suggests that catastrophic earth-history events can drive rapid biological evolution, but empirical evidence for such processes is scarce. Destructive geological events such as earthquakes can represent large-scale natural experiments for inferring such evolutionary processes. We capitalized on a major prehistoric (800 yr BP) geological uplift event affecting a southern New Zealand coastline to test for the lasting genomic impacts of disturbance. Genome-wide analyses of three co-distributed keystone kelp taxa revealed that post-earthquake recolonization drove the evolution of novel, large-scale intertidal spatial genetic ‘sectors’ which are tightly linked to geological fault boundaries. Demographic simulations confirmed that, following widespread extirpation, parallel expansions into newly vacant habitats rapidly restructured genome-wide diversity. Interspecific differences in recolonization mode and tempo reflect differing ecological constraints relating to habitat choice and dispersal capacity among taxa. This study highlights the rapid and enduring evolutionary effects of catastrophic ecosystem disturbance and reveals the key role of range expansion in reshaping spatial genetic patterns.

Download Full-text

Secure large-scale genome-wide association studies using homomorphic encryption

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1918257117 ◽

2020 ◽

Vol 117 (21) ◽

pp. 11608-11613 ◽

Cited By ~ 1

Author(s):

Marcelo Blatt ◽

Alexander Gusev ◽

Yuriy Polyakov ◽

Shafi Goldwasser

Keyword(s):

Large Scale ◽

Homomorphic Encryption ◽

Association Studies ◽

Genome Wide Association ◽

Single Server ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

User Interactions ◽

Individual Level ◽

Genome Wide

Genome-wide association studies (GWASs) seek to identify genetic variants associated with a trait, and have been a powerful approach for understanding complex diseases. A critical challenge for GWASs has been the dependence on individual-level data that typically have strict privacy requirements, creating an urgent need for methods that preserve the individual-level privacy of participants. Here, we present a privacy-preserving framework based on several advances in homomorphic encryption and demonstrate that it can perform an accurate GWAS analysis for a real dataset of more than 25,000 individuals, keeping all individual data encrypted and requiring no user interactions. Our extrapolations show that it can evaluate GWASs of 100,000 individuals and 500,000 single-nucleotide polymorphisms (SNPs) in 5.6 h on a single server node (or in 11 min on 31 server nodes running in parallel). Our performance results are more than one order of magnitude faster than prior state-of-the-art results using secure multiparty computation, which requires continuous user interactions, with the accuracy of both solutions being similar. Our homomorphic encryption advances can also be applied to other domains where large-scale statistical analyses over encrypted data are needed.

Download Full-text

Sex Differences in Urate Handling

International Journal of Molecular Sciences ◽

10.3390/ijms21124269 ◽

2020 ◽

Vol 21 (12) ◽

pp. 4269 ◽

Cited By ~ 1

Author(s):

Victoria L. Halperin Kuhns ◽

Owen M. Woodward

Keyword(s):

Sex Differences ◽

Disease Risk ◽

Association Studies ◽

Elevated Serum ◽

Serum Urate ◽

Genome Wide Association Studies ◽

Renal Handling ◽

Physiological Regulation ◽

Genome Wide

Hyperuricemia, or elevated serum urate, causes urate kidney stones and gout and also increases the incidence of many other conditions including renal disease, cardiovascular disease, and metabolic syndrome. As we gain mechanistic insight into how urate contributes to human disease, a clear sex difference has emerged in the physiological regulation of urate homeostasis. This review summarizes our current understanding of urate as a disease risk factor and how being of the female sex appears protective. Further, we review the mechanisms of renal handling of urate and the significant contributions from powerful genome-wide association studies of serum urate. We also explore the role of sex in the regulation of specific renal urate transporters and the power of new animal models of hyperuricemia to inform on the role of sex and hyperuricemia in disease pathogenesis. Finally, we advocate the use of sex differences in urate handling as a potent tool in gaining a further understanding of physiological regulation of urate homeostasis and for presenting new avenues for treating the constellation of urate related pathologies.

Download Full-text