Analysis across Taiwan Biobank, Biobank Japan and UK Biobank identifies hundreds of novel loci for 36 quantitative traits

AbstractGenome-wide association studies (GWAS) have identified tens of thousands of genetic loci associated with human complex traits and diseases1,2. However, the majority of GWAS were conducted in individuals of European ancestry3. Failure to capture global genetic diversity has limited biological discovery and impeded equitable delivery of genomic knowledge to diverse populations4. Here we report findings from 102,900 individuals across 36 human quantitative traits in the Taiwan Biobank (TWB), a major biobank effort that broadens the population diversity of genetic studies in East Asia (EAS). We identified 979 novel genetic loci, pinpointed novel causal variants through fine-mapping, compared the genetic architecture across TWB, Biobank Japan (BBJ)5–7 and UK Biobank (UKBB)8,9, and demonstrated the utility of cross-phenotype, cross-population polygenic risk scores (PRS) in disease risk prediction. We release all GWAS summary statistics, fine-mapping results, and single nucleotide polymorphism (SNP) weights and TWB-based PRS reference distributions for polygenic prediction (link to appear upon publication) to facilitate within-EAS and cross-population genetic research.

Download Full-text

Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity

10.1101/2021.08.02.21261488 ◽

2021 ◽

Author(s):

Steven Gazal ◽

Omer Weissbrod ◽

Farhad Hormozdiari ◽

Kushal Dey ◽

Joseph Nasser ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Common Disease ◽

Disease Genes ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Genome Wide

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies.

Download Full-text

Sexual differences in genetic architecture in UK Biobank

10.1101/2020.07.20.211813 ◽

2020 ◽

Author(s):

Elena Bernabeu ◽

Oriol Canela-Xandri ◽

Konrad Rawlik ◽

Andrea Talenti ◽

James Prendergast ◽

...

Keyword(s):

Sexual Dimorphism ◽

Complex Traits ◽

Genetic Architecture ◽

Molecular Mechanisms ◽

Association Studies ◽

Mammalian Species ◽

Genetic Correlations ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Uk Biobank

ABSTRACTSex is arguably the most important differentiating characteristic in most mammalian species, separating populations into different groups, with varying behaviors, morphologies, and physiologies based on their complement of sex chromosomes. In humans, despite males and females sharing nearly identical genomes, there are differences between the sexes in complex traits and in the risk of a wide array of diseases. Gene by sex interactions (GxS) are thought to account for some of this sexual dimorphism. However, the extent and basis of these interactions are poorly understood.Here we provide insights into both the scope and mechanism of GxS across the genome of circa 450,000 individuals of European ancestry and 530 complex traits in the UK Biobank. We found small yet widespread differences in genetic architecture across traits through the calculation of sex-specific heritability, genetic correlations, and sex-stratified genome-wide association studies (GWAS). We also found that, in some cases, sex-agnostic GWAS efforts might be missing loci of interest, and looked into possible improvements in the prediction of high-level phenotypes. Finally, we studied the potential functional role of the dimorphism observed through sex-biased eQTL and gene-level analyses.This study marks a broad examination of the genetics of sexual dimorphism. Our findings parallel previous reports, suggesting the presence of sexual genetic heterogeneity across complex traits of generally modest magnitude. Our results suggest the need to consider sex-stratified analyses for future studies in order to shed light into possible sex-specific molecular mechanisms.

Download Full-text

A Genetic Map of the Modern Urban Society of Amsterdam

Frontiers in Genetics ◽

10.3389/fgene.2021.727269 ◽

2021 ◽

Vol 12 ◽

Author(s):

Bart Ferwerda ◽

Abdel Abdellaoui ◽

Max Nieuwdorp ◽

Koos Zwinderman

Keyword(s):

Genetic Variation ◽

Complex Traits ◽

Disease Risk ◽

Association Studies ◽

Urban Setting ◽

European Ancestry ◽

Joint Analysis ◽

Genome Wide Association Studies ◽

Comprehensive Overview ◽

Polygenic Scores

Genetic differences between individuals underlie susceptibility to many diseases. Genome-wide association studies (GWAS) have discovered many susceptibility genes but were often limited to cohorts of predominantly European ancestry. Genetic diversity between individuals due to different ancestries and evolutionary histories shows that this approach has limitations. In order to gain a better understanding of the associated genetic variation, we need a more global genomics approach including a greater diversity. Here, we introduce the Healthy Life in an Urban Setting (HELIUS) cohort. The HELIUS cohort consists of participants living in Amsterdam, with a level of diversity that reflects the Dutch colonial and recent migration past. The current study includes 10,283 participants with genetic data available from seven groups of inhabitants, namely, Dutch, African Surinamese, South-Asian Surinamese, Turkish, Moroccan, Ghanaian, and Javanese Surinamese. First, we describe the genetic variation and admixture within the HELIUS cohort. Second, we show the challenges during imputation when having a genetically diverse cohort. Third, we conduct a body mass index (BMI) and height GWAS where we investigate the effects of a joint analysis of the entire cohort and a meta-analysis approach for the different subgroups. Finally, we construct polygenic scores for BMI and height and compare their predictive power across the different ethnic groups. Overall, we give a comprehensive overview of a genetically diverse cohort from Amsterdam. Our study emphasizes the importance of a less biased and more realistic representation of urban populations for mapping genetic associations with complex traits and disease risk for all.

Download Full-text

Genetics of complex traits: prediction of phenotype, identification of causal polymorphisms and genetic architecture

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2016.0569 ◽

2016 ◽

Vol 283 (1835) ◽

pp. 20160569 ◽

Cited By ~ 52

Author(s):

M. E. Goddard ◽

K. E. Kemper ◽

I. M. MacLeod ◽

A. J. Chamberlain ◽

B. J. Hayes

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Quantitative Traits ◽

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Crop Breeding ◽

Single Nucleotide ◽

Genome Wide ◽

Phenotype Identification

Complex or quantitative traits are important in medicine, agriculture and evolution, yet, until recently, few of the polymorphisms that cause variation in these traits were known. Genome-wide association studies (GWAS), based on the ability to assay thousands of single nucleotide polymorphisms (SNPs), have revolutionized our understanding of the genetics of complex traits. We advocate the analysis of GWAS data by a statistical method that fits all SNP effects simultaneously, assuming that these effects are drawn from a prior distribution. We illustrate how this method can be used to predict future phenotypes, to map and identify the causal mutations, and to study the genetic architecture of complex traits. The genetic architecture of complex traits is even more complex than previously thought: in almost every trait studied there are thousands of polymorphisms that explain genetic variation. Methods of predicting future phenotypes, collectively known as genomic selection or genomic prediction, have been widely adopted in livestock and crop breeding, leading to increased rates of genetic improvement.

Download Full-text

GWAS of three molecular traits highlights core genes and pathways alongside a highly polygenic background

10.1101/2020.04.20.051631 ◽

2020 ◽

Cited By ~ 6

Author(s):

Nasa Sinnott-Armstrong ◽

Sahin Naqvi ◽

Manuel Rivas ◽

Jonathan K Pritchard

Keyword(s):

Complex Traits ◽

Genetic Basis ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Biological Processes ◽

Uk Biobank ◽

The Core ◽

Genome Wide ◽

Core Genes

SummaryGenome-wide association studies (GWAS) have been used to study the genetic basis of a wide variety of complex diseases and other traits. However, for most traits it remains difficult to interpret what genes and biological processes are impacted by the top hits. Here, as a contrast, we describe UK Biobank GWAS results for three molecular traits—urate, IGF-1, and testosterone—that are biologically simpler than most diseases, and for which we know a great deal in advance about the core genes and pathways. Unlike most GWAS of complex traits, for all three traits we find that most top hits are readily interpretable. We observe huge enrichment of significant signals near genes involved in the relevant biosynthesis, transport, or signaling pathways. We show how GWAS data illuminate the biology of variation in each trait, including insights into differences in testosterone regulation between females and males. Meanwhile, in other respects the results are reminiscent of GWAS for more-complex traits. In particular, even these molecular traits are highly polygenic, with most of the variance coming not from core genes, but from thousands to tens of thousands of variants spread across most of the genome. Given that diseases are often impacted by many distinct biological processes, including these three, our results help to illustrate why so many variants can affect risk for any given disease.

Download Full-text

Increasing Sample Diversity in Psychiatric Genetics – Introducing a new Cohort of Patients with Schizophrenia and Controls from Vietnam – Results from a Pilot Study

10.1101/2021.04.21.21255615 ◽

2021 ◽

Author(s):

VT Nguyen ◽

A Braun ◽

J Kraft ◽

TMT Ta ◽

GM Panagiotaropoulou ◽

...

Keyword(s):

Pilot Study ◽

Data Collection ◽

Predictive Power ◽

Association Studies ◽

East Asian ◽

Genetic Research ◽

European Ancestry ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractObjectivesGenome-Wide Association Studies (GWAS) of Schizophrenia (SCZ) have provided new biological insights; however, most cohorts are of European ancestry. As a result, derived polygenic risk scores (PRS) show decreased predictive power when applied to populations of different ancestries. We aimed to assess the feasibility of a large-scale data collection in Hanoi, Vietnam, contribute to international efforts to diversify ancestry in SCZ genetic research and examine the transferability of SCZ-PRS to individuals of Vietnamese Kinh ancestry.MethodsIn a pilot study, 368 individuals (including 190 SCZ cases) were recruited at the Hanoi Medical University’s associated psychiatric hospitals and outpatient facilities. Data collection included sociodemographic data, baseline clinical data, clinical interviews assessing symptom severity and genome-wide SNP genotyping. SCZ-PRS were generated using different training data sets: i) European, ii) East-Asian and iii) trans-ancestry GWAS summary statistics from the latest SCZ GWAS meta-analysis.ResultsSCZ-PRS significantly predicted case status in Vietnamese individuals using mixed-ancestry (R2 liability=4.9%, p=6.83*10−8), East-Asian (R2 liability=4.5%, p=2.73*10−7) and European (R2 liability=3.8%, p = 1.79*10−6) discovery samples.DiscussionOur results corroborate previous findings of reduced PRS predictive power across populations, highlighting the importance of ancestral diversity in GWA studies.

Download Full-text

Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model

10.1101/228114 ◽

2017 ◽

Author(s):

Haohan Wang ◽

Xiang Liu ◽

Yunpeng Xiao ◽

Ming Xu ◽

Eric P. Xing

Keyword(s):

Population Structure ◽

Association Mapping ◽

Complex Traits ◽

Association Studies ◽

Phenotypic Variability ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Confounding Factors ◽

Genetic Loci ◽

Genome Wide

AbstractGenome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.

Download Full-text

Case-control association mapping without cases

10.1101/045831 ◽

2016 ◽

Cited By ~ 2

Author(s):

Jimmy Z Liu ◽

Yaniv Erlich ◽

Joseph K Pickrell

Keyword(s):

Association Mapping ◽

Complex Traits ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Large Population ◽

Case Control ◽

Genome Wide Association Studies ◽

The Uk ◽

Control Association

AbstractThe case-control association study is a powerful method for identifying genetic variants that influence disease risk. However, the collection of cases can be time-consuming and expensive; if a disease occurs late in life or is rapidly lethal, it may be more practical to identify family members of cases. Here, we show that replacing cases with their first-degree relatives enables genome-wide association studies by proxy (GWAX). In randomly-ascertained cohorts, this approach enables previously infeasible studies of diseases that are absent (or nearly absent) in the cohort. As an illustration, we performed GWAX of 12 common diseases in 116,196 individuals from the UK Biobank. By combining these results with published GWAS summary statistics in a meta-analysis, we replicated established risk loci and identified 17 newly associated risk loci: four in Alzheimer’s disease, eight in coronary artery disease, and five in type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping using family history of disease as a phenotype to be mapped. We anticipate that this approach will prove useful in future genetic studies of complex traits in large population cohorts.

Download Full-text

Polygenic transcriptome risk scores (PTRS) can improve portability of polygenic risk scores across ancestries

Genome Biology ◽

10.1186/s13059-021-02591-w ◽

2022 ◽

Vol 23 (1) ◽

Author(s):

Yanyu Liang ◽

Milton Pividori ◽

Ani Manichaikul ◽

Abraham A. Palmer ◽

Nancy J. Cox ◽

...

Keyword(s):

Association Studies ◽

Poor Performance ◽

Genome Wide Association ◽

European Ancestry ◽

Risk Scores ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

Transcript Levels ◽

Polygenic Risk ◽

Genome Wide

Abstract Background Polygenic risk scores (PRS) are valuable to translate the results of genome-wide association studies (GWAS) into clinical practice. To date, most GWAS have been based on individuals of European-ancestry leading to poor performance in populations of non-European ancestry. Results We introduce the polygenic transcriptome risk score (PTRS), which is based on predicted transcript levels (rather than SNPs), and explore the portability of PTRS across populations using UK Biobank data. Conclusions We show that PTRS has a significantly higher portability (Wilcoxon p=0.013) in the African-descent samples where the loss of performance is most acute with better performance than PRS when used in combination.

Download Full-text

Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽

10.3390/genes13010087 ◽

2021 ◽

Vol 13 (1) ◽

pp. 87

Author(s):

Sean M. Burnard ◽

Rodney A. Lea ◽

Miles Benton ◽

David Eccles ◽

Daniel W. Kennedy ◽

...

Keyword(s):

Multiple Sclerosis ◽

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Elastic Net ◽

Genome Wide Association Studies ◽

Multiple Testing Correction

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

Download Full-text