Improved methods for multi-trait fine mapping of pleiotropic risk loci

AbstractGenome-wide association studies (GWAS) have identified thousands of regions in the genome that contain genetic variants that increase risk for complex traits and diseases. However, the variants uncovered in GWAS are typically not biologicaly causal, but rather, correlated to the true causal variant through linkage disequilibrium (LD). To discern the true causal variant(s), a variety of statistical fine-mapping methods have been proposed to prioritize variants for functional validation. In this work we introduce a new approach, fastPAINTOR, that leverages evidence across correlated traits, as well as functional annotation data, to improve fine-mapping accuracy at pleiotropic risk loci. To improve computational efficiency, we describe an new importance sampling scheme to perform model inference. First, we demonstrate in simulations that by leveraging functional annotation data, fastPAINTOR increases fine-mapping resolution relative to existing methods. Next, we show that jointly modeling pleiotropic risk regions improves fine-mapping resolution relative to standard single trait and pleiotropic fine mapping strategies. We report a reduction in the number of SNPs required for follow-up in order to capture 90% of the causal variants from 23 SNPs per locus using a single trait to 12 SNPs when fine-mapping two traits simultaneously. Finally, we analyze summary association data from a large-scale GWAS of lipids and show that these improvements are largely sustained in real data.

Download Full-text

Discovery and fine-mapping of kidney function loci in first genome-wide association study in Africans

10.1101/2020.06.09.142463 ◽

2020 ◽

Author(s):

Segun Fatumo ◽

Tinashe Chikowore ◽

Robert Kalyesubula ◽

Rebecca N Nsubuga ◽

Gershim Asiki ◽

...

Keyword(s):

Kidney Function ◽

Fine Mapping ◽

Large Scale ◽

Genome Wide Association Study ◽

Association Studies ◽

Causal Variant ◽

Genome Wide Association ◽

European Ancestry ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractGenome-wide association studies (GWAS) for kidney function have uncovered hundreds of risk loci, primarily in populations of European ancestry. We conducted the first GWAS of estimated glomerular filtration rate (eGFR) in Africa in 3288 Ugandans and replicated the findings in 8224 African Americans. We identified two loci associated with eGFR at genome-wide significance (p<5×10−8). The most significantly associated variant (rs2433603, p=2.4×10−9) in GATM was distinct from previously reported signals. A second association signal mapping near HBB (rs141845179, p=3.0×10−8) was not significant after conditioning on a previously reported SNP (rs334) for eGFR. However, fine-mapping analyses highlighted rs141845179 to be the most likely causal variant at the HBB locus (posterior probability of 0.61). A trans-ethnic GRS of eGFR constructed from previously reported lead SNPs was not predictive into the Ugandan population, indicating that additional large-scale efforts in Africa are necessary to gain further insight into the genetic architecture of kidney disease.

Download Full-text

Insights from complex trait fine-mapping across diverse populations

10.1101/2021.09.03.21262975 ◽

2021 ◽

Author(s):

Masahiro Kanai ◽

Jacob C Ulirsch ◽

Juha Karjalainen ◽

Mitja Kurki ◽

Konrad J Karczewski ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Large Scale ◽

Association Studies ◽

Great Success ◽

Genome Wide Association Studies ◽

Diverse Populations ◽

High Confidence ◽

Causal Variants ◽

Coding Variants

AbstractDespite the great success of genome-wide association studies (GWAS) in identifying genetic loci significantly associated with diseases, the vast majority of causal variants underlying disease-associated loci have not been identified1–3. To create an atlas of causal variants, we performed and integrated fine-mapping across 148 complex traits in three large-scale biobanks (BioBank Japan4,5, FinnGen6, and UK Biobank7,8; total n = 811,261), resulting in 4,518 variant-trait pairs with high posterior probability (> 0.9) of causality. Of these, we found 285 high-confidence variant-trait pairs replicated across multiple populations, and we characterized multiple contributors to the surprising lack of overlap among fine-mapping results from different biobanks. By studying the bottlenecked Finnish and Japanese populations, we identified 21 and 26 putative causal coding variants with extreme allele frequency enrichment (> 10-fold) in these two populations, respectively. Aggregating data across populations enabled identification of 1,492 unique fine-mapped coding variants and 176 genes in which multiple independent coding variants influence the same trait (i.e., with an allelic series of coding variants). Our results demonstrate that fine-mapping in diverse populations enables novel insights into the biology of complex traits by pinpointing high-confidence causal variants for further characterization.

Download Full-text

Better estimation of SNP heritability from summary statistics provides a new understanding of the genetic architecture of complex traits

10.1101/284976 ◽

2018 ◽

Cited By ~ 6

Author(s):

Doug Speed ◽

David J Balding

Keyword(s):

Complex Traits ◽

Genetic Architecture ◽

Large Scale ◽

Association Studies ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Confounding Bias ◽

Conserved Regions ◽

Genome Wide ◽

Variation Explained

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.

Download Full-text

Comparison of methods for estimating genetic correlation between complex traits using GWAS summary statistics

10.1101/2020.10.12.336867 ◽

2020 ◽

Cited By ~ 1

Author(s):

Yiliang Zhang ◽

Youshu Cheng ◽

Wei Jiang ◽

Yixuan Ye ◽

Qiongshi Lu ◽

...

Keyword(s):

Genetic Correlation ◽

Complex Traits ◽

Association Studies ◽

Genetic Correlations ◽

Real Data ◽

Estimation Methods ◽

Easy Access ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Correlation Estimation

AbstractGenetic correlation is the correlation of additive genetic effects on two phenotypes. It is an informative metric to quantify the overall genetic similarity between complex traits, which provides insights into their polygenic genetic architecture. Several methods have been proposed to estimate genetic correlations based on data collected from genome-wide association studies (GWAS). Due to the easy access of GWAS summary statistics and computational efficiency, methods only requiring GWAS summary statistics as input have become more popular than methods utilizing individual-level genotype data. Here, we present a benchmark study for different summary-statistics-based genetic correlation estimation methods through simulation and real data applications. We focus on two major technical challenges in estimating genetic correlation: marker dependency caused by linkage disequilibrium (LD) and sample overlap between different studies. To assess the performance of different methods in the presence of these two challenges, we first conducted comprehensive simulations with diverse LD patterns and sample overlaps. Then we applied these methods to real GWAS summary statistics for a wide spectrum of complex traits. Based on these experiments, we conclude that methods relying on accurate LD estimation are less robust in real data applications compared to other methods due to the imprecision of LD obtained from reference panels. Our findings offer a guidance on how to appropriately choose the method for genetic correlation estimation in post-GWAS analysis in interpretation.

Download Full-text

Barcoded Bulk QTL mapping reveals highly polygenic and epistatic architecture of complex traits in yeast

10.1101/2021.09.08.459513 ◽

2021 ◽

Author(s):

Alex N. Nguyen Ba ◽

Katherine R. Lawrence ◽

Artur Rego-Costa ◽

Shreyas Gopalakrishnan ◽

Daniel Temko ◽

...

Keyword(s):

Quantitative Trait Locus ◽

Qtl Mapping ◽

Quantitative Trait ◽

Complex Traits ◽

Large Scale ◽

Genetic Basis ◽

Association Studies ◽

Model Organisms ◽

Genome Wide Association Studies ◽

Trait Locus

Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.Significance statementUnderstanding the genetic basis of important phenotypes is a central goal of genetics. However, the highly polygenic architectures of complex traits inferred by large-scale genome-wide association studies (GWAS) in humans stand in contrast to the results of quantitative trait locus (QTL) mapping studies in model organisms. Here, we use a barcoding approach to conduct QTL mapping in budding yeast at a scale two orders of magnitude larger than the previous state of the art. The resulting increase in power reveals the polygenic nature of complex traits in yeast, and offers insight into widespread patterns of pleiotropy and epistasis. Our data and analysis methods offer opportunities for future work in systems biology, and have implications for large-scale GWAS in human populations.

Download Full-text

Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽

10.3390/genes13010087 ◽

2021 ◽

Vol 13 (1) ◽

pp. 87

Author(s):

Sean M. Burnard ◽

Rodney A. Lea ◽

Miles Benton ◽

David Eccles ◽

Daniel W. Kennedy ◽

...

Keyword(s):

Multiple Sclerosis ◽

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Elastic Net ◽

Genome Wide Association Studies ◽

Multiple Testing Correction

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

Download Full-text

circVAR database: genome-wide archive of genetic variants for human circular RNAs

10.21203/rs.3.rs-48904/v2 ◽

2020 ◽

Author(s):

Min Zhao ◽

Hong Qu

Keyword(s):

Genetic Variants ◽

Complex Traits ◽

Large Scale ◽

Rna Binding ◽

Rna Binding Proteins ◽

Association Studies ◽

Chromosome 17 ◽

Circular Rnas ◽

Genome Wide Association Studies ◽

Genome Wide

Abstract Background: Circular RNAs (circRNAs) play important roles in regulating gene expression through binding miRNAs and RNA binding proteins. Genetic variation of circRNAs may affect complex traits/diseases by changing their binding efficiency to target miRNAs and proteins. There is a growing demand for investigations of the functions of genetic changes using large-scale experimental evidence. However, there is no online genetic resource for circRNA genes. Results: We performed extensive genetic annotation of 295,526 circRNAs integrated from circBase, circNet and circRNAdb. All pre-computed genetic variants were presented at our online resource, circVAR, with data browsing and search functionality. We explored the chromosome-based distribution of circRNAs and their associated variants. We found that, based on mapping to the 1000 Genomes and ClinVAR databases, chromosome 17 has a relatively large number of circRNAs and associated common and health-related genetic variants. Following the annotation of genome wide association studies (GWAS)-based circRNA variants, we found many non-coding variants within circRNAs, suggesting novel mechanisms for common diseases reported from GWAS studies. For cancer-based somatic variants, we found that chromosome 7 has many highly complex mutations that have been overlooked in previous research. Conclusion: We used the circVAR database to collect SNPs and small insertions and deletions (INDELs) in putative circRNA regions and to identify their potential phenotypic information. To provide a reusable resource for the circRNA research community, we have published all the pre-computed genetic data concerning circRNAs and associated genes together with data query and browsing functions at http://soft.bioinfo-minzhao.org/circvar .

Download Full-text

Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity

10.1101/2021.08.02.21261488 ◽

2021 ◽

Author(s):

Steven Gazal ◽

Omer Weissbrod ◽

Farhad Hormozdiari ◽

Kushal Dey ◽

Joseph Nasser ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Common Disease ◽

Disease Genes ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Genome Wide

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies.

Download Full-text

True causal effect size heterogeneity is not required to explain trans-ethnic differences in GWAS signals

10.1101/085092 ◽

2016 ◽

Cited By ~ 3

Author(s):

Daniela Zanetti ◽

Michael E. Weale

Keyword(s):

Ethnic Differences ◽

Effect Size ◽

Complex Traits ◽

Causal Effect ◽

Association Studies ◽

African Ancestry ◽

Causal Variant ◽

Genome Wide Association Studies ◽

Population Differences ◽

Relative Risks

AbstractThrough genome-wide association studies (GWASs), researchers have identified hundreds of genetic variants associated with particular complex traits. Previous studies have compared the pattern of association signals across different populations in real data, and these have detected differences in the strength and sometimes even the direction of GWAS signals. These differences could be due to a combination of (1) lack of power (insufficient sample sizes); (2) minor allele frequency (MAF) differences (again affecting power); (3) linkage disequilibrium (LD) differences (affecting power to ‘tag’ the causal variant); and (4) true differences in causal variant effect sizes (defined by relative risks).In the present work, we sought to assess whether the first three of these reasons are sufficient on their own to explain the observed incidence of trans-ethnic differences in replications of GWAS signals, or whether the fourth reason is also required. We simulated case-control data of European, Asian and African ancestry, drawing on observed MAF and LD patterns seen in the 1000-Genomes reference dataset and assuming the true causal relative risks were the same in all three populations.We found that a combination of Euro-centric SNP selection and between-population differences in LD, accentuated by the lower SNP density typical of older GWAS panels, was sufficient to explain the rate of trans-ethnic differences previously reported, without the need to assume between-population differences in true causal SNP effect size. This suggests a cross-population consistency that has implications for our understanding of the interplay between genetics and environment in the aetiology of complex human diseases.

Download Full-text

Causal Haplotype Block Identification in Plant Genome-Wide Association Studies

10.1101/2021.10.28.466332 ◽

2021 ◽

Author(s):

Xing Wu ◽

Wei Jiang ◽

Christopher Fragoso ◽

Jing Huang ◽

Geyu Zhou ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Haplotype Block ◽

Association Studies ◽

Crop Improvement ◽

Plant Genome ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Haplotype Blocks ◽

Genome Wide

Genome wide association studies (GWAS) can play an essential role in understanding genetic basis of complex traits in plants and animals. Conventional SNP-based linear mixed models (LMM) used in many GWAS that marginally test single nucleotide polymorphisms (SNPs) have successfully identified many loci with major and minor effects. In plants, the relatively small population size in GWAS and the high genetic diversity found many plant species can impede mapping efforts on complex traits. Here we present a novel haplotype-based trait fine-mapping framework, HapFM, to supplement current GWAS methods. HapFM uses genotype data to partition the genome into haplotype blocks, identifies haplotype clusters within each block, and then performs genome-wide haplotype fine-mapping to infer the causal haplotype blocks of trait. We benchmarked HapFM, GEMMA, BSLMM, and GMMAT in both simulation and real plant GWAS datasets. HapFM consistently resulted in higher mapping power than the other GWAS methods in simulations with high polygenicity. Moreover, it resulted in higher mapping resolution, especially in regions of high LD, by identifying small causal blocks in the larger haplotype block. In the Arabidopsis flowering time (FT10) datasets, HapFM identified four novel loci compared to GEMMA results, and its average mapping interval of HapFM was 9.6 times smaller than that of GEMMA. In conclusion, HapFM is tailored for plant GWAS to result in high mapping power on complex traits and improved mapping resolution to facilitate crop improvement.

Download Full-text