Case-control association mapping without cases

AbstractThe case-control association study is a powerful method for identifying genetic variants that influence disease risk. However, the collection of cases can be time-consuming and expensive; if a disease occurs late in life or is rapidly lethal, it may be more practical to identify family members of cases. Here, we show that replacing cases with their first-degree relatives enables genome-wide association studies by proxy (GWAX). In randomly-ascertained cohorts, this approach enables previously infeasible studies of diseases that are absent (or nearly absent) in the cohort. As an illustration, we performed GWAX of 12 common diseases in 116,196 individuals from the UK Biobank. By combining these results with published GWAS summary statistics in a meta-analysis, we replicated established risk loci and identified 17 newly associated risk loci: four in Alzheimer’s disease, eight in coronary artery disease, and five in type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping using family history of disease as a phenotype to be mapped. We anticipate that this approach will prove useful in future genetic studies of complex traits in large population cohorts.

Download Full-text

Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽

10.3390/genes13010087 ◽

2021 ◽

Vol 13 (1) ◽

pp. 87

Author(s):

Sean M. Burnard ◽

Rodney A. Lea ◽

Miles Benton ◽

David Eccles ◽

Daniel W. Kennedy ◽

...

Keyword(s):

Multiple Sclerosis ◽

Complex Traits ◽

Multiple Testing ◽

Large Scale ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Elastic Net ◽

Genome Wide Association Studies ◽

Multiple Testing Correction

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

Download Full-text

Genome-Wide Meta-Analysis of Late-Onset Alzheimer's Disease Using Rare Variant Imputation in 324,809 Subjects Identifies Novel Rare Variant Locus NCK2: The International Genomics of Alzheimer's Project (IGAP)

10.1101/2021.03.14.21253553 ◽

2021 ◽

Author(s):

Adam C. Naj ◽

Ganna Leonenko ◽

Xueqiu Jian ◽

Benjamin Grenier-Boley ◽

Maria Carolina Dalmasso ◽

...

Keyword(s):

Alzheimer’S Disease ◽

Alzheimer's Disease ◽

Rare Variant ◽

Late Onset ◽

Sequence Data ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Genome Wide ◽

The Uk

Risk for late-onset Alzheimer's disease (LOAD) is driven by multiple loci primarily identified by genome-wide association studies, many of which are common variants with minor allele frequencies (MAF)>0.01. To identify additional common and rare LOAD risk variants, we performed a GWAS on 25,170 LOAD subjects and 41,052 cognitively normal controls in 44 datasets from the International Genomics of Alzheimer's Project (IGAP). Existing genotype data were imputed using the dense, high-resolution Haplotype Reference Consortium (HRC) r1.1 reference panel. Stage 1 associations of P<10-5 were meta-analyzed with the European Alzheimer's Disease Biobank (EADB) (n=20,301 cases; 21,839 controls) (stage 2 combined IGAP and EADB). An expanded meta-analysis was performed using a GWAS of parental AD/dementia history in the UK Biobank (UKBB) (n=35,214 cases; 180,791 controls) (stage 3 combined IGAP, EADB, and UKBB). Common variant (MAF≥0.01) associations were identified for 29 loci in stage 2, including novel genome-wide significant associations at TSPAN14 (P=2.33×10-12), SHARPIN (P=1.56×10-9), and ATF5/SIGLEC11 (P=1.03[mult]10-8), and newly significant associations without using AD proxy cases in MTSS1L/IL34 (P=1.80×10-8), APH1B (P=2.10×10-13), and CLNK (P=2.24×10-10). Rare variant (MAF<0.01) associations with genome-wide significance in stage 2 included multiple variants in APOE and TREM2, and a novel association of a rare variant (rs143080277; MAF=0.0054; P=2.69×10-9) in NCK2, further strengthened with the inclusion of UKBB data in stage 3 (P=7.17×10-13). Single-nucleus sequence data shows that NCK2 is highly expressed in amyloid-responsive microglial cells, suggesting a role in LOAD pathology.

Download Full-text

Multiplex Confounding Factor Correction for Genomic Association Mapping with Squared Sparse Linear Mixed Model

10.1101/228114 ◽

2017 ◽

Author(s):

Haohan Wang ◽

Xiang Liu ◽

Yunpeng Xiao ◽

Ming Xu ◽

Eric P. Xing

Keyword(s):

Population Structure ◽

Association Mapping ◽

Complex Traits ◽

Association Studies ◽

Phenotypic Variability ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Confounding Factors ◽

Genetic Loci ◽

Genome Wide

AbstractGenome-wide Association Study has presented a promising way to understand the association between human genomes and complex traits. Many simple polymorphic loci have been shown to explain a significant fraction of phenotypic variability. However, challenges remain in the non-triviality of explaining complex traits associated with multifactorial genetic loci, especially considering the confounding factors caused by population structure, family structure, and cryptic relatedness. In this paper, we propose a Squared-LMM (LMM2) model, aiming to jointly correct population and genetic confounding factors. We offer two strategies of utilizing LMM2 for association mapping: 1) It serves as an extension of univariate LMM, which could effectively correct population structure, but consider each SNP in isolation. 2) It is integrated with the multivariate regression model to discover association relationship between complex traits and multifactorial genetic loci. We refer to this second model as sparse Squared-LMM (sLMM2). Further, we extend LMM2/sLMM2 by raising the power of our squared model to the LMMn/sLMMn model. We demonstrate the practical use of our model with synthetic phenotypic variants generated from genetic loci of Arabidopsis Thaliana. The experiment shows that our method achieves a more accurate and significant prediction on the association relationship between traits and loci. We also evaluate our models on collected phenotypes and genotypes with the number of candidate genes that the models could discover. The results suggest the potential and promising usage of our method in genome-wide association studies.

Download Full-text

Population stratification in GWAS meta-analysis should be standardized to the best available reference datasets

10.1101/2020.09.03.281568 ◽

2020 ◽

Author(s):

Aliya Sarmanova ◽

Tim Morris ◽

Daniel John Lawson

Keyword(s):

Population Stratification ◽

Association Studies ◽

Meta Analysis ◽

Principal Component ◽

Underlying Structure ◽

Genome Wide Association Studies ◽

Uk Biobank ◽

External Reference ◽

Major Disadvantage ◽

The Uk

AbstractPopulation stratification has recently been demonstrated to bias genetic studies even in relatively homogeneous populations such as within the British Isles. A key component to correcting for stratification in genome-wide association studies (GWAS) is accurately identifying and controlling for the underlying structure present in the sample. Meta-analysis across cohorts is increasingly important for achieving very large sample sizes, but comes with the major disadvantage that each individual cohort corrects for different population stratification. Here we demonstrate that correcting for structure against an external reference adds significant value to meta-analysis. We treat the UK Biobank as a collection of smaller studies, each of which is geographically localised. We provide software to standardize an external dataset against a reference, provide the UK Biobank principal component loadings for this purpose, and demonstrate the value of this with an analysis of the geographically sampled ALSPAC cohort.

Download Full-text

A meta-analysis of the genome-wide association studies on two genetically correlated phenotypes (self-reported headache and self-reported migraine) identifies four new risk loci for headaches (N=397,385)

10.1101/2021.09.15.21263668 ◽

2021 ◽

Author(s):

Weihua Meng ◽

Parminder Reel ◽

Charvi Nangia ◽

Aravind Rajendrakumar ◽

Harry Hebert ◽

...

Keyword(s):

Association Studies ◽

Meta Analysis ◽

The Self ◽

Genome Wide Association ◽

P Value ◽

Clinical Settings ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Genetic Mechanisms ◽

The Uk

Headache is one of the commonest complaints that doctors need to address in clinical settings. The genetic mechanisms of different types of headache are not well understood. In this study, we performed a meta-analysis of genome-wide association studies (GWAS) on the self-reported headache phenotype from the UK Biobank cohort and the self-reported migraine phenotype from the 23andMe resource using the metaUSAT for genetically correlated phenotypes (N=397,385). We identified 38 loci for headaches, of which 34 loci have been reported before and 4 loci were newly identified. The LRP1-STAT6-SDR9C7 region in chromosome 12 was the most significantly associated locus with a leading P value of 1.24 x 10-62 of rs11172113. The ONECUT2 gene locus in chromosome 18 was the strongest signal among the 4 new loci with a P value of 1.29 x 10-9 of rs673939. Our study demonstrated that the genetically correlated phenotypes of self-reported headache and self-reported migraine can be meta-analysed together in theory and in practice to boost study power to identify more new variants for headaches. This study has paved way for a large GWAS meta-analysis study involving cohorts of different, though genetically correlated headache phenotypes.

Download Full-text

Signals of polygenic adaptation on height have been overestimated due to uncorrected population structure in genome-wide association studies

10.1101/355057 ◽

2018 ◽

Cited By ~ 19

Author(s):

Mashaal Sohail ◽

Robert M. Maier ◽

Andrea Ganna ◽

Alex Bloemendal ◽

Alicia R. Martin ◽

...

Keyword(s):

Population Structure ◽

Association Studies ◽

Meta Analysis ◽

Human Populations ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Large Numbers ◽

Genome Wide ◽

Polygenic Adaptation ◽

The Uk

AbstractGenetic predictions of height differ among human populations and these differences are too large to be explained by genetic drift. This observation has been interpreted as evidence of polygenic adaptation. Differences across populations were detected using SNPs genome-wide significantly associated with height, and many studies also found that the signals grew stronger when large numbers of subsignificant SNPs were analyzed. This has led to excitement about the prospect of analyzing large fractions of the genome to detect subtle signals of selection and claims of polygenic adaptation for multiple traits. Polygenic adaptation studies of height have been based on SNP effect size measurements in the GIANT Consortium meta-analysis. Here we repeat the height analyses in the UK Biobank, a much more homogeneously designed study. Our results show that polygenic adaptation signals based on large numbers of SNPs below genome-wide significance are extremely sensitive to biases due to uncorrected population structure.

Download Full-text

Pleiotropic Mapping and Annotation Selection in Genome-wide Association Studies with Penalized Gaussian Mixture Models

10.1101/256461 ◽

2018 ◽

Author(s):

Ping Zeng ◽

Xinjie Hao ◽

Xiang Zhou

Keyword(s):

Association Mapping ◽

Complex Traits ◽

Association Studies ◽

Penalized Regression ◽

Genome Wide Association ◽

Accurate Estimation ◽

Genome Wide Association Studies ◽

Multiple Traits ◽

Snp Association ◽

Genome Wide

AbstractMotivationGenome-wide association studies (GWASs) have identified many genetic loci associated with complex traits. A substantial fraction of these identified loci are associated with multiple traits – a phenomena known as pleiotropy. Identification of pleiotropic associations can help characterize the genetic relationship among complex traits and can facilitate our understanding of disease etiology. Effective pleiotropic association mapping requires the development of statistical methods that can jointly model multiple traits with genome-wide SNPs together.ResultsWe develop a joint modeling method, which we refer to as the integrative MApping of Pleiotropic association (iMAP). iMAP models summary statistics from GWASs, uses a multivariate Gaussian distribution to account for phenotypic correlation, simultaneously infers genome-wide SNP association pattern using mixture modeling, and has the potential to reveal causal relationship between traits. Importantly, iMAP integrates a large number of SNP functional annotations to substantially improve association mapping power, and, with a sparsity-inducing penalty, is capable of selecting informative annotations from a large, potentially noninformative set. To enable scalable inference of iMAP to association studies with hundreds of thousands of individuals and millions of SNPs, we develop an efficient expectation maximization algorithm based on an approximate penalized regression algorithm. With simulations and comparisons to existing methods, we illustrate the benefits of iMAP both in terms of high association mapping power and in terms of accurate estimation of genome-wide SNP association patterns. Finally, we apply iMAP to perform a joint analysis of 48 traits from 31 GWAS consortia together with 40 tissue-specific SNP annotations generated from the Roadmap Project. iMAP is freely available at www.xzlab.org/software.html.

Download Full-text

Combining SNP-to-gene linking strategies to pinpoint disease genes and assess disease omnigenicity

10.1101/2021.08.02.21261488 ◽

2021 ◽

Author(s):

Steven Gazal ◽

Omer Weissbrod ◽

Farhad Hormozdiari ◽

Kushal Dey ◽

Joseph Nasser ◽

...

Keyword(s):

Fine Mapping ◽

Complex Traits ◽

Target Genes ◽

Disease Risk ◽

Association Studies ◽

Common Disease ◽

Disease Genes ◽

Genome Wide Association Studies ◽

Functional Interpretation ◽

Genome Wide

Although genome-wide association studies (GWAS) have identified thousands of disease-associated common SNPs, these SNPs generally do not implicate the underlying target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis, but it is unclear how these strategies should be applied in the context of interpreting common disease risk variants. We developed a framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk, leveraging polygenic analyses of disease heritability to define and estimate their precision and recall. We applied our framework to GWAS summary statistics for 63 diseases and complex traits (average N=314K), evaluating 50 S2G strategies. Our optimal combined S2G strategy (cS2G) included 7 constituent S2G strategies (Exon, Promoter, 2 fine-mapped cis-eQTL strategies, EpiMap enhancer-gene linking, Activity-By-Contact (ABC), and Cicero), and achieved a precision of 0.75 and a recall of 0.33, more than doubling the precision and/or recall of any individual strategy; this implies that 33% of SNP-heritability can be linked to causal genes with 75% confidence. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 7,111 causal SNP-gene-disease triplets (with S2G-derived functional interpretation) with high confidence. Finally, we applied cS2G to genome-wide fine-mapping results for these traits (not restricted to GWAS loci) to rank genes by the heritability linked to each gene, providing an empirical assessment of disease omnigenicity; averaging across traits, we determined that the top 200 (1%) of ranked genes explained roughly half of the heritability linked to all genes. Our results highlight the benefits of our cS2G strategy in providing functional interpretation of GWAS findings; we anticipate that precision and recall will increase further under our framework as improved functional assays lead to improved S2G strategies.

Download Full-text

High Level of Uromodulin Increases the Risk of Hypertension: A Mendelian Randomization Study

Frontiers in Cardiovascular Medicine ◽

10.3389/fcvm.2021.736001 ◽

2021 ◽

Vol 8 ◽

Author(s):

Ruilian You ◽

Lanlan Chen ◽

Lubin Xu ◽

Dingding Zhang ◽

Haitao Li ◽

...

Keyword(s):

Blood Pressure ◽

Causal Relationship ◽

Mendelian Randomization ◽

Association Studies ◽

Meta Analysis ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Inverse Variance ◽

The Uk ◽

High Level

Background: The association of uromodulin and hypertension has been observed in clinical studies, but not proven by a causal relationship. We conducted a two-sample Mendelian randomization (MR) analysis to investigate the causal relationship between uromodulin and blood pressure.Methods: We selected single nucleotide polymorphisms (SNPs) related to urinary uromodulin (uUMOD) and serum uromodulin (sUMOD) from a large Genome-Wide Association Studies (GWAS) meta-analysis study and research in PubMed. Six datasets based on the UK Biobank and the International Consortium for Blood Pressure (ICBP) served as outcomes with a large sample of hypertension (n = 46,188), systolic blood pressure (SBP, n = 1,194,020), and diastolic blood pressure (DBP, n = 1,194,020). The inverse variance weighted (IVW) method was performed in uUMOD MR analysis, while methods of IVW, MR-Egger, Weighted median, and Mendelian Randomization Pleiotropy RESidual Sum and Outlier (MR-PRESSO) were utilized on sUMOD MR analysis.Results: MR analysis of IVM showed the odds ratio (OR) of the uUMOD to hypertension (“ukb-b-14057” and “ukb-b-14177”) is 1.04 (95% Confidence Interval (CI), 1.03-1.04, P < 0.001); the effect sizes of the uUMOD to SBP are 1.10 (Standard error (SE) = 0.25, P = 8.92E-06) and 0.03 (SE = 0.01, P = 2.70E-04) in “ieu-b-38” and “ukb-b-20175”, respectively. The β coefficient of the uUMOD to DBP is 0.88 (SE = 0.19, P = 4.38E-06) in “ieu-b-39” and 0.05 (SE = 0.01, P = 2.13E-10) in “ukb-b-7992”. As for the sUMOD, the OR of hypertension (“ukb-b-14057” and “ukb-b-14177”) is 1.01 (95% CI 1.01–1.02, all P < 0.001). The β coefficient of the SBP is 0.37 (SE = 0.07, P = 1.26E-07) in “ieu-b-38” and 0.01 (SE = 0.003, P = 1.04E-04) in “ukb-b-20175”. The sUMOD is causally associated with elevated DBP (“ieu-b-39”: β = 0.313, SE = 0.050, P = 3.43E-10; “ukb-b-7992”: β = 0.018, SE = 0.003, P = 8.41E-09).Conclusion: Our results indicated that high urinary and serum uromodulin levels are potentially detrimental in elevating blood pressure, and serve as a causal risk factor for hypertension.

Download Full-text

Discovering patterns of pleiotropy in genome-wide association studies

10.1101/273540 ◽

2018 ◽

Cited By ~ 2

Author(s):

Jianan Zhana ◽

Jessica van Setten ◽

Jennifer Brody ◽

Brenton Swenson ◽

Anne M. Butler ◽

...

Keyword(s):

Gold Standard ◽

Disease Risk ◽

Association Studies ◽

Meta Analysis ◽

Linkage Disequilibrium Block ◽

Genome Wide Association ◽

Superior Performance ◽

Great Success ◽

Genome Wide Association Studies ◽

Genome Wide

AbstractMotivationGenome-wide association studies have had great success in identifying human genetic variants associated with disease, disease risk factors, and other biomedical phenotypes. Many variants are associated with multiple traits, even after correction for trait-trait correlation. Discovering subsets of variants associated with a shared subset of phenotypes could help reveal disease mechanisms, suggest new therapeutic options, and increase the power to detect additional variants with similar pattern of associations. Here we introduce two methods based on a Bayesian framework, SNP And Pleiotropic PHenotype Organization (SAPPHO), one modeling independent phenotypes (SAPPHO-I) and the other incorporating a full phenotype covariance structure (SAPPHO-C). These two methods learn patterns of pleiotropy from genotype and phenotype data, using identified associations to discover additional associations with shared patterns.ResultsThe SAPPHO methods, along with other recent approaches for pleiotropic association tests, were assessed using data from the Atherosclerotic Risk in Communities (ARIC) study of 8,000 individuals, whose gold-standard associations were provided by meta-analysis of 40,000 to 100,000 individuals from the CHARGE consortium. Using power to detect gold-standard associations at genome-wide significance (0.05 family-wise error rate) as a metric, SAPPHO performed best. The SAPPHO methods were also uniquely able to select the most significant variants in a parsimonious model, excluding other less likely variants within a linkage disequilibrium block. For meta-analysis, the SAPPHO methods implement summary modes that use sufficient statistics rather than full phenotype and genotype data. Meta-analysis applied to CHARGE detected 16 additional associations to the gold-standard loci, as well as 124 novel loci, at 0.05 false discovery rate. Reasons for the superior performance were explored by performing simulations over a range of scenarios describing different genetic architectures. With SAPPHO we were able to learn genetic structures that were hidden using the traditional univariate tests.Availabilityhttps://bitbucket.org/baderlab/fast/wiki/Home. SAPPHO software is available under the GNU General Public License, v2.

Download Full-text