scholarly journals Capturing SNP Association across the NK Receptor and HLA Gene Regions in Multiple Sclerosis by Targeted Penalised Regression Models

Genes ◽  
2021 ◽  
Vol 13 (1) ◽  
pp. 87
Author(s):  
Sean M. Burnard ◽  
Rodney A. Lea ◽  
Miles Benton ◽  
David Eccles ◽  
Daniel W. Kennedy ◽  
...  

Conventional genome-wide association studies (GWASs) of complex traits, such as Multiple Sclerosis (MS), are reliant on per-SNP p-values and are therefore heavily burdened by multiple testing correction. Thus, in order to detect more subtle alterations, ever increasing sample sizes are required, while ignoring potentially valuable information that is readily available in existing datasets. To overcome this, we used penalised regression incorporating elastic net with a stability selection method by iterative subsampling to detect the potential interaction of loci with MS risk. Through re-analysis of the ANZgene dataset (1617 cases and 1988 controls) and an IMSGC dataset as a replication cohort (1313 cases and 1458 controls), we identified new association signals for MS predisposition, including SNPs above and below conventional significance thresholds while targeting two natural killer receptor loci and the well-established HLA loci. For example, rs2844482 (98.1% iterations), otherwise ignored by conventional statistics (p = 0.673) in the same dataset, was independently strongly associated with MS in another GWAS that required more than 40 times the number of cases (~45 K). Further comparison of our hits to those present in a large-scale meta-analysis, confirmed that the majority of SNPs identified by the elastic net model reached conventional statistical GWAS thresholds (p < 5 × 10−8) in this much larger dataset. Moreover, we found that gene variants involved in oxidative stress, in addition to innate immunity, were associated with MS. Overall, this study highlights the benefit of using more advanced statistical methods to (re-)analyse subtle genetic variation among loci that have a biological basis for their contribution to disease risk.

2018 ◽  
Author(s):  
David M. Howard ◽  
Mark J. Adams ◽  
Toni-Kim Clarke ◽  
Jonathan D. Hafferty ◽  
Jude Gibson ◽  
...  

AbstractMajor depression is a debilitating psychiatric illness that is typically associated with low mood, anhedonia and a range of comorbidities. Depression has a heritable component that has remained difficult to elucidate with current sample sizes due to the polygenic nature of the disorder. To maximise sample size, we meta-analysed data on 807,553 individuals (246,363 cases and 561,190 controls) from the three largest genome-wide association studies of depression. We identified 102 independent variants, 269 genes, and 15 gene-sets associated with depression, including both genes and gene-pathways associated with synaptic structure and neurotransmission. Further evidence of the importance of prefrontal brain regions in depression was provided by an enrichment analysis. In an independent replication sample of 1,306,354 individuals (414,055 cases and 892,299 controls), 87 of the 102 associated variants were significant following multiple testing correction. Based on the putative genes associated with depression this work also highlights several potential drug repositioning opportunities. These findings advance our understanding of the complex genetic architecture of depression and provide several future avenues for understanding aetiology and developing new treatment approaches.


2016 ◽  
Author(s):  
Jimmy Z Liu ◽  
Yaniv Erlich ◽  
Joseph K Pickrell

AbstractThe case-control association study is a powerful method for identifying genetic variants that influence disease risk. However, the collection of cases can be time-consuming and expensive; if a disease occurs late in life or is rapidly lethal, it may be more practical to identify family members of cases. Here, we show that replacing cases with their first-degree relatives enables genome-wide association studies by proxy (GWAX). In randomly-ascertained cohorts, this approach enables previously infeasible studies of diseases that are absent (or nearly absent) in the cohort. As an illustration, we performed GWAX of 12 common diseases in 116,196 individuals from the UK Biobank. By combining these results with published GWAS summary statistics in a meta-analysis, we replicated established risk loci and identified 17 newly associated risk loci: four in Alzheimer’s disease, eight in coronary artery disease, and five in type 2 diabetes. In addition to informing disease biology, our results demonstrate the utility of association mapping using family history of disease as a phenotype to be mapped. We anticipate that this approach will prove useful in future genetic studies of complex traits in large population cohorts.


2021 ◽  
Author(s):  
Giulia Muzio ◽  
Leslie O'Bray ◽  
Laetitia Meng-Papaxanthos ◽  
Juliane Klatt ◽  
Karsten Borgwardt

While the search for associations between genetic markers and complex traits has discovered tens of thousands of trait-related genetic variants, the vast majority of these only explain a tiny fraction of observed phenotypic variation. One possible strategy to detect stronger associations is to aggregate the effects of several genetic markers and to test entire genes, pathways or (sub)networks of genes for association to a phenotype. The latter, network-based genome-wide association studies, in particular suffers from a huge search space and an inherent multiple testing problem. As a consequence, current approaches are either based on greedy feature selection, thereby risking that they miss relevant associations, and/or neglect doing a multiple testing correction, which can lead to an abundance of false positive findings. To address the shortcomings of current approaches of network-based genome-wide association studies, we propose <tt>networkGWAS</tt>, a computationally efficient and statistically sound approach to gene-based genome-wide association studies based on mixed models and neighborhood aggregation. It allows for population structure correction and for well-calibrated p-values, which we obtain through a block permutation scheme. <tt>networkGWAS</tt> successfully detects known or plausible associations on simulated rare variants from H. sapiens data as well as semi-simulated and real data with common variants from A. thaliana and enables the systematic combination of gene-based genome-wide association studies with biological network information.


2020 ◽  
Author(s):  
Amy R. Bentley ◽  
Guanjie Chen ◽  
Ayo P. Doumatey ◽  
Daniel Shriner ◽  
Karlijn Meeks ◽  
...  

AbstractBackgroundSerum lipids are biomarkers of cardiometabolic disease risk, and understanding the genomic factors contributing to their distribution has been of considerable interest. Large genome-wide association studies (GWAS) have identified over 150 lipids loci; however, GWAS of Africans (AF) are rare. Given the genomic diversity among those of African ancestry, it is expected that a GWAS in Africans could identify novel lipids loci. While GWAS have been conducted in African Americans (AA), such studies are not proxies for studies in continental Africans due to the drastically different environmental context. Therefore, we conducted a GWAS of 4,317 Africans enrolled in the Africa America Diabetes Mellitus study.Methods and ResultsWe used linear mixed models of the inverse normal transformations of covariate-djusted residuals of high-density lipoprotein cholesterol (HDLC), low-density lipoprotein cholesterol (LDLC), total cholesterol (CHOL), triglycerides (TG), and TG/HDLC, with adjustment for three principal components and the random effect of relatedness. Replication of loci associated at p<5×10−8 was attempted in 9,542 AA. Meta-analysis of AF and AA was also conducted. We also conducted analyses that excluded the relatively small number of East Africans. We evaluated known lipids loci in Africans using both exact replication and “local” replication, which accounts for interethnic differences in linkage disequilibrium.In our main analysis, we identified 23 novel associations in Africans. Of the 14 of these that were able to be tested in AA, two associations replicated (GPNMB-TG and ENPP1-TG). Two additional novel loci were discovered upon meta-analysis with AA (rs138282551-TG and TLL2-CHOL). Analyses considering only those with predominantly West African ancestry (Nigeria, Ghana, and AA) yielded new insights: ORC5-LDLC and chr20:60973327-CHOL.ConclusionsWhile functional work will be useful to confirm and understand the biological mechanisms underlying these associations, this study demonstrates the utility of conducting large-scale genomic analyses in Africans for discovering novel loci. The functional significance of some of these loci in relation to lipids remains to be elucidated, yet some have known connections to lipids pathways. For instance, rs147706369 (intronic, TLL2) alters a regulatory motif for sterol regulatory element-binding proteins (SREBPs), which are a family of transcription factors that control the expression of a range of enzymes involved in cholesterol, fatty acid, and triglyceride synthesis.


Author(s):  
Jessica D Faul ◽  
Minjung Kho ◽  
Wei Zhao ◽  
Kalee E Rumfelt ◽  
Miao Yu ◽  
...  

Abstract Background Later-life cognitive function is influenced by genetics as well as early- and later-life socioeconomic context. However, few studies have examined the interaction between genetics and early childhood factors. Methods Using gene-based tests (interaction sequence kernel association test [iSKAT]/iSKAT optimal unified test), we examined whether common and/or rare exonic variants in 39 gene regions previously associated with cognitive performance, dementia, and related traits had an interaction with childhood socioeconomic context (parental education and financial strain) on memory performance or decline in European ancestry (EA, N = 10 468) and African ancestry (AA, N = 2 252) participants from the Health and Retirement Study. Results Of the 39 genes, 22 in EA and 19 in AA had nominally significant interactions with at least one childhood socioeconomic measure on memory performance and/or decline; however, all but one (father’s education by solute carrier family 24 member 4 [SLC24A4] in AA) were not significant after multiple testing correction (false discovery rate [FDR] &lt; .05). In trans-ethnic meta-analysis, 2 genes interacted with childhood socioeconomic context (FDR &lt; .05): mother’s education by membrane-spanning 4-domains A4A (MS4A4A) on memory performance, and father’s education by SLC24A4 on memory decline. Both interactions remained significant (p &lt; .05) after adjusting for respondent’s own educational attainment, apolipoprotein-ε4 allele (APOE ε4) status, lifestyle factors, body mass index, and comorbidities. For both interactions in EA and AA, the genetic effect was stronger in participants with low parental education. Conclusions Examination of common and rare variants in genes discovered through genome-wide association studies shows that childhood context may interact with key gene regions to jointly impact later-life memory function and decline. Genetic effects may be more salient for those with lower childhood socioeconomic status.


2018 ◽  
Author(s):  
Doug Speed ◽  
David J Balding

LD Score Regression (LDSC) has been widely applied to the results of genome-wide association studies. However, its estimates of SNP heritability are derived from an unrealistic model in which each SNP is expected to contribute equal heritability. As a consequence, LDSC tends to over-estimate confounding bias, under-estimate the total phenotypic variation explained by SNPs, and provide misleading estimates of the heritability enrichment of SNP categories. Therefore, we present SumHer, software for estimating SNP heritability from summary statistics using more realistic heritability models. After demonstrating its superiority over LDSC, we apply SumHer to the results of 24 large-scale association studies (average sample size 121 000). First we show that these studies have tended to substantially over-correct for confounding, and as a result the number of genome-wide significant loci has under-reported by about 20%. Next we estimate enrichment for 24 categories of SNPs defined by functional annotations. A previous study using LDSC reported that conserved regions were 13-fold enriched, and found a further twelve categories with above 2-fold enrichment. By contrast, our analysis using SumHer finds that conserved regions are only 1.6-fold (SD 0.06) enriched, and that no category has enrichment above 1.7-fold. SumHer provides an improved understanding of the genetic architecture of complex traits, which enables more efficient analysis of future genetic data.


2020 ◽  
Vol 117 (26) ◽  
pp. 15028-15035 ◽  
Author(s):  
Ronald Yurko ◽  
Max G’Sell ◽  
Kathryn Roeder ◽  
Bernie Devlin

To correct for a large number of hypothesis tests, most researchers rely on simple multiple testing corrections. Yet, new methodologies of selective inference could potentially improve power while retaining statistical guarantees, especially those that enable exploration of test statistics using auxiliary information (covariates) to weight hypothesis tests for association. We explore one such method, adaptiveP-value thresholding (AdaPT), in the framework of genome-wide association studies (GWAS) and gene expression/coexpression studies, with particular emphasis on schizophrenia (SCZ). Selected SCZ GWAS associationPvalues play the role of the primary data for AdaPT; single-nucleotide polymorphisms (SNPs) are selected because they are gene expression quantitative trait loci (eQTLs). This natural pairing of SNPs and genes allow us to map the following covariate values to these pairs: GWAS statistics from genetically correlated bipolar disorder, the effect size of SNP genotypes on gene expression, and gene–gene coexpression, captured by subnetwork (module) membership. In all, 24 covariates per SNP/gene pair were included in the AdaPT analysis using flexible gradient boosted trees. We demonstrate a substantial increase in power to detect SCZ associations using gene expression information from the developing human prefrontal cortex. We interpret these results in light of recent theories about the polygenic nature of SCZ. Importantly, our entire process for identifying enrichment and creating features with independent complementary data sources can be implemented in many different high-throughput settings to ultimately improve power.


2018 ◽  
Vol 21 (2) ◽  
pp. 84-88 ◽  
Author(s):  
W. David Hill

Intelligence and educational attainment are strongly genetically correlated. This relationship can be exploited by Multi-Trait Analysis of GWAS (MTAG) to add power to Genome-wide Association Studies (GWAS) of intelligence. MTAG allows the user to meta-analyze GWASs of different phenotypes, based on their genetic correlations, to identify association's specific to the trait of choice. An MTAG analysis using GWAS data sets on intelligence and education was conducted by Lam et al. (2017). Lam et al. (2017) reported 70 loci that they described as ‘trait specific’ to intelligence. This article examines whether the analysis conducted by Lam et al. (2017) has resulted in genetic information about a phenotype that is more similar to education than intelligence.


2016 ◽  
Author(s):  
Alicia R. Martin ◽  
Christopher R. Gignoux ◽  
Raymond K. Walters ◽  
Genevieve L. Wojcik ◽  
Benjamin M. Neale ◽  
...  

AbstractThe vast majority of genome-wide association studies are performed in Europeans, and their transferability to other populations is dependent on many factors (e.g. linkage disequilibrium, allele frequencies, genetic architecture). As medical genomics studies become increasingly large and diverse, gaining insights into population history and consequently the transferability of disease risk measurement is critical. Here, we disentangle recent population history in the widely-used 1000 Genomes Project reference panel, with an emphasis on populations underrepresented in medical studies. To examine the transferability of single-ancestry GWAS, we used published summary statistics to calculate polygenic risk scores for six well-studied traits and diseases. We identified directional inconsistencies in all scores; for example, height is predicted to decrease with genetic distance from Europeans, despite robust anthropological evidence that West Africans are as tall as Europeans on average. To gain deeper quantitative insights into GWAS transferability, we developed a complex trait coalescent-based simulation framework considering effects of polygenicity, causal allele frequency divergence, and heritability. As expected, correlations between true and inferred risk were typically highest in the population from which summary statistics were derived. We demonstrated that scores inferred from European GWAS were biased by genetic drift in other populations even when choosing the same causal variants, and that biases in any direction were possible and unpredictable. This work cautions that summarizing findings from large-scale GWAS may have limited portability to other populations using standard approaches, and highlights the need for generalized risk prediction methods and the inclusion of more diverse individuals in medical genomics.


2021 ◽  
Author(s):  
Alex N. Nguyen Ba ◽  
Katherine R. Lawrence ◽  
Artur Rego-Costa ◽  
Shreyas Gopalakrishnan ◽  
Daniel Temko ◽  
...  

Mapping the genetic basis of complex traits is critical to uncovering the biological mechanisms that underlie disease and other phenotypes. Genome-wide association studies (GWAS) in humans and quantitative trait locus (QTL) mapping in model organisms can now explain much of the observed heritability in many traits, allowing us to predict phenotype from genotype. However, constraints on power due to statistical confounders in large GWAS and smaller sample sizes in QTL studies still limit our ability to resolve numerous small-effect variants, map them to causal genes, identify pleiotropic effects across multiple traits, and infer non-additive interactions between loci (epistasis). Here, we introduce barcoded bulk quantitative trait locus (BB-QTL) mapping, which allows us to construct, genotype, and phenotype 100,000 offspring of a budding yeast cross, two orders of magnitude larger than the previous state of the art. We use this panel to map the genetic basis of eighteen complex traits, finding that the genetic architecture of these traits involves hundreds of small-effect loci densely spaced throughout the genome, many with widespread pleiotropic effects across multiple traits. Epistasis plays a central role, with thousands of interactions that provide insight into genetic networks. By dramatically increasing sample size, BB-QTL mapping demonstrates the potential of natural variants in high-powered QTL studies to reveal the highly polygenic, pleiotropic, and epistatic architecture of complex traits.Significance statementUnderstanding the genetic basis of important phenotypes is a central goal of genetics. However, the highly polygenic architectures of complex traits inferred by large-scale genome-wide association studies (GWAS) in humans stand in contrast to the results of quantitative trait locus (QTL) mapping studies in model organisms. Here, we use a barcoding approach to conduct QTL mapping in budding yeast at a scale two orders of magnitude larger than the previous state of the art. The resulting increase in power reveals the polygenic nature of complex traits in yeast, and offers insight into widespread patterns of pleiotropy and epistasis. Our data and analysis methods offer opportunities for future work in systems biology, and have implications for large-scale GWAS in human populations.


Sign in / Sign up

Export Citation Format

Share Document