GenEpi: Gene-based Epistasis Discovery Using Machine Learning

AbstractGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer’s disease (AD). In this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power. Availability: GenEpi is an open-source python package and available free of charge only for non-commercial users. The package can be downloaded from https://github.com/Chester75321/GenEpi, and has also been published on The Python Package Index.

Download Full-text

Penalized partial least squares for pleiotropy

BMC Bioinformatics ◽

10.1186/s12859-021-03968-1 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Camilo Broc ◽

Therese Truong ◽

Benoit Liquet

Keyword(s):

Least Squares ◽

Partial Least Squares ◽

Association Studies ◽

A Priori ◽

Simulated Data ◽

Real Data ◽

Genome Wide Association Studies ◽

Genetic Associations ◽

Multiple Traits ◽

Application Fields

Abstract Background The increasing number of genome-wide association studies (GWAS) has revealed several loci that are associated to multiple distinct phenotypes, suggesting the existence of pleiotropic effects. Highlighting these cross-phenotype genetic associations could help to identify and understand common biological mechanisms underlying some diseases. Common approaches test the association between genetic variants and multiple traits at the SNP level. In this paper, we propose a novel gene- and a pathway-level approach in the case where several independent GWAS on independent traits are available. The method is based on a generalization of the sparse group Partial Least Squares (sgPLS) to take into account groups of variables, and a Lasso penalization that links all independent data sets. This method, called joint-sgPLS, is able to convincingly detect signal at the variable level and at the group level. Results Our method has the advantage to propose a global readable model while coping with the architecture of data. It can outperform traditional methods and provides a wider insight in terms of a priori information. We compared the performance of the proposed method to other benchmark methods on simulated data and gave an example of application on real data with the aim to highlight common susceptibility variants to breast and thyroid cancers. Conclusion The joint-sgPLS shows interesting properties for detecting a signal. As an extension of the PLS, the method is suited for data with a large number of variables. The choice of Lasso penalization copes with architectures of groups of variables and observations sets. Furthermore, although the method has been applied to a genetic study, its formulation is adapted to any data with high number of variables and an exposed a priori architecture in other application fields.

Download Full-text

Bias in two-sample Mendelian randomization when using heritable covariable-adjusted summary associations

International Journal of Epidemiology ◽

10.1093/ije/dyaa266 ◽

2021 ◽

Author(s):

Fernando Pires Hartwig ◽

Kate Tilling ◽

George Davey Smith ◽

Deborah A Lawlor ◽

Maria Carolina Borges

Keyword(s):

Waist Circumference ◽

Genetic Variants ◽

Mendelian Randomization ◽

Causal Effect ◽

Association Studies ◽

Real Data ◽

Sensitivity Analyses ◽

Effect Estimate ◽

Genome Wide Association Studies ◽

Residual Confounding

Abstract Background Two-sample Mendelian randomization (MR) allows the use of freely accessible summary association results from genome-wide association studies (GWAS) to estimate causal effects of modifiable exposures on outcomes. Some GWAS adjust for heritable covariables in an attempt to estimate direct effects of genetic variants on the trait of interest. One, both or neither of the exposure GWAS and outcome GWAS may have been adjusted for covariables. Methods We performed a simulation study comprising different scenarios that could motivate covariable adjustment in a GWAS and analysed real data to assess the influence of using covariable-adjusted summary association results in two-sample MR. Results In the absence of residual confounding between exposure and covariable, between exposure and outcome, and between covariable and outcome, using covariable-adjusted summary associations for two-sample MR eliminated bias due to horizontal pleiotropy. However, covariable adjustment led to bias in the presence of residual confounding (especially between the covariable and the outcome), even in the absence of horizontal pleiotropy (when the genetic variants would be valid instruments without covariable adjustment). In an analysis using real data from the Genetic Investigation of ANthropometric Traits (GIANT) consortium and UK Biobank, the causal effect estimate of waist circumference on blood pressure changed direction upon adjustment of waist circumference for body mass index. Conclusions Our findings indicate that using covariable-adjusted summary associations in MR should generally be avoided. When that is not possible, careful consideration of the causal relationships underlying the data (including potentially unmeasured confounders) is required to direct sensitivity analyses and interpret results with appropriate caution.

Download Full-text

Assocplots: a python package for static and interactive visualization of multiple-group GWAS results

10.1101/062737 ◽

2016 ◽

Author(s):

Ekaterina A Khramtsova ◽

Barbara E. Stranger

Keyword(s):

Open Source ◽

Data Visualization ◽

Genetic Variants ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Multiple Group ◽

Genome Wide ◽

Quantile Plots ◽

Python Package

AbstractSummaryOver the last decade, genome-wide association studies (GWAS) have generated vast amounts of analysis results, requiring development of novel tools for data visualization. Quantile-quantile plots and Manhattan plots are classical tools which have been utilized to visually summarize GWAS results and identify genetic variants significantly associated with traits of interest. However, static visualizations are limiting in the information that can be shown. Here we present Assocplots, a python package for viewing and exploring GWAS results not only using classic static Manhattan and quantile-quantile plots, but also through a dynamic extension which allows to visualize data interactively, and to visualize the relationships between GWAS results from multiple cohorts or studies.AvailabilityThe Assocplots package is open source and distributed under the MIT license via GitHub (https://github.com/khramts/assocplots) along with examples, documentation and installation [email protected], [email protected]

Download Full-text

MARS: leveraging allelic heterogeneity to increase power of association testing

Genome Biology ◽

10.1186/s13059-021-02353-8 ◽

2021 ◽

Vol 22 (1) ◽

Cited By ~ 1

Author(s):

Farhad Hormozdiari ◽

Junghyun Jung ◽

Eleazar Eskin ◽

Jong Wha J. Joo

Keyword(s):

Type I Error ◽

Association Studies ◽

Simulated Data ◽

Real Data ◽

Association Test ◽

Type I ◽

Genome Wide Association Studies ◽

Association Testing ◽

Causal Status ◽

Causal Variants

AbstractIn standard genome-wide association studies (GWAS), the standard association test is underpowered to detect associations between loci with multiple causal variants with small effect sizes. We propose a statistical method, Model-based Association test Reflecting causal Status (MARS), that finds associations between variants in risk loci and a phenotype, considering the causal status of variants, only requiring the existing summary statistics to detect associated risk loci. Utilizing extensive simulated data and real data, we show that MARS increases the power of detecting true associated risk loci compared to previous approaches that consider multiple variants, while controlling the type I error.

Download Full-text

RAISS: robust and accurate imputation from summary statistics

Bioinformatics ◽

10.1093/bioinformatics/btz466 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4837-4839 ◽

Cited By ~ 1

Author(s):

Hanna Julienne ◽

Huwenbo Shi ◽

Bogdan Pasaniuc ◽

Hugues Aschard

Keyword(s):

Effect Size ◽

Association Studies ◽

Real Data ◽

Supplementary Information ◽

P Value ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Genome Wide ◽

Small Effect Size ◽

Python Package

Abstract Motivation Multi-trait analyses using public summary statistics from genome-wide association studies (GWASs) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. Although methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses. Results We fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for variants of all effect sizes on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel. Availability and implementation The python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Identifying novel genetic variants for brain amyloid deposition: a genome-wide association study in the Korean population

Alzheimer s Research & Therapy ◽

10.1186/s13195-021-00854-z ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Hang-Rai Kim ◽

Sang-Hyuk Jung ◽

Jaeho Kim ◽

Hyemin Jang ◽

Sung Hoon Kang ◽

...

Keyword(s):

Genetic Variants ◽

Prediction Models ◽

Association Studies ◽

Genome Wide Association ◽

Korean Population ◽

European Ancestry ◽

Eqtl Analysis ◽

Genome Wide Association Studies ◽

Genome Wide ◽

A Genome

Abstract Background Genome-wide association studies (GWAS) have identified a number of genetic variants for Alzheimer’s disease (AD). However, most GWAS were conducted in individuals of European ancestry, and non-European populations are still underrepresented in genetic discovery efforts. Here, we performed GWAS to identify single nucleotide polymorphisms (SNPs) associated with amyloid β (Aβ) positivity using a large sample of Korean population. Methods One thousand four hundred seventy-four participants of Korean ancestry were recruited from multicenters in South Korea. Discovery dataset consisted of 1190 participants (383 with cognitively unimpaired [CU], 330 with amnestic mild cognitive impairment [aMCI], and 477 with AD dementia [ADD]) and replication dataset consisted of 284 participants (46 with CU, 167 with aMCI, and 71 with ADD). GWAS was conducted to identify SNPs associated with Aβ positivity (measured by amyloid positron emission tomography). Aβ prediction models were developed using the identified SNPs. Furthermore, bioinformatics analysis was conducted for the identified SNPs. Results In addition to APOE, we identified nine SNPs on chromosome 7, which were associated with a decreased risk of Aβ positivity at a genome-wide suggestive level. Of these nine SNPs, four novel SNPs (rs73375428, rs2903923, rs3828947, and rs11983537) were associated with a decreased risk of Aβ positivity (p < 0.05) in the replication dataset. In a meta-analysis, two SNPs (rs7337542 and rs2903923) reached a genome-wide significant level (p < 5.0 × 10−8). Prediction performance for Aβ positivity increased when rs73375428 were incorporated (area under curve = 0.75; 95% CI = 0.74–0.76) in addition to clinical factors and APOE genotype. Cis-eQTL analysis demonstrated that the rs73375428 was associated with decreased expression levels of FGL2 in the brain. Conclusion The novel genetic variants associated with FGL2 decreased risk of Aβ positivity in the Korean population. This finding may provide a candidate therapeutic target for AD, highlighting the importance of genetic studies in diverse populations.

Download Full-text

RAISS: Robust and Accurate imputation from Summary Statistics

10.1101/502880 ◽

2018 ◽

Cited By ~ 1

Author(s):

Hanna Julienne ◽

Huwenbo Shi ◽

Bogdan Pasaniuc ◽

Hugues Aschard

Keyword(s):

Effect Size ◽

Association Studies ◽

Real Data ◽

Statistical Genetics ◽

P Value ◽

Genome Wide Association Studies ◽

Summary Statistics ◽

Link Type ◽

Small Effect Size ◽

Python Package

AbstractMotivationMulti-trait analyses using public summary statistics from genome-wide association studies (GWAS) are becoming increasingly popular. A constraint of multi-trait methods is that they require complete summary data for all traits. While methods for the imputation of summary statistics exist, they lack precision for genetic variants with small effect size. This is benign for univariate analyses where only variants with large effect size are selected a posteriori. However, it can lead to strong p-value inflation in multi-trait testing. Here we present a new approach that improve the existing imputation methods and reach a precision suitable for multi-trait analyses.ResultsWe fine-tuned parameters to obtain a very high accuracy imputation from summary statistics. We demonstrate this accuracy for small size-effect variants on real data of 28 GWAS. We implemented the resulting methodology in a python package specially designed to efficiently impute multiple GWAS in parallel.AvailabilityThe python package is available at: https://gitlab.pasteur.fr/statistical-genetics/raiss, its accompanying documentation is accessible here http://statistical-genetics.pages.pasteur.fr/raiss/[email protected]

Download Full-text

Migraine: Genetic Variants and Clinical Phenotypes

Current Medicinal Chemistry ◽

10.2174/0929867325666180719120215 ◽

2019 ◽

Vol 26 (34) ◽

pp. 6207-6221 ◽

Cited By ~ 1

Author(s):

Innocenzo Rainero ◽

Alessandro Vacca ◽

Flora Govone ◽

Annalisa Gai ◽

Lorenzo Pinessi ◽

...

Keyword(s):

Genetic Variants ◽

Association Studies ◽

Mthfr Gene ◽

Genome Wide Association Studies ◽

Clinical Phenotypes ◽

Network Analyses ◽

Genome Wide ◽

Genomic Studies ◽

The Common ◽

The Relationship

Migraine is a common, chronic neurovascular disorder caused by a complex interaction between genetic and environmental risk factors. In the last two decades, molecular genetics of migraine have been intensively investigated. In a few cases, migraine is transmitted as a monogenic disorder, and the disease phenotype cosegregates with mutations in different genes like CACNA1A, ATP1A2, SCN1A, KCNK18, and NOTCH3. In the common forms of migraine, candidate genes as well as genome-wide association studies have shown that a large number of genetic variants may increase the risk of developing migraine. At present, few studies investigated the genotype-phenotype correlation in patients with migraine. The purpose of this review was to discuss recent studies investigating the relationship between different genetic variants and the clinical characteristics of migraine. Analysis of genotype-phenotype correlations in migraineurs is complicated by several confounding factors and, to date, only polymorphisms of the MTHFR gene have been shown to have an effect on migraine phenotype. Additional genomic studies and network analyses are needed to clarify the complex pathways underlying migraine and its clinical phenotypes.

Download Full-text

A comprehensive evaluation of methods for Mendelian randomization using realistic simulations and an analysis of 38 biomarkers for risk of type 2 diabetes

International Journal of Epidemiology ◽

10.1093/ije/dyaa262 ◽

2021 ◽

Author(s):

Guanghao Qi ◽

Nilanjan Chatterjee

Keyword(s):

Type 2 Diabetes ◽

Mendelian Randomization ◽

Association Studies ◽

Real Data ◽

Causal Effects ◽

Type I ◽

Genome Wide Association Studies ◽

Simulation Studies ◽

Sample Sizes

Abstract Background Previous studies have often evaluated methods for Mendelian randomization (MR) analysis based on simulations that do not adequately reflect the data-generating mechanisms in genome-wide association studies (GWAS) and there are often discrepancies in the performance of MR methods in simulations and real data sets. Methods We use a simulation framework that generates data on full GWAS for two traits under a realistic model for effect-size distribution coherent with the heritability, co-heritability and polygenicity typically observed for complex traits. We further use recent data generated from GWAS of 38 biomarkers in the UK Biobank and performed down sampling to investigate trends in estimates of causal effects of these biomarkers on the risk of type 2 diabetes (T2D). Results Simulation studies show that weighted mode and MRMix are the only two methods that maintain the correct type I error rate in a diverse set of scenarios. Between the two methods, MRMix tends to be more powerful for larger GWAS whereas the opposite is true for smaller sample sizes. Among the other methods, random-effect IVW (inverse-variance weighted method), MR-Robust and MR-RAPS (robust adjust profile score) tend to perform best in maintaining a low mean-squared error when the InSIDE assumption is satisfied, but can produce large bias when InSIDE is violated. In real-data analysis, some biomarkers showed major heterogeneity in estimates of their causal effects on the risk of T2D across the different methods and estimates from many methods trended in one direction with increasing sample size with patterns similar to those observed in simulation studies. Conclusion The relative performance of different MR methods depends heavily on the sample sizes of the underlying GWAS, the proportion of valid instruments and the validity of the InSIDE assumption. Down-sampling analysis can be used in large GWAS for the possible detection of bias in the MR methods.

Download Full-text

Case–Parent Trio Studies in Cleft Lip and Palate

Global Medical Genetics ◽

10.1055/s-0040-1722097 ◽

2020 ◽

Vol 07 (03) ◽

pp. 075-079

Author(s):

Mahamad Irfanulla Khan ◽

Prashanth CS

Keyword(s):

Genetic Variants ◽

Cleft Lip ◽

Cleft Lip And Palate ◽

Association Studies ◽

Geographical Location ◽

Genetic Association Studies ◽

Population Based ◽

Genome Wide Association Studies ◽

Family Based ◽

Study Designs

AbstractCleft lip with or without cleft palate (CL/P) is one of the most common congenital malformations in humans involving various genetic and environmental risk factors. The prevalence of CL/P varies according to geographical location, ethnicity, race, gender, and socioeconomic status, affecting approximately 1 in 800 live births worldwide. Genetic studies aim to understand the mechanisms contributory to a phenotype by measuring the association between genetic variants and also between genetic variants and phenotype population. Genome-wide association studies are standard tools used to discover genetic loci related to a trait of interest. Genetic association studies are generally divided into two main design types: population-based studies and family-based studies. The epidemiological population-based studies comprise unrelated individuals that directly compare the frequency of genetic variants between (usually independent) cases and controls. The alternative to population-based studies (case–control designs) includes various family-based study designs that comprise related individuals. An example of such a study is a case–parent trio design study, which is commonly employed in genetics to identify the variants underlying complex human disease where transmission of alleles from parents to offspring is studied. This article describes the fundamentals of case–parent trio study, trio design and its significances, statistical methods, and limitations of the trio studies.

Download Full-text