Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text

Spatial rank-based multifactor dimensionality reduction to detect gene–gene interactions for multivariate phenotypes

BMC Bioinformatics ◽

10.1186/s12859-021-04395-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mira Park ◽

Hoe-Bin Jeong ◽

Jong-Hyun Lee ◽

Taesung Park

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Nonparametric Statistics ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Simulation Studies ◽

Skewed Distributions ◽

Genome Wide ◽

Evaluation Measure

Abstract Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR. Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.

Download Full-text

Abstract 2350: Efficient survival multifactor dimensionality reduction method for genome-wide association study

10.1158/1538-7445.am2018-2350 ◽

2018 ◽

Author(s):

Jiang Gui ◽

Xuemei Ji ◽

Christopher I. Amos

Keyword(s):

Dimensionality Reduction ◽

Association Study ◽

Multifactor Dimensionality Reduction ◽

Genome Wide Association Study ◽

Reduction Method ◽

Genome Wide Association ◽

Genome Wide ◽

Multifactor Dimensionality Reduction Method ◽

Dimensionality Reduction Method

Download Full-text

GWAS-GMDR: A program package for genome-wide scan of gene-gene interactions with covariate adjustment based on multifactor dimensionality reduction

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ◽

10.1109/bibmw.2011.6112456 ◽

2011 ◽

Cited By ~ 1

Author(s):

Min-Seok Kwon ◽

Kyunga Kim ◽

Sungyoung Lee ◽

Wonil Chung ◽

Sung-Gon Yi ◽

...

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Program Package ◽

Covariate Adjustment ◽

Gene Interactions ◽

Genome Wide ◽

Genome Wide Scan

Download Full-text

Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch002 ◽

2011 ◽

pp. 17-30 ◽

Cited By ~ 21

Author(s):

Jason H. Moore

Keyword(s):

Dimensionality Reduction ◽

Dna Sequences ◽

Multifactor Dimensionality Reduction ◽

Human Genetics ◽

Interindividual Variation ◽

Human Populations ◽

Genome Wide Analysis ◽

Genome Wide ◽

Sequence Variations ◽

Using Data

Human genetics is an evolving discipline that is being driven by rapid advances in technologies that make it possible to measure enormous quantities of genetic information. An important goal of human genetics is to understand the mapping relationship between interindividual variation in DNA sequences (i.e., the genome) and variability in disease susceptibility (i.e., the phenotype). The focus of the present study is the detection and characterization of nonlinear interactions among DNA sequence variations in human populations using data mining and machine learning methods. We first review the concept difficulty and then review a multifactor dimensionality reduction (MDR) approach that was developed specifically for this domain. We then present some ideas about how to scale the MDR approach to datasets with thousands of attributes (i.e., genome-wide analysis). Finally, we end with some ideas about how nonlinear genetic models might be statistically interpreted to facilitate making biological inferences.

Download Full-text

Accelerating Genome-Wide Association Studies Using CUDA Compatible Graphics Processing Units

2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing ◽

10.1109/ijcbs.2009.32 ◽

2009 ◽

Cited By ~ 5

Author(s):

Rui Jiang ◽

Feng Zeng ◽

Wangshu Zhang ◽

Xuebing Wu ◽

Zhihong Yu

Keyword(s):

Graphics Processing Units ◽

Association Studies ◽

Genome Wide Association ◽

Genome Wide Association Studies ◽

Genome Wide ◽

Graphics Processing

Download Full-text