Applications of Multifactor Dimensionality Reduction to Genome-Wide Data Using the R Package ‘MDR’

Data-adaptive multi-locus association testing in subjects with arbitrary genealogical relationships

Statistical Applications in Genetics and Molecular Biology ◽

10.1515/sagmb-2018-0030 ◽

2019 ◽

Vol 18 (3) ◽

Cited By ~ 1

Author(s):

Gail Gong ◽

Wei Wang ◽

Chih-Lin Hsieh ◽

David J. Van Den Berg ◽

Christopher Haiman ◽

...

Keyword(s):

Prostate Cancer ◽

R Package ◽

Suppressor Gene ◽

Test Statistic ◽

Specific Data ◽

Association Tests ◽

Association Testing ◽

Genome Wide ◽

Genome Wide Data ◽

Data Adaptive

Abstract Genome-wide sequencing enables evaluation of associations between traits and combinations of variants in genes and pathways. But such evaluation requires multi-locus association tests with good power, regardless of the variant and trait characteristics. And since analyzing families may yield more power than analyzing unrelated individuals, we need multi-locus tests applicable to both related and unrelated individuals. Here we describe such tests, and we introduce SKAT-X, a new test statistic that uses genome-wide data obtained from related or unrelated subjects to optimize power for the specific data at hand. Simulations show that: a) SKAT-X performs well regardless of variant and trait characteristics; and b) for binary traits, analyzing affected relatives brings more power than analyzing unrelated individuals, consistent with previous findings for single-locus tests. We illustrate the methods by application to rare unclassified missense variants in the tumor suppressor gene BRCA2, as applied to combined data from prostate cancer families and unrelated prostate cancer cases and controls in the Multi-ethnic Cohort (MEC). The methods can be implemented using open-source code for public use as the R-package GATARS (Genetic Association Tests for Arbitrarily Related Subjects) <https://gailg.github.io/gatars/>.

Download Full-text

Multifactor dimensionality reduction for graphics processing units enables genome-wide testing of epistasis in sporadic ALS

Bioinformatics ◽

10.1093/bioinformatics/btq009 ◽

2010 ◽

Vol 26 (5) ◽

pp. 694-695 ◽

Cited By ~ 48

Author(s):

Casey S. Greene ◽

Nicholas A. Sinnott-Armstrong ◽

Daniel S. Himmelstein ◽

Paul J. Park ◽

Jason H. Moore ◽

...

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Graphics Processing Units ◽

Genome Wide ◽

Sporadic Als ◽

Graphics Processing ◽

Wide Testing

Download Full-text

A new efficient method to detect genetic interactions for lung cancer GWAS

BMC Medical Genomics ◽

10.1186/s12920-020-00807-9 ◽

2020 ◽

Vol 13 (1) ◽

Author(s):

Jennifer Luyapan ◽

Xuemei Ji ◽

Siting Li ◽

Xiangjun Xiao ◽

Dakai Zhu ◽

...

Keyword(s):

Lung Cancer ◽

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Disease Onset ◽

Genetic Interactions ◽

Survival Outcomes ◽

Type I ◽

Genome Wide Association Studies ◽

Data Set ◽

Genome Wide

Abstract Background Genome-wide association studies (GWAS) have proven successful in predicting genetic risk of disease using single-locus models; however, identifying single nucleotide polymorphism (SNP) interactions at the genome-wide scale is limited due to computational and statistical challenges. We addressed the computational burden encountered when detecting SNP interactions for survival analysis, such as age of disease-onset. To confront this problem, we developed a novel algorithm, called the Efficient Survival Multifactor Dimensionality Reduction (ES-MDR) method, which used Martingale Residuals as the outcome parameter to estimate survival outcomes, and implemented the Quantitative Multifactor Dimensionality Reduction method to identify significant interactions associated with age of disease-onset. Methods To demonstrate efficacy, we evaluated this method on two simulation data sets to estimate the type I error rate and power. Simulations showed that ES-MDR identified interactions using less computational workload and allowed for adjustment of covariates. We applied ES-MDR on the OncoArray-TRICL Consortium data with 14,935 cases and 12,787 controls for lung cancer (SNPs = 108,254) to search over all two-way interactions to identify genetic interactions associated with lung cancer age-of-onset. We tested the best model in an independent data set from the OncoArray-TRICL data. Results Our experiment on the OncoArray-TRICL data identified many one-way and two-way models with a single-base deletion in the noncoding region of BRCA1 (HR 1.24, P = 3.15 × 10–15), as the top marker to predict age of lung cancer onset. Conclusions From the results of our extensive simulations and analysis of a large GWAS study, we demonstrated that our method is an efficient algorithm that identified genetic interactions to include in our models to predict survival outcomes.

Download Full-text

Spatial rank-based multifactor dimensionality reduction to detect gene–gene interactions for multivariate phenotypes

BMC Bioinformatics ◽

10.1186/s12859-021-04395-y ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Mira Park ◽

Hoe-Bin Jeong ◽

Jong-Hyun Lee ◽

Taesung Park

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Nonparametric Statistics ◽

Genome Wide Association ◽

Gene Interactions ◽

Genome Wide Association Studies ◽

Simulation Studies ◽

Skewed Distributions ◽

Genome Wide ◽

Evaluation Measure

Abstract Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR. Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.

Download Full-text

Abstract 2350: Efficient survival multifactor dimensionality reduction method for genome-wide association study

10.1158/1538-7445.am2018-2350 ◽

2018 ◽

Author(s):

Jiang Gui ◽

Xuemei Ji ◽

Christopher I. Amos

Keyword(s):

Dimensionality Reduction ◽

Association Study ◽

Multifactor Dimensionality Reduction ◽

Genome Wide Association Study ◽

Reduction Method ◽

Genome Wide Association ◽

Genome Wide ◽

Multifactor Dimensionality Reduction Method ◽

Dimensionality Reduction Method

Download Full-text

The R-package GenomicTools for multifactor dimensionality reduction and the analysis of (exploratory) Quantitative Trait Loci

Computer Methods and Programs in Biomedicine ◽

10.1016/j.cmpb.2017.08.012 ◽

2017 ◽

Vol 151 ◽

pp. 171-177 ◽

Cited By ~ 2

Author(s):

Daniel Fischer

Keyword(s):

Quantitative Trait Loci ◽

Dimensionality Reduction ◽

Quantitative Trait ◽

Multifactor Dimensionality Reduction ◽

R Package ◽

Trait Loci

Download Full-text

GWAS-GMDR: A program package for genome-wide scan of gene-gene interactions with covariate adjustment based on multifactor dimensionality reduction

2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW) ◽

10.1109/bibmw.2011.6112456 ◽

2011 ◽

Cited By ~ 1

Author(s):

Min-Seok Kwon ◽

Kyunga Kim ◽

Sungyoung Lee ◽

Wonil Chung ◽

Sung-Gon Yi ◽

...

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

Program Package ◽

Covariate Adjustment ◽

Gene Interactions ◽

Genome Wide ◽

Genome Wide Scan

Download Full-text

cuGWAM: Genome-wide Association Multifactor Dimensionality reduction using CUDA-enabled high-performance graphics processing unit

International Journal of Data Mining and Bioinformatics ◽

10.1504/ijdmb.2012.049301 ◽

2012 ◽

Vol 6 (5) ◽

pp. 471 ◽

Cited By ~ 5

Author(s):

Min Seok Kwon ◽

Kyunga Kim ◽

Sungyoung Lee ◽

Taesung Park

Keyword(s):

Dimensionality Reduction ◽

Multifactor Dimensionality Reduction ◽

High Performance ◽

Graphics Processing Unit ◽

Genome Wide Association ◽

Processing Unit ◽

Genome Wide ◽

Graphics Processing

Download Full-text

RIdeogram: drawing SVG graphics to visualize and map genome-wide data on the idiograms

PeerJ Computer Science ◽

10.7717/peerj-cs.251 ◽

2020 ◽

Vol 6 ◽

pp. e251 ◽

Cited By ~ 17

Author(s):

Zhaodong Hao ◽

Dekang Lv ◽

Ying Ge ◽

Jisen Shi ◽

Dolf Weijers ◽

...

Keyword(s):

Gc Content ◽

R Package ◽

Whole Genome ◽

Data Mapping ◽

Data Types ◽

Model Species ◽

Chromosomal Distribution ◽

Whole Genome Analysis ◽

Genome Wide ◽

Genome Wide Data

Background Owing to the rapid advances in DNA sequencing technologies, whole genome from more and more species are becoming available at increasing pace. For whole-genome analysis, idiograms provide a very popular, intuitive and effective way to map and visualize the genome-wide information, such as GC content, gene and repeat density, DNA methylation distribution, genomic synteny, etc. However, most available software programs and web servers are available only for a few model species, such as human, mouse and fly, or have limited application scenarios. As more and more non-model species are sequenced with chromosome-level assembly being available, tools that can generate idiograms for a broad range of species and be capable of visualizing more data types are needed to help better understanding fundamental genome characteristics. Results The R package RIdeogram allows users to build high-quality idiograms of any species of interest. It can map continuous and discrete genome-wide data on the idiograms and visualize them in a heat map and track labels, respectively. Conclusion The visualization of genome-wide data mapping and comparison allow users to quickly establish a clear impression of the chromosomal distribution pattern, thus making RIdeogram a useful tool for any researchers working with omics.

Download Full-text

Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch002 ◽

2011 ◽

pp. 17-30 ◽

Cited By ~ 21

Author(s):

Jason H. Moore

Keyword(s):

Dimensionality Reduction ◽

Dna Sequences ◽

Multifactor Dimensionality Reduction ◽

Human Genetics ◽

Interindividual Variation ◽

Human Populations ◽

Genome Wide Analysis ◽

Genome Wide ◽

Sequence Variations ◽

Using Data

Human genetics is an evolving discipline that is being driven by rapid advances in technologies that make it possible to measure enormous quantities of genetic information. An important goal of human genetics is to understand the mapping relationship between interindividual variation in DNA sequences (i.e., the genome) and variability in disease susceptibility (i.e., the phenotype). The focus of the present study is the detection and characterization of nonlinear interactions among DNA sequence variations in human populations using data mining and machine learning methods. We first review the concept difficulty and then review a multifactor dimensionality reduction (MDR) approach that was developed specifically for this domain. We then present some ideas about how to scale the MDR approach to datasets with thousands of attributes (i.e., genome-wide analysis). Finally, we end with some ideas about how nonlinear genetic models might be statistically interpreted to facilitate making biological inferences.

Download Full-text