multifactor dimensionality reduction
Recently Published Documents


TOTAL DOCUMENTS

173
(FIVE YEARS 39)

H-INDEX

25
(FIVE YEARS 3)

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Mira Park ◽  
Hoe-Bin Jeong ◽  
Jong-Hyun Lee ◽  
Taesung Park

Abstract Background Identifying interaction effects between genes is one of the main tasks of genome-wide association studies aiming to shed light on the biological mechanisms underlying complex diseases. Multifactor dimensionality reduction (MDR) is a popular approach for detecting gene–gene interactions that has been extended in various forms to handle binary and continuous phenotypes. However, only few multivariate MDR methods are available for multiple related phenotypes. Current approaches use Hotelling’s T2 statistic to evaluate interaction models, but it is well known that Hotelling’s T2 statistic is highly sensitive to heavily skewed distributions and outliers. Results We propose a robust approach based on nonparametric statistics such as spatial signs and ranks. The new multivariate rank-based MDR (MR-MDR) is mainly suitable for analyzing multiple continuous phenotypes and is less sensitive to skewed distributions and outliers. MR-MDR utilizes fuzzy k-means clustering and classifies multi-locus genotypes into two groups. Then, MR-MDR calculates a spatial rank-sum statistic as an evaluation measure and selects the best interaction model with the largest statistic. Our novel idea lies in adopting nonparametric statistics as an evaluation measure for robust inference. We adopt tenfold cross-validation to avoid overfitting. Intensive simulation studies were conducted to compare the performance of MR-MDR with current methods. Application of MR-MDR to a real dataset from a Korean genome-wide association study demonstrated that it successfully identified genetic interactions associated with four phenotypes related to kidney function. The R code for conducting MR-MDR is available at https://github.com/statpark/MR-MDR. Conclusions Intensive simulation studies comparing MR-MDR with several current methods showed that the performance of MR-MDR was outstanding for skewed distributions. Additionally, for symmetric distributions, MR-MDR showed comparable power. Therefore, we conclude that MR-MDR is a useful multivariate non-parametric approach that can be used regardless of the phenotype distribution, the correlations between phenotypes, and sample size.


2021 ◽  
Vol 2021 ◽  
pp. 1-8
Author(s):  
Jingjing Chen ◽  
Yuanfang Xu ◽  
Hongyuan Hu ◽  
Tianbo Jin

Background. Cervical cancer (CC) is the second most common tumor in women worldwide. Studies have been accepted that genetic variations play an important role in the development of CC. The aim of this study was to evaluate the impact of TCF7L1 variants on CC risk. Methods. 508 patients of cervical cancer and 497 healthy subjects were recruited to determine the impact of TCF7L1 polymorphisms on CC susceptibility. The associations were investigated by computing odds ratios (ORs) and 95% confidence intervals. The effect of SNP-SNP interactions on CC risk was explored by multifactor dimensionality reduction analysis. Results. Our study showed that rs11904127 (OR 0.79, p = 0.010 ) and rs62162674 (OR 0.82, p = 0.044 ) of TCF7L1 significantly decreased cervical cancer risk. Stratified analysis indicated that rs11904127 and rs62162674 present decreased susceptibility to CC in age > 51 years (OR 0.74, p = 0.019 ; OR 0.72, p = 0.014 , respectively). Haplotype analyses revealed that Grs2366264Trs11689667Crs62162674 has a lower risk to cervical cancer ( OR = 0.43 , p = 0.018 ). Besides, there is strong interaction of rs11904127 and rs2366264. Conclusion. Rs11904127 and rs62162674 in TCF7L1 are related to cervical cancer. We suggest that these variants can be used as prognostic markers for judging the susceptibility to cervical cancer.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Jung Wun Lee ◽  
Seungyeoun Lee

Abstract Background For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely employed to reduce multi-levels of gene-gene interactions into high- or low-risk groups using a binary attribute. For the survival phenotype, the Cox-MDR method has been proposed using a martingale residual of a Cox model since Surv-MDR was first proposed using a log-rank test statistic. Recently, the KM-MDR method was proposed using the Kaplan-Meier median survival time as a classifier. All three methods used the cross-validation procedure to identify single nucleotide polymorphism (SNP) using SNP interactions among all possible SNP pairs. Furthermore, these methods require the permutation test to verify the significance of the selected SNP pairs. However, the unified model-based multifactor dimensionality reduction method (UM-MDR) overcomes this shortcoming of MDR by unifying the significance testing with the MDR algorithm within the framework of the regression model. Neither cross-validation nor permutation testing is required to identify SNP by SNP interactions in the UM-MDR method. The UM-MDR method comprises two steps: in the first step, multi-level genotypes are classified into high- or low-risk groups, and an indicator variable for the high-risk group is defined. In the second step, the significance of the indicator variable of the high-risk group is tested in the regression model included with other adjusting covariates. The Cox-UMMDR method was recently proposed by combining Cox-MDR with UM-MDR to identify gene-gene interactions associated with the survival phenotype. In this study, we propose two simple methods either by combining KM-MDR with UM-MDR, called KM-UMMDR or by modifying Cox-UMMDR by adjusting for the covariate effect in step 1, rather than in step 2, a process called Cox2-UMMDR. The KM-UMMDR method allows the covariate effect to be adjusted for in the regression model of step 2, although KM-MDR cannot adjust for the covariate effect in the classification procedure of step 1. In contrast, Cox2-UMMDR differs from Cox-UMMDR in the sense that the martingale residuals are obtained from a Cox model by adjusting for the covariate effect in step 1 of Cox2-UMMDR whereas Cox-UMMDR adjusts for the covariate effect in the regression model in step 2. We performed simulation studies to compare the power of several methods such as KM-UMMDR, Cox-UMMDR, Cox2-UMMDR, Cox-MDR, and KM-MDR by considering the effect of covariates and the marginal effect of SNPs. We also analyzed a real example of Korean leukemia patient data for illustration and a short discussion is provided. Results In the simulation study, two different scenarios are considered: the first scenario compares the power of the cases with and without the covariate effect. The second scenario is to compare the power of cases with the main effect of SNPs versus without the main effect of SNPs. From the simulation results, Cox-UMMDR performs the best across all scenarios among KM-UMMDR, Cox2-UMMDR, Cox-MDR and KM-MDR. As expected, both Cox-UMMDR and Cox-MDR perform better than KM-UMMDR and KM-MDR when a covariate effect exists because the former adjusts for the covariate effect but the latter cannot. However, Cox2-UMMDR behaves similarly to KM-UMMDR and KM-MDR even though there is a covariate effect. This implies that the covariate effect would be more efficiently adjusted for in the regression model of the second step rather than under the classification procedure of the first step. When there is a main effect of any SNP, Cox-UMMDR, Cox2-UMMDR and KM-UMMDR perform better than Cox-MDR and KM-MDR if the main effects of SNPs are properly adjusted for in the regression model. From the simulation results of two different scenarios, Cox-UMMDR seems to be the most robust when there is either any covariate effect adjusting for or any SNP that has a main effect on the survival phenotype. In addition, the power of all methods decreased as the censoring fraction increased from 0.1 to 0.3, as heritability increased. The power of all methods seems to be greater under MAF = 0.2 than under MAF = 0.4. For illustration, both KM-UMMDR and Cox2-UMMDR were applied to identify SNP by SNP interactions with the survival phenotype to a real dataset of Korean leukemia patients. Conclusion Both KM-UMMDR and Cox2-UMMDR were easily implemented by combining KM-MDR and Cox-MDR with UM-MDR, respectively, to detect significant gene-gene interactions associated with survival time without cross-validation and permutation testing. The simulation results demonstrate the utility of KM-UMMDR, Cox2-UMMDR and Cox-UMMDR, which outperforms Cox-MDR and KM-MDR when some SNPs with only marginal effects might mask the detection of causal epistasis. In addition, Cox-UMMDR, Cox2-UMMDR and Cox-MDR performed better than KM-UMMDR and KM-MDR when there were potentially confounding covariate effects.


2021 ◽  
Vol 14 (1) ◽  
Author(s):  
Fentaw Abegaz ◽  
François Van Lishout ◽  
Jestinah M. Mahachie John ◽  
Kridsadakorn Chiachoompu ◽  
Archana Bhardwaj ◽  
...  

Abstract Background In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power. Conclusion We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.


2021 ◽  
Author(s):  
JungWun Lee ◽  
Seungyeoun Lee

Abstract Background: For gene-gene interaction analysis, the multifactor dimensionality reduction (MDR) method has been widely employed to reduce multi-levels of gene-gene interactions into high- or low-risk groups using a binary attribute. For the survival phenotype, the Cox-MDR method has been proposed using a martingale residual of a Cox model since Surv-MDR was first proposed using a log-rank test statistic. Recently, the KM-MDR method was proposed using the Kaplan-Meier median survival time as a classifier. All three methods used the cross-validation procedure to identify single nucleotide polymorphism (SNP) using SNP interactions among all possible SNP pairs. Furthermore, these methods require the permutation test to verify the significance of the selected SNP pairs. However, the unified model-based multifactor dimensionality reduction method (UM-MDR) overcomes this shortcoming of MDR by unifying the significance testing with the MDR algorithm within the framework of the regression model. Neither cross-validation nor permutation testing is required to identify SNP by SNP interactions in the UM-MDR method. The UM-MDR method comprises two steps: in the first step, multi-level genotypes are classified into high- or low-risk groups, and an indicator variable for the high-risk group is defined. In the second step, the significance of the indicator variable of the high-risk group is tested in the regression model included with other adjusting covariates. The Cox-UMMDR method was recently proposed by combining Cox-MDR with UM-MDR to identify gene-gene interactions associated with the survival phenotype. In this study, we propose two simple methods either by combining KM-MDR with UM-MDR, called KM-UMMDR or by modifying Cox-UMMDR by adjusting for the covariate effect in step 1, rather than in step 2, a process called Cox2-UMMDR. The KM-UMMDR method allows the covariate effect to be adjusted for in the regression model of step 2, although KM-MDR cannot adjust for the covariate effect in the classification procedure of step 1. In contrast, Cox2-UMMDR differs from Cox-UMMDR in the sense that the martingale residuals are obtained from a Cox model by adjusting for the covariate effect in step 1 of Cox2-UMMDR whereas Cox-UMMDR adjusts for the covariate effect in the regression model in step 2. We performed simulation studies to compare the power of several methods such as KM-UMMDR, Cox-UMMDR, Cox2-UMMDR, Cox-MDR, and KM-MDR by considering the effect of covariates and the marginal effect of SNPs. We also analyzed a real example of Korean leukemia patient data for illustration and a short discussion is provided.Results: In the simulation study, two different scenarios are considered: the first scenario compares the power of the cases with and without the covariate effect. The second scenario is to compare the power of cases with the main effect of SNPs versus without the main effect of SNPs. From the simulation results, Cox-UMMDR performs the best across all scenarios among KM-UMMDR, Cox2-UMMDR, Cox-MDR and KM-MDR. As expected, both Cox-UMMDR and Cox-MDR perform better than KM-UMMDR and KM-MDR when a covariate effect exists because the former adjusts for the covariate effect but the latter cannot. However, Cox2-UMMDR behaves similarly to KM-UMMDR and KM-MDR even though there is a covariate effect. This implies that the covariate effect would be more efficiently adjusted for in the regression model of the second step rather than under the classification procedure of the first step. When there is a main effect of any SNP, Cox-UMMDR, Cox2-UMMDR and KM-UMMDR perform better than Cox-MDR and KM-MDR if the main effects of SNPs are properly adjusted for in the regression model. From the simulation results of two different scenarios, Cox-UMMDR seems to be the most robust when there is either any covariate effect adjusting for or any SNP that has a main effect on the survival phenotype. In addition, the power of all methods decreased as the censoring fraction increased from 0.1 to 0.3, as heritability increased. The power of all methods seems to be greater under MAF = 0.2 than under MAF = 0.4. For illustration, both KM-UMMDR and Cox2-UMMDR were applied to identify SNP by SNP interactions with the survival phenotype to a real dataset of Korean leukemia patients.Conclusion: Both KM-UMMDR and Cox2-UMMDR were easily implemented by combining KM-MDR and Cox-MDR with UM-MDR, respectively, to detect significant gene-gene interactions associated with survival time without cross-validation and permutation testing. The simulation results demonstrate the utility of KM-UMMDR, Cox2-UMMDR and Cox-UMMDR, which outperforms Cox-MDR and KM-MDR when some SNPs with only marginal effects might mask the detection of causal epistasis. In addition, Cox-UMMDR, Cox2-UMMDR and Cox-MDR performed better than KM-UMMDR and KM-MDR when there were potentially confounding covariate effects.


2021 ◽  
Author(s):  
Fentaw Abegaz ◽  
Francois van Lishout ◽  
Jestinah M. Mahachie John ◽  
Kridsadakorn Chiachoompu ◽  
Archana Bhjardwa ◽  
...  

Abstract Background: In genome-wide association studies the extent and impact of confounding due to population structure have been well recognized. Inadequate handling of such confounding is likely to lead to spurious associations, hampering replication, and the identification of causal variants. Several strategies have been developed for protecting associations against confounding, the most popular one is based on Principal Component Analysis. In contrast, the extent and impact of confounding due to population structure in gene-gene interaction association epistasis studies are much less investigated and understood. In particular, the role of nonlinear genetic population substructure in epistasis detection is largely under-investigated, especially outside a regression framework. Methods: To identify causal variants in synergy, to improve interpretability and replicability of epistasis results, we introduce three strategies based on a model-based multifactor dimensionality reduction approach for structured populations, namely MBMDR-PC, MBMDR-PG, and MBMDR-GC. Results: Simulation results comparing the performance of various approaches show that in the presence of population structure MBMDR-PC and MBMDR-PG consistently better control type I error rate at the nominal level than MBMDR-GC. Moreover, our proposed three methods of population structure correction outperform MDR-SP in terms of statistical power.Conclusion: We demonstrate through extensive simulation studies the effect of various degrees of genetic population structure and relatedness on epistasis detection and propose appropriate remedial measures based on linear and nonlinear sample genetic similarity.


Sign in / Sign up

Export Citation Format

Share Document