scholarly journals A novel nonparametric computational strategy for identifying differential methylation regions

2022 ◽  
Vol 23 (S1) ◽  
Author(s):  
Xifang Sun ◽  
Donglin Wang ◽  
Jiaqiang Zhu ◽  
Shiquan Sun

Abstract Background DNA methylation has long been known as an epigenetic gene silencing mechanism. For a motivating example, the methylomes of cancer and non-cancer cells show a number of methylation differences, indicating that certain features characteristics of cancer cells may be related to methylation characteristics. Robust methods for detecting differentially methylated regions (DMRs) could help scientists narrow down genome regions and even find biologically important regions. Although some statistical methods were developed for detecting DMR, there is no default or strongest method. Fisher’s exact test is direct, but not suitable for data with multiple replications, while regression-based methods usually come with a large number of assumptions. More complicated methods have been proposed, but those methods are often difficult to interpret. Results In this paper, we propose a three-step nonparametric kernel smoothing method that is both flexible and straightforward to implement and interpret. The proposed method relies on local quadratic fitting to find the set of equilibrium points (points at which the first derivative is 0) and the corresponding set of confidence windows. Potential regions are further refined using biological criteria, and finally selected based on a Bonferroni adjusted t-test cutoff. Using a comparison of three senescent and three proliferating cell lines to illustrate our method, we were able to identify a total of 1077 DMRs on chromosome 21. Conclusions We proposed a completely nonparametric, statistically straightforward, and interpretable method for detecting differentially methylated regions. Compared with existing methods, the non-reliance on model assumptions and the straightforward nature of our method makes it one competitive alternative to the existing statistical methods for defining DMRs.

2017 ◽  
Vol 01 (01) ◽  
pp. 1630013 ◽  
Author(s):  
Suneel Babu Chatla ◽  
Chun-Houh Chen ◽  
Galit Shmueli

The field of computational statistics refers to statistical methods or tools that are computationally intensive. Due to the recent advances in computing power, some of these methods have become prominent and central to modern data analysis. In this paper, we focus on several of the main methods including density estimation, kernel smoothing, smoothing splines, and additive models. While the field of computational statistics includes many more methods, this paper serves as a brief introduction to selected popular topics.


2007 ◽  
Vol 25 (18_suppl) ◽  
pp. 6027-6027 ◽  
Author(s):  
Z. Guo ◽  
Z. Chen ◽  
Z. Yang ◽  
L. Schumaker ◽  
K. J. Cullen

6027 Background: Resistance of cancer cells to cisplatin and its analogues is the major limitation in clinical application of cisplatin-based chemotherapy. The mechanisms by which cancer cells develop resistance to the drugs are still unclear, and there is no way currently to predict the drug resistance of individual tumors. By genome-wide scanning of hypermethylated genes on head and neck cancer cells, we identified glutathione peroxidase 3 (GPX3) as one of the strong candidates whose promoter hypermethylation may be associated with head and neck chemoresistance. In this study, we investigated the potential predictive value of GPX3 methylation for head and neck cancer chemoresistance and patient prognosis. Methods: Promoter methylation and expression of GPX3 gene in head and neck cancer cell lines were examined by plasmid cloning, bisulfite DNA sequencing, reverse transcription-PCR and Western blot. GPX3 methylation in primary cancer tissues was assessed by real-time methylation-specific PCR (MSP). Forty-six head and neck cancer cases, for which chemotherapy response and survival were known, were selected for analysis. Correlation of GPX3 methylation and chemoresistance was tested using two-sided Fisher’s Exact Test and its prediction for patient survival was assessed using Kaplan-Meier survival analysis. Results: Loss of GPX3 expression was observed in 4 of 8 head and neck cancer cell lines and was consistent with cisplatin resistance. Demethylating treatment of the cell lines negative for GPX3 expression significantly restored its expression. Bisulfite DNA sequencing showed that the 5’ flanking promoter region of GPX3 was heavily hypermethylated in all cell lines with expression-silencing of the gene. In the 46 head and neck cancer cases analyzed by MSP, 15 of 23 non-responding cases (65%) showed GPX3 methylation, while 4 of 23 complete and partial response cases (17%) contained low levels of GPX3 methylation (Relative Risk 3.343, two sided Fisher’s exact test, P=0.002). Kaplan-Meier survival analysis showed a relative risk of death of 1.942 in patients with GPX3 methylation. Conclusions: Our findings suggest that GPX3 methylation is a strong candidate predictor for chemoresistance and prognosis of head and neck cancer patients. No significant financial relationships to disclose.


This paper is aimed to analyze the feature selection process based on different statistical methods viz., Correlation, Gain Ratio, Information gain, OneR, Chi-square MapReduce model, Fisher’s exact test for agricultural data. During the recent past, Fishers exact test was commonly used for feature selection process. However, it supports only for small data set. To handle large data set, the Chi square, one of the most popular statistical methods is used. But, it also finds irrelevant data and thus resultant accuracy is not as expected. As a novelty, Fisher’s exact test is combined with Map Reduce model to handle large data set. In addition, the simulation outcome proves that proposed fisher’s exact test finds the significant attributes with more accurate and reduced time complexity when compared to other existing methods.


2008 ◽  
Vol 2 ◽  
pp. BBI.S431 ◽  
Author(s):  
Angelica Lindlöf ◽  
Marcus Bräutigam ◽  
Aakash Chawade ◽  
Olof Olsson ◽  
Björn Olsson

The detection of differentially expressed genes from EST data is of importance for the discovery of potential biological or pharmaceutical targets, especially when studying biological processes in less characterized organisms and where large-scale microarrays are not an option. We present a comparison of five different statistical methods for identifying up-regulated genes through pairwise comparison of EST sets, where one of the sets is generated from a treatment and the other one serves as a control. In addition, we specifically address situations where the sets are relatively small (~2,000– 10,000 ESTs) and may differ in size. The methods were tested on both simulated and experimentally derived data, and compared to a collection of cold stress induced genes identified by microarrays. We found that combining the method proposed by Audic and Claverie with Fisher's exact test and a method based on calculating the difference in relative frequency was the best combination for maximizing the detection of up-regulated genes. We also introduced the use of a flexible cutoff, which takes the size of the EST sets into consideration. This could be considered as an alternative to a static cutoff. Finally, the detected genes showed a low overlap with those identified by microarrays, which indicates, as in previous studies, low overall concordance between the two platforms.


Author(s):  
Ryo Okui ◽  
Takahide Yanagi

Abstract This paper proposes nonparametric kernel-smoothing estimation for panel data to examine the degree of heterogeneity across cross-sectional units. We first estimate the sample mean, autocovariances and autocorrelations for each unit and then apply kernel smoothing to compute their density functions. The dependence of the kernel estimator on bandwidth makes asymptotic bias of very high order affect the required condition on the relative magnitudes of the cross-sectional sample size (N) and the time-series length (T). In particular, it makes the condition on N and T stronger and more complicated than those typically observed in the long-panel literature without kernel smoothing. We also consider a split-panel jackknife method to correct bias and construction of confidence intervals. An empirical application illustrates our procedure.


2021 ◽  
Author(s):  
Viivi Halla-aho ◽  
Harri Lähdesmäki

Background: cfMeDIP-seq is a low-cost method for determining the DNA methylation status of cell-free DNA and it has been successfully combined with statistical methods for accurate cancer diagnostics. We investigate the diagnostic classification aspect by applying statistical tests and dimension reduction techniques for feature selection and probabilistic modeling for the cancer type classification, and we also study the effect of sequencing depth. Methods: We experiment with a variety of statistical methods that use different feature selection and feature extraction methods as well as probabilistic classifiers for diagnostic decision making. We test the (moderated) t-tests and the Fisher's exact test for feature selection, principal component analysis (PCA) as well as iterative supervised PCA (ISPCA) for feature generation, and GLMnet and logistic regression methods with sparsity promoting priors for classification. Probabilistic programming language Stan is used to implement Bayesian inference for the probabilistic models. Results and conclusions: We compare overlaps of differentially methylated genomic regions as chosen by different feature selection methods, and evaluate probabilistic classifiers by evaluating the area under the receiver operating characteristic (AUROC) scores on discovery and validation cohorts. While we observe that many methods perform equally well as, and occasionally considerably better than, GLMnet that was originally proposed for cfMeDIP-seq based cancer classification, we also observed that performance of different methods vary across sequencing depths, cancer types and study cohorts. Overall, methods that seem robust and promising include Fisher's exact test and ISPCA for feature selection as well as a simple logistic regression model with the number of hyper and hypo-methylated regions as features.


Sign in / Sign up

Export Citation Format

Share Document