scholarly journals Detecting Differential Variable microRNAs via Model-Based Clustering

2018 ◽  
Author(s):  
Xuan Li ◽  
Yuejiao Fu ◽  
Xiaogang Wang ◽  
Dawn L. DeMeo ◽  
Kelan Tantisira ◽  
...  

ABSTRACTIdentifying genomic probes (e.g., DNA methylation marks) is becoming a new approach to detect novel genomic risk factors for complex human diseases. The F test is the standard equal-variance test in Statistics. For high-throughput genomic data, the probe-wise F test has been successfully used to detect biologically relevant DNA methylation marks that have different variances between two groups of subjects (e.g., cases vs. controls). In addition to DNA methylation, microRNA is another mechanism of epigenetics. However, to the best of our knowledge, no studies have identified differentially variable (DV) microRNAs. In this article, we proposed a novel model-based clustering to improve the power of the probe-wise F test to detect DV microRNAs. We imposed special structures on covariance matrices for each cluster of microRNAs based on the prior information about the relationship between variance in cases and variance in controls and about the independence among cases and controls. To the best of our knowledge, the proposed method is the first clustering algorithm that aims to detect DV genomic probes. Simulation studies showed that the proposed method outperformed the probe-wise F test and had certain robustness to the violation of the normality assumption. Based on two real datasets about human hepatocellular carcinoma (HCC), we identified 7 DV-only microRNAs (hsa-miR-1826, hsa-miR-191, hsa-miR-194-star, hsa-miR-222, hsa-miR-502-3p, hsa-miR-93, and hsa-miR-99b) using the proposed method, one (hsa-miR-1826) of which has not yet been reported to relate to HCC in the literature.

2018 ◽  
Vol 2018 ◽  
pp. 1-9
Author(s):  
Xuan Li ◽  
Yuejiao Fu ◽  
Xiaogang Wang ◽  
Dawn L. DeMeo ◽  
Kelan Tantisira ◽  
...  

Identifying differentially variable (DV) genomic probes is becoming a new approach to detect novel genomic risk factors for complex human diseases. The F test is the standard equal-variance test in statistics. For high-throughput genomic data, the probe-wise F test has been successfully used to detect biologically relevant DNA methylation marks that have different variances between two groups of subjects (e.g., cases versus controls). In addition to DNA methylation, microRNA (miRNA) is another important mechanism of epigenetics. However, to the best of our knowledge, no studies have identified DV miRNAs. In this article, we proposed a novel model-based clustering method to improve the power of the probe-wise F test to detect DV miRNAs. We imposed special structures on covariance matrices for each cluster of miRNAs based on the prior information about the relationship between variances in cases and controls and about the independence among them. Simulation studies showed that the proposed method seems promising in detecting DV probes. Based on two real datasets about human hepatocellular carcinoma (HCC), we identified 7 DV-only miRNAs (hsa-miR-1826, hsa-miR-191, hsa-miR-194-star, hsa-miR-222, hsa-miR-502-3p, hsa-miR-93, and hsa-miR-99b) using the proposed method, one (hsa-miR-1826) of which has not yet been reported to be related to HCC in the literature.


Author(s):  
Naohiko Kinoshita ◽  
◽  
Yasunori Endo ◽  
Akira Sugawara ◽  
◽  
...  

Clustering is representative unsupervised classification. Many researchers have proposed clustering algorithms based on mathematical models – methods we call model-based clustering. Clustering techniques are very useful for determining data structures, but model-based clustering is difficult to use for analyzing data correctly because we cannot select a suitable method unless we know the data structure at least partially. The new clustering algorithm we propose introduces soft computing techniques such as fuzzy reasoning in what we call linguistic-based clustering, whose features are not incident to the data structure. We verify the method’s effectiveness through numerical examples.


2020 ◽  
Vol 11 (1) ◽  
pp. 45-60
Author(s):  
Billel Kenidra ◽  
Mohamed Benmohammed

The clustering process is used to identify cancer subtypes based on gene expression and DNA methylation datasets, since cancer subtype information is critically important for understanding tumor heterogeneity, detecting previously unknown clusters of biological samples, which are usually associated with unknown types of cancer will, in turn, gives way to prescribe more effective treatments for patients. This is because cancer has varying subtypes which often respond disparately to the same treatment. While the DNA methylation database is extremely large-scale datasets, running time still remains a major challenge. Actually, traditional clustering algorithms are too slow to handle biological high-dimensional datasets, they usually require large amounts of computational time. The proposed clustering algorithm extraordinarily overcomes all others in terms of running time, it is able to rapidly identify a set of biologically relevant clusters in large-scale DNA methylation datasets, its superiority over the others has been demonstrated regarding its relative speed.


2021 ◽  
Vol 16 ◽  
Author(s):  
Zhaoyang Liu ◽  
Hongsheng Yin ◽  
Shutao Chen ◽  
Hui Liu ◽  
Jia Meng ◽  
...  

Background: m6A methylation is a ubiquitous post-transcriptional modification that exists in mammals. MeRIP-seq technology makes the acquisition of m6A data in the whole transcriptome under different conditions realizable. The specific regulation of the enzyme will present co-methylation module on m6A methylation level data. Thus, mining the co-methylation module from which can help to unveil the mechanism of m<sup>6</sup>A methylation modification and its mechanism in the occurrence and development of complex diseases such as cancer. Objective: To develop a clustering algorithm that can effectively realize the mining of m6 co-methylation module. Method: In this study, a novel beta mixture model-based clustering algorithm named MBMM was proposed, which is based on the EM framework and introduces the method of moment estimating in M-step for parameter estimation to tackle the high-dimensional small sample m6A data. Simulation research was employed to evaluate the clustering performance of the proposed algorithm, and by which the co-methylation module mining was done based on real data. Biological significance correlation analysis was employed to explore whether the clustering results are co-methylation modules. Results and Conclusion: Simulation research demonstrated that MBMM performed out than other clustering algorithms. In real data, seven co-methylation modules were found by MBMM. Six m6A-related pathways specific analysis showed that six co-methylation modules were enriched in the pathway and were different. Five enzymes substrate-specific analysis revealed that seven co-methylation modules expressed varying degrees of enrichment. Gene Ontology enrichment analysis indicated that these modules may be regulated by enzymes while having potential functional specificity.


Sign in / Sign up

Export Citation Format

Share Document