Background:
RNA methylome has been discovered as an important layer of gene regulation and
can be profiled directly with count-based measurements from high-throughput sequencing data. Although
the detailed regulatory circuit of the epitranscriptome remains uncharted, clustering effect in methylation
status among different RNA methylation sites can be identified from transcriptome-wide RNA methylation
profiles and may reflect the epitranscriptomic regulation. Count-based RNA methylation sequencing data
has unique features, such as low reads coverage, which calls for novel clustering approaches.
<P><P>
Objective: Besides the low reads coverage, it is also necessary to keep the integer property to approach
clustering analysis of count-based RNA methylation sequencing data.
<P><P>
Method: We proposed a nonparametric generative model together with its Gibbs sampling solution for
clustering analysis. The proposed approach implements a beta-binomial mixture model to capture the
clustering effect in methylation level with the original count-based measurements rather than an estimated
continuous methylation level. Besides, it adopts a nonparametric Dirichlet process to automatically
determine an optimal number of clusters so as to avoid the common model selection problem in clustering
analysis.
<P><P>
Results: When tested on the simulated system, the method demonstrated improved clustering performance
over hierarchical clustering, K-means, MClust, NMF and EMclust. It also revealed on real dataset two novel
RNA N6-methyladenosine (m6A) co-methylation patterns that may be induced directly by METTL14 and
WTAP, which are two known regulatory components of the RNA m6A methyltransferase complex.
<P><P>
Conclusion: Our proposed DPBBM method not only properly handles the count-based measurements of
RNA methylation data from sites of very low reads coverage, but also learns an optimal number of clusters
adaptively from the data analyzed.
<P><P>
Availability: The source code and documents of DPBBM R package are freely available through the
Comprehensive R Archive Network (CRAN): https://cran.r-project.org/web/packages/DPBBM/.