scholarly journals DNN-m6A: A Cross-Species Method for Identifying RNA N6-Methyladenosine Sites Based on Deep Neural Network with Multi-Information Fusion

Genes ◽  
2021 ◽  
Vol 12 (3) ◽  
pp. 354
Author(s):  
Lu Zhang ◽  
Xinyi Qin ◽  
Min Liu ◽  
Ziwei Xu ◽  
Guangzhong Liu

As a prevalent existing post-transcriptional modification of RNA, N6-methyladenosine (m6A) plays a crucial role in various biological processes. To better radically reveal its regulatory mechanism and provide new insights for drug design, the accurate identification of m6A sites in genome-wide is vital. As the traditional experimental methods are time-consuming and cost-prohibitive, it is necessary to design a more efficient computational method to detect the m6A sites. In this study, we propose a novel cross-species computational method DNN-m6A based on the deep neural network (DNN) to identify m6A sites in multiple tissues of human, mouse and rat. Firstly, binary encoding (BE), tri-nucleotide composition (TNC), enhanced nucleic acid composition (ENAC), K-spaced nucleotide pair frequencies (KSNPFs), nucleotide chemical property (NCP), pseudo dinucleotide composition (PseDNC), position-specific nucleotide propensity (PSNP) and position-specific dinucleotide propensity (PSDP) are employed to extract RNA sequence features which are subsequently fused to construct the initial feature vector set. Secondly, we use elastic net to eliminate redundant features while building the optimal feature subset. Finally, the hyper-parameters of DNN are tuned with Bayesian hyper-parameter optimization based on the selected feature subset. The five-fold cross-validation test on training datasets show that the proposed DNN-m6A method outperformed the state-of-the-art method for predicting m6A sites, with an accuracy (ACC) of 73.58%–83.38% and an area under the curve (AUC) of 81.39%–91.04%. Furthermore, the independent datasets achieved an ACC of 72.95%–83.04% and an AUC of 80.79%–91.09%, which shows an excellent generalization ability of our proposed method.

2020 ◽  
Author(s):  
Xiao Chen ◽  
Yi Xiong ◽  
Yinbo Liu ◽  
Yuqing Chen ◽  
Shoudong Bi ◽  
...  

Abstract Background: As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functionssuch as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA,researcherscanbetter understandthe exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost.However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement.Results: In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVMoffered substantially higher prediction accuracy thanpreviously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.Conclusion: In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species.The result shows that our model outperformed the existing state-of-art models.Our model is available for users through a web serverat http://zhulab.ahu.edu.cn/m5CPred-SVM.


2020 ◽  
Author(s):  
Xiao Chen ◽  
Yi Xiong ◽  
Yinbo Liu ◽  
Yuqing Chen ◽  
Shoudong Bi ◽  
...  

Abstract Background: As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. Results: In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.Conclusion: In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at http://zhulab.ahu.edu.cn/m5CPred-SVM.


Author(s):  
Nam D Nguyen ◽  
Ting Jin ◽  
Daifeng Wang

Abstract Summary Population studies such as genome-wide association study have identified a variety of genomic variants associated with human diseases. To further understand potential mechanisms of disease variants, recent statistical methods associate functional omic data (e.g. gene expression) with genotype and phenotype and link variants to individual genes. However, how to interpret molecular mechanisms from such associations, especially across omics, is still challenging. To address this problem, we developed an interpretable deep learning method, Varmole, to simultaneously reveal genomic functions and mechanisms while predicting phenotype from genotype. In particular, Varmole embeds multi-omic networks into a deep neural network architecture and prioritizes variants, genes and regulatory linkages via biological drop-connect without needing prior feature selections. Availability and implementation Varmole is available as a Python tool on GitHub at https://github.com/daifengwanglab/Varmole. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (S15) ◽  
Author(s):  
Hongda Bu ◽  
Jiaqi Hao ◽  
Yanglan Gan ◽  
Shuigeng Zhou ◽  
Jihong Guan

Abstract Background Super-enhancers (SEs) are clusters of transcriptional active enhancers, which dictate the expression of genes defining cell identity and play an important role in the development and progression of tumors and other diseases. Many key cancer oncogenes are driven by super-enhancers, and the mutations associated with common diseases such as Alzheimer’s disease are significantly enriched with super-enhancers. Super-enhancers have shown great potential for the identification of key oncogenes and the discovery of disease-associated mutational sites. Results In this paper, we propose a new computational method called DEEPSEN for predicting super-enhancers based on convolutional neural network. The proposed method integrates 36 kinds of features. Compared with existing approaches, our method performs better and can be used for genome-wide prediction of super-enhancers. Besides, we screen important features for predicting super-enhancers. Conclusion Convolutional neural network is effective in boosting the performance of super-enhancer prediction.


2020 ◽  
Author(s):  
Xiao Chen ◽  
Yi Xiong ◽  
Yinbo Liu ◽  
Yuqing Chen ◽  
Shoudong Bi ◽  
...  

Abstract Background: As one of the most common post-transcriptional modifications (PTCM) in RNA, 5-cytosine-methylation plays important roles in many biological functions such as RNA metabolism and cell fate decision. Through accurate identification of 5-methylcytosine (m5C) sites on RNA, researchers can better understand the exact role of 5-cytosine-methylation in these biological functions. In recent years, computational methods of predicting m5C sites have attracted lots of interests because of its efficiency and low-cost. However, both the accuracy and efficiency of these methods are not satisfactory yet and need further improvement. Results: In this work, we have developed a new computational method, m5CPred-SVM, to identify m5C sites in three species, H. sapiens, M. musculus and A. thaliana. To build this model, we first collected benchmark datasets following three recently published methods. Then, six types of sequence-based features were generated based on RNA segments and the sequential forward feature selection strategy was used to obtain the optimal feature subset. After that, the performance of models based on different learning algorithms were compared, and the model based on the support vector machine provided the highest prediction accuracy. Finally, our proposed method, m5CPred-SVM was compared with several existing methods, and the result showed that m5CPred-SVM offered substantially higher prediction accuracy than previously published methods. It is expected that our method, m5CPred-SVM, can become a useful tool for accurate identification of m5C sites.Conclusion: In this study, by introducing position-specific propensity related features, we built a new model, m5CPred-SVM, to predict RNA m5C sites of three different species. The result shows that our model outperformed the existing state-of-art models. Our model is available for users through a web server at http://zhulab.ahu.edu.cn/m5CPred-SVM.


2020 ◽  
Vol 36 (15) ◽  
pp. 4276-4282 ◽  
Author(s):  
Cangzhi Jia ◽  
Yue Bi ◽  
Jinxiang Chen ◽  
André Leier ◽  
Fuyi Li ◽  
...  

Abstract Motivation Different from traditional linear RNAs (containing 5′ and 3′ ends), circular RNAs (circRNAs) are a special type of RNAs that have a closed ring structure. Accumulating evidence has indicated that circRNAs can directly bind proteins and participate in a myriad of different biological processes. Results For identifying the interaction of circRNAs with 37 different types of circRNA-binding proteins (RBPs), we develop an ensemble neural network, termed PASSION, which is based on the concatenated artificial neural network (ANN) and hybrid deep neural network frameworks. Specifically, the input of the ANN is the optimal feature subset for each RBP, which has been selected from six types of feature encoding schemes through incremental feature selection and application of the XGBoost algorithm. In turn, the input of the hybrid deep neural network is a stacked codon-based scheme. Benchmarking experiments indicate that the ensemble neural network reaches the average best area under the curve (AUC) of 0.883 across the 37 circRNA datasets when compared with XGBoost, k-nearest neighbor, support vector machine, random forest, logistic regression and Naive Bayes. Moreover, each of the 37 RBP models is extensively tested by performing independent tests, with the varying sequence similarity thresholds of 0.8, 0.7, 0.6 and 0.5, respectively. The corresponding average AUC obtained are 0.883, 0.876, 0.868 and 0.883, respectively, highlighting the effectiveness and robustness of PASSION. Extensive benchmarking experiments demonstrate that PASSION achieves a competitive performance for identifying binding sites between circRNA and RBPs, when compared with several state-of-the-art methods. Availability and implementation A user-friendly web server of PASSION is publicly accessible at http://flagship.erc.monash.edu/PASSION/. Supplementary information Supplementary data are available at Bioinformatics online.


2011 ◽  
Vol 2 (1) ◽  
pp. 18-33 ◽  
Author(s):  
Liau Heng Fui ◽  
Dino Isa

Feature selection is crucial to select an “optimized” subset of features from the original feature set based on a certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. This paper proposes a feature selection algorithm that aims to minimize the area under the curve of detection error trade-off (DET) curve. Particle swarm optimization (PSO) is employed to search for the optimal feature subset. The proposed method is implemented in face recognition and iris recognition systems. The result shows that the proposed method is able to find an optimal subset of features that sufficiently describes iris and face images by removing unwanted and redundant features and at the same time improving the classification accuracy in terms of total error rate (TER).


Author(s):  
Liau Heng Fui ◽  
Dino Isa

Feature selection is crucial to select an “optimized” subset of features from the original feature set based on a certain objective function. In general, feature selection removes redundant or irrelevant data while retaining classification accuracy. This paper proposes a feature selection algorithm that aims to minimize the area under the curve of detection error trade-off (DET) curve. Particle swarm optimization (PSO) is employed to search for the optimal feature subset. The proposed method is implemented in face recognition and iris recognition systems. The result shows that the proposed method is able to find an optimal subset of features that sufficiently describes iris and face images by removing unwanted and redundant features and at the same time improving the classification accuracy in terms of total error rate (TER).


Author(s):  
Songhee Cheon ◽  
Jungyoon Kim ◽  
Jihye Lim

The increase in stroke incidence with the aging of the Korean population will rapidly impose an economic burden on society. Timely treatment can improve stroke prognosis. Awareness of stroke warning signs and appropriate actions in the event of a stroke improve outcomes. Medical service use and health behavior data are easier to collect than medical imaging data. Here, we used a deep neural network to detect stroke using medical service use and health behavior data; we identified 15,099 patients with stroke. Principal component analysis (PCA) featuring quantile scaling was used to extract relevant background features from medical records; we used these to predict stroke. We compared our method (a scaled PCA/deep neural network [DNN] approach) to five other machine-learning methods. The area under the curve (AUC) value of our method was 83.48%; hence; it can be used by both patients and doctors to prescreen for possible stroke.


2019 ◽  
Vol 35 (16) ◽  
pp. 2796-2800 ◽  
Author(s):  
Wei Chen ◽  
Hao Lv ◽  
Fulei Nie ◽  
Hao Lin

Abstract Motivation DNA N6-methyladenine (6mA) is associated with a wide range of biological processes. Since the distribution of 6mA site in the genome is non-random, accurate identification of 6mA sites is crucial for understanding its biological functions. Although experimental methods have been proposed for this regard, they are still cost-ineffective for detecting 6mA site in genome-wide scope. Therefore, it is desirable to develop computational methods to facilitate the identification of 6mA site. Results In this study, a computational method called i6mA-Pred was developed to identify 6mA sites in the rice genome, in which the optimal nucleotide chemical properties obtained by the using feature selection technique were used to encode the DNA sequences. It was observed that the i6mA-Pred yielded an accuracy of 83.13% in the jackknife test. Meanwhile, the performance of i6mA-Pred was also superior to other methods. Availability and implementation A user-friendly web-server, i6mA-Pred is freely accessible at http://lin-group.cn/server/i6mA-Pred.


Sign in / Sign up

Export Citation Format

Share Document