scholarly journals Deep6mA: A deep learning framework for exploring similar patterns in DNA N6-methyladenine sites across different species

2021 ◽  
Vol 17 (2) ◽  
pp. e1008767
Author(s):  
Zutan Li ◽  
Hangjin Jiang ◽  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Kun Lang ◽  
...  

N6-methyladenine (6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for under-standing of 6mA’s biological functions. However, the existing experimental techniques for detecting 6mA sites are cost-ineffective, which implies the great need of developing new computational methods for this problem. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca and Rosa chinensis with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.

2019 ◽  
Author(s):  
Zutan Li ◽  
Hangjin Jiang ◽  
Lingpeng Kong ◽  
Yuanyuan Chen ◽  
Liangyun Zhang ◽  
...  

ABSTRACTN6-methyladenin(6mA) is an important DNA modification form associated with a wide range of biological processes. Identifying accurately 6mA sites on a genomic scale is crucial for understanding of 6mA’s biological functions. In this paper, we developed, without requiring any prior knowledge of 6mA and manually crafted sequence features, a deep learning framework named Deep6mA to identify DNA 6mA sites, and its performance is superior to other DNA 6mA prediction tools. Specifically, the 5-fold cross-validation on a benchmark dataset of rice gives the sensitivity and specificity of Deep6mA as 92.96% and 95.06%, respectively, and the overall prediction accuracy is 94%. Importantly, we find that the sequences with 6mA sites share similar patterns across different species. The model trained with rice data predicts well the 6mA sites of other three species: Arabidopsis thaliana, Fragaria vesca, and Rosa chinensis, with a prediction accuracy over 90%. In addition, we find that (1) 6mA tends to occur at GAGG motifs, which means the sequence near the 6mA site may be conservative; (2) 6mA is enriched in the TATA box of the promoter, which may be the main source of its regulating downstream gene expression.


2021 ◽  
Vol 11 (16) ◽  
pp. 7731
Author(s):  
Rao Zeng ◽  
Minghong Liao

DNA methylation is one of the most extensive epigenetic modifications. DNA N6-methyladenine (6mA) plays a key role in many biology regulation processes. An accurate and reliable genome-wide identification of 6mA sites is crucial for systematically understanding its biological functions. Some machine learning tools can identify 6mA sites, but their limited prediction accuracy and lack of robustness limit their usability in epigenetic studies, which implies the great need of developing new computational methods for this problem. In this paper, we developed a novel computational predictor, namely the 6mAPred-MSFF, which is a deep learning framework based on a multi-scale feature fusion mechanism to identify 6mA sites across different species. In the predictor, we integrate the inverted residual block and multi-scale attention mechanism to build lightweight and deep neural networks. As compared to existing predictors using traditional machine learning, our deep learning framework needs no prior knowledge of 6mA or manually crafted sequence features and sufficiently capture better characteristics of 6mA sites. By benchmarking comparison, our deep learning method outperforms the state-of-the-art methods on the 5-fold cross-validation test on the seven datasets of six species, demonstrating that the proposed 6mAPred-MSFF is more effective and generic. Specifically, our proposed 6mAPred-MSFF gives the sensitivity and specificity of the 5-fold cross-validation on the 6mA-rice-Lv dataset as 97.88% and 94.64%, respectively. Our model trained with the rice data predicts well the 6mA sites of other five species: Arabidopsis thaliana, Fragaria vesca, Rosa chinensis, Homo sapiens, and Drosophila melanogaster with a prediction accuracy 98.51%, 93.02%, and 91.53%, respectively. Moreover, via experimental comparison, we explored performance impact by training and testing our proposed model under different encoding schemes and feature descriptors.


2020 ◽  
Vol 21 (15) ◽  
pp. 5222 ◽  
Author(s):  
Xiao-Nan Fan ◽  
Shao-Wu Zhang ◽  
Song-Yao Zhang ◽  
Jin-Jie Ni

Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing the lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. In this study, we presented an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporated three different input modalities, then a multimodal deep learning framework was built for learning the high-level abstract representations and predicting the probability whether a transcript was lncRNA or not. LncRNA_Mdeep achieved 98.73% prediction accuracy in a 10-fold cross-validation test on humans. Compared with other eight state-of-the-art methods, lncRNA_Mdeep showed 93.12% prediction accuracy independent test on humans, which was 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets showed that lncRNA_Mdeep was a powerful predictor for predicting lncRNAs.


2020 ◽  
Author(s):  
Xiao-Nan Fan ◽  
Shao-Wu Zhang ◽  
Song-Yao Zhang ◽  
Jin-Jie Ni

Abstract Background: Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. Results: In this study, we present an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporates three different input modalities (i.e. OFH modality, k-mer modality, and sequence modality), then a multimodal deep learning framework is built for learning the high-level abstract representations and predicting the probability whether a transcript is lncRNA or not. Conclusions: LncRNA_Mdeep achieves 98.73% prediction accuracy in 10-fold cross-validation test on human. Compared with other eight state-of-the-art methods, lncRNA_Mdeep shows 93.12% prediction accuracy independent test on human, which is 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets show that lncRNA_Mdeep is a powerful predictor for identifying lncRNAs. The source code can be downloaded from https://github.com/NWPU-903PR/lncRNA_Mdeep.


Author(s):  
Zhihao Ke ◽  
Xiaoning Liu ◽  
Yining Chen ◽  
Hongfu Shi ◽  
Zigang Deng

Abstract By the merits of self-stability and low energy consumption, high temperature superconducting (HTS) maglev has the potential to become a novel type of transportation mode. As a key index to guarantee the lateral self-stability of HTS maglev, guiding force has strong non-linearity and is determined by multitudinous factors, and these complexities impede its further researches. Compared to traditional finite element and polynomial fitting method, the prosperity of deep learning algorithms could provide another guiding force prediction approach, but the verification of this approach is still blank. Therefore, this paper establishes 5 different neural network models (RBF, DNN, CNN, RNN, LSTM) to predict HTS maglev guiding force, and compares their prediction efficiency based on 3720 pieces of collected data. Meanwhile, two adaptively iterative algorithms for parameters matrix and learning rate adjustment are proposed, which could effectively reduce computing time and unnecessary iterations. And according to the results, it is revealed that, the DNN model shows the best fitting goodness, while the LSTM model displays the smoothest fitting curve on guiding force prediction. Based on this discovery, the effects of learning rate and iterations on prediction accuracy of the constructed DNN model are studied. And the learning rate and iterations at the highest guiding force prediction accuracy are 0.00025 and 90000, respectively. Moreover, the K-fold cross validation method is also applied to this DNN model, whose result manifests the generalization and robustness of this DNN model. The imperative of K-fold cross validation method to ensure universality of guiding force prediction model is likewise assessed. This paper firstly combines HTS maglev guiding force prediction with deep learning algorithms considering different field cooling height, real-time magnetic flux density, liquid nitrogen temperature and motion direction of bulk. Additionally, this paper gives a convenient and efficient method for HTS guiding force prediction and parameter optimization.


2021 ◽  
Author(s):  
Quynh C. Pham ◽  
Trung Q. Trinh ◽  
Lesley A. James

Abstract Knowing the minimum miscibility pressure (MMP) between different oil and gas compositions is important to predict reservoir performance for gas-based injection as a secondary gas flood or tertiary technique such as water alternating gas (WAG). Machine Learning (ML) has been used widely and has been proven efficient in estimating these properties. In this work, the development of ML as well as commonly used algorithms in predicting bubble point pressure and oil formation volume factor is reviewed. Just a few studies are found before 2000. From 2001 to 2010, the use of ML increased steadily. However, a sharp augmentation in number of articles is observed from 2011 up to now. More than that, Artificial Neural Networks (ANN) is the most employed algorithm with 23 applications out of 38 studied papers. In addition, for the first time, deep learning- multiple fully connected networks algorithm is implemented to predict the MMP for oil and gas through 250 datasets covering a wide range of CO2 concentration from 0 to 100% in the injected gas. The wide range of CO2 concentrations is to cover all modes of gas injection from a pure CO2 flood to CO2 being negligibly present when injecting a sweet gas. The model is then optimized using Early Stopping and K-Fold Cross Validation techniques, showing the average result of k splitting data sets. The eight input parameters are as follows: reservoir temperature, oil characteristics (molecular weight, ratio of volatile components, and intermediate components), and gas characteristics (mole percentage of CO2, Cl, N2, H2S, C2+). The proposed model is compared with other Machine Learning Techniques such as Decision Tree and Random Forest Regression. The results show that reservoir temperature, the amount of CO2 and Cl in the gas source were the parameters to affect MMP the most significantly. The presence of CO2 in the gas stream will lower the MMP significantly. The Deep Learning model obtained an R2 = 0.96 and a Root Mean Square Error (RMSE) of 5.4%. Through Early Stopping technique, the proposed model reach the R2 result of 0.97 in 7 epochs. An R2 value of 0.954 was found using K-Fold Cross Validation technique, resulting in a good model generated by five folds data set. The model built by Deep Learning algorithm was more accurate than these ones built by Decision Tree and Random Forest Regression, which had an R2 value below 0.9 and RMSE larger than 10%. This work goes beyond other prior research by adding a ‘stopping point’ concept, increasing the overall performance of the methods for general applications, and considering the full range of CO2 in the gas stream.


2019 ◽  
Author(s):  
Xiao-Nan Fan ◽  
Shao-Wu Zhang ◽  
Song-Yao Zhang ◽  
Jin-Jie Ni

Abstract Background: Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. Results: In this study, we present an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporates three different input modalities (i.e. OFH modality, k-mer modality, and sequence modality), then a multimodal deep learning framework is built for learning the high-level abstract representations and predicting the probability whether a transcript is lncRNA or not.Conclusions: LncRNA_Mdeep achieves 98.73% prediction accuracy in 10-fold cross-validation test on human. Compared with other eight state-of-the-art methods, lncRNA_Mdeep shows 93.12% prediction accuracy independent test on human, which is 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets show that lncRNA_Mdeep is a powerful predictor for identifying lncRNAs. The source code can be downloaded from https://github.com/NWPU-903PR/lncRNA_Mdeep.


Author(s):  
Nathan Swanson ◽  
Donald Koban ◽  
Patrick Brundage

AbstractApplying Google’s PageRank model to sports is a popular concept in contemporary sports ranking. However, there is limited evidence that rankings generated with PageRank models do well at predicting the winners of playoffs series. In this paper, we use a PageRank model to predict the outcomes of the 2008–2016 NHL playoffs. Unlike previous studies that use a uniform personalization vector, we incorporate Corsi statistics into a personalization vector, use a nine-fold cross validation to identify tuning parameters, and evaluate the prediction accuracy of the tuned model. We found our ratings had a 70% accuracy for predicting the outcome of playoff series, outperforming the Colley, Massey, Bradley-Terry, Maher, and Generalized Markov models by 5%. The implication of our results is that fitting parameter values and adding a personalization vector can lead to improved performance when using PageRank models.


Diagnostics ◽  
2021 ◽  
Vol 11 (10) ◽  
pp. 1875
Author(s):  
Yuchi Tian ◽  
Temitope Emmanuel Komolafe ◽  
Jian Zheng ◽  
Guofeng Zhou ◽  
Tao Chen ◽  
...  

To assess if quantitative integrated deep learning and radiomics features can predict the PD-L1 expression level in preoperative MRI of hepatocellular carcinoma (HCC) patients. The data in this study consist of 103 hepatocellular carcinoma patients who received immunotherapy in a single center. These patients were divided into a high PD-L1 expression group (30 patients) and a low PD-L1 expression group (73 patients). Both radiomics and deep learning features were extracted from their MRI sequence of T2-WI, which were merged into an integrative feature space for machine learning for the prediction of PD-L1 expression. The five-fold cross-validation was adopted to validate the performance of the model, while the AUC was used to assess the predictive ability of the model. Based on the five-fold cross-validation, the integrated model achieved the best prediction performance, with an AUC score of 0.897 ± 0.084, followed by the deep learning-based model with an AUC of 0.852 ± 0.043 then the radiomics-based model with AUC of 0.794 ± 0.035. The feature set integrating radiomics and deep learning features is more effective in predicting PD-L1 expression level than only one feature type. The integrated model can achieve fast and accurate prediction of PD-L1 expression status in preoperative MRI of HCC patients.


SLEEP ◽  
2020 ◽  
Vol 43 (Supplement_1) ◽  
pp. A236-A236
Author(s):  
A Guillot ◽  
T Moutakanni ◽  
M Harris ◽  
P J Arnal ◽  
V Thorey

Abstract Introduction Polysomnography (PSG) is the gold-standard to diagnose obstructive sleep apnea (OSA). OSA severity diagnosis is defined by the apnea-hypopnea index (AHI) defined as the number of apnea and hypopnea events measured per hour of sleep. The Dreem2 headband (DH) is a self-administered, easy to use device that measure EEG, breathing frequency, heart rate and sound at-home. In our study, we assessed the performance of the DH to automatically detects OSA compared to 3 sleep’s experts scoring on PSG. Methods 41 subjects (8 females, 42.6 ± 13.7 y.o.) having a suspicion of OSA performed a night at-home wearing both a PSG and the DH. Each PSG record was scored for apnea and hypopnea events by 3 independent trained sleep experts following AASM guidelines. The deep learning approach DOSED, was trained on the DH signals using the manual apnea scoring. 10-fold cross-validation was used to provide predictions for each of the 41 subjects with the DH. Results We observed an average AHI expert’s scoring of 13.6 ± 10.1 CI[10.5, 16.5] compared to 12.9 ± 10.3 CI[9.6, 15.8] for the DH. Both, the correlation between the 3 scorers (r= 0.88, p < 0.001) and the DH and the scorers (r=0.79, p< 0.001) were significant. The specificity and sensitivity to detect mild OSA (AHI ≤ 5) was 84.4 % and 96.4 % for the DH and 86.5 % and 86.0% for the scorers. Conclusion The results show that the DH using deep learning can detect OSA with an accuracy similar to the sleep experts. The use of DH paves the way for longitudinal monitoring of patients with a suspicion of OSA and its accessibility could lead to better screening of the general population. Support This Study has been supported by Dreem sas.


Sign in / Sign up

Export Citation Format

Share Document