scholarly journals Deep learning based DNA:RNA triplex forming potential prediction

2020 ◽  
Author(s):  
Yu ZHANG ◽  
Yahui Long ◽  
Chee Keong Kwoh

Abstract Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: i) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and ii) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the 5-fold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarise the cis and trans targeting of triplexes lncRNAs. Conclusions: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.

2020 ◽  
Author(s):  
Yu ZHANG ◽  
Yahui Long ◽  
Chee Keong Kwoh

Abstract Background: Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: i) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and ii) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data.Results: In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the 5-fold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions: The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.


2020 ◽  
Vol 21 (1) ◽  
Author(s):  
Yu Zhang ◽  
Yahui Long ◽  
Chee Keong Kwoh

Abstract Background Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: (1) they identify a large number of triplex-forming lncRNAs, but the limited number of experimentally verified triplex-forming lncRNA indicates that maybe not all of them can form triplex in practice, and (2) their predictions only consider the theoretical relationship while lacking the features from the experimentally verified data. Results In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex-forming lncRNAs and DNA sites based on the experimentally verified data, where the high-level features are learned by the convolutional neural networks. In the fivefold cross validation, the average values of Area Under the ROC curves and PRC curves for removed redundancy triplex-forming lncRNA dataset with threshold 0.8 are 0.9649 and 0.9996, and these two values for triplex DNA sites prediction are 0.8705 and 0.9671, respectively. Besides, we also briefly summarize the cis and trans targeting of triplexes lncRNAs. Conclusions The TriplexFPP is able to predict the most likely triplex-forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities and the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.


2020 ◽  
Author(s):  
Yu ZHANG ◽  
Yahui Long ◽  
Chee Keong Kwoh

Abstract Background Long non-coding RNAs (lncRNAs) can exert functions via forming triplex with DNA. The current methods in predicting the triplex formation mainly rely on mathematic statistic according to the base paring rules. However, these methods have two main limitations: i) they identify a large number of triplex forming lncRNAs, but the limited number of experimental verified triplex forming lncRNA indicate that maybe not all of them can from triplex in practice, and ii) their prediction only consider the theoretical relationship while lacking the features from the experimentally verified data. Results In this work, we develop an integrated program named TriplexFPP (Triplex Forming Potential Prediction), which is the first machine learning model in DNA:RNA triplex prediction. TriplexFPP predicts the most likely triplex forming lncRNAs and DNA sites based on the experimentally verified data, where their high-level features are learned by the deep neural networks. In the 5-fold cross validation, its average values of Area Under the ROC curves and PRC curves for triplex forming lncRNA and DNA sites predictions are 0.9949 and 0.9999, 0.8775 and 0.9692, respectively. Besides, we also briefly summarized the cis and trans targeting of triplexes lncRNAs. Conclusions The TriplexFPP can predict the most likely triplex forming lncRNAs from all the lncRNAs with computationally defined triplex forming capacities, and predict the potential of a DNA site to become a triplex. It may provide insights to the exploration of lncRNA functions.


2020 ◽  
Vol 48 (5) ◽  
pp. 030006052091922
Author(s):  
Qiao Yang ◽  
Xian Zhong Jiang ◽  
Yong Fen Zhu ◽  
Fang Fang Lv

Objective We aimed to analyze the risk factors and to establish a predictive tool for the occurrence of bloodstream infections (BSI) in patients with cirrhosis. Methods A total of 2888 patients with cirrhosis were retrospectively included. Multivariate analysis for risk factors of BSI were tested using logistic regression. Multivariate logistic regression was validated using five-fold cross-validation. Results Variables that were independently associated with incidence of BSI were white blood cell count (odds ratio [OR] = 1.094, 95% confidence interval [CI] 1.063–1.127)], C-reactive protein (OR = 1.005, 95% CI 1.002–1.008), total bilirubin (OR = 1.003, 95% CI 1.002–1.004), and previous antimicrobial exposure (OR = 4.556, 95% CI 3.369–6.160); albumin (OR = 0.904, 95% CI 0.883–0.926), platelet count (OR = 0.996, 95% CI 0.994–0.998), and serum creatinine (OR = 0.989, 95% CI 0.985–0.994) were associated with lower odds of BSI. The area under receiver operating characteristic (ROC) curve of the risk assessment scale was 0.850, and its sensitivity and specificity were 0.762 and 0.801, respectively. There was no significant difference between the ROC curves of cross-validation and risk assessment. Conclusions We developed a predictive tool for BSI in patients with cirrhosis, which could help with early identification of such episodes at admission, to improve outcome in these patients.


2020 ◽  
Vol 21 (15) ◽  
pp. 5222 ◽  
Author(s):  
Xiao-Nan Fan ◽  
Shao-Wu Zhang ◽  
Song-Yao Zhang ◽  
Jin-Jie Ni

Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing the lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. In this study, we presented an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporated three different input modalities, then a multimodal deep learning framework was built for learning the high-level abstract representations and predicting the probability whether a transcript was lncRNA or not. LncRNA_Mdeep achieved 98.73% prediction accuracy in a 10-fold cross-validation test on humans. Compared with other eight state-of-the-art methods, lncRNA_Mdeep showed 93.12% prediction accuracy independent test on humans, which was 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets showed that lncRNA_Mdeep was a powerful predictor for predicting lncRNAs.


Author(s):  
Yuhong Huang ◽  
Wenben Chen ◽  
Xiaoling Zhang ◽  
Shaofu He ◽  
Nan Shao ◽  
...  

Aim: After neoadjuvant chemotherapy (NACT), tumor shrinkage pattern is a more reasonable outcome to decide a possible breast-conserving surgery (BCS) than pathological complete response (pCR). The aim of this article was to establish a machine learning model combining radiomics features from multiparametric MRI (mpMRI) and clinicopathologic characteristics, for early prediction of tumor shrinkage pattern prior to NACT in breast cancer.Materials and Methods: This study included 199 patients with breast cancer who successfully completed NACT and underwent following breast surgery. For each patient, 4,198 radiomics features were extracted from the segmented 3D regions of interest (ROI) in mpMRI sequences such as T1-weighted dynamic contrast-enhanced imaging (T1-DCE), fat-suppressed T2-weighted imaging (T2WI), and apparent diffusion coefficient (ADC) map. The feature selection and supervised machine learning algorithms were used to identify the predictors correlated with tumor shrinkage pattern as follows: (1) reducing the feature dimension by using ANOVA and the least absolute shrinkage and selection operator (LASSO) with 10-fold cross-validation, (2) splitting the dataset into a training dataset and testing dataset, and constructing prediction models using 12 classification algorithms, and (3) assessing the model performance through an area under the curve (AUC), accuracy, sensitivity, and specificity. We also compared the most discriminative model in different molecular subtypes of breast cancer.Results: The Multilayer Perception (MLP) neural network achieved higher AUC and accuracy than other classifiers. The radiomics model achieved a mean AUC of 0.975 (accuracy = 0.912) on the training dataset and 0.900 (accuracy = 0.828) on the testing dataset with 30-round 6-fold cross-validation. When incorporating clinicopathologic characteristics, the mean AUC was 0.985 (accuracy = 0.930) on the training dataset and 0.939 (accuracy = 0.870) on the testing dataset. The model further achieved good AUC on the testing dataset with 30-round 5-fold cross-validation in three molecular subtypes of breast cancer as following: (1) HR+/HER2–: 0.901 (accuracy = 0.816), (2) HER2+: 0.940 (accuracy = 0.865), and (3) TN: 0.837 (accuracy = 0.811).Conclusions: It is feasible that our machine learning model combining radiomics features and clinical characteristics could provide a potential tool to predict tumor shrinkage patterns prior to NACT. Our prediction model will be valuable in guiding NACT and surgical treatment in breast cancer.


2020 ◽  
Vol 8 (2) ◽  
Author(s):  
yohana Tri Utami ◽  
Dewi Asiah Shofiana ◽  
Yunda Heningtyas

Telecommunication industries are experiencing substantial problems related to the migration of customers due to a large number of competing companies, dynamic circumstances, as well as the presence of many innovative and attractive offerings. The situation has resulted in a high level of customer migration, affecting a decrement toward the company revenue. Regarding that condition, the customer churn is one well-know approach that can help in increasing the company's revenue and reputation. As to predict the reason behind the migration of customer, this study proposed a data mining classification technique by applying the C4.5 algorithm. Patterns generated by the model were implemented using 10-fold cross-validation, resulting in a model with an accuracy rate of 87%, precision 87.5%, and a recall of 97%. Based on the good performance quality of the model, it can be stated that the C4.5 algorithm succeeded to discover several causes from the migration of telecommunication users, in which price holds the top place as the primary reason


Author(s):  
Pierre O. Jacquet ◽  
Farid Pazhoohi ◽  
Charles Findling ◽  
Hugo Mell ◽  
Coralie Chevallier ◽  
...  

AbstractWhy do moral religions exist? An influential psychological explanation is that religious beliefs in supernatural punishment is cultural group adaptation enhancing prosocial attitudes and thereby large-scale cooperation. An alternative explanation is that religiosity is an individual strategy that results from high level of mistrust and the need for individuals to control others’ behaviors through moralizing. Existing evidence is mixed but most works are limited by sample size and generalizability issues. The present study overcomes these limitations by applying k-fold cross-validation on multivariate modeling of data from >295,000 individuals in 108 countries of the World Values Surveys and the European Value Study. First, this methodology reveals no evidence that European and non-European religious people invest more in collective actions and are more trustful of unrelated conspecifics. Instead, the individuals’ level of religiosity is found to be weakly but positively associated with social mistrust and negatively associated with the production of behaviors, which benefit unrelated members of the large-scale community. Second, our models show that individual variation in religiosity is well explained by the interaction of increased levels of social mistrust and increased needs to moralize other people’s sexual behaviors. Finally, stratified k-fold cross-validation demonstrates that the structures of these association patterns are robust to sampling variability and reliable enough to generalize to out-of-sample data.


2020 ◽  
Author(s):  
Xiao-Nan Fan ◽  
Shao-Wu Zhang ◽  
Song-Yao Zhang ◽  
Jin-Jie Ni

Abstract Background: Long non-coding RNAs (lncRNAs) play crucial roles in diverse biological processes and human complex diseases. Distinguishing lncRNAs from protein-coding transcripts is a fundamental step for analyzing lncRNA functional mechanism. However, the experimental identification of lncRNAs is expensive and time-consuming. Results: In this study, we present an alignment-free multimodal deep learning framework (namely lncRNA_Mdeep) to distinguish lncRNAs from protein-coding transcripts. LncRNA_Mdeep incorporates three different input modalities (i.e. OFH modality, k-mer modality, and sequence modality), then a multimodal deep learning framework is built for learning the high-level abstract representations and predicting the probability whether a transcript is lncRNA or not. Conclusions: LncRNA_Mdeep achieves 98.73% prediction accuracy in 10-fold cross-validation test on human. Compared with other eight state-of-the-art methods, lncRNA_Mdeep shows 93.12% prediction accuracy independent test on human, which is 0.94%~15.41% higher than that of other eight methods. In addition, the results on 11 cross-species datasets show that lncRNA_Mdeep is a powerful predictor for identifying lncRNAs. The source code can be downloaded from https://github.com/NWPU-903PR/lncRNA_Mdeep.


2020 ◽  
Vol 21 (16) ◽  
pp. 5710
Author(s):  
Xiao Wang ◽  
Yinping Jin ◽  
Qiuwen Zhang

Mitochondrial proteins are physiologically active in different compartments, and their abnormal location will trigger the pathogenesis of human mitochondrial pathologies. Correctly identifying submitochondrial locations can provide information for disease pathogenesis and drug design. A mitochondrion has four submitochondrial compartments, the matrix, the outer membrane, the inner membrane, and the intermembrane space, but various existing studies ignored the intermembrane space. The majority of researchers used traditional machine learning methods for predicting mitochondrial protein localization. Those predictors required expert-level knowledge of biology to be encoded as features rather than allowing the underlying predictor to extract features through a data-driven procedure. Besides, few researchers have considered the imbalance in datasets. In this paper, we propose a novel end-to-end predictor employing deep neural networks, DeepPred-SubMito, for protein submitochondrial location prediction. First, we utilize random over-sampling to decrease the influence caused by unbalanced datasets. Next, we train a multi-channel bilayer convolutional neural network for multiple subsequences to learn high-level features. Third, the prediction result is outputted through the fully connected layer. The performance of the predictor is measured by 10-fold cross-validation and 5-fold cross-validation on the SM424-18 dataset and the SubMitoPred dataset, respectively. Experimental results show that the predictor outperforms state-of-the-art predictors. In addition, the prediction of results in the M983 dataset also confirmed its effectiveness in predicting submitochondrial locations.


Sign in / Sign up

Export Citation Format

Share Document