Distinguish virulent and temperate phage-derived sequences in metavirome data with a deep learning approach

ABSTRACTBackgroundProkaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for their role in interactions with bacterial hosts and regulations of microbial communities. However there is no experimental or computational approach to classify sequences of these two in culture-independent metavirome effectively, we present a new computational method DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment.FindingsDeePhage utilizes a “one-hot” encoding form to have an overall and detailed representation of DNA sequences. Sequence signatures are detected via a deep learning algorithm, namely a convolutional neural network to extract valuable local features. DeePhage makes better performance than the most related method PHACTS. The accuracy of DeePhage on five-fold validation reach as high as 88%, nearly 30% higher than PHACTS. Evaluation on real metavirome shows DeePhage annotated 54.4% of reliable contigs while PHACTS annotated 44.5%. While running on the same machine, DeePhage reduces computational time than PHACTS by 810 times. Besides, we proposed a new strategy to explore phage transformations in the microbial community by direct detection of the temperate viral fragments from metagenome and metavirome. The detectable transformation of temperate phages provided us a new insight into the potential treatment for human disease.ConclusionsDeePhage is the first tool that can rapidly and efficiently identify two kinds of phage fragments especially for metagenomics analysis with satisfactory performance. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.

Download Full-text

DeePhage: distinguishing virulent and temperate phage-derived sequences in metavirome data with a deep learning approach

GigaScience ◽

10.1093/gigascience/giab056 ◽

2021 ◽

Vol 10 (9) ◽

Cited By ~ 1

Author(s):

Shufang Wu ◽

Zhencheng Fang ◽

Jie Tan ◽

Mo Li ◽

Chunhui Wang ◽

...

Keyword(s):

Dna Sequences ◽

Cross Validation ◽

Direct Detection ◽

Temperate Phage ◽

Computational Method ◽

New Strategy ◽

Culture Independent ◽

Fold Cross Validation ◽

Insight Into ◽

Metagenomics Analysis

Abstract Background Prokaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage–derived sequences in metavirome data is important for elucidating their different roles in interactions with bacterial hosts and regulation of microbial communities. However, there is no experimental or computational approach to effectively classify their sequences in culture-independent metavirome. We present a new computational method, DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage–derived fragment. Findings DeePhage uses a “one-hot” encoding form to represent DNA sequences in detail. Sequence signatures are detected via a convolutional neural network to obtain valuable local features. The accuracy of DeePhage on 5-fold cross-validation reaches as high as 89%, nearly 10% and 30% higher than that of 2 similar tools, PhagePred and PHACTS. On real metavirome, DeePhage correctly predicts the highest proportion of contigs when using BLAST as annotation, without apparent preferences. Besides, DeePhage reduces running time vs PhagePred and PHACTS by 245 and 810 times, respectively, under the same computational configuration. By direct detection of the temperate viral fragments from metagenome and metavirome, we furthermore propose a new strategy to explore phage transformations in the microbial community. The ability to detect such transformations provides us a new insight into the potential treatment for human disease. Conclusions DeePhage is a novel tool developed to rapidly and efficiently identify 2 kinds of phage fragments especially for metagenomics analysis. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.

Download Full-text

DNA sequences performs as natural language processing by exploiting deep learning algorithm for the identification of N4-methylcytosine

Scientific Reports ◽

10.1038/s41598-020-80430-x ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Abdul Wahab ◽

Hilal Tayara ◽

Zhenyu Xuan ◽

Kil To Chong

Keyword(s):

Deep Learning ◽

Language Processing ◽

Dna Sequences ◽

Area Under Curve ◽

Cross Validation ◽

Learning Algorithm ◽

State Of The Art ◽

Deep Learning Algorithm ◽

Fold Cross Validation ◽

Genome Dataset

AbstractN4-methylcytosine is a biochemical alteration of DNA that affects the genetic operations without modifying the DNA nucleotides such as gene expression, genomic imprinting, chromosome stability, and the development of the cell. In the proposed work, a computational model, 4mCNLP-Deep, used the word embedding approach as a vector formulation by exploiting deep learning based CNN algorithm to predict 4mC and non-4mC sites on the C.elegans genome dataset. Diversity of ranges employed for the experimental such as corpus k-mer and k-fold cross-validation to obtain the prevailing capabilities. The 4mCNLP-Deep outperform from the state-of-the-art predictor by achieving the results in five evaluation metrics by following; Accuracy (ACC) as 0.9354, Mathew’s correlation coefficient (MCC) as 0.8608, Specificity (Sp) as 0.89.96, Sensitivity (Sn) as 0.9563, and Area under curve (AUC) as 0.9731 by using 3-mer corpus word2vec and 3-fold cross-validation and attained the increment of 1.1%, 0.6%, 0.58%, 0.77%, and 4.89%, respectively. At last, we developed the online webserver http://nsclbio.jbnu.ac.kr/tools/4mCNLP-Deep/, for the experimental researchers to get the results easily.

Download Full-text

Direct Detection of Pixel-Level Myocardial Infarction Areas via a Deep-Learning Algorithm

Medical Image Computing and Computer Assisted Intervention − MICCAI 2017 - Lecture Notes in Computer Science ◽

10.1007/978-3-319-66179-7_28 ◽

2017 ◽

pp. 240-249 ◽

Cited By ~ 9

Author(s):

Chenchu Xu ◽

Lei Xu ◽

Zhifan Gao ◽

Shen Zhao ◽

Heye Zhang ◽

...

Keyword(s):

Myocardial Infarction ◽

Deep Learning ◽

Learning Algorithm ◽

Direct Detection ◽

Deep Learning Algorithm

Download Full-text

Predicting Protein Interactions Using a Deep Learning Method-Stacked Sparse Autoencoder Combined with a Probabilistic Classification Vector Machine

Complexity ◽

10.1155/2018/4216813 ◽

2018 ◽

Vol 2018 ◽

pp. 1-12 ◽

Cited By ~ 11

Author(s):

Yanbin Wang ◽

Zhuhong You ◽

Liping Li ◽

Li Cheng ◽

Xi Zhou ◽

...

Keyword(s):

Deep Learning ◽

Protein Interactions ◽

Learning Algorithm ◽

Computational Method ◽

Support Vector ◽

Sequence Information ◽

Deep Learning Algorithm ◽

Probabilistic Classification ◽

Sparse Autoencoder ◽

Stacked Sparse Autoencoder

Protein-protein interactions (PPIs), as an important molecular process within cells, are of pivotal importance in the biochemical function of cells. Although high-throughput experimental techniques have matured, enabling researchers to detect large amounts of PPIs, it has unavoidable disadvantages, such as having a high cost and being time consuming. Recent studies have demonstrated that PPIs can be efficiently detected by computational methods. Therefore, in this study, we propose a novel computational method to predict PPIs using only protein sequence information. This method was developed based on a deep learning algorithm-stacked sparse autoencoder (SSAE) combined with a Legendre moment (LM) feature extraction technique. Finally, a probabilistic classification vector machine (PCVM) classifier is used to implement PPI prediction. The proposed method was performed on human, unbalanced-human, H. pylori, and S. cerevisiae datasets with 5-fold cross-validation and yielded very high predictive accuracies of 98.58%, 97.71%, 93.76%, and 96.55%, respectively. To further evaluate the performance of our method, we compare it with the support vector machine- (SVM-) based method. The experimental results indicate that the PCVM-based method is obviously preferable to the SVM-based method. Our results have proven that the proposed method is practical, effective, and robust.

Download Full-text