Distinguish virulent and temperate phage-derived sequences in metavirome data with a deep learning approach
ABSTRACTBackgroundProkaryotic viruses referred to as phages can be divided into virulent and temperate phages. Distinguishing virulent and temperate phage-derived sequences in metavirome data is important for their role in interactions with bacterial hosts and regulations of microbial communities. However there is no experimental or computational approach to classify sequences of these two in culture-independent metavirome effectively, we present a new computational method DeePhage, which can directly and rapidly judge each read or contig as a virulent or temperate phage-derived fragment.FindingsDeePhage utilizes a “one-hot” encoding form to have an overall and detailed representation of DNA sequences. Sequence signatures are detected via a deep learning algorithm, namely a convolutional neural network to extract valuable local features. DeePhage makes better performance than the most related method PHACTS. The accuracy of DeePhage on five-fold validation reach as high as 88%, nearly 30% higher than PHACTS. Evaluation on real metavirome shows DeePhage annotated 54.4% of reliable contigs while PHACTS annotated 44.5%. While running on the same machine, DeePhage reduces computational time than PHACTS by 810 times. Besides, we proposed a new strategy to explore phage transformations in the microbial community by direct detection of the temperate viral fragments from metagenome and metavirome. The detectable transformation of temperate phages provided us a new insight into the potential treatment for human disease.ConclusionsDeePhage is the first tool that can rapidly and efficiently identify two kinds of phage fragments especially for metagenomics analysis with satisfactory performance. DeePhage is freely available via http://cqb.pku.edu.cn/ZhuLab/DeePhage or https://github.com/shufangwu/DeePhage.