scholarly journals Identifying viruses from metagenomic data using deep learning

2020 ◽  
Vol 8 (1) ◽  
pp. 64-77 ◽  
Author(s):  
Jie Ren ◽  
Kai Song ◽  
Chao Deng ◽  
Nathan A. Ahlgren ◽  
Jed A. Fuhrman ◽  
...  
2018 ◽  
Vol 19 (S7) ◽  
Author(s):  
Antonino Fiannaca ◽  
Laura La Paglia ◽  
Massimo La Rosa ◽  
Giosue’ Lo Bosco ◽  
Giovanni Renda ◽  
...  

Microbiome ◽  
2018 ◽  
Vol 6 (1) ◽  
Author(s):  
Gustavo Arango-Argoty ◽  
Emily Garner ◽  
Amy Pruden ◽  
Lenwood S. Heath ◽  
Peter Vikesland ◽  
...  

2021 ◽  
Vol 1 (3) ◽  
pp. 138-165
Author(s):  
Thomas Krause ◽  
Jyotsna Talreja Wassan ◽  
Paul Mc Kevitt ◽  
Haiying Wang ◽  
Huiru Zheng ◽  
...  

Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw data, which is a challenge for data processing but also an opportunity for advanced machine learning methods like deep learning that require large datasets. However, in contrast to classical machine learning algorithms, the use of deep learning in metagenomics is still an exception. Regardless of the algorithms used, they are usually not applied to raw data but require several preprocessing steps. Performing this preprocessing and the actual analysis in an automated, reproducible, and scalable way is another challenge. This and other challenges can be addressed by adjusting known big data methods and architectures to the needs of microbiome analysis and DNA sequence processing. A conceptual architecture for the use of machine learning and big data on metagenomic data sets was recently presented and initially validated to analyze the rumen microbiome. The same architecture can be used for clinical purposes as is discussed in this paper.


2017 ◽  
Author(s):  
G. A. Arango-Argoty ◽  
E. Garner ◽  
A. Pruden ◽  
L. S. Heath ◽  
P. Vikesland ◽  
...  

ABSTRACTGrowing concerns regarding increasing rates of antibiotic resistance call for global monitoring efforts. Monitoring of environmental media (e.g., wastewater, agricultural waste, food, and water) is of particular interest as these media can serve as sources of potential novel antibiotic resistance genes (ARGs), as hot spots for ARG exchange, and as pathways for the spread of ARGs and human exposure. Next-generation sequence-based monitoring has recently enabled direct access and profiling of the total metagenomic DNA pool, where ARGs are identified or predicted based on the “best hits” of homology searches against existing databases. Unfortunately, this approach tends to produce high rates of false negatives. To address such limitations, we propose here a deep leaning approach, taking into account a dissimilarity matrix created using all known categories of ARGs. Two models, deepARG-SS and deepARG-LS, were constructed for short read sequences and full gene length sequences, respectively. Performance evaluation of the deep learning models over 30 classes of antibiotics demonstrates that the deepARG models can predict ARGs with both high precision (>0.97) and recall (>0.90) for most of the antibiotic resistance categories. The models show advantage over the traditional best hit approach by having consistently much lower false negative rates and thus higher overall recall (>0.9). As more data become available for under-represented antibiotic resistance categories, the deepARG models’ performance can be expected to be further enhanced due to the nature of the underlying neural networks. The deepARG models are available both in command line version and via a Web server at http://bench.cs.vt.edu/deeparg. Our newly developed ARG database, deepARG-DB, containing predicted ARGs with high confidence and high degree of manual curation, greatly expands the current ARG repository. DeepARG-DB can be downloaded freely to benefit community research and future development of antibiotic resistance-related resources.AbbreviationsARGantibiotic resistance gene


2021 ◽  
Author(s):  
Jie Tan ◽  
Zhencheng Fang ◽  
Shufang Wu ◽  
Qian Guo ◽  
Xiaoqing Jiang ◽  
...  

AbstractPhages - viruses that infect bacteria and archaea - are dominant in the virosphere and play an important role in the microbial community. It is very important to identify the host of a given phage fragment from metavriome data for understanding the ecological impact of phage in a microbial community. State-of-the-art tools for host identification only present reliable results on long sequences within a narrow candidate host range, while there are a large number of short fragments in real metagenomic data and the taxonomic composition of a microbial community is often complicated. Here, we present a method, named HoPhage, to identify the host of a given phage fragment from metavirome data at the genus level. HoPhage integrates two modules using the deep learning algorithms and the Markov chain model, respectively. By testing on both the artificial benchmark dataset of phage contigs and the real virome data, HoPhage demonstrates a satisfactory performance on short fragments within a wide candidate host range at every taxonomic level. HoPhage is freely available at http://cqb.pku.edu.cn/ZhuLab/HoPhage/.


2021 ◽  
Author(s):  
Michał Karlicki ◽  
Stanisław Antonowicz ◽  
Anna Karnkowska

AbstractMotivationWith a large number of metagenomic datasets becoming available, the eukaryotic metagenomics emerged as a new challenge. The proper classification of eukaryotic nuclear and organellar genomes is an essential step towards the better understanding of eukaryotic diversity.ResultsWe developed Tiara, a deep-learning-based approach for identification of eukaryotic sequences in the metagenomic data sets. Its two-step classification process enables the classification of nuclear and organellar eukaryotic fractions and subsequently divides organellar sequences to plastidial and mitochondrial. Using test dataset, we have shown that Tiara performs similarly to EukRep for prokaryotes classification and outperformed it for eukaryotes classification with lower calculation time. Tiara is also the only available tool correctly classifying organellar sequences.Availability and implementationTiara is implemented in python 3.8, available at https://github.com/ibe-uw/tiara and tested on Unix-based systems. It is released under an open-source MIT license and documentation is available at https://ibe-uw.github.io/tiara. Version 1.0.1 of Tiara has been used for all benchmarks.


2020 ◽  
Vol 36 (10) ◽  
pp. 3239-3241 ◽  
Author(s):  
Zhencheng Fang ◽  
Jie Tan ◽  
Shufang Wu ◽  
Mo Li ◽  
Chunhui Wang ◽  
...  

Abstract Summary We present the first tool of gene prediction, PlasGUN, for plasmid metagenomic short-read data. The tool, developed based on deep learning algorithm of multiple input Convolutional Neural Network, demonstrates much better performance when tested on a benchmark dataset of artificial short reads and presents more reliable results for real plasmid metagenomic data than traditional gene prediction tools designed primarily for chromosome-derived short reads. Availability and implementation The PlasGUN software is available at http://cqb.pku.edu.cn/ZhuLab/PlasGUN/ or https://github.com/zhenchengfang/PlasGUN/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document