scholarly journals CHEER: HierarCHical taxonomic classification for viral mEtagEnomic data via deep leaRning

Methods ◽  
2020 ◽  
Author(s):  
Jiayu Shang ◽  
Yanni Sun
2018 ◽  
Vol 19 (S7) ◽  
Author(s):  
Antonino Fiannaca ◽  
Laura La Paglia ◽  
Massimo La Rosa ◽  
Giosue’ Lo Bosco ◽  
Giovanni Renda ◽  
...  

2020 ◽  
Vol 8 (1) ◽  
pp. 64-77 ◽  
Author(s):  
Jie Ren ◽  
Kai Song ◽  
Chao Deng ◽  
Nathan A. Ahlgren ◽  
Jed A. Fuhrman ◽  
...  

2015 ◽  
Vol 15 (6) ◽  
pp. 1403-1414 ◽  
Author(s):  
Johan Bengtsson-Palme ◽  
Martin Hartmann ◽  
Karl Martin Eriksson ◽  
Chandan Pal ◽  
Kaisa Thorell ◽  
...  

2021 ◽  
Author(s):  
Gracielly G. F. Coutinho ◽  
Gabriel B. M. Câmara ◽  
Raquel de M. Barbosa ◽  
Marcelo A. C. Fernandes

Since December 2019, the world has been intensely affected by the COVID-19 pandemic, caused by the SARS-CoV-2 virus, first identified in Wuhan, China. In the case of a novel virus identification, the early elucidation of taxonomic classification and origin of the virus genomic sequence is essential for strategic planning, containment, and treatments. Deep learning techniques have been successfully used in many viral classification problems associated with viral infections diagnosis, metagenomics, phylogenetic, and analysis. This work proposes to generate an efficient viral genome classifier for the SARS-CoV-2 virus using the deep neural network (DNN) based on the stacked sparse autoencoder (SSAE) technique. We performed four different experiments to provide different levels of taxonomic classification of the SARS-CoV-2 virus. The confusion matrix presented the validation and test sets and the ROC curve for the validation set. In all experiments, the SSAE technique provided great performance results. In this work, we explored the utilization of image representations of the complete genome sequences as the SSAE input to provide a viral classification of the SARS-CoV-2. For that, a dataset based on k-mers image representation, with k=6, was applied. The results indicated the applicability of using this deep learning technique in genome classification problems.


2021 ◽  
Author(s):  
Seth Commichaux ◽  
Kiran Javkar ◽  
Harihara Subrahmaniam Muralidharan ◽  
Padmini Ramachandran ◽  
Andrea Ottesen ◽  
...  

Abstract BackgroundMicrobial eukaryotes are nearly ubiquitous in microbiomes on Earth and contribute to many integral ecological functions. Metagenomics is a proven tool for studying the microbial diversity, functions, and ecology of microbiomes, but has been underutilized for microeukaryotes due to the computational challenges they present. For taxonomic classification, the use of a eukaryotic marker gene database can improve the computational efficiency, precision and sensitivity. However, state-of-the-art tools which use marker gene databases implement universal thresholds for classification rather than dynamically learning the thresholds from the database structure, impacting the accuracy of the classification process.ResultsHere we introduce taxaTarget, a method for the taxonomic classification of microeukaryotes in metagenomic data. Using a database of eukaryotic marker genes and a supervised learning approach for training, we learned the discriminatory power and classification thresholds for each 20 amino acid region of each marker gene in our database. This approach provided improved sensitivity and precision compared to other state-of-the-art approaches, with rapid runtimes and low memory usage. Additionally, taxaTarget was better able to detect the presence of multiple closely related species as well as species with no representative sequences in the database. One of the greatest challenges faced during the development of taxaTarget was the general sparsity of available sequences for microeukaryotes. Several algorithms were implemented, including threshold padding, which effectively handled the missing training data and reduced classification errors. Using taxaTarget on metagenomes from human fecal microbiomes, a broader range of genera were detected, including multiple parasites that the other tested tools missed.ConclusionData-driven methods for learning classification thresholds from the structure of an input database can provide granular information about the discriminatory power of the sequences and improve the sensitivity and precision of classification. These methods will help facilitate a more comprehensive analysis of metagenomic data and expand our knowledge about the diverse eukaryotes in microbial communities.


Microbiome ◽  
2018 ◽  
Vol 6 (1) ◽  
Author(s):  
Gustavo Arango-Argoty ◽  
Emily Garner ◽  
Amy Pruden ◽  
Lenwood S. Heath ◽  
Peter Vikesland ◽  
...  

2016 ◽  
pp. gkw1248 ◽  
Author(s):  
David Ainsworth ◽  
Michael J.E. Sternberg ◽  
Come Raczy ◽  
Sarah A. Butcher

Sign in / Sign up

Export Citation Format

Share Document