A Virtual Machine Platform for Non-Computer Professionals for Using Deep Learning to Classify Biological Sequences of Metagenomic Data

Author(s):  
Zhencheng Fang ◽  
Hongwei Zhou
2020 ◽  
Vol 8 (1) ◽  
pp. 64-77 ◽  
Author(s):  
Jie Ren ◽  
Kai Song ◽  
Chao Deng ◽  
Nathan A. Ahlgren ◽  
Jed A. Fuhrman ◽  
...  

2019 ◽  
Author(s):  
Eva Malta ◽  
Charles Rodamilans ◽  
Sandra Avila ◽  
Edson Borin

This paper analyzes the cost-benefit of using EC2 instances, specif- ically the p2 and p3 virtual machine types, which have GPU accelerators, to execute a machine learning algorithm. This analysis includes the runtime of a convolutional neural network executions, and it takes into consideration the necessary time to stabilize the accuracy value with different batch sizes. Also, we measure the cost of using each machine type, and we define a relation be- tween this cost and the execution time for each virtual machine. The results show that, although the price per hour of the p3 instance is three times bigger, it is faster and costs almost the same as the p2 instance type to train the deep learning algorithm.


2021 ◽  
Author(s):  
Quinn Dickinson ◽  
Jesse G. Meyer

AbstractMachine learning with artificial neural networks, also known as “deep learning”, accurately predicts biological phenomena such as disease diagnosis and protein structure. Despite the ability of deep learning to make accurate biological predictions, a challenge is model interpretation, which is especially challenging for recurrent neural network architectures due to the sequential input data. Here we train multi-output long short-term memory (LSTM) regression models to predict peptide binding affinity to five rhesus macaque major histocompatibility complex (MHC) I alleles. We adapt SHapely Additive exPlanations (SHAP) to generate positional model interpretations of which amino acids are important for peptide binding. These positional SHAP values reproduced known rhesus macaque MHC class I (Mamu-A1*001) peptide binding motifs and provided insights into inter-positional dependencies of peptide-MHC interactions. Positional SHAP should find widespread utility for interpreting a variety of models trained from biological sequences.


2018 ◽  
Vol 19 (S7) ◽  
Author(s):  
Antonino Fiannaca ◽  
Laura La Paglia ◽  
Massimo La Rosa ◽  
Giosue’ Lo Bosco ◽  
Giovanni Renda ◽  
...  

Microbiome ◽  
2018 ◽  
Vol 6 (1) ◽  
Author(s):  
Gustavo Arango-Argoty ◽  
Emily Garner ◽  
Amy Pruden ◽  
Lenwood S. Heath ◽  
Peter Vikesland ◽  
...  

2021 ◽  
Vol 1 (3) ◽  
pp. 138-165
Author(s):  
Thomas Krause ◽  
Jyotsna Talreja Wassan ◽  
Paul Mc Kevitt ◽  
Haiying Wang ◽  
Huiru Zheng ◽  
...  

Metagenomics promises to provide new valuable insights into the role of microbiomes in eukaryotic hosts such as humans. Due to the decreasing costs for sequencing, public and private repositories for human metagenomic datasets are growing fast. Metagenomic datasets can contain terabytes of raw data, which is a challenge for data processing but also an opportunity for advanced machine learning methods like deep learning that require large datasets. However, in contrast to classical machine learning algorithms, the use of deep learning in metagenomics is still an exception. Regardless of the algorithms used, they are usually not applied to raw data but require several preprocessing steps. Performing this preprocessing and the actual analysis in an automated, reproducible, and scalable way is another challenge. This and other challenges can be addressed by adjusting known big data methods and architectures to the needs of microbiome analysis and DNA sequence processing. A conceptual architecture for the use of machine learning and big data on metagenomic data sets was recently presented and initially validated to analyze the rumen microbiome. The same architecture can be used for clinical purposes as is discussed in this paper.


2021 ◽  
Author(s):  
Zijun Zhang ◽  
Evan M. Cofer ◽  
Olga G. Troyanskaya

Convolutional neural networks (CNN) have become a standard approach for modeling genomic sequences. CNNs can be effectively built by Neural Architecture Search (NAS) by trading computing power for accurate neural architectures. Yet, the consumption of immense computing power is a major practical, financial, and environmental issue for deep learning. Here, we present a novel NAS framework, AMBIENT, that generates highly accurate CNN architectures for biological sequences of diverse functions, while substantially reducing the computing cost of conventional NAS.


Sign in / Sign up

Export Citation Format

Share Document