Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies

Abstract Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification or insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex nonlinear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. Furthermore, when combined with the quartet puzzling algorithm, residual network predictors can be used to reconstruct trees with more than four taxa. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl; last accessed January 3, 2020).

Download Full-text

Deep residual neural networks resolve quartet molecular phylogenies

10.1101/787168 ◽

2019 ◽

Cited By ~ 1

Author(s):

Zhengting Zou ◽

Hongjiu Zhang ◽

Yuanfang Guan ◽

Jianzhi Zhang

Keyword(s):

Neural Networks ◽

Sequence Data ◽

Model Misspecification ◽

Phylogenetic Reconstruction ◽

Likelihood Method ◽

Primary Data ◽

Sequence Evolution ◽

Residual Network ◽

Inference Problems ◽

Protein Sequence Data

ABSTRACTPhylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification and insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex non-linear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl).

Download Full-text

Hyperparameters optimization for ResNet and Xception in the purpose of diagnosing COVID-19

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-210925 ◽

2021 ◽

pp. 1-17

Author(s):

Hania H. Farag ◽

Lamiaa A. A. Said ◽

Mohamed R. M. Rizk ◽

Magdy Abd ElAzim Ahmed

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Network ◽

Random Search ◽

Learning Networks ◽

Residual Network ◽

Global Pandemic ◽

Search Optimization

COVID-19 has been considered as a global pandemic. Recently, researchers are using deep learning networks for medical diseases’ diagnosis. Some of these researches focuses on optimizing deep learning neural networks for enhancing the network accuracy. Optimizing the Convolutional Neural Network includes testing various networks which are obtained through manually configuring their hyperparameters, then the configuration with the highest accuracy is implemented. Each time a different database is used, a different combination of the hyperparameters is required. This paper introduces two COVID-19 diagnosing systems using both Residual Network and Xception Network optimized by random search in the purpose of finding optimal models that give better diagnosis rates for COVID-19. The proposed systems showed that hyperparameters tuning for the ResNet and the Xception Net using random search optimization give more accurate results than other techniques with accuracies 99.27536% and 100 % respectively. We can conclude that hyperparameters tuning using random search optimization for either the tuned Residual Network or the tuned Xception Network gives better accuracies than other techniques diagnosing COVID-19.

Download Full-text

Learning, visualizing and exploring 16S rRNA structure using an attention-based deep neural network

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009345 ◽

2021 ◽

Vol 17 (9) ◽

pp. e1009345

Author(s):

Zhengqiao Zhao ◽

Stephen Woloszynek ◽

Felix Agbavor ◽

Joshua Chang Mell ◽

Bahrad A. Sokhansanj ◽

...

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Microbial Community ◽

Language Processing ◽

Recurrent Neural Networks ◽

Network Architecture ◽

Sequence Data ◽

Community Sample ◽

Marker Genes ◽

Link Type

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).

Download Full-text

Increasing of Thermal Images Resolution Using Deep Learning Neural Networks

Pomiary Automatyka Robotyka ◽

10.14313/par_241/31 ◽

2021 ◽

Vol 25 (3) ◽

pp. 31-35

Author(s):

Piotr Więcek ◽

Dominik Sankowski

Keyword(s):

Neural Network ◽

Neural Networks ◽

Deep Learning ◽

Execution Time ◽

High Accuracy ◽

New Method ◽

Residual Network ◽

Thermal Images ◽

The Neural Network

The article presents a new algorithm for increasing the resolution of thermal images. For this purpose, the residual network was integrated with the Kernel-Sharing Atrous Convolution (KSAC) image sub-sampling module. A significant reduction in the algorithm’s complexity and shortening the execution time while maintaining high accuracy were achieved. The neural network has been implemented in the PyTorch environment. The results of the proposed new method of increasing the resolution of thermal images with sizes 32 × 24, 160 × 120 and 640 × 480 for scales up to 6 are presented.

Download Full-text

Accurate deep learning off-target prediction with novel sgRNA-DNA sequence encoding in CRISPR-Cas9 gene editing

Bioinformatics ◽

10.1093/bioinformatics/btab112 ◽

2021 ◽

Author(s):

Jeremy Charlier ◽

Robert Nadon ◽

Vladimir Makarenkov

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Dna Sequence ◽

Dna Sequences ◽

Gene Editing ◽

Sequence Data ◽

Target Prediction ◽

Feedforward Neural Networks ◽

Strong Impact ◽

Sequence Encoding

Abstract Motivation Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information. Results In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers.We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular CRISPOR and GUIDE-seq gene editing data sets. In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%. Availability The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget

Download Full-text

A Coalescent-Based Method for Detecting and Estimating Recombination From Gene Sequences

Genetics ◽

10.1093/genetics/160.3.1231 ◽

2002 ◽

Vol 160 (3) ◽

pp. 1231-1241 ◽

Cited By ~ 5

Author(s):

Gil McVean ◽

Philip Awadalla ◽

Paul Fearnhead

Keyword(s):

Recombination Rate ◽

Evolutionary Biology ◽

Sequence Data ◽

Recurrent Mutation ◽

Likelihood Method ◽

Viral Population ◽

Sequence Evolution ◽

Infinite Sites Model ◽

High Level ◽

Population Recombination

Abstract Determining the amount of recombination in the genealogical history of a sample of genes is important to both evolutionary biology and medical population genetics. However, recurrent mutation can produce patterns of genetic diversity similar to those generated by recombination and can bias estimates of the population recombination rate. Hudson (2001) has suggested an approximate-likelihood method based on coalescent theory to estimate the population recombination rate, 4Ner, under an infinite-sites model of sequence evolution. Here we extend the method to the estimation of the recombination rate in genomes, such as those of many viruses and bacteria, where the rate of recurrent mutation is high. In addition, we develop a powerful permutation-based method for detecting recombination that is both more powerful than other permutation-based methods and robust to misspecification of the model of sequence evolution. We apply the method to sequence data from viruses, bacteria, and human mitochondrial DNA. The extremely high level of recombination detected in both HIV1 and HIV2 sequences demonstrates that recombination cannot be ignored in the analysis of viral population genetic data.

Download Full-text

Levenshtein Augmentation Improves Performance of SMILES Based Deep-Learning Synthesis Prediction

10.26434/chemrxiv.12562121 ◽

2020 ◽

Author(s):

Dean Sumner ◽

Jiazhen He ◽

Amol Thakkar ◽

Ola Engkvist ◽

Esben Jannik Bjerrum

Keyword(s):

Neural Networks ◽

Pattern Recognition ◽

Deep Learning ◽

Recurrent Neural Networks ◽

Data Augmentation ◽

State Of The Art ◽

Sequence Similarity ◽

Learning Models ◽

Underlying Network

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>

Download Full-text

Deep Learning through Convolutional Neural Networks for Classification of Image A Novel Approach Using Hyper Filter

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v7i6.164168 ◽

2019 ◽

Vol 7 (6) ◽

pp. 164-168

Author(s):

Kshitij Tripathi ◽

Rajendra G. Vyas ◽

Anil K. Gupta

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Novel Approach

Download Full-text

CONVOLUTIONAL NEURAL NETWORKS, ANALYTICAL ALGORITHMS, AND PERSONALIZED HEALTH CARE: EMBRACING THE MASSIVE DATA ANALYSIS CAPABILITIES OF DEEP LEARNING ARTIFICIAL INTELLIGENCE SYSTEMS TO COMPLEMENT AND IMPROVE MEDICAL SERVICES

American Journal of Medical Research ◽

10.22381/ajmr5220187 ◽

2018 ◽

Vol 5 (2) ◽

pp. 52 ◽

Cited By ~ 1

Keyword(s):

Artificial Intelligence ◽

Neural Networks ◽

Health Care ◽

Deep Learning ◽

Data Analysis ◽

Convolutional Neural Networks ◽

Massive Data ◽

Personalized Health ◽

Personalized Health Care ◽

Artificial Intelligence Systems

Download Full-text

Deep convolutional neural networks for cardiovascular vulnerable plaque detection

MATEC Web of Conferences ◽

10.1051/matecconf/201927702024 ◽

2019 ◽

Vol 277 ◽

pp. 02024 ◽

Cited By ~ 1

Author(s):

Lincan Li ◽

Tong Jia ◽

Tianqi Meng ◽

Yizhe Liu

Keyword(s):

Neural Networks ◽

Deep Learning ◽

Convolutional Neural Networks ◽

Vulnerable Plaque ◽

Recall Rate ◽

Superior Performance ◽

Learning Approaches ◽

Deep Convolutional Neural Networks ◽

Vulnerable Plaques ◽

Plaque Detection

In this paper, an accurate two-stage deep learning method is proposed to detect vulnerable plaques in ultrasonic images of cardiovascular. Firstly, a Fully Convonutional Neural Network (FCN) named U-Net is used to segment the original Intravascular Optical Coherence Tomography (IVOCT) cardiovascular images. We experiment on different threshold values to find the best threshold for removing noise and background in the original images. Secondly, a modified Faster RCNN is adopted to do precise detection. The modified Faster R-CNN utilize six-scale anchors (122,162,322,642,1282,2562) instead of the conventional one scale or three scale approaches. First, we present three problems in cardiovascular vulnerable plaque diagnosis, then we demonstrate how our method solve these problems. The proposed method in this paper apply deep convolutional neural networks to the whole diagnostic procedure. Test results show the Recall rate, Precision rate, IoU (Intersection-over-Union) rate and Total score are 0.94, 0.885, 0.913 and 0.913 respectively, higher than the 1st team of CCCV2017 Cardiovascular OCT Vulnerable Plaque Detection Challenge. AP of the designed Faster RCNN is 83.4%, higher than conventional approaches which use one-scale or three-scale anchors. These results demonstrate the superior performance of our proposed method and the power of deep learning approaches in diagnose cardiovascular vulnerable plaques.

Download Full-text