scholarly journals Deep Residual Neural Networks Resolve Quartet Molecular Phylogenies

2019 ◽  
Vol 37 (5) ◽  
pp. 1495-1507 ◽  
Author(s):  
Zhengting Zou ◽  
Hongjiu Zhang ◽  
Yuanfang Guan ◽  
Jianzhi Zhang

Abstract Phylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification or insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex nonlinear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. Furthermore, when combined with the quartet puzzling algorithm, residual network predictors can be used to reconstruct trees with more than four taxa. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl; last accessed January 3, 2020).

2019 ◽  
Author(s):  
Zhengting Zou ◽  
Hongjiu Zhang ◽  
Yuanfang Guan ◽  
Jianzhi Zhang

ABSTRACTPhylogenetic inference is of fundamental importance to evolutionary as well as other fields of biology, and molecular sequences have emerged as the primary data for this task. Although many phylogenetic methods have been developed to explicitly take into account substitution models of sequence evolution, such methods could fail due to model misspecification and insufficiency, especially in the face of heterogeneities in substitution processes across sites and among lineages. In this study, we propose to infer topologies of four-taxon trees using deep residual neural networks, a machine learning approach needing no explicit modeling of the subject system and having a record of success in solving complex non-linear inference problems. We train residual networks on simulated protein sequence data with extensive amino acid substitution heterogeneities. We show that the well-trained residual network predictors can outperform existing state-of-the-art inference methods such as the maximum likelihood method on diverse simulated test data, especially under extensive substitution heterogeneities. Reassuringly, residual network predictors generally agree with existing methods in the trees inferred from real phylogenetic data with known or widely believed topologies. We conclude that deep learning represents a powerful new approach to phylogenetic reconstruction, especially when sequences evolve via heterogeneous substitution processes. We present our best trained predictor in a freely available program named Phylogenetics by Deep Learning (PhyDL, https://gitlab.com/ztzou/phydl).


2021 ◽  
pp. 1-17
Author(s):  
Hania H. Farag ◽  
Lamiaa A. A. Said ◽  
Mohamed R. M. Rizk ◽  
Magdy Abd ElAzim Ahmed

COVID-19 has been considered as a global pandemic. Recently, researchers are using deep learning networks for medical diseases’ diagnosis. Some of these researches focuses on optimizing deep learning neural networks for enhancing the network accuracy. Optimizing the Convolutional Neural Network includes testing various networks which are obtained through manually configuring their hyperparameters, then the configuration with the highest accuracy is implemented. Each time a different database is used, a different combination of the hyperparameters is required. This paper introduces two COVID-19 diagnosing systems using both Residual Network and Xception Network optimized by random search in the purpose of finding optimal models that give better diagnosis rates for COVID-19. The proposed systems showed that hyperparameters tuning for the ResNet and the Xception Net using random search optimization give more accurate results than other techniques with accuracies 99.27536% and 100 % respectively. We can conclude that hyperparameters tuning using random search optimization for either the tuned Residual Network or the tuned Xception Network gives better accuracies than other techniques diagnosing COVID-19.


2021 ◽  
Vol 17 (9) ◽  
pp. e1009345
Author(s):  
Zhengqiao Zhao ◽  
Stephen Woloszynek ◽  
Felix Agbavor ◽  
Joshua Chang Mell ◽  
Bahrad A. Sokhansanj ◽  
...  

Recurrent neural networks with memory and attention mechanisms are widely used in natural language processing because they can capture short and long term sequential information for diverse tasks. We propose an integrated deep learning model for microbial DNA sequence data, which exploits convolutional neural networks, recurrent neural networks, and attention mechanisms to predict taxonomic classifications and sample-associated attributes, such as the relationship between the microbiome and host phenotype, on the read/sequence level. In this paper, we develop this novel deep learning approach and evaluate its application to amplicon sequences. We apply our approach to short DNA reads and full sequences of 16S ribosomal RNA (rRNA) marker genes, which identify the heterogeneity of a microbial community sample. We demonstrate that our implementation of a novel attention-based deep network architecture, Read2Pheno, achieves read-level phenotypic prediction. Training Read2Pheno models will encode sequences (reads) into dense, meaningful representations: learned embedded vectors output from the intermediate layer of the network model, which can provide biological insight when visualized. The attention layer of Read2Pheno models can also automatically identify nucleotide regions in reads/sequences which are particularly informative for classification. As such, this novel approach can avoid pre/post-processing and manual interpretation required with conventional approaches to microbiome sequence classification. We further show, as proof-of-concept, that aggregating read-level information can robustly predict microbial community properties, host phenotype, and taxonomic classification, with performance at least comparable to conventional approaches. An implementation of the attention-based deep learning network is available at https://github.com/EESI/sequence_attention (a python package) and https://github.com/EESI/seq2att (a command line tool).


2021 ◽  
Vol 25 (3) ◽  
pp. 31-35
Author(s):  
Piotr Więcek ◽  
Dominik Sankowski

The article presents a new algorithm for increasing the resolution of thermal images. For this purpose, the residual network was integrated with the Kernel-Sharing Atrous Convolution (KSAC) image sub-sampling module. A significant reduction in the algorithm’s complexity and shortening the execution time while maintaining high accuracy were achieved. The neural network has been implemented in the PyTorch environment. The results of the proposed new method of increasing the resolution of thermal images with sizes 32 × 24, 160 × 120 and 640 × 480 for scales up to 6 are presented.


Author(s):  
Jeremy Charlier ◽  
Robert Nadon ◽  
Vladimir Makarenkov

Abstract Motivation Off-target predictions are crucial in gene editing research. Recently, significant progress has been made in the field of prediction of off-target mutations, particularly with CRISPR-Cas9 data, thanks to the use of deep learning. CRISPR-Cas9 is a gene editing technique which allows manipulation of DNA fragments. The sgRNA-DNA (single guide RNA-DNA) sequence encoding for deep neural networks, however, has a strong impact on the prediction accuracy. We propose a novel encoding of sgRNA-DNA sequences that aggregates sequence data with no loss of information. Results In our experiments, we compare the proposed sgRNA-DNA sequence encoding applied in a deep learning prediction framework with state-of-the-art encoding and prediction methods. We demonstrate the superior accuracy of our approach in a simulation study involving Feedforward Neural Networks (FNNs), Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) as well as the traditional Random Forest (RF), Naive Bayes (NB) and Logistic Regression (LR) classifiers.We highlight the quality of our results by building several FNNs, CNNs and RNNs with various layer depths and performing predictions on two popular CRISPOR and GUIDE-seq gene editing data sets. In all our experiments, the new encoding led to more accurate off-target prediction results, providing an improvement of the area under the Receiver Operating Characteristic (ROC) curve up to 35%. Availability The code and data used in this study are available at: https://github.com/dagrate/dl-offtarget


Genetics ◽  
2002 ◽  
Vol 160 (3) ◽  
pp. 1231-1241 ◽  
Author(s):  
Gil McVean ◽  
Philip Awadalla ◽  
Paul Fearnhead

Abstract Determining the amount of recombination in the genealogical history of a sample of genes is important to both evolutionary biology and medical population genetics. However, recurrent mutation can produce patterns of genetic diversity similar to those generated by recombination and can bias estimates of the population recombination rate. Hudson (2001) has suggested an approximate-likelihood method based on coalescent theory to estimate the population recombination rate, 4Ner, under an infinite-sites model of sequence evolution. Here we extend the method to the estimation of the recombination rate in genomes, such as those of many viruses and bacteria, where the rate of recurrent mutation is high. In addition, we develop a powerful permutation-based method for detecting recombination that is both more powerful than other permutation-based methods and robust to misspecification of the model of sequence evolution. We apply the method to sequence data from viruses, bacteria, and human mitochondrial DNA. The extremely high level of recombination detected in both HIV1 and HIV2 sequences demonstrates that recombination cannot be ignored in the analysis of viral population genetic data.


2020 ◽  
Author(s):  
Dean Sumner ◽  
Jiazhen He ◽  
Amol Thakkar ◽  
Ola Engkvist ◽  
Esben Jannik Bjerrum

<p>SMILES randomization, a form of data augmentation, has previously been shown to increase the performance of deep learning models compared to non-augmented baselines. Here, we propose a novel data augmentation method we call “Levenshtein augmentation” which considers local SMILES sub-sequence similarity between reactants and their respective products when creating training pairs. The performance of Levenshtein augmentation was tested using two state of the art models - transformer and sequence-to-sequence based recurrent neural networks with attention. Levenshtein augmentation demonstrated an increase performance over non-augmented, and conventionally SMILES randomization augmented data when used for training of baseline models. Furthermore, Levenshtein augmentation seemingly results in what we define as <i>attentional gain </i>– an enhancement in the pattern recognition capabilities of the underlying network to molecular motifs.</p>


2019 ◽  
Vol 277 ◽  
pp. 02024 ◽  
Author(s):  
Lincan Li ◽  
Tong Jia ◽  
Tianqi Meng ◽  
Yizhe Liu

In this paper, an accurate two-stage deep learning method is proposed to detect vulnerable plaques in ultrasonic images of cardiovascular. Firstly, a Fully Convonutional Neural Network (FCN) named U-Net is used to segment the original Intravascular Optical Coherence Tomography (IVOCT) cardiovascular images. We experiment on different threshold values to find the best threshold for removing noise and background in the original images. Secondly, a modified Faster RCNN is adopted to do precise detection. The modified Faster R-CNN utilize six-scale anchors (122,162,322,642,1282,2562) instead of the conventional one scale or three scale approaches. First, we present three problems in cardiovascular vulnerable plaque diagnosis, then we demonstrate how our method solve these problems. The proposed method in this paper apply deep convolutional neural networks to the whole diagnostic procedure. Test results show the Recall rate, Precision rate, IoU (Intersection-over-Union) rate and Total score are 0.94, 0.885, 0.913 and 0.913 respectively, higher than the 1st team of CCCV2017 Cardiovascular OCT Vulnerable Plaque Detection Challenge. AP of the designed Faster RCNN is 83.4%, higher than conventional approaches which use one-scale or three-scale anchors. These results demonstrate the superior performance of our proposed method and the power of deep learning approaches in diagnose cardiovascular vulnerable plaques.


Sign in / Sign up

Export Citation Format

Share Document