Mapping the glycosyltransferase fold landscape using deep learning

Glycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through 10 the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small 11 molecule substrates. The extensive structural and functional diversification of GTs presents a 12 major challenge in mapping the relationships connecting sequence, structure, fold and function 13 using traditional bioinformatics approaches. Here, we present a convolutional neural network 14 with attention (CNN-attention) based deep learning model that leverages simple secondary 15 structure representations generated from primary sequences to provide GT fold prediction with 16 high accuracy. The model learned distinguishing features free of primary sequence alignment 17 constraints and, unlike other models, is highly interpretable and helped identify common 18 secondary structural features shared by divergent families. The model delineated sequence and 19 structural features characteristic of individual fold types, while classifying them into distinct 20 clusters that group evolutionarily divergent families based on shared secondary structural 21 features. We further extend our model to classify GT families of unknown folds and variants of 22 known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and 23 GT97, our studies identify targets for future structural studies and expand the GT fold landscape.

Download Full-text

Mapping the glycosyltransferase fold landscape using interpretable deep learning

Nature Communications ◽

10.1038/s41467-021-25975-9 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Rahil Taujale ◽

Zhongliang Zhou ◽

Wayland Yeung ◽

Kelley W. Moremen ◽

Sheng Li ◽

...

Keyword(s):

Deep Learning ◽

Secondary Structure ◽

Structural Features ◽

Functional Diversification ◽

Sequence Structure ◽

Cellular Processes ◽

And Function ◽

Deep Learning Model ◽

Fold Prediction ◽

Primary Sequence Alignment

AbstractGlycosyltransferases (GTs) play fundamental roles in nearly all cellular processes through the biosynthesis of complex carbohydrates and glycosylation of diverse protein and small molecule substrates. The extensive structural and functional diversification of GTs presents a major challenge in mapping the relationships connecting sequence, structure, fold and function using traditional bioinformatics approaches. Here, we present a convolutional neural network with attention (CNN-attention) based deep learning model that leverages simple secondary structure representations generated from primary sequences to provide GT fold prediction with high accuracy. The model learns distinguishing secondary structure features free of primary sequence alignment constraints and is highly interpretable. It delineates sequence and structural features characteristic of individual fold types, while classifying them into distinct clusters that group evolutionarily divergent families based on shared secondary structural features. We further extend our model to classify GT families of unknown folds and variants of known folds. By identifying families that are likely to adopt novel folds such as GT91, GT96 and GT97, our studies expand the GT fold landscape and prioritize targets for future structural studies.

Download Full-text

Deep learning model for unstructured knowledge classification using structural features

Personal and Ubiquitous Computing ◽

10.1007/s00779-019-01244-x ◽

2019 ◽

Author(s):

Wonkyun Joo ◽

KiSeok Choi ◽

Young-Kuk Kim

Keyword(s):

Deep Learning ◽

Learning Model ◽

Structural Features ◽

Deep Learning Model ◽

Knowledge Classification

Download Full-text

Verifying explainability of a deep learning tissue classifier trained on RNA-seq data

Scientific Reports ◽

10.1038/s41598-021-81773-9 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Melvyn Yap ◽

Rebecca L. Johnston ◽

Helena Foley ◽

Samual MacDonald ◽

Olga Kondrashova ◽

...

Keyword(s):

Deep Learning ◽

Differential Expression ◽

Expression Analysis ◽

Differential Expression Analysis ◽

Tissue Expression ◽

Superior Performance ◽

Rna Seq ◽

Widespread Acceptance ◽

And Function ◽

Deep Learning Model

AbstractFor complex machine learning (ML) algorithms to gain widespread acceptance in decision making, we must be able to identify the features driving the predictions. Explainability models allow transparency of ML algorithms, however their reliability within high-dimensional data is unclear. To test the reliability of the explainability model SHapley Additive exPlanations (SHAP), we developed a convolutional neural network to predict tissue classification from Genotype-Tissue Expression (GTEx) RNA-seq data representing 16,651 samples from 47 tissues. Our classifier achieved an average F1 score of 96.1% on held-out GTEx samples. Using SHAP values, we identified the 2423 most discriminatory genes, of which 98.6% were also identified by differential expression analysis across all tissues. The SHAP genes reflected expected biological processes involved in tissue differentiation and function. Moreover, SHAP genes clustered tissue types with superior performance when compared to all genes, genes detected by differential expression analysis, or random genes. We demonstrate the utility and reliability of SHAP to explain a deep learning model and highlight the strengths of applying ML to transcriptome data.

Download Full-text

LPI-DL: A recurrent deep learning model for plant lncRNA-protein interaction and function prediction with feature optimization

2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm49941.2020.9313431 ◽

2020 ◽

Author(s):

Jael Sanyanda Wekesa ◽

Yushi Luan ◽

Jun Meng

Keyword(s):

Deep Learning ◽

Protein Interaction ◽

Learning Model ◽

Function Prediction ◽

Feature Optimization ◽

And Function ◽

Deep Learning Model

Download Full-text

ProteinBERT: A universal deep-learning model of protein sequence and function

10.1101/2021.05.24.445464 ◽

2021 ◽

Author(s):

Nadav Brandes ◽

Dan Ofer ◽

Yam Peleg ◽

Nadav Rappoport ◽

Michal Linial

Keyword(s):

Deep Learning ◽

Language Model ◽

Language Modeling ◽

Post Translational Modifications ◽

Go Annotation ◽

Architectural Elements ◽

Protein Properties ◽

And Function ◽

Deep Learning Model ◽

Biophysical Attributes

Self-supervised deep language modeling has shown unprecedented success across natural language tasks, and has recently been repurposed to biological sequences. However, existing models and pretraining methods are designed and optimized for text analysis. We introduce ProteinBERT, a deep language model specifically designed for proteins. Our pretraining scheme consists of masked language modeling combined with a novel task of Gene Ontology (GO) annotation prediction. We introduce novel architectural elements that make the model highly efficient and flexible to very large sequence lengths. The architecture of ProteinBERT consists of both local and global representations, allowing end-to-end processing of these types of inputs and outputs. ProteinBERT obtains state-of-the-art performance on multiple benchmarks covering diverse protein properties (including protein structure, post translational modifications and biophysical attributes), despite using a far smaller model than competing deep-learning methods. Overall, ProteinBERT provides an efficient framework for rapidly training protein predictors, even with limited labeled data. Code and pretrained model weights are available at https://github.com/nadavbra/protein_bert.

Download Full-text

A Deep Learning Model to Recognize and Quantitatively Analyze Cold Seep Substrates and the Dominant Associated Species

Frontiers in Marine Science ◽

10.3389/fmars.2021.775433 ◽

2021 ◽

Vol 8 ◽

Author(s):

Haining Wang ◽

Xiaoxue Fu ◽

Chengqian Zhao ◽

Zhendong Luan ◽

Chaolun Li

Keyword(s):

Deep Learning ◽

Large Scale ◽

Recognition Accuracy ◽

Learning Model ◽

Cold Seep ◽

Cold Seeps ◽

Promising Tool ◽

And Function ◽

Associated Species ◽

Deep Learning Model

Characterizing habitats and species distribution is important to understand the structure and function of cold seep ecosystems. This paper develops a deep learning model for the fast and accurate recognition and classification of substrates and the dominant associated species in cold seeps. Considering the dense distribution of the dominant associated species and small objects caused by overlap in cold seeps, the feature pyramid network (FPN) embed into the faster region-convolutional neural network (R-CNN) was used to detect large-scale changes and small missing objects without increasing the number of calculations. We applied three classifiers (Faster R-CNN + FPN for mussel beds, lobster clusters and biological mixing, CNN for shell debris and exposed authigenic carbonates, and VGG16 for reduced sediments and muddy bottom) to improve the recognition accuracy of substrates. The model’s results were manually verified using images obtained in the Formosa cold seep during a 2016 cruise. The recognition accuracy of the two dominant species, e.g., Gigantidas platifrons and Munidopsidae could be 70.85 and 56.16%, respectively. Seven subcategories of substrates were also classified with a mean accuracy of 74.87%. The developed model is a promising tool for the fast and accurate characterization of substrates and epifauna in cold seeps, which is crucial for large-scale quantitative analyses.

Download Full-text

Deep learning of material transport in complex neurite networks

Scientific Reports ◽

10.1038/s41598-021-90724-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Angran Li ◽

Amir Barati Farimani ◽

Yongjie Jessica Zhang

Keyword(s):

Deep Learning ◽

Transport Process ◽

Complex Geometry ◽

Biomedical Application ◽

Computation Time ◽

Average Error ◽

Material Transport ◽

Proposed Model ◽

And Function ◽

Deep Learning Model

AbstractNeurons exhibit complex geometry in their branched networks of neurites which is essential to the function of individual neuron but also brings challenges to transport a wide variety of essential materials throughout their neurite networks for their survival and function. While numerical methods like isogeometric analysis (IGA) have been used for modeling the material transport process via solving partial differential equations (PDEs), they require long computation time and huge computation resources to ensure accurate geometry representation and solution, thus limit their biomedical application. Here we present a graph neural network (GNN)-based deep learning model to learn the IGA-based material transport simulation and provide fast material concentration prediction within neurite networks of any topology. Given input boundary conditions and geometry configurations, the well-trained model can predict the dynamical concentration change during the transport process with an average error less than 10% and $$120 \sim 330$$ 120 ∼ 330 times faster compared to IGA simulations. The effectiveness of the proposed model is demonstrated within several complex neurite networks.

Download Full-text

Deep Learning Model Selection of Suboptimal Complexity

Автоматика и телемеханика ◽

10.31857/s000523100001252-1 ◽

2018 ◽

pp. 129-147

Author(s):

Oleg Bakhteev ◽

◽

Vadim Strijov ◽

Keyword(s):

Deep Learning ◽

Model Selection ◽

Learning Model ◽

Deep Learning Model ◽

Selection Of

Download Full-text

Improving Sentiment Analysis using Hybrid Deep Learning Model

Recent Advances in Computer Science and Communications ◽

10.2174/2213275912666190328200012 ◽

2020 ◽

Vol 13 (4) ◽

pp. 627-640 ◽

Cited By ~ 1

Author(s):

Avinash Chandra Pandey ◽

Dharmveer Singh Rajpoot

Keyword(s):

Neural Network ◽

Deep Learning ◽

Sentiment Analysis ◽

Classification Accuracy ◽

Short Term Memory ◽

Computational Cost ◽

Extraction Process ◽

Learning Model ◽

Sentiment Classification ◽

Deep Learning Model

Background: Sentiment analysis is a contextual mining of text which determines viewpoint of users with respect to some sentimental topics commonly present at social networking websites. Twitter is one of the social sites where people express their opinion about any topic in the form of tweets. These tweets can be examined using various sentiment classification methods to find the opinion of users. Traditional sentiment analysis methods use manually extracted features for opinion classification. The manual feature extraction process is a complicated task since it requires predefined sentiment lexicons. On the other hand, deep learning methods automatically extract relevant features from data hence; they provide better performance and richer representation competency than the traditional methods. Objective: The main aim of this paper is to enhance the sentiment classification accuracy and to reduce the computational cost. Method: To achieve the objective, a hybrid deep learning model, based on convolution neural network and bi-directional long-short term memory neural network has been introduced. Results: The proposed sentiment classification method achieves the highest accuracy for the most of the datasets. Further, from the statistical analysis efficacy of the proposed method has been validated. Conclusion: Sentiment classification accuracy can be improved by creating veracious hybrid models. Moreover, performance can also be enhanced by tuning the hyper parameters of deep leaning models.

Download Full-text

Deep Learning Model Comparison for Vision-Based Classification of Full/Empty-Load Trucks in Earthmoving Operations

Applied Sciences ◽

10.3390/app9224871 ◽

2019 ◽

Vol 9 (22) ◽

pp. 4871 ◽

Cited By ~ 4

Author(s):

Quan Liu ◽

Chen Feng ◽

Zida Song ◽

Joseph Louis ◽

Jian Zhou

Keyword(s):

Deep Learning ◽

Model Comparison ◽

Surveillance Systems ◽

Comparison Study ◽

Learning Models ◽

The Core ◽

Dump Trucks ◽

Deep Learning Model ◽

Contact Field

Earthmoving is an integral civil engineering operation of significance, and tracking its productivity requires the statistics of loads moved by dump trucks. Since current truck loads’ statistics methods are laborious, costly, and limited in application, this paper presents the framework of a novel, automated, non-contact field earthmoving quantity statistics (FEQS) for projects with large earthmoving demands that use uniform and uncovered trucks. The proposed FEQS framework utilizes field surveillance systems and adopts vision-based deep learning for full/empty-load truck classification as the core work. Since convolutional neural network (CNN) and its transfer learning (TL) forms are popular vision-based deep learning models and numerous in type, a comparison study is conducted to test the framework’s core work feasibility and evaluate the performance of different deep learning models in implementation. The comparison study involved 12 CNN or CNN-TL models in full/empty-load truck classification, and the results revealed that while several provided satisfactory performance, the VGG16-FineTune provided the optimal performance. This proved the core work feasibility of the proposed FEQS framework. Further discussion provides model choice suggestions that CNN-TL models are more feasible than CNN prototypes, and models that adopt different TL methods have advantages in either working accuracy or speed for different tasks.

Download Full-text