SCLpred-EMS: subcellular localization prediction of endomembrane system and secretory pathway proteins by Deep N-to-1 Convolutional Neural Networks

Abstract Motivation The subcellular location of a protein can provide useful information for protein function prediction and drug design. Experimentally determining the subcellular location of a protein is an expensive and time-consuming task. Therefore, various computer-based tools have been developed, mostly using machine learning algorithms, to predict the subcellular location of proteins. Results Here, we present a neural network-based algorithm for protein subcellular location prediction. We introduce SCLpred-EMS a subcellular localization predictor powered by an ensemble of Deep N-to-1 Convolutional Neural Networks. SCLpred-EMS predicts the subcellular location of a protein into two classes, the endomembrane system and secretory pathway versus all others, with a Matthews correlation coefficient of 0.75–0.86 outperforming the other state-of-the-art web servers we tested. Availability and implementation SCLpred-EMS is freely available for academic users at http://distilldeep.ucd.ie/SCLpred2/. Contact [email protected]

Download Full-text

MirLocPredictor: A ConvNet-Based Multi-Label MicroRNA Subcellular Localization Predictor by Incorporating k-Mer Positional Information

Genes ◽

10.3390/genes11121475 ◽

2020 ◽

Vol 11 (12) ◽

pp. 1475

Author(s):

Muhammad Nabeel Asim ◽

Muhammad Imran Malik ◽

Christoph Zehe ◽

Johan Trygg ◽

Andreas Dengel ◽

...

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Convolutional Neural Networks ◽

Noncoding Rna ◽

State Of The Art ◽

Subcellular Location ◽

Positional Information ◽

Nucleotide Position ◽

Discriminative Power ◽

Rna Sequences

MicroRNAs (miRNA) are small noncoding RNA sequences consisting of about 22 nucleotides that are involved in the regulation of almost 60% of mammalian genes. Presently, there are very limited approaches for the visualization of miRNA locations present inside cells to support the elucidation of pathways and mechanisms behind miRNA function, transport, and biogenesis. MIRLocator, a state-of-the-art tool for the prediction of subcellular localization of miRNAs makes use of a sequence-to-sequence model along with pretrained k-mer embeddings. Existing pretrained k-mer embedding generation methodologies focus on the extraction of semantics of k-mers. However, in RNA sequences, positional information of nucleotides is more important because distinct positions of the four nucleotides define the function of an RNA molecule. Considering the importance of the nucleotide position, we propose a novel approach (kmerPR2vec) which is a fusion of positional information of k-mers with randomly initialized neural k-mer embeddings. In contrast to existing k-mer-based representation, the proposed kmerPR2vec representation is much more rich in terms of semantic information and has more discriminative power. Using novel kmerPR2vec representation, we further present an end-to-end system (MirLocPredictor) which couples the discriminative power of kmerPR2vec with Convolutional Neural Networks (CNNs) for miRNA subcellular location prediction. The effectiveness of the proposed kmerPR2vec approach is evaluated with deep learning-based topologies (i.e., Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN)) and by using 9 different evaluation measures. Analysis of the results reveals that MirLocPredictor outperform state-of-the-art methods with a significant margin of 18% and 19% in terms of precision and recall.

Download Full-text

Protein subcellular localization based on deep image features and criterion learning strategy

Briefings in Bioinformatics ◽

10.1093/bib/bbaa313 ◽

2020 ◽

Author(s):

Ran Su ◽

Linlin He ◽

Tianling Liu ◽

Xiaofeng Liu ◽

Leyi Wei

Keyword(s):

Neural Networks ◽

Subcellular Localization ◽

Learning Strategy ◽

Subcellular Location ◽

Image Features ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Protein Functions ◽

Deep Image ◽

Criterion Learning

Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.

Download Full-text

Convolutional neural networks with image representation of amino acid sequences for protein function prediction

Computational Biology and Chemistry ◽

10.1016/j.compbiolchem.2021.107494 ◽

2021 ◽

Vol 92 ◽

pp. 107494

Author(s):

Samia Tasnim Sara ◽

Md Mehedi Hasan ◽

Ahsan Ahmad ◽

Swakkhar Shatabda

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Convolutional Neural Networks ◽

Protein Function ◽

Protein Function Prediction ◽

Image Representation ◽

Function Prediction ◽

Amino Acid Sequences

Download Full-text

Predicting Human Protein Function with Multi-task Deep Neural Networks

10.1101/256420 ◽

2018 ◽

Cited By ~ 1

Author(s):

Rui Fa ◽

Domenico Cozzetto ◽

Cen Wan ◽

David T. Jones

Keyword(s):

Machine Learning ◽

Neural Networks ◽

Deep Learning ◽

Protein Function ◽

Deep Neural Networks ◽

Protein Function Prediction ◽

Function Prediction ◽

Machine Learning Algorithms ◽

Medium Size ◽

Prediction Ability

AbstractMachine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.

Download Full-text

Advances in the Prediction of Protein Subcellular Locations with Machine Learning

Current Bioinformatics ◽

10.2174/1574893614666181217145156 ◽

2019 ◽

Vol 14 (5) ◽

pp. 406-421 ◽

Cited By ~ 3

Author(s):

Ting-He Zhang ◽

Shao-Wu Zhang

Keyword(s):

Machine Learning ◽

Feature Fusion ◽

Protein Sequences ◽

Subcellular Location ◽

Automated Analysis ◽

Cellular Level ◽

Machine Learning Algorithms ◽

Feature Representation ◽

Protein Subcellular Location ◽

Protein Subcellular Locations

Background: Revealing the subcellular location of a newly discovered protein can bring insight into their function and guide research at the cellular level. The experimental methods currently used to identify the protein subcellular locations are both time-consuming and expensive. Thus, it is highly desired to develop computational methods for efficiently and effectively identifying the protein subcellular locations. Especially, the rapidly increasing number of protein sequences entering the genome databases has called for the development of automated analysis methods. Methods: In this review, we will describe the recent advances in predicting the protein subcellular locations with machine learning from the following aspects: i) Protein subcellular location benchmark dataset construction, ii) Protein feature representation and feature descriptors, iii) Common machine learning algorithms, iv) Cross-validation test methods and assessment metrics, v) Web servers. Result & Conclusion: Concomitant with a large number of protein sequences generated by highthroughput technologies, four future directions for predicting protein subcellular locations with machine learning should be paid attention. One direction is the selection of novel and effective features (e.g., statistics, physical-chemical, evolutional) from the sequences and structures of proteins. Another is the feature fusion strategy. The third is the design of a powerful predictor and the fourth one is the protein multiple location sites prediction.

Download Full-text

Integrating Second-order Moving Average and Over-sampling Algorithm to Predict Apoptosis Protein Subcellular Localization

Current Bioinformatics ◽

10.2174/1574893614666190902155811 ◽

2020 ◽

Vol 15 (6) ◽

pp. 517-527

Author(s):

Yunyun Liang ◽

Shengli Zhang

Keyword(s):

Subcellular Localization ◽

Moving Average ◽

Subcellular Location ◽

Second Order ◽

Test Method ◽

Support Vector ◽

Protein Subcellular Localization ◽

Protein Subcellular Location ◽

Apoptosis Protein ◽

Leibler Divergence

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.

Download Full-text

SCLpred‐MEM : subcellular localization prediction of membrane proteins by Deep N‐to‐1 Convolutional Neural Networks

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26144 ◽

2021 ◽

Author(s):

Manaz Kaleel ◽

Liam Ellinger ◽

Clodagh Lalor ◽

Gianluca Pollastri ◽

Catherine Mooney

Keyword(s):

Neural Networks ◽

Membrane Proteins ◽

Subcellular Localization ◽

Convolutional Neural Networks ◽

Subcellular Localization Prediction ◽

Localization Prediction

Download Full-text

Hierarchical multi-label classification for protein function prediction: A local approach based on neural networks

2011 11th International Conference on Intelligent Systems Design and Applications ◽

10.1109/isda.2011.6121678 ◽

2011 ◽

Cited By ~ 7

Author(s):

Ricardo Cerri ◽

Rodrigo C. Barros ◽

Andre C. P. L. F. de Carvalho

Keyword(s):

Neural Networks ◽

Protein Function ◽

Protein Function Prediction ◽

Function Prediction ◽

Local Approach

Download Full-text

Incorporating label correlations into deep neural networks to classify protein subcellular location patterns in immunohistochemistry images

Proteins Structure Function and Bioinformatics ◽

10.1002/prot.26244 ◽

2021 ◽

Author(s):

Jin‐Xian Hu ◽

Yang Yang ◽

Ying‐Ying Xu ◽

Hong‐Bin Shen

Keyword(s):

Neural Networks ◽

Deep Neural Networks ◽

Subcellular Location ◽

Protein Subcellular Location ◽

Location Patterns ◽

Label Correlations

Download Full-text

EnzyNet: enzyme classification using 3D convolutional neural networks on spatial representation

PeerJ ◽

10.7717/peerj.4750 ◽

2018 ◽

Vol 6 ◽

pp. e4750 ◽

Cited By ~ 24

Author(s):

Afshine Amidi ◽

Shervine Amidi ◽

Dimitrios Vlachakis ◽

Vasileios Megalooikonomou ◽

Nikos Paragios ◽

...

Keyword(s):

Neural Networks ◽

Amino Acid ◽

Convolutional Neural Networks ◽

Protein Function ◽

Enzyme Commission Number ◽

Data Bank ◽

Biochemical Properties ◽

Data Availability ◽

Binary Representation ◽

Enzymatic Function

During the past decade, with the significant progress of computational power as well as ever-rising data availability, deep learning techniques became increasingly popular due to their excellent performance on computer vision problems. The size of the Protein Data Bank (PDB) has increased more than 15-fold since 1999, which enabled the expansion of models that aim at predicting enzymatic function via their amino acid composition. Amino acid sequence, however, is less conserved in nature than protein structure and therefore considered a less reliable predictor of protein function. This paper presents EnzyNet, a novel 3D convolutional neural networks classifier that predicts the Enzyme Commission number of enzymes based only on their voxel-based spatial structure. The spatial distribution of biochemical properties was also examined as complementary information. The two-layer architecture was investigated on a large dataset of 63,558 enzymes from the PDB and achieved an accuracy of 78.4% by exploiting only the binary representation of the protein shape. Code and datasets are available at https://github.com/shervinea/enzynet.

Download Full-text