Design powerful predictor for mRNA subcellular location prediction in Homo sapiens

Author(s):  
Zhao-Yue Zhang ◽  
Yu-He Yang ◽  
Hui Ding ◽  
Dong Wang ◽  
Wei Chen ◽  
...  

Abstract Messenger RNAs (mRNAs) shoulder special responsibilities that transmit genetic code from DNA to discrete locations in the cytoplasm. The locating process of mRNA might provide spatial and temporal regulation of mRNA and protein functions. The situ hybridization and quantitative transcriptomics analysis could provide detail information about mRNA subcellular localization; however, they are time consuming and expensive. It is highly desired to develop computational tools for timely and effectively predicting mRNA subcellular location. In this work, by using binomial distribution and one-way analysis of variance, the optimal nonamer composition was obtained to represent mRNA sequences. Subsequently, a predictor based on support vector machine was developed to identify the mRNA subcellular localization. In 5-fold cross-validation, results showed that the accuracy is 90.12% for Homo sapiens (H. sapiens). The predictor may provide a reference for the study of mRNA localization mechanisms and mRNA translocation strategies. An online web server was established based on our models, which is available at http://lin-group.cn/server/iLoc-mRNA/.

2020 ◽  
Vol 15 (6) ◽  
pp. 517-527
Author(s):  
Yunyun Liang ◽  
Shengli Zhang

Background: Apoptosis proteins have a key role in the development and the homeostasis of the organism, and are very important to understand the mechanism of cell proliferation and death. The function of apoptosis protein is closely related to its subcellular location. Objective: Prediction of apoptosis protein subcellular localization is a meaningful task. Methods: In this study, we predict the apoptosis protein subcellular location by using the PSSMbased second-order moving average descriptor, nonnegative matrix factorization based on Kullback-Leibler divergence and over-sampling algorithms. This model is named by SOMAPKLNMF- OS and constructed on the ZD98, ZW225 and CL317 benchmark datasets. Then, the support vector machine is adopted as the classifier, and the bias-free jackknife test method is used to evaluate the accuracy. Results: Our prediction system achieves the favorable and promising performance of the overall accuracy on the three datasets and also outperforms the other listed models. Conclusion: The results show that our model offers a high throughput tool for the identification of apoptosis protein subcellular localization.


2010 ◽  
Vol 20 (01) ◽  
pp. 13-28 ◽  
Author(s):  
YANG YANG ◽  
BAO-LIANG LU

Prediction of protein subcellular localization is an important issue in computational biology because it provides important clues for the characterization of protein functions. Currently, much research has been dedicated to developing automatic prediction tools. Most, however, focus on mono-locational proteins, i.e., they assume that proteins exist in only one location. It should be noted that many proteins bear multi-locational characteristics and carry out crucial functions in biological processes. This work aims to develop a general pattern classifier for predicting multiple subcellular locations of proteins. We use an ensemble classifier, called the min-max modular support vector machine (M3-SVM), to solve protein subcellular multi-localization problems; and, propose a module decomposition method based on gene ontology (GO) semantic information for M3-SVM. The amino acid composition with secondary structure and solvent accessibility information is adopted to represent features of protein sequences. We apply our method to two multi-locational protein data sets. The M3-SVMs show higher accuracy and efficiency than traditional SVMs using the same feature vectors. And the GO decomposition also helps to improve prediction accuracy. Moreover, our method has a much higher rate of accuracy than existing subcellular localization predictors in predicting protein multi-localization.


2016 ◽  
Vol 12 (8) ◽  
pp. 2572-2586 ◽  
Author(s):  
Anamika Thakur ◽  
Akanksha Rajput ◽  
Manoj Kumar

Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth.


BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hao Wang ◽  
Yijie Ding ◽  
Jijun Tang ◽  
Quan Zou ◽  
Fei Guo

Abstract Background Biological functions of biomolecules rely on the cellular compartments where they are located in cells. Importantly, RNAs are assigned in specific locations of a cell, enabling the cell to implement diverse biochemical processes in the way of concurrency. However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. It is of great practical significance to expand RNA subcellular localization into multi-label classification problem. Results In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. Furthermore, we utilize different nucleotide property composition models to extract effective features to adequately represent the important information of nucleotide sequences. In the most critical part, we achieve a major challenge that is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion. The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. Our method obtained excellent results of 0.703, 0.757, 0.787, and 0.800, respectively on four RNA data sets on average precision. Conclusion To be specific, our novel method performs outstanding rather than other prediction tools on novel benchmark datasets. Moreover, we establish user-friendly web server with the implementation of our method.


Author(s):  
Ran Su ◽  
Linlin He ◽  
Tianling Liu ◽  
Xiaofeng Liu ◽  
Leyi Wei

Abstract The spatial distribution of proteome at subcellular levels provides clues for protein functions, thus is important to human biology and medicine. Imaging-based methods are one of the most important approaches for predicting protein subcellular location. Although deep neural networks have shown impressive performance in a number of imaging tasks, its application to protein subcellular localization has not been sufficiently explored. In this study, we developed a deep imaging-based approach to localize the proteins at subcellular levels. Based on deep image features extracted from convolutional neural networks (CNNs), both single-label and multi-label locations can be accurately predicted. Particularly, the multi-label prediction is quite a challenging task. Here we developed a criterion learning strategy to exploit the label–attribute relevancy and label–label relevancy. A criterion that was used to determine the final label set was automatically obtained during the learning procedure. We concluded an optimal CNN architecture that could give the best results. Besides, experiments show that compared with the hand-crafted features, the deep features present more accurate prediction with less features. The implementation for the proposed method is available at https://github.com/RanSuLab/ProteinSubcellularLocation.


2021 ◽  
Vol 2021 ◽  
pp. 1-14
Author(s):  
Danyu Jin ◽  
Ping Zhu

The prediction of protein subcellular localization not only is important for the study of protein structure and function but also can facilitate the design and development of new drugs. In recent years, feature extraction methods based on protein evolution information have attracted much attention and made good progress. Based on the protein position-specific score matrix (PSSM) obtained by PSI-BLAST, PSSM-GSD method is proposed according to the data distribution characteristics. In order to reflect the protein sequence information as much as possible, AAO method, PSSM-AAO method, and PSSM-GSD method are fused together. Then, conditional entropy-based classifier chain algorithm and support vector machine are used to locate multilabel proteins. Finally, we test Gpos-mPLoc and Gneg-mPLoc datasets, considering the severe imbalance of data, and select SMOTE algorithm to expand a few sample; the experiment shows that the AAO + PSSM ∗ method in the paper achieved 83.1% and 86.8% overall accuracy, respectively. After experimental comparison of different methods, AAO + PSSM ∗ has good performance and can effectively predict protein subcellular location.


2018 ◽  
Vol 1 (1) ◽  
pp. 120-130 ◽  
Author(s):  
Chunxiang Qian ◽  
Wence Kang ◽  
Hao Ling ◽  
Hua Dong ◽  
Chengyao Liang ◽  
...  

Support Vector Machine (SVM) model optimized by K-Fold cross-validation was built to predict and evaluate the degradation of concrete strength in a complicated marine environment. Meanwhile, several mathematical models, such as Artificial Neural Network (ANN) and Decision Tree (DT), were also built and compared with SVM to determine which one could make the most accurate predictions. The material factors and environmental factors that influence the results were considered. The materials factors mainly involved the original concrete strength, the amount of cement replaced by fly ash and slag. The environmental factors consisted of the concentration of Mg2+, SO42-, Cl-, temperature and exposing time. It was concluded from the prediction results that the optimized SVM model appeared to perform better than other models in predicting the concrete strength. Based on SVM model, a simulation method of variables limitation was used to determine the sensitivity of various factors and the influence degree of these factors on the degradation of concrete strength.


2014 ◽  
Vol 26 (01) ◽  
pp. 1450002 ◽  
Author(s):  
Hanguang Xiao

The early detection and intervention of artery stenosis is very important to reduce the mortality of cardiovascular disease. A novel method for predicting artery stenosis was proposed by using the input impedance of the systemic arterial tree and support vector machine (SVM). Based on the built transmission line model of a 55-segment systemic arterial tree, the input impedance of the arterial tree was calculated by using a recursive algorithm. A sample database of the input impedance was established by specifying the different positions and degrees of artery stenosis. A SVM prediction model was trained by using the sample database. 10-fold cross-validation was used to evaluate the performance of the SVM. The effects of stenosis position and degree on the accuracy of the prediction were discussed. The results showed that the mean specificity, sensitivity and overall accuracy of the SVM are 80.2%, 98.2% and 89.2%, respectively, for the 50% threshold of stenosis degree. Increasing the threshold of the stenosis degree from 10% to 90% increases the overall accuracy from 82.2% to 97.4%. Increasing the distance of the stenosis artery from the heart gradually decreases the overall accuracy from 97.1% to 58%. The deterioration of the stenosis degree to 90% increases the prediction accuracy of the SVM to more than 90% for the stenosis of peripheral artery. The simulation demonstrated theoretically the feasibility of the proposed method for predicting artery stenosis via the input impedance of the systemic arterial tree and SVM.


2007 ◽  
Vol 292 (5) ◽  
pp. C1971-C1981 ◽  
Author(s):  
Emily Cordas ◽  
Anikó Náray-Fejes-Tóth ◽  
Géza Fejes-Tóth

Serum- and glucocorticoid-induced kinase-1 (SGK1) is involved in aldosterone-induced Na+ reabsorption by increasing epithelial Na+ channel (ENaC) activity in cortical collecting duct (CCD) cells, but its exact mechanisms of action are unknown. Although several potential targets such as Nedd4-2 have been described in expression systems, endogenous substrates mediating SGK1's physiological effects remain to be identified. In addition, subcellular localization studies of SGK1 have provided controversial results. We determined the subcellular location of SGK1 using SGK1-autofluorescent protein (AFP) fusion proteins. Rabbit CCD (RCCT-28A) cells were transiently transfected with a construct encoding for SGK1-AFP and were stained or cotransfected with markers for various subcellular compartments. In live cells, transiently expressed SGK1-AFP clearly colocalized with the mitochondrial marker rhodamine 123. Similarly, SGK1-AFP colocalized with the mitochondrial marker MitoTracker when stably expressed using a retroviral system in either RCCT-28A cells or the mammary epithelial cell line MCF10A. To determine which region of SGK1 is responsible for this subcellular localization, we generated RCCT-28A cell lines stably expressing SGK1 mutants. The results indicate that the NH2-terminal 60-amino acid region of SGK1 is necessary and sufficient for its subcellular localization. Localization of SGK1 to the mitochondria raises the possibility that SGK1 may play a role in regulating energy metabolism.


Sign in / Sign up

Export Citation Format

Share Document