scholarly journals Identify RNA-associated subcellular localizations based on multi-label learning using Chou’s 5-steps rule

BMC Genomics ◽  
2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hao Wang ◽  
Yijie Ding ◽  
Jijun Tang ◽  
Quan Zou ◽  
Fei Guo

Abstract Background Biological functions of biomolecules rely on the cellular compartments where they are located in cells. Importantly, RNAs are assigned in specific locations of a cell, enabling the cell to implement diverse biochemical processes in the way of concurrency. However, lots of existing RNA subcellular localization classifiers only solve the problem of single-label classification. It is of great practical significance to expand RNA subcellular localization into multi-label classification problem. Results In this study, we extract multi-label classification datasets about RNA-associated subcellular localizations on various types of RNAs, and then construct subcellular localization datasets on four RNA categories. In order to study Homo sapiens, we further establish human RNA subcellular localization datasets. Furthermore, we utilize different nucleotide property composition models to extract effective features to adequately represent the important information of nucleotide sequences. In the most critical part, we achieve a major challenge that is to fuse the multivariate information through multiple kernel learning based on Hilbert-Schmidt independence criterion. The optimal combined kernel can be put into an integration support vector machine model for identifying multi-label RNA subcellular localizations. Our method obtained excellent results of 0.703, 0.757, 0.787, and 0.800, respectively on four RNA data sets on average precision. Conclusion To be specific, our novel method performs outstanding rather than other prediction tools on novel benchmark datasets. Moreover, we establish user-friendly web server with the implementation of our method.

2019 ◽  
Vol 20 (S25) ◽  
Author(s):  
Zhengwei Li ◽  
Ru Nie ◽  
Zhuhong You ◽  
Chen Cao ◽  
Jiashu Li

Abstract Background The interactions among proteins act as crucial roles in most cellular processes. Despite enormous effort put for identifying protein-protein interactions (PPIs) from a large number of organisms, existing firsthand biological experimental methods are high cost, low efficiency, and high false-positive rate. The application of in silico methods opens new doors for predicting interactions among proteins, and has been attracted a great deal of attention in the last decades. Results Here we present a novelty computational model with the adoption of our proposed Discriminative Vector Machine (DVM) model and a 2-Dimensional Principal Component Analysis (2DPCA) descriptor to identify candidate PPIs only based on protein sequences. To be more specific, a 2DPCA descriptor is employed to capture discriminative feature information from Position-Specific Scoring Matrix (PSSM) of amino acid sequences by the tool of PSI-BLAST. Then, a robust and powerful DVM classifier is employed to infer PPIs. When applied on both gold benchmark datasets of Yeast and H. pylori, our model obtained mean prediction accuracies as high as of 97.06 and 92.89%, respectively, which demonstrates a noticeable improvement than some state-of-the-art methods. Moreover, we constructed Support Vector Machines (SVM) based predictive model and made comparison it with our model on Human benchmark dataset. In addition, to further demonstrate the predictive reliability of our proposed method, we also carried out extensive experiments for identifying cross-species PPIs on five other species datasets. Conclusions All the experimental results indicate that our method is very effective for identifying potential PPIs and could serve as a practical approach to aid bioexperiment in proteomics research.


2010 ◽  
Vol 20 (01) ◽  
pp. 13-28 ◽  
Author(s):  
YANG YANG ◽  
BAO-LIANG LU

Prediction of protein subcellular localization is an important issue in computational biology because it provides important clues for the characterization of protein functions. Currently, much research has been dedicated to developing automatic prediction tools. Most, however, focus on mono-locational proteins, i.e., they assume that proteins exist in only one location. It should be noted that many proteins bear multi-locational characteristics and carry out crucial functions in biological processes. This work aims to develop a general pattern classifier for predicting multiple subcellular locations of proteins. We use an ensemble classifier, called the min-max modular support vector machine (M3-SVM), to solve protein subcellular multi-localization problems; and, propose a module decomposition method based on gene ontology (GO) semantic information for M3-SVM. The amino acid composition with secondary structure and solvent accessibility information is adopted to represent features of protein sequences. We apply our method to two multi-locational protein data sets. The M3-SVMs show higher accuracy and efficiency than traditional SVMs using the same feature vectors. And the GO decomposition also helps to improve prediction accuracy. Moreover, our method has a much higher rate of accuracy than existing subcellular localization predictors in predicting protein multi-localization.


2022 ◽  
Vol 2022 ◽  
pp. 1-11
Author(s):  
Saiqiang Xia ◽  
Chaowei Zhang ◽  
Wanyong Cai ◽  
Jun Yang ◽  
Liangfa Hua ◽  
...  

For a conventional narrowband radar system, its insufficient bandwidth usually leads to the lack of detectable information of the target, and it is difficult for the radar to classify the target types, such as rotor helicopter, propeller aircraft, and jet aircraft. To address the classification problem of three different types of aircraft target, a joint multifeature classification method based on the micro-Doppler effect in the echo caused by the target micromotion is proposed in this paper. Through the characteristics analysis of the target simulation echoes obtained from the target scattering point model, four features with obvious distinguishability are extracted from the time domain and frequency domain, respectively, that is, flicker interval, fractal dimension, modulation bandwidth, and second central moment. Then, a support vector machine model will be applied to the classification of the three different types of aircraft. Compared with the conventional method, the proposed method has better classification performance and can significantly improve the classification probability of aircraft target. The simulations are carried out to validate the effectiveness of the proposed method.


Author(s):  
Norsyela Muhammad Noor Mathivanan ◽  
Nor Azura Md.Ghani ◽  
Roziah Mohd Janor

<p>Online business development through e-commerce platforms is a phenomenon which change the world of promoting and selling products in this 21<sup>st</sup> century. Product title classification is an important task in assisting retailers and sellers to list a product in a suitable category. Product title classification is apart of text classification problem but the properties of product title are different from general document. This study aims to evaluate the performance of five different supervised learning models on data sets consist of e-commerce product titles with a very short description and they are incomplete sentences. The supervised learning models involve in the study are Naïve Bayes, K-Nearest Neighbor (KNN), Decision Tree, Support Vector Machine (SVM) and Random Forest. The results show KNN model is the best model with the highest accuracy and fastest computation time to classify the data used in the study. Hence, KNN model is a good approach in classifying e-commerce products.</p>


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hua Liu ◽  
Hua Yuan ◽  
Yongmei Wang ◽  
Weiwei Huang ◽  
Hui Xue ◽  
...  

AbstractAccumulating studies appear to suggest that the risk factors for venous thromboembolism (VTE) among young-middle-aged inpatients are different from those among elderly people. Therefore, the current prediction models for VTE are not applicable to young-middle-aged inpatients. The aim of this study was to develop and externally validate a new prediction model for young-middle-aged people using machine learning methods. The clinical data sets linked with 167 inpatients with deep venous thrombosis (DVT) and/or pulmonary embolism (PE) and 406 patients without DVT or PE were compared and analysed with machine learning techniques. Five algorithms, including logistic regression, decision tree, feed-forward neural network, support vector machine, and random forest, were used for training and preparing the models. The support vector machine model had the best performance, with AUC values of 0.806–0.944 for 95% CI, 59% sensitivity and 99% specificity, and an accuracy of 87%. Although different top predictors of adverse outcomes appeared in the different models, life-threatening illness, fibrinogen, RBCs, and PT appeared to be more consistently featured by the different models as top predictors of adverse outcomes. Clinical data sets of young and middle-aged inpatients can be used to accurately predict the risk of VTE with a support vector machine model.


2019 ◽  
Vol 37 (6) ◽  
pp. 1040-1058 ◽  
Author(s):  
Shuo Xu ◽  
Xin An

Purpose Image classification is becoming a supporting technology in several image-processing tasks. Due to rich semantic information contained in the images, it is very popular for an image to have several labels or tags. This paper aims to develop a novel multi-label classification approach with superior performance. Design/methodology/approach Many multi-label classification problems share two main characteristics: label correlations and label imbalance. However, most of current methods are devoted to either model label relationship or to only deal with unbalanced problem with traditional single-label methods. In this paper, multi-label classification problem is regarded as an unbalanced multi-task learning problem. Multi-task least-squares support vector machine (MTLS-SVM) is generalized for this problem, renamed as multi-label LS-SVM (ML2S-SVM). Findings Experimental results on the emotions, scene, yeast and bibtex data sets indicate that the ML2S-SVM is competitive with respect to the state-of-the-art methods in terms of Hamming loss and instance-based F1 score. The values of resulting parameters largely influence the performance of ML2S-SVM, so it is necessary for users to identify proper parameters in advance. Originality/value On the basis of MTLS-SVM, a novel multi-label classification approach, ML2S-SVM, is put forward. This method can overcome the unbalanced problem but also explicitly models arbitrary order correlations among labels by allowing multiple labels to share a subspace. In addition, the multi-label classification approach has a wider range of applications. That is to say, it is not limited to the field of image classification.


Author(s):  
ZHI-XIA YANG

In this paper, we propose two Laplacian nonparallel hyperplane proximal classifiers (LapNPPCs) for semi-supervised and full-supervised classification problem respectively by adding manifold regularization terms. Due to the manifold regularization terms, our LapNPPCs are able to exploit the intrinsic structure of the patterns of the training set. Furthermore, our classifiers only need to solve two systems of linear equations rather than two quadratic programming (QP) problems as needed in Laplacian twin support vector machine (LapTSVM) (Z. Qi, Y. Tian and Y. Shi, Neural Netw.35 (2012) 46–53). Numerical experiments on toy and UCI benchmark datasets show that the accuracy of our LapNPPCs is comparable with other classifiers, such as the standard SVM, TWSVM and LapTSVM, etc. It is also the case that based on our LapNPPCs, some other TWSVM type classifiers with manifold regularization can be constructed by choosing different norms and loss functions to deal with semi-supervised binary and multi-class classification problems.


Author(s):  
Süreyya Özöğür Akyüz ◽  
Gürkan Üstünkar ◽  
Gerhard Wilhelm Weber

The interplay of machine learning (ML) and optimization methods is an emerging field of artificial intelligence. Both ML and optimization are concerned with modeling of systems related to real-world problems. Parameter selection for classification models is an important task for ML algorithms. In statistical learning theory, cross-validation (CV) which is the most well-known model selection method can be very time consuming for large data sets. One of the recent model selection techniques developed for support vector machines (SVMs) is based on the observed test point margins. In this study, observed margin strategy is integrated into our novel infinite kernel learning (IKL) algorithm together with multi-local procedure (MLP) which is an optimization technique to find global solution. The experimental results show improvements in accuracy and speed when comparing with multiple kernel learning (MKL) and semi-infinite linear programming (SILP) with CV.


Author(s):  
Q. Wang ◽  
Y. Gu ◽  
T. Liu ◽  
H. Liu ◽  
X. Jin

In recent years, many studies on remote sensing image classification have shown that using multiple features from different data sources can effectively improve the classification accuracy. As a very powerful means of learning, multiple kernel learning (MKL) can conveniently be embedded in a variety of characteristics. The conventional combined kernel learned by MKL can be regarded as the compromise of all basic kernels for all classes in classification. It is the best of the whole, but not optimal for each specific class. For this problem, this paper proposes a class-pair-guided MKL method to integrate the heterogeneous features (HFs) from multispectral image (MSI) and light detection and ranging (LiDAR) data. In particular, the <q>one-against-one</q> strategy is adopted, which converts multiclass classification problem to a plurality of two-class classification problem. Then, we select the best kernel from pre-constructed basic kernels set for each class-pair by kernel alignment (KA) in the process of classification. The advantage of the proposed method is that only the best kernel for the classification of any two classes can be retained, which leads to greatly enhanced discriminability. Experiments are conducted on two real data sets, and the experimental results show that the proposed method achieves the best performance in terms of classification accuracies in integrating the HFs for classification when compared with several state-of-the-art algorithms.


Author(s):  
Carlotta Orsenigo ◽  
Carlo Vercellis

In the context of biolife science, predicting the folding structure of a protein plays an important role for investigating its function and discovering new drugs. Protein folding recognition can be naturally cast in the form of a multicategory classification problem, that appears challenging due to the high number of folds classes. Thus, in the last decade several supervised learning methods have been applied in order to discriminate between proteins characterized by different folds. Recently, discrete support vector machines have been introduced as an effective alternative to traditional support vector machines. Discrete SVM have shown to outperform other competing classification techniques both on binary and multicategory benchmark datasets. In this paper, we adopt discrete SVM for protein folding classification. Computational tests performed on benchmark datasets empirically support the effectiveness of discrete SVM, which are able to achieve the highest prediction accuracy.


Sign in / Sign up

Export Citation Format

Share Document