scholarly journals VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost

2022 ◽  
Vol 12 ◽  
Author(s):  
Yue Gong ◽  
Benzhi Dong ◽  
Zixiao Zhang ◽  
Yixiao Zhai ◽  
Bo Gao ◽  
...  

Vesicular transport proteins are related to many human diseases, and they threaten human health when they undergo pathological changes. Protein function prediction has been one of the most in-depth topics in bioinformatics. In this work, we developed a useful tool to identify vesicular transport proteins. Our strategy is to extract transition probability composition, autocovariance transformation and other information from the position-specific scoring matrix as feature vectors. EditedNearesNeighbours (ENN) is used to address the imbalance of the data set, and the Max-Relevance-Max-Distance (MRMD) algorithm is adopted to reduce the dimension of the feature vector. We used 5-fold cross-validation and independent test sets to evaluate our model. On the test set, VTP-Identifier presented a higher performance compared with GRU. The accuracy, Matthew’s correlation coefficient (MCC) and area under the ROC curve (AUC) were 83.6%, 0.531 and 0.873, respectively.

2019 ◽  
Author(s):  
Dominic Simm ◽  
Klas Hatje ◽  
Stephan Waack ◽  
Martin Kollmar

AbstractCoiled-coil regions were among the first protein motifs described structurally and theoretically. The beauty and simplicity of the motif gives hope to detecting coiled-coil regions with reasonable accuracy and precision in any protein sequence. Here, we re-evaluated the most commonly used coiled-coil prediction tools with respect to the most comprehensive reference data set available, the entire Protein Data Base (PDB), down to each amino acid and its secondary structure. Apart from the thirtyfold difference in number of predicted coiled-coils the tools strongly vary in their predictions, across structures and within structures. The evaluation of the false discovery rate and Matthews correlation coefficient, a widely used performance metric for imbalanced data sets, suggests that the tested tools have only limited applicability for large data sets. Coiled-coil predictions strongly impact the functional characterization of proteins, are used for functional genome annotation, and should therefore be supported and validated by additional information.


2021 ◽  
Author(s):  
Zihao Zhao ◽  
Hongwei Zhang ◽  
Minglei Hu ◽  
Ning Yang ◽  
Hui Wang ◽  
...  

Abstract Background: The function of protein is directly related to its structure, and plays a pivotal role in the entire life process. The protein interaction network controls almost all biological cell processes while fulfilling most of the biological functions. In fact, protein function prediction can be regarded as a multi-label classification problem to fill the gap between a huge number of protein sequences and known functions. It is not only a key issue in related research fields, but also a long-standing challenge. Protein function prediction with Deep Neural Network (DNN) almost study data set with small scale proteins based on Gene Ontology (GO). They usually dig relationships between protein features and function tags. It still needs further study for large-scale protein to find useful prediction approaches.Methods: This paper proposed a protein function prediction approach with DNN which used Grasshopper Optimization Algorithm (GOA), Intuitionistic Fuzzy c-Means (IFCM), Kernel Principal Component Analysis (KPCA) and DNN (IGP-DNN). The features in protein function modules were extracted by combining GOA and IFCM. The KPCA was used to reduce the dimensions of features in protein properties. Both features were integrated to enrich the features information and the integrated features were input into the DNN model. The protein function modules were classified to predict function by computing in hiding level of DNN.Results and conclusion: IGP-DNN combines the advantages of IFCM-GOA and DNN. The combination of IFCM and GOA not only avoids falling into local optimal when extracting function module feature and reduces the over-sensitivity of IFCM for clustering center, but also improves the precision of the protein function module feature extraction. This paper proposes a protein function prediction approach based on DNN. In the model, protein features are composed of the protein function module features that are extracted by using IFCM-GOA and the protein property features that are reduced dimensions by using KPCA to address the noise sensitivity and the other problems during predicting protein function.


2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Zhiyu Tao ◽  
Yanjuan Li ◽  
Zhixia Teng ◽  
Yuming Zhao

With the development of computer technology, many machine learning algorithms have been applied to the field of biology, forming the discipline of bioinformatics. Protein function prediction is a classic research topic in this subject area. Though many scholars have made achievements in identifying protein by different algorithms, they often extract a large number of feature types and use very complex classification methods to obtain little improvement in the classification effect, and this process is very time-consuming. In this research, we attempt to utilize as few features as possible to classify vesicular transportation proteins and to simultaneously obtain a comparative satisfactory classification result. We adopt CTDC which is a submethod of the method of composition, transition, and distribution (CTD) to extract only 39 features from each sequence, and LibSVM is used as the classification method. We use the SMOTE method to deal with the problem of dataset imbalance. There are 11619 protein sequences in our dataset. We selected 4428 sequences to train our classification model and selected other 1832 sequences from our dataset to test the classification effect and finally achieved an accuracy of 71.77%. After dimension reduction by MRMD, the accuracy is 72.16%.


Entropy ◽  
2021 ◽  
Vol 23 (1) ◽  
pp. 126
Author(s):  
Sharu Theresa Jose ◽  
Osvaldo Simeone

Meta-learning, or “learning to learn”, refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that use either separate within-task training and test sets, like model agnostic meta-learning (MAML), or joint within-task training and test sets, like reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed for the two classes via novel individual task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.


Sign in / Sign up

Export Citation Format

Share Document