scholarly journals Computational Identification and Analysis of Ubiquinone-Binding Proteins

Cells ◽  
2020 ◽  
Vol 9 (2) ◽  
pp. 520 ◽  
Author(s):  
Chang Lu ◽  
Wenjie Jiang ◽  
Hang Wang ◽  
Jinxiu Jiang ◽  
Zhiqiang Ma ◽  
...  

Ubiquinone is an important cofactor that plays vital and diverse roles in many biological processes. Ubiquinone-binding proteins (UBPs) are receptor proteins that dock with ubiquinones. Analyzing and identifying UBPs via a computational approach will provide insights into the pathways associated with ubiquinones. In this work, we were the first to propose a UBPs predictor (UBPs-Pred). The optimal feature subset selected from three categories of sequence-derived features was fed into the extreme gradient boosting (XGBoost) classifier, and the parameters of XGBoost were tuned by multi-objective particle swarm optimization (MOPSO). The experimental results over the independent validation demonstrated considerable prediction performance with a Matthews correlation coefficient (MCC) of 0.517. After that, we analyzed the UBPs using bioinformatics methods, including the statistics of the binding domain motifs and protein distribution, as well as an enrichment analysis of the gene ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway.

2020 ◽  
Author(s):  
Qingmei Zhang ◽  
Peishun Liu ◽  
Yu Han ◽  
Yaqun Zhang ◽  
Xue Wang ◽  
...  

ABSTRACTDNA binding proteins (DBPs) not only play an important role in all aspects of genetic activities such as DNA replication, recombination, repair, and modification but also are used as key components of antibiotics, steroids, and anticancer drugs in the field of drug discovery. Identifying DBPs becomes one of the most challenging problems in the domain of proteomics research. Considering the high-priced and inefficient of the experimental method, constructing a detailed DBPs prediction model becomes an urgent problem for researchers. In this paper, we propose a stacked ensemble classifier based method for predicting DBPs called StackPDB. Firstly, pseudo amino acid composition (PseAAC), pseudo position-specific scoring matrix (PsePSSM), position-specific scoring matrix-transition probability composition (PSSM-TPC), evolutionary distance transformation (EDT), and residue probing transformation (RPT) are applied to extract protein sequence features. Secondly, extreme gradient boosting-recursive feature elimination (XGB-RFE) is employed to gain an excellent feature subset. Finally, the best features are applied to the stacked ensemble classifier composed of XGBoost, LightGBM, and SVM to construct StackPDB. After applying leave-one-out cross-validation (LOOCV), StackPDB obtains high ACC and MCC on PDB1075, 93.44% and 0.8687, respectively. Besides, the ACC of the independent test datasets PDB186 and PDB180 are 84.41% and 90.00%, respectively. The MCC of the independent test datasets PDB186 and PDB180 are 0.6882 and 0.7997, respectively. The results on the training dataset and the independent test dataset show that StackPDB has a great predictive ability to predict DBPs.


2020 ◽  
Vol 39 (5) ◽  
pp. 7671-7691
Author(s):  
Xuning Liu ◽  
Guoying Zhang ◽  
Zixian Zhang

The feature selection of influencing factors of coal and gas outbursts is of great significance for presenting the most discriminative features and improving prediction performance of a classifier, the paper presents an effective hybrid feature selection and modified outbursts classifier framework which aims at solving exiting coal and gas outbursts prediction problems. First, a measurement standard based on maximum information coefficient(MIC) is employed to identify the wide correlations between two variables; Second, based on a ranking procedure using non-dominated sorting genetic algorithm(NSGAII), maximum relevance minimum redundancy(MRMR) algorithm is subsequently performed to find out candidate feature set highly related to the class label and uncorrelated with each other; Third, random forest(RF) is employed to search the optimal feature subset from the candidate feature set, then the optimal feature subset that influences the classification performance of coal and gas outbursts is obtained; Finally, an improved classifier model has been proposed that combines gradient boosting decision tree(GBDT) and k-nearest neighbor(KNN) for outbursts prediction. In the modified classifier model, the GBDT is utilized to assign different weights to features, then the weighted features are input into the KNN to verify the effectiveness of proposed method on coal and gas outbursts dataset. The experimental results conclude that our proposed scheme is effective in the number of feature and prediction accuracy when compared with other related state-of-the-art prediction models based on feature selection for coal and gas outbursts.


Entropy ◽  
2020 ◽  
Vol 22 (4) ◽  
pp. 478 ◽  
Author(s):  
Jiajin Qi ◽  
Xu Gao ◽  
Nantian Huang

The fault samples of high voltage circuit breakers are few, the vibration signals are complex, the existing research methods cannot extract the effective information in the features, and it is easy to overfit, slow training, and other problems. To improve the efficiency of feature extraction of a circuit breaker vibration signal and the accuracy of circuit breaker state recognition, a Light Gradient Boosting Machine (LightGBM) method based on time-domain feature extraction with multi-type entropy features for mechanical fault diagnosis of the high voltage circuit breaker is proposed. First, the original vibration signal of the high voltage circuit breaker is segmented in the time domain; then, 16 features including 5 kinds of entropy features are extracted directly from each part of the original signal after time-domain segmentation, and the original feature set is constructed. Second, the Split importance value of each feature is calculated, and the optimal feature subset is determined by the forward feature selection, taking the classification accuracy of LightGBM as the decision variable. After that, the LightGBM classifier is constructed based on the feature vector of the optimal feature subset, which can accurately distinguish the mechanical fault state of the high voltage circuit breaker. The experimental results show that the new method has the advantages of high efficiency of feature extraction and high accuracy of fault identification.


Author(s):  
Hui Wang ◽  
Li Li Guo ◽  
Yun Lin

Automatic modulation recognition is very important for the receiver design in the broadband multimedia communication system, and the reasonable signal feature extraction and selection algorithm is the key technology of Digital multimedia signal recognition. In this paper, the information entropy is used to extract the single feature, which are power spectrum entropy, wavelet energy spectrum entropy, singular spectrum entropy and Renyi entropy. And then, the feature selection algorithm of distance measurement and Sequential Feature Selection(SFS) are presented to select the optimal feature subset. Finally, the BP neural network is used to classify the signal modulation. The simulation result shows that the four-different information entropy can be used to classify different signal modulation, and the feature selection algorithm is successfully used to choose the optimal feature subset and get the best performance.


2020 ◽  
Vol 2020 ◽  
pp. 1-10 ◽  
Author(s):  
Xiuzhi Sang ◽  
Wanyue Xiao ◽  
Huiwen Zheng ◽  
Yang Yang ◽  
Taigang Liu

Prediction of DNA-binding proteins (DBPs) has become a popular research topic in protein science due to its crucial role in all aspects of biological activities. Even though considerable efforts have been devoted to developing powerful computational methods to solve this problem, it is still a challenging task in the field of bioinformatics. A hidden Markov model (HMM) profile has been proved to provide important clues for improving the prediction performance of DBPs. In this paper, we propose a method, called HMMPred, which extracts the features of amino acid composition and auto- and cross-covariance transformation from the HMM profiles, to help train a machine learning model for identification of DBPs. Then, a feature selection technique is performed based on the extreme gradient boosting (XGBoost) algorithm. Finally, the selected optimal features are fed into a support vector machine (SVM) classifier to predict DBPs. The experimental results tested on two benchmark datasets show that the proposed method is superior to most of the existing methods and could serve as an alternative tool to identify DBPs.


2016 ◽  
Vol 2016 ◽  
pp. 1-10 ◽  
Author(s):  
Zhi Chen ◽  
Tao Lin ◽  
Ningjiu Tang ◽  
Xin Xia

The extensive applications of support vector machines (SVMs) require efficient method of constructing a SVM classifier with high classification ability. The performance of SVM crucially depends on whether optimal feature subset and parameter of SVM can be efficiently obtained. In this paper, a coarse-grained parallel genetic algorithm (CGPGA) is used to simultaneously optimize the feature subset and parameters for SVM. The distributed topology and migration policy of CGPGA can help find optimal feature subset and parameters for SVM in significantly shorter time, so as to increase the quality of solution found. In addition, a new fitness function, which combines the classification accuracy obtained from bootstrap method, the number of chosen features, and the number of support vectors, is proposed to lead the search of CGPGA to the direction of optimal generalization error. Experiment results on 12 benchmark datasets show that our proposed approach outperforms genetic algorithm (GA) based method and grid search method in terms of classification accuracy, number of chosen features, number of support vectors, and running time.


Author(s):  
Pengchao Chen ◽  
Yongjun Cai ◽  
Dongjie Tan ◽  
Yi Sun ◽  
Muyang Ai ◽  
...  

This paper describes a new acoustic pre-warning system for pipelines, aimed at preventing third party damage by monitoring the pipeline acoustic signals. Many environmental factors, such as the by-passing of vehicles and pedestrians, could introduce background noise into long distance transmission of pipeline acoustic signals. As a result, normal pipeline acoustic pre-warning system is disturbed to identify abnormal events. In this work, statistical methods were applied to signal analysis in order to extract feature parameters of different events. Then the optimal feature subset was obtained by gene arithmetic to differentiate hazardous events between normal events effectively. PetroChina has applied the new pre-warning system to their long distance transmission pipelines and the system operates well.


Sign in / Sign up

Export Citation Format

Share Document