Prediction of Nitration Sites Based on FCBF Method and Stacking Ensemble Model

2021 ◽  
Vol 18 ◽  
Author(s):  
Min Liu ◽  
Lu Zhang ◽  
Xinyi Qin ◽  
Tao Huang ◽  
Ziwei Xu ◽  
...  

Background: Nitration is one of the important Post-Translational Modification (PTM) occurring on the tyrosine residues of proteins. The occurrence of protein tyrosine nitration under disease conditions is inevitable and represents a shift from the signal transducing physiological actions of -NO to oxidative and potentially pathogenic pathways. Abnormal protein nitration modification can lead to serious human diseases, including neurodegenerative diseases, acute respiratory distress, organ transplant rejection and lung cancer. Objective: It is necessary and important to identify the nitration sites in protein sequences. Predicting that which tyrosine residues in the protein sequence are nitrated and which are not is of great significance for the study of nitration mechanism and related diseases. Methods: In this study, a prediction model of nitration sites based on the over-under sampling strategy and the FCBF method was proposed by stacking ensemble learning and fusing multiple features. Firstly, the protein sequence sample was encoded by 2701-dimensional fusion features (PseAAC, PSSM, AAIndex, CKSAAP, Disorder). Secondly, the ranked feature set was generated by the FCBF method according to the symmetric uncertainty metric. Thirdly, in the process of model training, use the over- and under- sampling technique was used to tackle the imbalanced dataset. Finally, the Incremental Feature Selection (IFS) method was adopted to extract an optimal classifier based on 10-fold cross-validation. Results and Conclusion: Results show that the model has significant performance advantages in indicators such as MCC, Recall and F1-score, no matter in what way the comparison was conducted with other classifiers on the independent test set, or made by cross-validation with single-type feature or with fusion-features on the training set. By integrating the FCBF feature ranking methods, over- and under- sampling technique and a stacking model composed of multiple base classifiers, an effective prediction model for nitration PTM sites was build, which can achieve a better recall rate when the ratio of positive and negative samples is highly imbalanced.

2020 ◽  
Vol 0 (0) ◽  
Author(s):  
Xuyang Pan ◽  
Laijun Sun ◽  
Guobing Sun ◽  
Panxiang Rong ◽  
Yuncai Lu ◽  
...  

AbstractNeutral detergent fiber (NDF) content was the critical indicator of fiber in corn stover. This study aimed to develop a prediction model to precisely measure NDF content in corn stover using near-infrared spectroscopy (NIRS) technique. Here, spectral data ranging from 400 to 2500 nm were obtained by scanning 530 samples, and Monte Carlo Cross Validation and the pretreatment were used to preprocess the original spectra. Moreover, the interval partial least square (iPLS) was employed to extract feature wavebands to reduce data computation. The PLSR model was built using two spectral regions, and it was evaluated with the coefficient of determination (R2) and root mean square error of cross validation (RMSECV) obtaining 0.97 and 0.65%, respectively. The overall results proved that the developed prediction model coupled with spectral data analysis provides a set of theoretical foundations for NIRS techniques application on measuring fiber content in corn stover.


IEEE Access ◽  
2019 ◽  
Vol 7 ◽  
pp. 48699-48714 ◽  
Author(s):  
S. M. Hasan Mahmud ◽  
Wenyu Chen ◽  
Hosney Jahan ◽  
Yongsheng Liu ◽  
Nasir Islam Sujan ◽  
...  

2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Yanjuan Li ◽  
Zitong Zhang ◽  
Zhixia Teng ◽  
Xiaoyan Liu

Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer’s disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.


2018 ◽  
Vol 21 (2) ◽  
pp. 595-608 ◽  
Author(s):  
Man Cao ◽  
Guodong Chen ◽  
Jialin Yu ◽  
Shaoping Shi

Abstract Protein phosphorylation is a reversible and ubiquitous post-translational modification that primarily occurs at serine, threonine and tyrosine residues and regulates a variety of biological processes. In this paper, we first briefly summarized the current progresses in computational prediction of eukaryotic protein phosphorylation sites, which mainly focused on animals and plants, especially on human, with a less extent on fungi. Since the number of identified fungi phosphorylation sites has greatly increased in a wide variety of organisms and their roles in pathological physiology still remain largely unknown, more attention has been paid on the identification of fungi-specific phosphorylation. Here, experimental fungi phosphorylation sites data were collected and most of the sites were classified into different types to be encoded with various features and trained via a two-step feature optimization method. A novel method for prediction of species-specific fungi phosphorylation-PreSSFP was developed, which can identify fungi phosphorylation in seven species for specific serine, threonine and tyrosine residues (http://computbiol.ncu.edu.cn/PreSSFP). Meanwhile, we critically evaluated the performance of PreSSFP and compared it with other existing tools. The satisfying results showed that PreSSFP is a robust predictor. Feature analyses exhibited that there have some significant differences among seven species. The species-specific prediction via two-step feature optimization method to mine important features for training could considerably improve the prediction performance. We anticipate that our study provides a new lead for future computational analysis of fungi phosphorylation.


2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Qingyu Xiao ◽  
Benpeng Miao ◽  
Jie Bi ◽  
Zhen Wang ◽  
Yixue Li

Abstract Protein phosphorylation is an important type of post-translational modification that is involved in a variety of biological activities. Most phosphorylation events occur on serine, threonine and tyrosine residues in eukaryotes. In recent years, many phosphorylation sites have been identified as a result of advances in mass-spectrometric techniques. However, a large percentage of phosphorylation sites may be non-functional. Systematically prioritizing functional sites from a large number of phosphorylation sites will be increasingly important for the study of their biological roles. This study focused on exploring the intrinsic features of functional phosphorylation sites to predict whether a phosphosite is likely to be functional. We found significant differences in the distribution of evolutionary conservation, kinase association, disorder score, and secondary structure between known functional and background phosphorylation datasets. We built four different types of classifiers based on the most representative features and found that their performances were similar. We also prioritized 213,837 human phosphorylation sites from a variety of phosphorylation databases, which will be helpful for subsequent functional studies. All predicted results are available for query and download on our website (Predict Functional Phosphosites, PFP, http://pfp.biosino.org/pfp).


2015 ◽  
Vol 30 (1) ◽  
pp. 197-205 ◽  
Author(s):  
Baoqiang Tian ◽  
Ke Fan

Abstract A new statistical forecast scheme, referred to as scheme 1, is developed using observed autumn Atlantic sea surface temperature (SST) and Eurasian snow cover in the preceding autumn to predict the upcoming winter North Atlantic Oscillation (NAO) using the year-to-year increment prediction approach (i.e., DY approach). Two predictors for the year-to-year increment are identified that are available in the preceding autumn. Cross-validation tests for the period 1950–2011 and independent hindcasts for the period 1990–2011 are performed to validate the prediction ability of the proposed technique. The cross-validation test results for 1950–2011 reveal a high correlation coefficient of 0.52 (0.58) between the predicted and observed NAO indices (DY of the NAO). The model also successfully predicts the independent hindcasts for the period 1990–2011 with a correlation coefficient of 0.55 (0.74). In addition, scheme 0 (i.e., anomaly approach) is established using the SST and snow cover anomalies during the preceding autumn. Compared with scheme 0, this new prediction model has higher predictive skill in reproducing the interdecadal variability of NAO. Therefore, this study provides an effective climate prediction scheme for the interannual and interdecadal variability of NAO in boreal winter.


2008 ◽  
Vol 104 (4) ◽  
pp. 1220-1231 ◽  
Author(s):  
Ishtiaq Ahmad ◽  
Daniel C. Hoessli ◽  
Wajahat M. Qazi ◽  
Ahmed Khurshid ◽  
Abid Mehmood ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document