Efficient Classification of Hot Spots and Hub Protein Interfaces by Recursive Feature Elimination and Gradient Boosting

2020 ◽  
Vol 17 (5) ◽  
pp. 1525-1534
Author(s):  
Xiaoli Lin ◽  
Xiaolong Zhang ◽  
Xin Xu

Protein-Protein Interactions referred as PPIs perform significant role in biological functions like cell metabolism, immune response, signal transduction etc. Hot spots are small fractions of residues in interfaces and provide substantial binding energy in PPIs. Therefore, identification of hot spots is important to discover and analyze molecular medicines and diseases. The current strategy, alanine scanning isn't pertinent to enormous scope applications since the technique is very costly and tedious. The existing computational methods are poor in classification performance as well as accuracy in prediction. They are concerned with the topological structure and gene expression of hub proteins. The proposed system focuses on hot spots of hub proteins by eliminating redundant as well as highly correlated features using Pearson Correlation Coefficient and Support Vector Machine based feature elimination. Extreme Gradient boosting and LightGBM algorithms are used to ensemble a set of weak classifiers to form a strong classifier. The proposed system shows better accuracy than the existing computational methods. The model can also be used to predict accurate molecular inhibitors for specific PPIs


2019 ◽  
Author(s):  
Amaurys Ibarra ◽  
Gail J. Bartlett ◽  
Zsofia Hegedus ◽  
Som Dutt ◽  
Fruzsina Hobor ◽  
...  

Here we describe a comparative analysis of multiple CAS methods, which highlights effective approaches to improve the accuracy of predicting hot-spot residues. Alongside this, we introduce a new method, BUDE Alanine Scanning, which can be applied to single structures from crystallography, and to structural ensembles from NMR or molecular dynamics data. The comparative analyses facilitate accurate prediction of hot-spots that we validate experimentally with three diverse targets: NOXA-B/MCL-1 (an α helix-mediated PPI), SIMS/SUMO and GKAP/SHANK-PDZ (both β strand-mediated interactions). Finally, the approach is applied to the accurate prediction of hot-residues at a topographically novel Affimer/BCL-xL protein-protein interface.


In pharmaceutical research, traditional drug discovery process is time consuming and expensive, where several compounds are experimentally tested for their biological activities. Series of lab experiments are conducted to analyze newly synthesized drug’s pharmaceutical activities and its biological effects on human. With every new drug discovery, the required clinical properties can be determined using machine learning models and this greatly reduces the experimental cost. This paper explores parametric and non-parametric machine learning models to classify administration properties of drugs and its toxicity. The multinomial classification of drugs was based on their physicochemical and ADMET properties. Balanced data samples were drawn from chEMBL and was pre-processed. Features were reduced using Recursive Feature Elimination and the attributes were ranked based on their importance to reduce highly correlated attributes. The performance of parametric and non-parametric machine learning models was analyzed on cheminformatic data that includes physiochemical, biological and pharmaceutical properties of the drug molecules. Selecting the potent drug candidate along with its administration properties greatly reduces wet lab experimental time and cost. Multiclass classification can be determined efficiently using non-parametric machine learning model. Optimal feature engineering, tuning hyperparameters and adopting hybrid algorithms would result in more accurate predictions in future for cheminformatics data.


2021 ◽  
Author(s):  
Hanna Klimczak ◽  
Wojciech Kotłowski ◽  
Dagmara Oszkiewicz ◽  
Francesca DeMeo ◽  
Agnieszka Kryszczyńska ◽  
...  

<p>The aim of the project is the classification of asteroids according to the most commonly used asteroid taxonomy (Bus-Demeo et al. 2009) with the use of various machine learning methods like Logistic Regression, Naive Bayes, Support Vector Machines, Gradient Boosting and Multilayer Perceptrons. Different parameter sets are used for classification in order to compare the quality of prediction with limited amount of data, namely the difference in performance between using the 0.45mu to 2.45mu spectral range and multiple spectral features, as well as performing the Prinicpal Component Analysis to reduce the dimensions of the spectral data.</p> <p> </p> <p>This work has been supported by grant No. 2017/25/B/ST9/00740 from the National Science Centre, Poland.</p>


2012 ◽  
Vol 52 (8) ◽  
pp. 2236-2244 ◽  
Author(s):  
Brandon S. Zerbe ◽  
David R. Hall ◽  
Sandor Vajda ◽  
Adrian Whitty ◽  
Dima Kozakov

Symmetry ◽  
2019 ◽  
Vol 11 (2) ◽  
pp. 256 ◽  
Author(s):  
Jiangyong An ◽  
Wanyi Li ◽  
Maosong Li ◽  
Sanrong Cui ◽  
Huanran Yue

Drought stress seriously affects crop growth, development, and grain production. Existing machine learning methods have achieved great progress in drought stress detection and diagnosis. However, such methods are based on a hand-crafted feature extraction process, and the accuracy has much room to improve. In this paper, we propose the use of a deep convolutional neural network (DCNN) to identify and classify maize drought stress. Field drought stress experiments were conducted in 2014. The experiment was divided into three treatments: optimum moisture, light drought, and moderate drought stress. Maize images were obtained every two hours throughout the whole day by digital cameras. In order to compare the accuracy of DCNN, a comparative experiment was conducted using traditional machine learning on the same dataset. The experimental results demonstrated an impressive performance of the proposed method. For the total dataset, the accuracy of the identification and classification of drought stress was 98.14% and 95.95%, respectively. High accuracy was also achieved on the sub-datasets of the seedling and jointing stages. The identification and classification accuracy levels of the color images were higher than those of the gray images. Furthermore, the comparison experiments on the same dataset demonstrated that DCNN achieved a better performance than the traditional machine learning method (Gradient Boosting Decision Tree GBDT). Overall, our proposed deep learning-based approach is a very promising method for field maize drought identification and classification based on digital images.


Sign in / Sign up

Export Citation Format

Share Document