scholarly journals Toxicity prediction of small drug molecules of aryl hydrocarbon receptor using a proposed ensemble model

2019 ◽  
Vol 27 (4) ◽  
pp. 2883-2849
Author(s):  
Vishan Kumar GUPTA ◽  
Prashant Singh RANA
2019 ◽  
Vol 17 (05) ◽  
pp. 1950033 ◽  
Author(s):  
Vishan Kumar Gupta ◽  
Prashant Singh Rana

In this study, efforts are created to develop a quantitative structure–activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development. An efficient machine learning model is developed to predict the toxicity of those drug molecules which binds to the androgen receptor (AR). Toxicity prediction is performed in terms of their activity, activity score, potency, and efficacy by using various physicochemical properties. A multilevel ensemble model is proposed, where its first level is performed ensemble-based classification of activity, and the second level is performed ensemble-based regression of activity score, potency, and efficacy of only those drug molecules which have been found active during the classification level. The AR dataset has 10,273 drug molecules where 461 are active, and 9812 are inactive, and each drug molecule has 1444 features. Therefore, our dataset is highly imbalanced having a very large number of features. Initially, we performed feature selection then the class imbalance problem is resolved. The [Formula: see text]-fold cross-validation is accomplished to measure the consistency of the model. Finally, our proposed multilevel ensemble model has been validated and compared with some existing models.


2020 ◽  
Author(s):  
Vishan Kumar Gupta ◽  
Prashant Singh Rana

Abstract The in-silico toxicity prediction techniques are useful to reduce rodents testing (in-vivo). Authors have proposed a computational method (in silico) for the toxicity prediction of small drug molecules using their various physicochemical properties (molecular descriptors), which can bind to the antioxidant response elements (AREs). The software PaDEL-Descriptor is used for extracting the different features of drug molecules. The ARE data set has total 7439 drug molecules, of which 1147 are active and 6292 are inactive, and each drug molecule contains 1444 features. We have proposed a novel ensemble-based model that can efficiently classify active (binding) and inactive (non-binding) compounds of the data set. Initially, we performed feature selection using random forest importance algorithm in R, and subsequently, we have resolved the class imbalance issue by ensemble learning method itself, where we divided the data set into five data frames, which have an almost equal number of active and inactive drug molecules. An ensemble model based upon the votes of four base classifiers is proposed, which gives an accuracy of 97.14%. The K-fold cross-validation is conducted to measure the consistency of the proposed ensemble model. Finally, the proposed ensemble model is validated on some new drug molecules and compared with some existing models.


Sign in / Sign up

Export Citation Format

Share Document