Toxicity prediction of small drug molecules of androgen receptor using multilevel ensemble model
In this study, efforts are created to develop a quantitative structure–activity relationship (QSAR)-based model, which are used for the prediction of toxicities to reduce testing in animals, time, and money in the early stages of drug development. An efficient machine learning model is developed to predict the toxicity of those drug molecules which binds to the androgen receptor (AR). Toxicity prediction is performed in terms of their activity, activity score, potency, and efficacy by using various physicochemical properties. A multilevel ensemble model is proposed, where its first level is performed ensemble-based classification of activity, and the second level is performed ensemble-based regression of activity score, potency, and efficacy of only those drug molecules which have been found active during the classification level. The AR dataset has 10,273 drug molecules where 461 are active, and 9812 are inactive, and each drug molecule has 1444 features. Therefore, our dataset is highly imbalanced having a very large number of features. Initially, we performed feature selection then the class imbalance problem is resolved. The [Formula: see text]-fold cross-validation is accomplished to measure the consistency of the model. Finally, our proposed multilevel ensemble model has been validated and compared with some existing models.