applicability domain
Recently Published Documents


TOTAL DOCUMENTS

129
(FIVE YEARS 32)

H-INDEX

30
(FIVE YEARS 3)

2022 ◽  
Author(s):  
Satoshi Endo

Polyparameter linear free energy relationships (PP-LFERs) are accurate and robust models to predict equilibrium partition coefficients (K) of organic chemicals. The accuracy of predictions by a PP-LEFR depends on the composition of the respective calibration data set. It is generally expected that extrapolation outside the model calibration domain is less accurate than interpolation. In this study, the applicability domain (AD) of PP-LFERs is systematically evaluated by calculation of the leverage (h), a measure of distance from the calibration set in the descriptor space. Repeated simulations with experimental data show that the root mean squared error of predictions increases with h, and that large prediction errors (>3 SDtraining, the standard deviation of training data) occur more frequently when h exceeds the common threshold of 3 hmean, where hmean is the mean h of all training compounds. Nevetheless, analysis also shows that well-calibrated PP-LFERs with many (e.g., 100), diverse, and accurate training data are highly robust against extrapolation; extreme prediction errors (> 5 SDtraining) are rare. For such PP-LFERs, 3 hmean may be too strict as the cutoff for AD. Evaluation of published PP-LFERs in terms of their AD using 25 chemically diverse, environmentally relevant chemicals as AD probes indicated that many reported PP-LFERs do not cover organosiloxanes, per- and polyfluorinated alkylsubstances, highly polar chemicals, and/or highly hydrophobic chemicals in their AD. It is concluded that calculation of h is useful to identify model extrapolations as well as the strengths and weaknesses of the trained PP-LFERs.


Molecules ◽  
2021 ◽  
Vol 26 (24) ◽  
pp. 7548
Author(s):  
Myung-Gyun Kang ◽  
Nam Sook Kang

Drug-induced liver injury (DILI) is a major concern for drug developers, regulators, and clinicians. However, there is no adequate model system to assess drug-associated DILI risk in humans. In the big data era, computational models are expected to play a revolutionary role in this field. This study aimed to develop a deep neural network (DNN)-based model using extended connectivity fingerprints of diameter 4 (ECFP4) to predict DILI risk. Each data set for the predictive model was retrieved and curated from DILIrank, LiverTox, and other literature. The best model was constructed through ten iterations of stratified 10-fold cross-validation, and the applicability domain was defined based on integer ECFP4 bits of the training set which represented substructures. For the robustness test, we employed the concept of the endurance level. The best model showed an accuracy of 0.731, a sensitivity of 0.714, and a specificity of 0.750 on the validation data set in the complete applicability domain. The model was further evaluated with four external data sets and attained an accuracy of 0.867 on 15 drugs with DILI cases reported since 2019. Overall, the results suggested that the ECFP4-based DNN model represents a new tool to identify DILI risk for the evaluation of drug safety.


2021 ◽  
Vol 16 (1) ◽  
pp. 251-265
Author(s):  
Afsar Jahan ◽  
Brij Kishore Sharma ◽  
Vishnu Dutt Sharma

QSAR study has been carried out on the MMP-13 inhibitory activity of fused pyrimidine derivatives possessing a1,2,4-triazol-3-yl group as a ZBG in 0D- to 2D-Dragon descriptors. The derived QSAR models have revealed that the number of Sulfur atoms (descriptor nS), Balaban mean square distance index (descriptor MSD), molecular electrotopological variation (descriptor DELS), structural information content index of neighborhood symmetry of 2nd and 3rd order (descriptors SIC2 and SIC3), average valence connectivity index chi-4 (descriptor X4Av) in addition to 1st order Galvez topological charge index (descriptor JGI1) and global topological charge index (descriptor JGT) played a pivotal role in rationalization of MMP-13 inhibition activity of titled compounds. Atomic properties such as mass and volume in terms of atomic properties weighted descriptors MATS5m and MATS3v, and certain atom centred fragments such as CH2RX (descriptor C-006), X--CX--X (descriptor C-044), H attached to heteroatom (descriptor H-050) and H attached to C0(sp3) with 1X attached to next C (descriptor H-052) are also predominant to explain MMP-13 inhibition actions of fused pyrimidines. PLS analysis has also corroborated the dominance of CP-MLR identified descriptors. Applicability domain analysis revealed that the suggested model matches the high-quality parameters with good fitting power and the capability of assessing external data and all of the compounds was within the applicability domain of the proposed model and were evaluated correctly.


Author(s):  
Huaqiang Wen ◽  
Yang Su ◽  
Zihao Wang ◽  
saimeng Jin ◽  
Jingzheng Ren ◽  
...  

Quantitative structure-property relationship (QSPR) studies based on deep neural networks (DNN) are receiving increasing attention due to their excellent performances. A systematic methodology coupling multiple machine learning technologies is proposed to solve vital problems including applicability domain and prediction uncertainty in DNN-based QSPRs. Key features are rapidly extracted from plentiful but chaotic descriptors by principal component analysis (PCA) and kernel PCA. Then, a detailed applicability domain (AD) is defined by K-means algorithm to avoid unreliable predictions and discover its potential impact on uncertainty. Moreover, prediction uncertainty is analyzed with dropout-embedded DNN by thousands of independent tests to assess the reliability of predictions. The prediction of flashpoint temperature is employed as a case study demonstrating that the model accuracy is remarkably improved comparing with the referenced model. More importantly, the proposed methodology breaks through difficulties in analyzing the uncertainty of DNN-based QSPRs and presents an AD correlated with the uncertainty.


2021 ◽  
Author(s):  
Shunya Sugita ◽  
Masahito Ohue

In the pursuit of research and development of drug discovery, the computational prediction of the target affinity of a drug candidate is useful for screening compounds at an early stage and for verifying the binding potential to an unknown target. The chemogenomics-based method has attracted increased attention as it integrates information pertaining to the drug and target to predict drug-target affinity (DTA). However, the compound and target spaces are vast, and without sufficient training data, proper DTA prediction is not possible. If a DTA prediction is made in this situation, it will potentially lead to false predictions. In this study, we propose a DTA prediction method that can advise whether/when there are insufficient samples in the compound/target spaces based on the concept of the applicability domain (AD) and the data density of the training dataset. AD indicates a data region in which a machine learning model can make reliable predictions. By preclassifying the samples to be predicted by the constructed AD into those within (In-AD) and those outside the AD (Out-AD), we can determine whether a reasonable prediction can be made for these samples. The results of the evaluation experiments based on the use of three different public datasets showed that the AD constructed by the k-nearest neighbor (k-NN) method worked well, i.e., the prediction accuracy of the samples classified by the AD as Out-AD was low, while the prediction accuracy of the samples classified by the AD as In-AD was high.


2021 ◽  
Author(s):  
Shunya Sugita ◽  
Masahito Ohue

In the pursuit of research and development of drug discovery, the computational prediction of the target affinity of a drug candidate is useful for screening compounds at an early stage and for verifying the binding potential to an unknown target. The chemogenomics-based method has attracted increased attention as it integrates information pertaining to the drug and target to predict drug-target affinity (DTA). However, the compound and target spaces are vast, and without sufficient training data, proper DTA prediction is not possible. If a DTA prediction is made in this situation, it will potentially lead to false predictions. In this study, we propose a DTA prediction method that can advise whether/when there are insufficient samples in the compound/target spaces based on the concept of the applicability domain (AD) and the data density of the training dataset. AD indicates a data region in which a machine learning model can make reliable predictions. By preclassifying the samples to be predicted by the constructed AD into those within (In-AD) and those outside the AD (Out-AD), we can determine whether a reasonable prediction can be made for these samples. The results of the evaluation experiments based on the use of three different public datasets showed that the AD constructed by the k-nearest neighbor (k-NN) method worked well, i.e., the prediction accuracy of the samples classified by the AD as Out-AD was low, while the prediction accuracy of the samples classified by the AD as In-AD was high.


Molecules ◽  
2021 ◽  
Vol 26 (7) ◽  
pp. 2098
Author(s):  
Angelica Mazzolari ◽  
Luca Sommaruga ◽  
Alessandro Pedretti ◽  
Giulio Vistoli

(1) Background: Data accuracy plays a key role in determining the model performances and the field of metabolism prediction suffers from the lack of truly reliable data. To enhance the accuracy of metabolic data, we recently proposed a manually curated database collected by a meta-analysis of the specialized literature (MetaQSAR). Here we aim to further increase data accuracy by focusing on publications reporting exhaustive metabolic trees. This selection should indeed reduce the number of false negative data. (2) Methods: A new metabolic database (MetaTREE) was thus collected and utilized to extract a dataset for metabolic data concerning glutathione conjugation (MT-dataset). After proper pre-processing, this dataset, along with the corresponding dataset extracted from MetaQSAR (MQ-dataset), was utilized to develop binary classification models using a random forest algorithm. (3) Results: The comparison of the models generated by the two collected datasets reveals the better performances reached by the MT-dataset (MCC raised from 0.63 to 0.67, sensitivity from 0.56 to 0.58). The analysis of the applicability domain also confirms that the model based on the MT-dataset shows a more robust predictive power with a larger applicability domain. (4) Conclusions: These results confirm that focusing on metabolic trees represents a convenient approach to increase data accuracy by reducing the false negative cases. The encouraging performances shown by the models developed by the MT-dataset invites to use of MetaTREE for predictive studies in the field of xenobiotic metabolism.


Sign in / Sign up

Export Citation Format

Share Document