Retrained Classification of Tyrosinase Inhibitors and “In Silico” Potency Estimation by Using Atom-Type Linear Indices

Author(s):  
Gerardo M. Casañola-Martín ◽  
Mahmud Tareq Hassan Khan ◽  
Huong Le-Thi-Thu ◽  
Yovani Marrero-Ponce ◽  
Ramón García-Domenech ◽  
...  

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.

In this paper, the authors present an effort to increase the applicability domain (AD) by means of retraining models using a database of 701 great dissimilar molecules presenting anti-tyrosinase activity and 728 drugs with other uses. Atom-based linear indices and best subset linear discriminant analysis (LDA) were used to develop individual classification models. Eighteen individual classification-based QSAR models for the tyrosinase inhibitory activity were obtained with global accuracy varying from 88.15-91.60% in the training set and values of Matthews correlation coefficients (C) varying from 0.76-0.82. The external validation set shows globally classifications above 85.99% and 0.72 for C. All individual models were validated and fulfilled by OECD principles. A brief analysis of AD for the training set of 478 compounds and the new active compounds included in the re-training was carried out. Various assembled multiclassifier systems contained eighteen models using different selection criterions were obtained, which provide possibility of select the best strategy for particular problem. The various assembled multiclassifier systems also estimated the potency of active identified compounds. Eighteen validated potency models by OECD principles were used.


2019 ◽  
Vol 37 (15_suppl) ◽  
pp. e15718-e15718
Author(s):  
Shuichi Mitsunaga ◽  
Shogo Nomura ◽  
Kazuo Hara ◽  
Yukiko Takayama ◽  
Makoto Ueno ◽  
...  

e15718 Background: The diagnostic value of serum microRNAs (miRNA) in a highly sensitive microarray for pancreatobiliary cancer (PBca) has been demonstrated. This study attempted to build and validate a signature comprised of multiple serum miRNA markers for discriminating PBca from healthy controls. Methods: A multicenter prospective study on the diagnostic performance of serum miRNAs was conducted. The patients (pts) with treatment-naïve PBca and healthy participants aged ≥60 years were enrolled. Clinical data and sera were collected. Target population was randomly divided to training or validation cohort with an allocation ratio of 2:1. Twenty-nine serum miRNA markers on the microarray data were analyzed. Using any combinations of the markers, a Fisher’s linear discriminant analysis was performed, and the resulting sensitivity, specificity and AUC of ROC curve to discriminate PBca from healthy controls were calculated for each combination. Marker combinations with a sensitivity/specificity (SN/SP) of ≥80%/90% and high AUC in comparison with AUC of CA19-9 were defined as the diagnostic miRNA signature, which were selected in the training cohort. Next, the signatures were screened out which showed a good reproducibility in the validation cohort. As an independent external cohort, PBca pts and healthy with pooled frozen sera were enrolled and the identified miRNA signatures were further validated. Results: Total of 546 participants (80 healthy and 223 PBca in training set, 40 healthy and 104 PBca in validation set, 49 healthy and 50 PBca in external validation set) were analyzed in this study. Four serum miRNA combinations were identified as the diagnostic miRNA signature. In the training set, four miRNA signatures, consisted of 10 miRNAs, were developed. For the best-performed miRNA signature, the SN/SP and AUC in the validation and external validation cohorts were 84/90% and 0.95 (CA19-9: 73/95% and 0.88) and 84/90% and 0.93 (CA19-9: 80/94% and 0.87), respectively. Conclusions: The diagnostic serum miRNA signatures for PBca were identified in this study.


Author(s):  
Shinjita Ghosh ◽  
Supratik Kar ◽  
Jerzy Leszczynski

Birds or avians have been imperative species in the ecology, having been evaluated in an effort to understand the toxic effects of endocrine disruption. The ecotoxicity of 56 industrial chemicals classified as endocrine disruptors were modeled employing classification and regression-based quantitative structure-activity relationship (QSAR) models to an important avian species, Anas platyrhynchos. The classification- and regression-based QSAR models were developed using linear discriminant analysis (LDA) and partial least squares (PLS) tools, respectively. All models were validated meticulously by employing internal and external validation metrics followed by randomization test, applicability domain (AD) study, and intelligent consensus prediction of all individual models. Features like topological distance of 1, 3, and 5 between atoms O-P, C-P, and N-S, correspondingly, along with the CR3X fragment, can be responsible for an increase in toxicity. On the contrary, the presence of S-Cl with topological distance 6 is accountable for lowering the toxicity of towards A. platyrhynchos. The developed chemometric models can offer significant evidence and guidance in the framework of virtual screening as well as a toxicity prediction of new and/or untested chemical libraries towards this specific avian species.


2019 ◽  
Vol 31 (5) ◽  
pp. 665-673 ◽  
Author(s):  
Maud Menard ◽  
Alexis Lecoindre ◽  
Jean-Luc Cadoré ◽  
Michèle Chevallier ◽  
Aurélie Pagnon ◽  
...  

Accurate staging of hepatic fibrosis (HF) is important for treatment and prognosis of canine chronic hepatitis. HF scores are used in human medicine to indirectly stage and monitor HF, decreasing the need for liver biopsy. We developed a canine HF score to screen for moderate or greater HF. We included 96 dogs in our study, including 5 healthy dogs. A liver biopsy for histologic examination and a biochemistry profile were performed on all dogs. The dogs were randomly split into a training set of 58 dogs and a validation set of 38 dogs. A HF score that included alanine aminotransferase, alkaline phosphatase, total bilirubin, potassium, and gamma-glutamyl transferase was developed in the training set. Model performance was confirmed using the internal validation set, and was similar to the performance in the training set. The overall sensitivity and specificity for the study group were 80% and 70% respectively, with an area under the curve of 0.80 (0.71–0.90). This HF score could be used for indirect diagnosis of canine HF when biochemistry panels are performed on the Konelab 30i (Thermo Scientific), using reagents as in our study. External validation is required to determine if the score is sufficiently robust to utilize biochemical results measured in other laboratories with different instruments and methodologies.


Molecules ◽  
2019 ◽  
Vol 24 (10) ◽  
pp. 2006 ◽  
Author(s):  
Liadys Mora Lagares ◽  
Nikola Minovski ◽  
Marjana Novič

P-glycoprotein (P-gp) is a transmembrane protein that actively transports a wide variety of chemically diverse compounds out of the cell. It is highly associated with the ADMET (absorption, distribution, metabolism, excretion and toxicity) properties of drugs/drug candidates and contributes to decreasing toxicity by eliminating compounds from cells, thereby preventing intracellular accumulation. Therefore, in the drug discovery and toxicological assessment process it is advisable to pay attention to whether a compound under development could be transported by P-gp or not. In this study, an in silico multiclass classification model capable of predicting the probability of a compound to interact with P-gp was developed using a counter-propagation artificial neural network (CP ANN) based on a set of 2D molecular descriptors, as well as an extensive dataset of 2512 compounds (1178 P-gp inhibitors, 477 P-gp substrates and 857 P-gp non-active compounds). The model provided a good classification performance, producing non error rate (NER) values of 0.93 for the training set and 0.85 for the test set, while the average precision (AvPr) was 0.93 for the training set and 0.87 for the test set. An external validation set of 385 compounds was used to challenge the model’s performance. On the external validation set the NER and AvPr values were 0.70 for both indices. We believe that this in silico classifier could be effectively used as a reliable virtual screening tool for identifying potential P-gp ligands.


2020 ◽  
Author(s):  
Ruyi Zhang ◽  
Mei Xu ◽  
Xiangxiang Liu ◽  
Miao Wang ◽  
Qiang Jia ◽  
...  

Abstract Objectives To develop a clinically predictive nomogram model which can maximize patients’ net benefit in terms of predicting the prognosis of patients with thyroid carcinoma based on the 8th edition of the AJCC Cancer Staging method. MethodsWe selected 134,962 thyroid carcinoma patients diagnosed between 2004 and 2015 from SEER database with details of the 8th edition of the AJCC Cancer Staging Manual and separated those patients into two datasets randomly. The first dataset, training set, was used to build the nomogram model accounting for 80% (94,474 cases) and the second dataset, validation set, was used for external validation accounting for 20% (40,488 cases). Then we evaluated its clinical availability by analyzing DCA (Decision Curve Analysis) performance and evaluated its accuracy by calculating AUC, C-index as well as calibration plot.ResultsDecision curve analysis showed the final prediction model could maximize patients’ net benefit. In training set and validation set, Harrell’s Concordance Indexes were 0.9450 and 0.9421 respectively. Both sensitivity and specificity of three predicted time points (12 Months,36 Months and 60 Months) of two datasets were all above 0.80 except sensitivity of 60-month time point of validation set was 0.7662. AUCs of three predicted timepoints were 0.9562, 0.9273 and 0.9009 respectively for training set. Similarly, those numbers were 0.9645, 0.9329, and 0.8894 respectively for validation set. Calibration plot also showed that the nomogram model had a good calibration.ConclusionThe final nomogram model provided with both excellent accuracy and clinical availability and should be able to predict patients’ survival probability visually and accurately.


2021 ◽  
Vol 11 ◽  
Author(s):  
Aihua Wu ◽  
Zhigang Liang ◽  
Songbo Yuan ◽  
Shanshan Wang ◽  
Weidong Peng ◽  
...  

BackgroundThe diagnostic value of clinical and laboratory features to differentiate between malignant pleural effusion (MPE) and benign pleural effusion (BPE) has not yet been established.ObjectivesThe present study aimed to develop and validate the diagnostic accuracy of a scoring system based on a nomogram to distinguish MPE from BPE.MethodsA total of 1,239 eligible patients with PE were recruited in this study and randomly divided into a training set and an internal validation set at a ratio of 7:3. Logistic regression analysis was performed in the training set, and a nomogram was developed using selected predictors. The diagnostic accuracy of an innovative scoring system based on the nomogram was established and validated in the training, internal validation, and external validation sets (n = 217). The discriminatory power and the calibration and clinical values of the prediction model were evaluated.ResultsSeven variables [effusion carcinoembryonic antigen (CEA), effusion adenosine deaminase (ADA), erythrocyte sedimentation rate (ESR), PE/serum CEA ratio (CEA ratio), effusion carbohydrate antigen 19-9 (CA19-9), effusion cytokeratin 19 fragment (CYFRA 21-1), and serum lactate dehydrogenase (LDH)/effusion ADA ratio (cancer ratio, CR)] were validated and used to develop a nomogram. The prediction model showed both good discrimination and calibration capabilities for all sets. A scoring system was established based on the nomogram scores to distinguish MPE from BPE. The scoring system showed favorable diagnostic performance in the training set [area under the curve (AUC) = 0.955, 95% confidence interval (CI) = 0.942–0.968], the internal validation set (AUC = 0.952, 95% CI = 0.932–0.973), and the external validation set (AUC = 0.973, 95% CI = 0.956–0.990). In addition, the scoring system achieved satisfactory discriminative abilities at separating lung cancer-associated MPE from tuberculous pleurisy effusion (TPE) in the combined training and validation sets.ConclusionsThe present study developed and validated a scoring system based on seven parameters. The scoring system exhibited a reliable diagnostic performance in distinguishing MPE from BPE and might guide clinical decision-making.


PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0244693
Author(s):  
Lingchen Wang ◽  
Wenhua Wang ◽  
Shaopeng Zeng ◽  
Huilie Zheng ◽  
Quqin Lu

Breast cancer is the most common malignant disease in women. Metastasis is the foremost cause of death. Breast tumor cells have a proclivity to metastasize to specific organs. The lung is one of the most common sites of breast cancer metastasis. Therefore, we aimed to build a useful and convenient prediction tool based on several genes that may affect lung metastasis-free survival (LMFS). We preliminarily identified 319 genes associated with lung metastasis in the training set GSE5327 (n = 58). Enrichment analysis of GO functions and KEGG pathways was conducted based on these genes. The best genes for modeling were selected using a robust likelihood-based survival modeling approach: GOLGB1, TMEM158, CXCL8, MCM5, HIF1AN, and TSPAN31. A prognostic nomogram for predicting lung metastasis in breast cancer was developed based on these six genes. The effectiveness of the nomogram was evaluated in the training set GSE5327 and the validation set GSE2603. Both the internal validation and the external validation manifested the effectiveness of our 6-gene prognostic nomogram in predicting the lung metastasis risk of breast cancer patients. On the other hand, in the validation set GSE2603, we found that neither the six genes in the nomogram nor the risk predicted by the nomogram were associated with bone metastasis of breast cancer, preliminarily suggesting that these genes and nomogram were specifically associated with lung metastasis of breast cancer. What’s more, five genes in the nomogram were significantly differentially expressed between breast cancer and normal breast tissues in the TIMER database. In conclusion, we constructed a new and convenient prediction model based on 6 genes that showed practical value in predicting the lung metastasis risk for clinical breast cancer patients. In addition, some of these genes could be treated as potential metastasis biomarkers for antimetastatic therapy in breast cancer. The evolution of this nomogram will provide a good reference for the prediction of tumor metastasis to other specific organs.


Molecules ◽  
2020 ◽  
Vol 25 (10) ◽  
pp. 2332 ◽  
Author(s):  
Alessandra Biancolillo ◽  
Martina Foschi ◽  
Angelo Antonio D’Archivio

One-hundred and fourteen samples of saffron harvested in four different Italian areas (three in Central Italy and one in the South) were investigated by IR and UV-Vis spectroscopies. Two different multi-block strategies, Sequential and Orthogonalized Partial Least Squares Linear Discriminant Analysis (SO-PLS-LDA) and Sequential and Orthogonalized Covariance Selection Linear Discriminant Analysis (SO-CovSel-LDA), were used to simultaneously handle the two data blocks and classify samples according to their geographical origin. Both multi-block approaches provided very satisfying results. Each model was investigated in order to understand which spectral variables contribute the most to the discrimination of samples, i.e., to the characterization of saffron harvested in the four different areas. The most accurate solution was provided by SO-PLS-LDA, which only misclassified three test samples over 31 (in external validation).


2021 ◽  
Author(s):  
Jiejun Lin ◽  
Huang Su ◽  
Yaqi Guan ◽  
Qingjie Zhou ◽  
Jie Pan ◽  
...  

Abstract Background and Aim. It is of importance to predict the risk of gastric cancer (GC) for endoscopists because early detection of GC determines the determines the selection of best treatment strategy and the prognosis of patients. The aim of the study was to evaluate the utility of a predictive nomogram based on Kyoto classification of gastritis for GC. Methods. It was a retrospective study that included 2639 patients who received esophagogastroduodenoscopy and serum pepsinogen (PG) assay from January 2020 to November 2020 at the Endoscopy Center of the Department of Gastroenterology, Wenzhou Central Hospital. Routine biopsy was conducted to determine the benign and malignant lesions pathologically. All cases were randomly divided into the training set (70%) and the validation set (30%) by using bootstrap method. A nomogram was formulated according to multivariate analysis of training set. The predictive accuracy and discriminative ability of the nomogram were assessed by concordance index (C-index), area under the curve (AUC) of receiver operating characteristic curve (ROC) as well as calibration curve and were validated by validation set.Results. Multivariate analysis indicated that age, sex, PG I/II ratio and Kyoto classification scores were independent predictive variables for GC. The C-index of the nomogram of the training set was 0.79 (95% CI: 0.74 to 0.84) and the AUC of ROC is 0.79. The calibration curve of the nomogram demonstrated an optimal agreement between predicted probability and observed probability of the risk of GC. In the validation set, the C-index was 0.86 (95% CI: 0.79 to 0.94) with a calibration curve of better concurrence.Conclusion. The nomogram formulated was proven to be of high predictive value for GC.


Sign in / Sign up

Export Citation Format

Share Document