scholarly journals The impact of compound library size on the performance of scoring functions for structure-based virtual screening

2020 ◽  
Author(s):  
Louison Fresnais ◽  
Pedro J. Ballester

AbstractLarger training datasets have been shown to improve the accuracy of Machine Learning (ML)-based Scoring functions (SFs) for Structure-Based Virtual Screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with at least nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs.We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets, the difference was not significant in the remaining two targets). A three-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those [email protected] informationan online-only supplementary results file is enclosed.Biographical NoteL. Fresnais carried out a master research project directly supervised by P.J Ballester and he will soon be starting a PhD.P.J Ballester has been working on virtual screening for over 15 years now. He is group leader and research scientist at cancer research centre of INSERM, the French National Institute of Health & Medical Research.

Author(s):  
Louison Fresnais ◽  
Pedro J Ballester

Abstract Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.


2021 ◽  
Author(s):  
Sarah Hall-Swan ◽  
Dinler A. Antunes ◽  
Didier Devaurs ◽  
Mauricio M. Rigo ◽  
Lydia E. Kavraki ◽  
...  

AbstractMotivationRecent efforts to computationally identify inhibitors for SARS-CoV-2 proteins have largely ignored the issue of receptor flexibility. We have implemented a computational tool for ensemble docking with the SARS-CoV-2 proteins, including the main protease (Mpro), papain-like protease (PLpro) and RNA-dependent RNA polymerase (RdRp).ResultsEnsembles of other SARS-CoV-2 proteins are being prepared and made available through a user-friendly docking interface. Plausible binding modes between conformations of a selected ensemble and an uploaded ligand are generated by DINC, our parallelized meta-docking tool. Binding modes are scored with three scoring functions, and account for the flexibility of both the ligand and receptor. Additional details on our methods are provided in the supplementary material.Availabilitydinc-covid.kavrakilab.orgSupplementary informationDetails on methods for ensemble generation and docking are provided as supplementary data [email protected], [email protected]


2019 ◽  
Vol 35 (20) ◽  
pp. 3989-3995 ◽  
Author(s):  
Hongjian Li ◽  
Jiangjun Peng ◽  
Pavel Sidorov ◽  
Yee Leung ◽  
Kwong-Sak Leung ◽  
...  

Abstract Motivation Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. Results We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. Availability and implementation https://github.com/HongjianLi/MLSF Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (23) ◽  
pp. 4994-5002 ◽  
Author(s):  
Jiahua He ◽  
Huanyu Tao ◽  
Sheng-You Huang

AbstractMotivationGiven the importance of protein–ribonucleic acid (RNA) interactions in many biological processes, a variety of docking algorithms have been developed to predict the complex structure from individual protein and RNA partners in the past decade. However, due to the impact of molecular flexibility, the performance of current methods has hit a bottleneck in realistic unbound docking. Pushing the limit, we have proposed a protein-ensemble–RNA docking strategy to explicitly consider the protein flexibility in protein–RNA docking through an ensemble of multiple protein structures, which is referred to as MPRDock. Instead of taking conformations from MD simulations or experimental structures, we obtained the multiple structures of a protein by building models from its homologous templates in the Protein Data Bank (PDB).ResultsOur approach can not only avoid the reliability issue of structures from MD simulations but also circumvent the limited number of experimental structures for a target protein in the PDB. Tested on 68 unbound–bound and 18 unbound–unbound protein–RNA complexes, our MPRDock/DITScorePR considerably improved the docking performance and achieved a significantly higher success rate than single-protein rigid docking whether pseudo-unbound templates are included or not. Similar improvements were also observed when combining our ensemble docking strategy with other scoring functions. The present homology model-based ensemble docking approach will have a general application in molecular docking for other interactions.Availability and implementationhttp://huanglab.phys.hust.edu.cn/mprdock/Supplementary informationSupplementary data are available at Bioinformatics online.


2020 ◽  
Vol 12 ◽  
Author(s):  
Sai Akilesh M ◽  
Ashish Wadhwani

: Infectious diseases have been prevalent since many decades and viral pathogens have caused global health crisis and economic meltdown on a devastating scale. High occurrence of newer viral infections in the recent years, in spite of the progress achieved in the field of pharmaceutical sciences defines the critical need for newer and more effective antiviral therapies and diagnostics. The incidence of multi-drug resistance and adverse effects due to the prolonged use of anti-viral therapy is also a major concern. Nanotechnology offers a cutting edge platform for the development of novel compounds and formulations for biomedical applications. The unique properties of nano-based materials can be attributed to the multi-fold increase in the surface to volume ratio at the nano-scale, tunable surface properties of charge and chemical moieties. Idealistic pharmaceutical properties such as increased bioavailability and retention times, lower toxicity profiles, sustained release formulations, lower dosage forms and most importantly, targeted drug delivery can be achieved through the approach of nanotechnology. The extensively researched nano-based materials are metal and polymeric nanoparticles, dendrimers and micelles, nano-drug delivery vesicles, liposomes and lipid based nanoparticles. In this review article, the impact of nanotechnology on the treatment of Human Immunodeficiency Virus (HIV) and Herpes Simplex Virus (HSV) viral infections during the last decade are outlined.


2020 ◽  
Vol 7 (04) ◽  
Author(s):  
A B Priyanshu ◽  
M K Singh ◽  
Mukesh Kumar ◽  
Vipin Kumar ◽  
Sunil Malik ◽  
...  

An experiment was conducted at Horticultural Research Centre, SVP University of Agriculture and Technology, Meerut (UP) during Rabi season of 2018-19 to assess the impact of different INM doses on yield and quality parameters of garlic. A total of ten treatments consisting of combinations of inorganic fertilizers, organic fertilizers and bio-fertilizers like T1- (Control), T2RDF (100:50:50 kg NPK ha-1), T3-RDF + 20 kg sulphur + FYM 20 ton ha-1, T4- RDF + 20 kg sulphur + VC 4 ton ha-1, T5- 75% RDF + 40 kg sulphur + 5 ton FYM ha-1+ PSB 5 kg ha-1, T6-75% RDF + 40 kg sulphur + 2 ton VC + Azotobacter 5 kg ha-1, T7- 75% RDF + 40 kg sulphur + FYM 3 ton + VC 1 ton+ PSB 5 kg + Azotobacter 5kg ha-1, T8- 50% RDF + 40 kg sulphur + FYM 5 ton + VC2 ton + PSB 5 kg ha-1, T9- 50% RDF + 40 kg sulphur + FYM5 ton + VC 2 ton+ Azotobacter 5 kg ha-1and T10- 50% RDF + 40 kg sulphur + FYM 5 ton + VC2 ton + PSB 5 kg + Azotobacter 5 kg ha-1 were used in Randomized Block Design and replicated thrice. Out of these an application of T7 (75% RDF + 40 kg sulphur + FYM3 ton + VC 1 ton ha-1 + PSB 5 kg + Azotobacter 5 kg ha-1) was found to be significantly superior in term of yield and attributing parameters of garlic.


2020 ◽  
Vol 41 (S1) ◽  
pp. s111-s112
Author(s):  
Mohammed Alsuhaibani ◽  
Mohammed Alzunitan ◽  
Kyle Jenn ◽  
Daniel Diekema ◽  
Michael Edmond ◽  
...  

Background: Surveillance for surgical site infections (SSI) is recommended by the CDC. Currently, colon and abdominal hysterectomy SSI rates are publicly available and impact hospital reimbursement. However, the CDC NHSN allows surgical procedures to be abstracted based on International Classification of Diseases, Tenth Revision (ICD-10) or current procedural terminology (CPT) codes. We assessed the impact of using ICD and/or CPT codes on the number of cases abstracted and SSI rates. Methods: We retrieved administrative codes (ICD and/or CPT) for procedures performed at the University of Iowa Hospitals & Clinics over 1 year: October 2018–September 2019. We included 10 procedure types: colon, hysterectomy, cesarean section, breast, cardiac, craniotomy, spinal fusion, laminectomy, hip prosthesis, and knee prosthesis surgeries. We then calculated the number of procedures that would be abstracted if we used different permutations in administration codes: (1) ICD codes only, (2) CPT codes only, (3) both ICD and CPT codes, and (4) at least 1 code from either ICD or CPT. We then calculated the impact on SSI rates based on any of the 4 coding permutations. Results: In total, 9,583 surgical procedures and 180 SSIs were detected during the study period using the fourth method (ICD or CPT codes). Denominators varied according to procedure type and coding method used. The number of procedures abstracted for breast surgery had a >10-fold difference if reported based on ICD only versus ICD or CPT codes (104 vs 1,109). Hip prosthesis had the lowest variation (638 vs 767). For SSI rates, cesarean section showed almost a 3-fold increment (2.6% when using ICD only to 7.32% with both ICD & CPT), whereas abdominal hysterectomy showed nearly a 2-fold increase (1.14% when using CPT only to 2.22% with both ICD & CPT codes). However, SSI rates remained fairly similar for craniotomy (0.14% absolute difference), hip prosthesis (0.24% absolute difference), and colon (0.09% absolute difference) despite differences in the number of abstracted procedures and coding methods. Conclusions: Denominators and SSI rates vary depending on the coding method used. Variations in the number of procedures abstracted and their subsequent impact on SSI rates were not predictable. Variations in coding methods used by hospitals could impact interhospital comparisons and benchmarking, potentially leading to disparities in public reporting and hospital penalties.Funding: NoneDisclosures: None


2021 ◽  
Vol 14 (5) ◽  
pp. 453
Author(s):  
Gabriela Wiergowska ◽  
Dominika Ludowicz ◽  
Kamil Wdowiak ◽  
Andrzej Miklaszewski ◽  
Kornelia Lewandowska ◽  
...  

To improve physicochemical properties of vardenafil hydrochloride (VAR), its amorphous form and combinations with excipients—hydroxypropyl methylcellulose (HPMC) and β-cyclodextrin (β-CD)—were prepared. The impact of the modification on physicochemical properties was estimated by comparing amorphous mixtures of VAR to their crystalline form. The amorphous form of VAR was obtained as a result of the freeze-drying process. Confirmation of the identity of the amorphous dispersion of VAR was obtained through the use of comprehensive analysis techniques—X-ray powder diffraction (PXRD) and differential scanning calorimetry (DSC), supported by FT-IR (Fourier-transform infrared spectroscopy) coupled with density functional theory (DFT) calculations. The amorphous mixtures of VAR increased its apparent solubility compared to the crystalline form. Moreover, a nearly 1.3-fold increase of amorphous VAR permeability through membranes simulating gastrointestinal epithelium as a consequence of the changes of apparent solubility (Papp crystalline VAR = 6.83 × 10−6 cm/s vs. Papp amorphous VAR = 8.75 × 10−6 cm/s) was observed, especially for its combinations with β-CD in the ratio of 1:5—more than 1.5-fold increase (Papp amorphous VAR = 8.75 × 10−6 cm/s vs. Papp amorphous VAR:β-CD 1:5 = 13.43 × 10−6 cm/s). The stability of the amorphous VAR was confirmed for 7 months. The HPMC and β-CD are effective modifiers of its apparent solubility and permeation through membranes simulating gastrointestinal epithelium, suggesting a possibility of a stronger pharmacological effect.


Author(s):  
Marta Oliveira ◽  
Sílvia Capelas ◽  
Cristina Delerue-Matos ◽  
Simone Morais

Grilling activities release large amounts of hazardous pollutants, but information on restaurant grill workers’ exposure to polycyclic aromatic hydrocarbons (PAHs) is almost inexistent. This study assessed the impact of grilling emissions on total workers’ exposure to PAHs by evaluating the concentrations of six urinary biomarkers of exposure (OHPAHs): naphthalene, acenaphthene, fluorene, phenanthrene, pyrene, and benzo(a)pyrene. Individual levels and excretion profiles of urinary OHPAHs were determined during working and nonworking periods. Urinary OHPAHs were quantified by high-performance liquid-chromatography with fluorescence detection. Levels of total OHPAHs (∑OHPAHs) were significantly increased (about nine times; p ≤ 0.001) during working comparatively with nonworking days. Urinary 1-hydroxynaphthalene + 1-hydroxyacenapthene and 2-hydroxyfluorene presented the highest increments (ca. 23- and 6-fold increase, respectively), followed by 1-hydroxyphenanthrene (ca. 2.3 times) and 1-hydroxypyrene (ca. 1.8 times). Additionally, 1-hydroxypyrene levels were higher than the benchmark, 0.5 µmol/mol creatinine, in 5% of exposed workers. Moreover, 3-hydroxybenzo(a)pyrene, biomarker of exposure to carcinogenic PAHs, was detected in 13% of exposed workers. Individual excretion profiles showed a cumulative increase in ∑OHPAHs during consecutive working days. A principal component analysis model partially discriminated workers’ exposure during working and nonworking periods showing the impact of grilling activities. Urinary OHPAHs were increased in grill workers during working days.


Sign in / Sign up

Export Citation Format

Share Document