scholarly journals Using Machine Learning to Develop a High-Performance Virtual Screening Method for Drug Design

2014 ◽  
Vol 29 (1) ◽  
pp. 194-200
Author(s):  
Masato Okada ◽  
Katsutoshi Kanamori ◽  
Shin Aoki ◽  
Hayato Ohwada
RSC Advances ◽  
2020 ◽  
Vol 10 (13) ◽  
pp. 7609-7618
Author(s):  
Jin Li ◽  
WeiChao Liu ◽  
Yongping Song ◽  
JiYi Xia

Virtual screening has become a successful alternative and complementary technique to experimental high-throughput screening technologies for drug design. This paper proposed a target-specific virtual screening method based on ensemble learning named ENS-VS.


Molecules ◽  
2019 ◽  
Vol 24 (13) ◽  
pp. 2414
Author(s):  
Weixing Dai ◽  
Dianjing Guo

Machine learning plays an important role in ligand-based virtual screening. However, conventional machine learning approaches tend to be inefficient when dealing with such problems where the data are imbalanced and features describing the chemical characteristic of ligands are high-dimensional. We here describe a machine learning algorithm LBS (local beta screening) for ligand-based virtual screening. The unique characteristic of LBS is that it quantifies the generalization ability of screening directly by a refined loss function, and thus can assess the risk of over-fitting accurately and efficiently for imbalanced and high-dimensional data in ligand-based virtual screening without the help of resampling methods such as cross validation. The robustness of LBS was demonstrated by a simulation study and tests on real datasets, in which LBS outperformed conventional algorithms in terms of screening accuracy and model interpretation. LBS was then used for screening potential activators of HIV-1 integrase multimerization in an independent compound library, and the virtual screening result was experimentally validated. Of the 25 compounds tested, six were proved to be active. The most potent compound in experimental validation showed an EC50 value of 0.71 µM.


Author(s):  
Michela Taufer ◽  
Trilce Estrada ◽  
Travis Johnston

This paper presents the survey of three algorithms to transform atomic-level molecular snapshots from molecular dynamics (MD) simulations into metadata representations that are suitable for in situ analytics based on machine learning methods. MD simulations studying the classical time evolution of a molecular system at atomic resolution are widely recognized in the fields of chemistry, material sciences, molecular biology and drug design; these simulations are one of the most common simulations on supercomputers. Next-generation supercomputers will have a dramatically higher performance than current systems, generating more data that needs to be analysed (e.g. in terms of number and length of MD trajectories). In the future, the coordination of data generation and analysis can no longer rely on manual, centralized analysis traditionally performed after the simulation is completed or on current data representations that have been defined for traditional visualization tools. Powerful data preparation phases (i.e. phases in which original row data is transformed to concise and still meaningful representations) will need to proceed data analysis phases. Here, we discuss three algorithms for transforming traditionally used molecular representations into concise and meaningful metadata representations. The transformations can be performed locally. The new metadata can be fed into machine learning methods for runtime in situ analysis of larger MD trajectories supported by high-performance computing. In this paper, we provide an overview of the three algorithms and their use for three different applications: protein–ligand docking in drug design; protein folding simulations; and protein engineering based on analytics of protein functions depending on proteins' three-dimensional structures. This article is part of a discussion meeting issue ‘Numerical algorithms for high-performance computational science’.


2019 ◽  
Vol 20 (5) ◽  
pp. 488-500 ◽  
Author(s):  
Yan Hu ◽  
Yi Lu ◽  
Shuo Wang ◽  
Mengying Zhang ◽  
Xiaosheng Qu ◽  
...  

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world&#039;s highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. </P><P> Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. </P><P> Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. </P><P> Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


2018 ◽  
Vol 15 (1) ◽  
pp. 82-88 ◽  
Author(s):  
Md. Mostafijur Rahman ◽  
Md. Bayejid Hosen ◽  
M. Zakir Hossain Howlader ◽  
Yearul Kabir

Background: 3C-like protease also called the main protease is an essential enzyme for the completion of the life cycle of Middle East Respiratory Syndrome Coronavirus. In our study we predicted compounds which are capable of inhibiting 3C-like protease, and thus inhibit the lifecycle of Middle East Respiratory Syndrome Coronavirus using in silico methods. </P><P> Methods: Lead like compounds and drug molecules which are capable of inhibiting 3C-like protease was identified by structure-based virtual screening and ligand-based virtual screening method. Further, the compounds were validated through absorption, distribution, metabolism and excretion filtering. Results: Based on binding energy, ADME properties, and toxicology analysis, we finally selected 3 compounds from structure-based virtual screening (ZINC ID: 75121653, 41131653, and 67266079) having binding energy -7.12, -7.1 and -7.08 Kcal/mol, respectively and 5 compounds from ligandbased virtual screening (ZINC ID: 05576502, 47654332, 04829153, 86434515 and 25626324) having binding energy -49.8, -54.9, -65.6, -61.1 and -66.7 Kcal/mol respectively. All these compounds have good ADME profile and reduced toxicity. Among eight compounds, one is soluble in water and remaining 7 compounds are highly soluble in water. All compounds have bioavailability 0.55 on the scale of 0 to 1. Among the 5 compounds from structure-based virtual screening, 2 compounds showed leadlikeness. All the compounds showed no inhibition of cytochrome P450 enzymes, no blood-brain barrier permeability and no toxic structure in medicinal chemistry profile. All the compounds are not a substrate of P-glycoprotein. Our predicted compounds may be capable of inhibiting 3C-like protease but need some further validation in wet lab.


2018 ◽  
Vol 15 (1) ◽  
pp. 6-28 ◽  
Author(s):  
Javier Pérez-Sianes ◽  
Horacio Pérez-Sánchez ◽  
Fernando Díaz

Background: Automated compound testing is currently the de facto standard method for drug screening, but it has not brought the great increase in the number of new drugs that was expected. Computer- aided compounds search, known as Virtual Screening, has shown the benefits to this field as a complement or even alternative to the robotic drug discovery. There are different methods and approaches to address this problem and most of them are often included in one of the main screening strategies. Machine learning, however, has established itself as a virtual screening methodology in its own right and it may grow in popularity with the new trends on artificial intelligence. Objective: This paper will attempt to provide a comprehensive and structured review that collects the most important proposals made so far in this area of research. Particular attention is given to some recent developments carried out in the machine learning field: the deep learning approach, which is pointed out as a future key player in the virtual screening landscape.


2016 ◽  
Vol 11 (4) ◽  
pp. 408-420 ◽  
Author(s):  
Cândida G. Silva ◽  
Carlos J.V. Simoes ◽  
Pedro Carreiras ◽  
Rui M.M. Brito

Author(s):  
Mark Endrei ◽  
Chao Jin ◽  
Minh Ngoc Dinh ◽  
David Abramson ◽  
Heidi Poxon ◽  
...  

Rising power costs and constraints are driving a growing focus on the energy efficiency of high performance computing systems. The unique characteristics of a particular system and workload and their effect on performance and energy efficiency are typically difficult for application users to assess and to control. Settings for optimum performance and energy efficiency can also diverge, so we need to identify trade-off options that guide a suitable balance between energy use and performance. We present statistical and machine learning models that only require a small number of runs to make accurate Pareto-optimal trade-off predictions using parameters that users can control. We study model training and validation using several parallel kernels and more complex workloads, including Algebraic Multigrid (AMG), Large-scale Atomic Molecular Massively Parallel Simulator, and Livermore Unstructured Lagrangian Explicit Shock Hydrodynamics. We demonstrate that we can train the models using as few as 12 runs, with prediction error of less than 10%. Our AMG results identify trade-off options that provide up to 45% improvement in energy efficiency for around 10% performance loss. We reduce the sample measurement time required for AMG by 90%, from 13 h to 74 min.


Sign in / Sign up

Export Citation Format

Share Document