A GPU-Accelerated Machine Learning Framework for Molecular Simulation: Hoomd-Blue with TensorFlow

Author(s):  
Rainier Barrett ◽  
Maghesree Chakraborty ◽  
Dilnoza Amirkulova ◽  
Heta Gandhi ◽  
Andrew White

We have designed and implemented software that enables integration of a scalable GPU-accelerated molecular mechanics engine, Hoomd-blue, with the machine learning (ML) TensorFlow package. TensorFlow is a GPU accelerated, scalable, graph-based tensor computation model building package that has been the implementation of many recent innovations in deep learning and other ML tasks. Tensor computation graphs allow for designation of robust, flexible, and easily replicated computational models for a variety of tasks. Our plugin leverages the generality and speed of computational tensor graphs in TensorFlow to enable four previously challenging tasks in molecular dynamics: (1) the calculation of arbitrary force-fields including neural-network-based, stochastic, and/or automatically-generated force-fields which are differentiated from potential functions; (2) the efficient computation of arbitrary collective variables; (3) the biasing of simulations via automatic differentiation of collective variables and consequently the implementation of many free energy biasing methods; (4) ML on any of the above tasks, including coarse grain force fields, on-the-fly learned biases, and collective variable calculations. The TensorFlow models are constructed in Python and can be visualized or debugged using the rich set of tools implemented in the TensorFlow package. In this article, we present examples of the four major tasks this method can accomplish, benchmark data, and describe the architecture of our implementation. This method should lead to both the design of new models in computational chemistry research and reproducible model specification without requiring recompiling or writing low-level code. <br>

2019 ◽  
Author(s):  
Rainier Barrett ◽  
Maghesree Chakraborty ◽  
Dilnoza Amirkulova ◽  
Heta Gandhi ◽  
Andrew White

We have designed and implemented software that enables integration of a scalable GPU-accelerated molecular mechanics engine, Hoomd-blue, with the machine learning (ML) TensorFlow package. TensorFlow is a GPU accelerated, scalable, graph-based tensor computation model building package that has been the implementation of many recent innovations in deep learning and other ML tasks. Tensor computation graphs allow for designation of robust, flexible, and easily replicated computational models for a variety of tasks. Our plugin leverages the generality and speed of computational tensor graphs in TensorFlow to enable four previously challenging tasks in molecular dynamics: (1) the calculation of arbitrary force-fields including neural-network-based, stochastic, and/or automatically-generated force-fields which are differentiated from potential functions; (2) the efficient computation of arbitrary collective variables; (3) the biasing of simulations via automatic differentiation of collective variables and consequently the implementation of many free energy biasing methods; (4) ML on any of the above tasks, including coarse grain force fields, on-the-fly learned biases, and collective variable calculations. The TensorFlow models are constructed in Python and can be visualized or debugged using the rich set of tools implemented in the TensorFlow package. In this article, we present examples of the four major tasks this method can accomplish, benchmark data, and describe the architecture of our implementation. This method should lead to both the design of new models in computational chemistry research and reproducible model specification without requiring recompiling or writing low-level code. <br>


2019 ◽  
Author(s):  
Rainier Barrett ◽  
Maghesree Chakraborty ◽  
Dilnoza Amirkulova ◽  
Heta Gandhi ◽  
Andrew White

<div> <div> <div> <p>As interest grows in applying machine learning force-fields and methods to molecular simulation, there is a need for state-of-the-art inference methods to use trained models within efficient molecular simulation engines. We have designed and implemented software that enables integration of a scalable GPU-accelerated molecular mechanics engine, HOOMD-blue, with the machine learning (ML) TensorFlow package. TensorFlow is a GPU-accelerated, scalable, graph-based tensor computation model building package that has been the implementation of many recent innovations in deep learning and other ML tasks. TensorFlow models are constructed in Python and can be visualized or debugged using the rich set of tools implemented in the TensorFlow package. In this article, we present four major examples of tasks this software can accomplish which would normally require multiple different tools: (1) we train a neural network to reproduce a force field of a Lennard-Jones simulation; (2) we perform online force matching of methanol; (3) we compute the maximum entropy bias of a Lennard-Jones collective variable; (4) we calculate the scattering profile of an ongoing TIP4P water molecular dynamics simulation. This work should accelerate both the design of new neural network based models in computational chemistry research and reproducible model specification by leveraging a widely-used ML package.</p></div></div></div>


Author(s):  
Rainier Barrett ◽  
Maghesree Chakraborty ◽  
Dilnoza Amirkulova ◽  
Heta Gandhi ◽  
Andrew White

<div> <div> <div> <p>As interest grows in applying machine learning force-fields and methods to molecular simulation, there is a need for state-of-the-art inference methods to use trained models within efficient molecular simulation engines. We have designed and implemented software that enables integration of a scalable GPU-accelerated molecular mechanics engine, HOOMD-blue, with the machine learning (ML) TensorFlow package. TensorFlow is a GPU-accelerated, scalable, graph-based tensor computation model building package that has been the implementation of many recent innovations in deep learning and other ML tasks. TensorFlow models are constructed in Python and can be visualized or debugged using the rich set of tools implemented in the TensorFlow package. In this article, we present four major examples of tasks this software can accomplish which would normally require multiple different tools: (1) we train a neural network to reproduce a force field of a Lennard-Jones simulation; (2) we perform online force matching of methanol; (3) we compute the maximum entropy bias of a Lennard-Jones collective variable; (4) we calculate the scattering profile of an ongoing TIP4P water molecular dynamics simulation. This work should accelerate both the design of new neural network based models in computational chemistry research and reproducible model specification by leveraging a widely-used ML package.</p></div></div></div>


2019 ◽  
Author(s):  
Qiannan Duan ◽  
Jianchao Lee ◽  
Jinhong Gao ◽  
Jiayuan Chen ◽  
Yachao Lian ◽  
...  

<p>Machine learning (ML) has brought significant technological innovations in many fields, but it has not been widely embraced by most researchers of natural sciences to date. Traditional understanding and promotion of chemical analysis cannot meet the definition and requirement of big data for running of ML. Over the years, we focused on building a more versatile and low-cost approach to the acquisition of copious amounts of data containing in a chemical reaction. The generated data meet exclusively the thirst of ML when swimming in the vast space of chemical effect. As proof in this study, we carried out a case for acute toxicity test throughout the whole routine, from model building, chip preparation, data collection, and ML training. Such a strategy will probably play an important role in connecting ML with much research in natural science in the future.</p>


2020 ◽  
Vol 27 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Camila Rizzotto ◽  
Walter Filgueira de Azevedo Junior

Background: Analysis of atomic coordinates of protein-ligand complexes can provide three-dimensional data to generate computational models to evaluate binding affinity and thermodynamic state functions. Application of machine learning techniques can create models to assess protein-ligand potential energy and binding affinity. These methods show superior predictive performance when compared with classical scoring functions available in docking programs. Objective: Our purpose here is to review the development and application of the program SAnDReS. We describe the creation of machine learning models to assess the binding affinity of protein-ligand complexes. Method: SAnDReS implements machine learning methods available in the scikit-learn library. This program is available for download at https://github.com/azevedolab/sandres. SAnDReS uses crystallographic structures, binding, and thermodynamic data to create targeted scoring functions. Results: Recent applications of the program SAnDReS to drug targets such as Coagulation factor Xa, cyclin-dependent kinases, and HIV-1 protease were able to create targeted scoring functions to predict inhibition of these proteins. These targeted models outperform classical scoring functions. Conclusion: Here, we reviewed the development of machine learning scoring functions to predict binding affinity through the application of the program SAnDReS. Our studies show the superior predictive performance of the SAnDReS-developed models when compared with classical scoring functions available in the programs such as AutoDock4, Molegro Virtual Docker, and AutoDock Vina.


2020 ◽  
Vol 20 (14) ◽  
pp. 1375-1388 ◽  
Author(s):  
Patnala Ganga Raju Achary

The scientists, and the researchers around the globe generate tremendous amount of information everyday; for instance, so far more than 74 million molecules are registered in Chemical Abstract Services. According to a recent study, at present we have around 1060 molecules, which are classified as new drug-like molecules. The library of such molecules is now considered as ‘dark chemical space’ or ‘dark chemistry.’ Now, in order to explore such hidden molecules scientifically, a good number of live and updated databases (protein, cell, tissues, structure, drugs, etc.) are available today. The synchronization of the three different sciences: ‘genomics’, proteomics and ‘in-silico simulation’ will revolutionize the process of drug discovery. The screening of a sizable number of drugs like molecules is a challenge and it must be treated in an efficient manner. Virtual screening (VS) is an important computational tool in the drug discovery process; however, experimental verification of the drugs also equally important for the drug development process. The quantitative structure-activity relationship (QSAR) analysis is one of the machine learning technique, which is extensively used in VS techniques. QSAR is well-known for its high and fast throughput screening with a satisfactory hit rate. The QSAR model building involves (i) chemo-genomics data collection from a database or literature (ii) Calculation of right descriptors from molecular representation (iii) establishing a relationship (model) between biological activity and the selected descriptors (iv) application of QSAR model to predict the biological property for the molecules. All the hits obtained by the VS technique needs to be experimentally verified. The present mini-review highlights: the web-based machine learning tools, the role of QSAR in VS techniques, successful applications of QSAR based VS leading to the drug discovery and advantages and challenges of QSAR.


Author(s):  
William B. Rouse

This book discusses the use of models and interactive visualizations to explore designs of systems and policies in determining whether such designs would be effective. Executives and senior managers are very interested in what “data analytics” can do for them and, quite recently, what the prospects are for artificial intelligence and machine learning. They want to understand and then invest wisely. They are reasonably skeptical, having experienced overselling and under-delivery. They ask about reasonable and realistic expectations. Their concern is with the futurity of decisions they are currently entertaining. They cannot fully address this concern empirically. Thus, they need some way to make predictions. The problem is that one rarely can predict exactly what will happen, only what might happen. To overcome this limitation, executives can be provided predictions of possible futures and the conditions under which each scenario is likely to emerge. Models can help them to understand these possible futures. Most executives find such candor refreshing, perhaps even liberating. Their job becomes one of imagining and designing a portfolio of possible futures, assisted by interactive computational models. Understanding and managing uncertainty is central to their job. Indeed, doing this better than competitors is a hallmark of success. This book is intended to help them understand what fundamentally needs to be done, why it needs to be done, and how to do it. The hope is that readers will discuss this book and develop a “shared mental model” of computational modeling in the process, which will greatly enhance their chances of success.


2020 ◽  
Vol 41 (S1) ◽  
pp. s521-s522
Author(s):  
Debarka Sengupta ◽  
Vaibhav Singh ◽  
Seema Singh ◽  
Dinesh Tewari ◽  
Mudit Kapoor ◽  
...  

Background: The rising trend of antibiotic resistance imposes a heavy burden on healthcare both clinically and economically (US$55 billion), with 23,000 estimated annual deaths in the United States as well as increased length of stay and morbidity. Machine-learning–based methods have, of late, been used for leveraging patient’s clinical history and demographic information to predict antimicrobial resistance. We developed a machine-learning model ensemble that maximizes the accuracy of such a drug-sensitivity versus resistivity classification system compared to the existing best-practice methods. Methods: We first performed a comprehensive analysis of the association between infecting bacterial species and patient factors, including patient demographics, comorbidities, and certain healthcare-specific features. We leveraged the predictable nature of these complex associations to infer patient-specific antibiotic sensitivities. Various base-learners, including k-NN (k-nearest neighbors) and gradient boosting machine (GBM), were used to train an ensemble model for confident prediction of antimicrobial susceptibilities. Base learner selection and model performance evaluation was performed carefully using a variety of standard metrics, namely accuracy, precision, recall, F1 score, and Cohen &kappa;. Results: For validating the performance on MIMIC-III database harboring deidentified clinical data of 53,423 distinct patient admissions between 2001 and 2012, in the intensive care units (ICUs) of the Beth Israel Deaconess Medical Center in Boston, Massachusetts. From ~11,000 positive cultures, we used 4 major specimen types namely urine, sputum, blood, and pus swab for evaluation of the model performance. Figure 1 shows the receiver operating characteristic (ROC) curves obtained for bloodstream infection cases upon model building and prediction on 70:30 split of the data. We received area under the curve (AUC) values of 0.88, 0.92, 0.92, and 0.94 for urine, sputum, blood, and pus swab samples, respectively. Figure 2 shows the comparative performance of our proposed method as well as some off-the-shelf classification algorithms. Conclusions: Highly accurate, patient-specific predictive antibiogram (PSPA) data can aid clinicians significantly in antibiotic recommendation in ICU, thereby accelerating patient recovery and curbing antimicrobial resistance.Funding: This study was supported by Circle of Life Healthcare Pvt. Ltd.Disclosures: None


Author(s):  
Mythili K. ◽  
Manish Narwaria

Quality assessment of audiovisual (AV) signals is important from the perspective of system design, optimization, and management of a modern multimedia communication system. However, automatic prediction of AV quality via the use of computational models remains challenging. In this context, machine learning (ML) appears to be an attractive alternative to the traditional approaches. This is especially when such assessment needs to be made in no-reference (i.e., the original signal is unavailable) fashion. While development of ML-based quality predictors is desirable, we argue that proper assessment and validation of such predictors is also crucial before they can be deployed in practice. To this end, we raise some fundamental questions about the current approach of ML-based model development for AV quality assessment and signal processing for multimedia communication in general. We also identify specific limitations associated with the current validation strategy which have implications on analysis and comparison of ML-based quality predictors. These include a lack of consideration of: (a) data uncertainty, (b) domain knowledge, (c) explicit learning ability of the trained model, and (d) interpretability of the resultant model. Therefore, the primary goal of this article is to shed some light into mentioned factors. Our analysis and proposed recommendations are of particular importance in the light of significant interests in ML methods for multimedia signal processing (specifically in cases where human-labeled data is used), and a lack of discussion of mentioned issues in existing literature.


Sign in / Sign up

Export Citation Format

Share Document