scholarly journals TorsionNet: A Deep Neural Network to Rapidly Predict Small Molecule Torsion Energy Profiles with the Accuracy of Quantum Mechanics

Author(s):  
Brajesh Rai ◽  
Vishnu Sresht ◽  
Qingyi Yang ◽  
Rayomond J. Unwalla ◽  
Meihua Tu ◽  
...  

<p></p><p>TorsionNet: A Deep Neural Network to Rapidly Predict Small Molecule Torsion Energy Profiles with the Accuracy of Quantum Mechanics </p> <p> </p> <p>Brajesh K. Rai<sup>*,1</sup>, Vishnu Sresht<sup>1</sup>, Qingyi Yang<sup>2</sup>, Ray Unwalla<sup>2</sup>, Meihua Tu<sup>2</sup>, Alan M. Mathiowetz<sup>2</sup>, and Gregory A. Bakken<sup>3</sup></p> <p><sup>1</sup>Simulation and Modeling Sciences and <sup>2</sup>Medicine Design, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States</p> <p><sup>3</sup>Digital, Pfizer, Eastern Point Road, Groton, Connecticut 06340, United States</p> <p> </p> <p> </p> <p><b>ABSTRACT</b><b> </b><b></b></p> <p>Fast and accurate assessment of small molecule dihedral energetics is crucial for molecular design and optimization in medicinal chemistry. Yet, accurate prediction of torsion energy profiles remains a challenging task as current molecular mechanics methods are limited by insufficient coverage of druglike chemical space and accurate quantum mechanical (QM) methods are too expensive. To address this limitation, we introduce TorsionNet, a deep neural network (DNN) model specifically developed to predict small molecule torsion energy profiles with QM-level accuracy. We applied active learning to identify nearly 50k fragments (with elements H, C, N, O, F, S, and Cl) that maximized the coverage of our corporate library and leveraged massively parallel cloud computing resources to perform DFT torsion scan of these fragments, generating a training dataset of 1.2 million DFT energies. By training TorsionNet on this dataset, we obtain a model that can rapidly predict the torsion energy profile of typical druglike fragments with DFT-level accuracy. Importantly, our method also provides a direct estimate of the uncertainty in the predicted profiles without any additional calculations. In this report, we show that TorsionNet can reliably identify the preferred dihedral geometries observed in crystal structures. We also present practical applications of TorsionNet that demonstrate how consideration of DNN-based strain energy leads to substantial improvement in existing lead discovery and design workflows. A benchmark dataset (TorsionNet500) comprising 500 chemically diverse fragments with DFT torsion profiles (12k DFT-optimized geometries and energies) has been created and is made freely available.</p><br><p></p>

2020 ◽  
Author(s):  
Brajesh Rai ◽  
Vishnu Sresht ◽  
Qingyi Yang ◽  
Rayomond J. Unwalla ◽  
Meihua Tu ◽  
...  

<p></p><p>TorsionNet: A Deep Neural Network to Rapidly Predict Small Molecule Torsion Energy Profiles with the Accuracy of Quantum Mechanics </p> <p> </p> <p>Brajesh K. Rai<sup>*,1</sup>, Vishnu Sresht<sup>1</sup>, Qingyi Yang<sup>2</sup>, Ray Unwalla<sup>2</sup>, Meihua Tu<sup>2</sup>, Alan M. Mathiowetz<sup>2</sup>, and Gregory A. Bakken<sup>3</sup></p> <p><sup>1</sup>Simulation and Modeling Sciences and <sup>2</sup>Medicine Design, Pfizer Worldwide Research Development and Medical, 610 Main Street, Cambridge, Massachusetts 02139, United States</p> <p><sup>3</sup>Digital, Pfizer, Eastern Point Road, Groton, Connecticut 06340, United States</p> <p> </p> <p> </p> <p><b>ABSTRACT</b><b> </b><b></b></p> <p>Fast and accurate assessment of small molecule dihedral energetics is crucial for molecular design and optimization in medicinal chemistry. Yet, accurate prediction of torsion energy profiles remains a challenging task as current molecular mechanics methods are limited by insufficient coverage of druglike chemical space and accurate quantum mechanical (QM) methods are too expensive. To address this limitation, we introduce TorsionNet, a deep neural network (DNN) model specifically developed to predict small molecule torsion energy profiles with QM-level accuracy. We applied active learning to identify nearly 50k fragments (with elements H, C, N, O, F, S, and Cl) that maximized the coverage of our corporate library and leveraged massively parallel cloud computing resources to perform DFT torsion scan of these fragments, generating a training dataset of 1.2 million DFT energies. By training TorsionNet on this dataset, we obtain a model that can rapidly predict the torsion energy profile of typical druglike fragments with DFT-level accuracy. Importantly, our method also provides a direct estimate of the uncertainty in the predicted profiles without any additional calculations. In this report, we show that TorsionNet can reliably identify the preferred dihedral geometries observed in crystal structures. We also present practical applications of TorsionNet that demonstrate how consideration of DNN-based strain energy leads to substantial improvement in existing lead discovery and design workflows. A benchmark dataset (TorsionNet500) comprising 500 chemically diverse fragments with DFT torsion profiles (12k DFT-optimized geometries and energies) has been created and is made freely available.</p><br><p></p>


2021 ◽  
Author(s):  
Noor Ahmad ◽  
Muhammad Aminu ◽  
Mohd Halim Mohd Noor

Deep learning approaches have attracted a lot of attention in the automatic detection of Covid-19 and transfer learning is the most common approach. However, majority of the pre-trained models are trained on color images, which can cause inefficiencies when fine-tuning the models on Covid-19 images which are often grayscale. To address this issue, we propose a deep learning architecture called CovidNet which requires a relatively smaller number of parameters. CovidNet accepts grayscale images as inputs and is suitable for training with limited training dataset. Experimental results show that CovidNet outperforms other state-of-the-art deep learning models for Covid-19 detection.


2021 ◽  
Author(s):  
Noor Ahmad ◽  
Muhammad Aminu ◽  
Mohd Halim Mohd Noor

Deep learning approaches have attracted a lot of attention in the automatic detection of Covid-19 and transfer learning is the most common approach. However, majority of the pre-trained models are trained on color images, which can cause inefficiencies when fine-tuning the models on Covid-19 images which are often grayscale. To address this issue, we propose a deep learning architecture called CovidNet which requires a relatively smaller number of parameters. CovidNet accepts grayscale images as inputs and is suitable for training with limited training dataset. Experimental results show that CovidNet outperforms other state-of-the-art deep learning models for Covid-19 detection.


2021 ◽  
Vol 15 (58) ◽  
pp. 308-318
Author(s):  
Tran-Hieu Nguyen ◽  
Anh-Tuan Vu

In this paper, a machine learning-based framework is developed to quickly evaluate the structural safety of trusses. Three numerical examples of a 10-bar truss, a 25-bar truss, and a 47-bar truss are used to illustrate the proposed framework. Firstly, several truss cases with different cross-sectional areas are generated by employing the Latin Hypercube Sampling method. Stresses inside truss members as well as displacements of nodes are determined through finite element analyses and obtained values are compared with design constraints. According to the constraint verification, the safety state is assigned as safe or unsafe. Members’ sectional areas and the safety state are stored as the inputs and outputs of the training dataset, respectively. Three popular machine learning classifiers including Support Vector Machine, Deep Neural Network, and Adaptive Boosting are used for evaluating the safety of structures. The comparison is conducted based on two metrics: the accuracy and the area under the ROC curve. For the two first examples, three classifiers get more than 90% of accuracy. For the 47-bar truss, the accuracies of the Support Vector Machine model and the Deep Neural Network model are lower than 70% but the Adaptive Boosting model still retains the high accuracy of approximately 98%. In terms of the area under the ROC curve, the comparative results are similar. Overall, the Adaptive Boosting model outperforms the remaining models. In addition, an investigation is carried out to show the influence of the parameters on the performance of the Adaptive Boosting model.


2016 ◽  
Vol 2016 ◽  
pp. 1-11 ◽  
Author(s):  
Wei Zheng ◽  
Desheng Hu ◽  
Jing Wang

With software’s increasing scale and complexity, software failure is inevitable. To date, although many kinds of software fault localization methods have been proposed and have had respective achievements, they also have limitations. In particular, for fault localization techniques based on machine learning, the models available in literatures are all shallow architecture algorithms. Having shortcomings like the restricted ability to express complex functions under limited amount of sample data and restricted generalization ability for intricate problems, the faults cannot be analyzed accurately via those methods. To that end, we propose a fault localization method based on deep neural network (DNN). This approach is capable of achieving the complex function approximation and attaining distributed representation for input data by learning a deep nonlinear network structure. It also shows a strong capability of learning representation from a small sized training dataset. Our DNN-based model is trained utilizing the coverage data and the results of test cases as input and we further locate the faults by testing the trained model using the virtual test suite. This paper conducts experiments on the Siemens suite and Space program. The results demonstrate that our DNN-based fault localization technique outperforms other fault localization methods like BPNN, Tarantula, and so forth.


Molecules ◽  
2020 ◽  
Vol 25 (11) ◽  
pp. 2715
Author(s):  
Marwah M.M. Madkhali ◽  
Conor D. Rankine ◽  
Thomas J. Penfold

An important consideration when developing a deep neural network (DNN) for the prediction of molecular properties is the representation of the chemical space. Herein we explore the effect of the representation on the performance of our DNN engineered to predict Fe K-edge X-ray absorption near-edge structure (XANES) spectra, and address the question: How important is the choice of representation for the local environment around an arbitrary Fe absorption site? Using two popular representations of chemical space—the Coulomb matrix (CM) and pair-distribution/radial distribution curve (RDC)—we investigate the effect that the choice of representation has on the performance of our DNN. While CM and RDC featurisation are demonstrably robust descriptors, it is possible to obtain a smaller mean squared error (MSE) between the target and estimated XANES spectra when using RDC featurisation, and converge to this state a) faster and b) using fewer data samples. This is advantageous for future extension of our DNN to other X-ray absorption edges, and for reoptimisation of our DNN to reproduce results from higher levels of theory. In the latter case, dataset sizes will be limited more strongly by the resource-intensive nature of the underlying theoretical calculations.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Neda Emami ◽  
Reza Ferdousi

AbstractAptamers are short oligonucleotides (DNA/RNA) or peptide molecules that can selectively bind to their specific targets with high specificity and affinity. As a powerful new class of amino acid ligands, aptamers have high potentials in biosensing, therapeutic, and diagnostic fields. Here, we present AptaNet—a new deep neural network—to predict the aptamer–protein interaction pairs by integrating features derived from both aptamers and the target proteins. Aptamers were encoded by using two different strategies, including k-mer and reverse complement k-mer frequency. Amino acid composition (AAC) and pseudo amino acid composition (PseAAC) were applied to represent target information using 24 physicochemical and conformational properties of the proteins. To handle the imbalance problem in the data, we applied a neighborhood cleaning algorithm. The predictor was constructed based on a deep neural network, and optimal features were selected using the random forest algorithm. As a result, 99.79% accuracy was achieved for the training dataset, and 91.38% accuracy was obtained for the testing dataset. AptaNet achieved high performance on our constructed aptamer-protein benchmark dataset. The results indicate that AptaNet can help identify novel aptamer–protein interacting pairs and build more-efficient insights into the relationship between aptamers and proteins. Our benchmark dataset and the source codes for AptaNet are available in: https://github.com/nedaemami/AptaNet.


2022 ◽  
Author(s):  
Sumit Tewari ◽  
Sahar Yousefi ◽  
Andrew G Webb

Abstract We present a combination of a CNN-based encoder with an analytical forward map for solving inverse problems. We call it an encoder-analytic (EA) hybrid model. It does not require a dedicated training dataset and can train itself from the connected forward map in a direct learning fashion. A separate regularization term is not required either, since the forward map also acts as a regularizer. As it is not a generalization model it does not suffer from overfitting. We further show that the model can be customized to either finding a specific target solution or one that follows a given heuristic. As an example, we apply this approach to the design of a multi-element surface magnet for low-field magnetic resonance imaging (MRI). We further show that the EA model can outperform the benchmark genetic algorithm model currently used for magnet design in MRI, obtaining almost 10 times better results.


2021 ◽  
Author(s):  
Lachlan Webb ◽  
Minna Kauppila ◽  
James A Roberts ◽  
Sampsa Vanhatalo ◽  
Nathan Stevenson

Background and Objective: To develop a computational algorithm that detects and identifies different artefact types in neonatal electroencephalography (EEG) signals. Methods: As part of a larger algorithm, we trained a Residual Deep Neural Network on expert human annotations of EEG recordings from 79 term infants recorded in a neonatal intensive care unit (112 h of 18-channel recording). The network was trained using 10 fold cross validation in Matlab. Artefact types included: device interference, EMG, movement, electrode pop, and non-cortical biological rhythms. Performance was assessed by prediction statistics and further validated on a separate independent dataset of 13 term infants (143 h of 3-channel recording). EEG pre-processing steps, and other post-processing steps such as averaging probability over a temporal window, were also included in the algorithm. Results: The Residual Deep Neural Network showed high accuracy (95%) when distinguishing periods of clean, artefact-free EEG from any kind of artefact, with a median accuracy for individual patient of 91% (IQR: 81%-96%). The accuracy in identifying the five different types of artefacts ranged from 57%-92%, with electrode pop being the hardest to detect and EMG being the easiest. This reflected the proportion of artefact available in the training dataset. Misclassification as clean was low for each artefact type, ranging from 1%-11%. The detection accuracy was lower on the validation set (87%). We used the algorithm to show that EEG channels located near the vertex were the least susceptible to artefact. Conclusion: Artefacts can be accurately and reliably identified in the neonatal EEG using a deep learning algorithm. Artefact detection algorithms can provide continuous bedside quality assessment and support EEG review by clinicians or analysis algorithms.


2021 ◽  
Author(s):  
Callum Newman ◽  
Jon Petzing ◽  
Yee Mey Goh ◽  
Laura Justham

Artificial intelligence in computer vision has focused on improving test performance using techniques and architectures related to deep neural networks. However, improvements can also be achieved by carefully selecting the training dataset images. Environmental factors, such as light intensity, affect the image’s appearance and by choosing optimal factor levels the neural network’s performance can improve. However, little research into processes which help identify optimal levels is available. This research presents a case study which uses a process for developing an optimised dataset for training an object detection neural network. Images are gathered under controlled conditions using multiple factors to construct various training datasets. Each dataset is used to train the same neural network and the test performance compared to identify the optimal factors. The opportunity to use synthetic images is introduced, which has many advantages including creating images when real-world images are unavailable, and more easily controlled factors.


Sign in / Sign up

Export Citation Format

Share Document