scholarly journals Multi-fidelity prediction of molecular optical peaks with deep learning

Author(s):  
Kevin Greenman ◽  
William Green ◽  
Rafael Gómez-Bombarelli

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction, each with a trade-off between accuracy, generality, and cost. Existing theoretical methods such as time-dependent density functional theory (TD-DFT) are generalizable across chemical space because of their robust physics-based foundations but still exhibit random and systematic errors with respect to experiment despite their high computational cost. Statistical methods can achieve high accuracy at a lower cost, but data sparsity and unoptimized molecule and solvent representations often limit their ability to generalize. Here, we utilize directed message passing neural networks (D-MPNNs) to represent both dye molecules and solvents for predictions of molecular absorption peaks in solution. Additionally, we demonstrate a multi-fidelity approach based on an auxiliary model trained on over 28,000 TD-DFT calculations that further improves accuracy and generalizability, as shown through rigorous splitting strategies. Combining several openly-available experimental datasets, we benchmark these methods against a state-of-the-art regression tree algorithm and compare the D-MPNN solvent representation to several alternatives. Finally, we explore the interpretability of the learned representations using dimensionality reduction and evaluate the use of ensemble variance as an estimator of the epistemic uncertainty in our predictions of molecular peak absorption in solution. The prediction methods proposed herein can be integrated with active learning, generative modeling, and experimental workflows to enable the more rapid design of molecules with targeted optical properties.

2021 ◽  
Author(s):  
Kenneth Atz ◽  
Clemens Isert ◽  
Markus N. A. Böcker ◽  
José Jiménez-Luna ◽  
Gisbert Schneider

Many molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like molecules currently renders large-scale applications of quantum chemistry challenging. Aiming to mitigate this problem, we developed DelFTa, an open-source toolbox for the prediction of electronic properties of drug-like molecules at the density functional (DFT) level of theory, using Δ-machine-learning. Δ-Learning corrects the prediction error (Δ) of a fast but inaccurate property calculation. DelFTa employs state-of-the-art three-dimensional message-passing neural networks trained on a large dataset of QM properties. It provides access to a wide array of quantum observables on the molecular, atomic and bond levels by predicting approximations to DFT values from a low-cost semiempirical baseline. Δ-Learning outperformed its direct-learning counterpart for most of the considered QM endpoints. The results suggest that predictions for non-covalent intra- and intermolecular interactions can be extrapolated to larger biomolecular systems. The software is fully open-sourced and features documented command-line and Python APIs.


2021 ◽  
Author(s):  
◽  
Richard Kleingeld

<p>Spectroscopy is the science of utilising light in order to divine information about a molecule or system of molecules. Specifically, the absorption, emission, and scattering of different wavelengths of light can provide data about bond strength, bond order, vibrational frequency, and excitation energy [1, 2]. As the wavelength and therefore energy of the incident photons can be set by the instrument, the exact energies of absorbance or emission of the molecule can be measured. This data can be gathered experimentally using specialised equipment however some molecules resist synthesis, and so a wealth of data about many theoretically possible species eludes us. We may also want to isolate the molecule in “empty space” whereas “gas phase” measurements are not always possible. This is one place where computational chemistry comes to the fore. Using an appropriate computational method such as density functional theory (DFT), data can be theoretically derived and calculated for many interesting areas of chemistry. DFT is a computational method based on the findings of Hohenberg and Kohn in 1964 that the ground state electronic energy of a system can be determined completely by the electron density [3-6]. This means that it has a considerably higher efficiency as a computational method compared to the wave function approach, where the number of variables increases exponentially as your system increases in size, as the electron density has the same number of variables regardless of the size of the system [7]. The use of an appropriate functional to map the electron density and the energy is one of the vital choices in utilising this method, but if chosen well can provide good results with a much lower computational cost than other methods, while still accounting for electron correlation effects [8]. It has become a very popular method due to its versatility and generally good accuracy with relatively low computational expense when compared to ab initio methods [9].</p>


2020 ◽  
Author(s):  
Srilok Srinivasan ◽  
Rohit Batra ◽  
Henry Chan ◽  
Ganesh Kamath ◽  
Mathew J. Cherukara ◽  
...  

An extensive search for active therapeutic agents against the SARS-CoV-2 is being conducted across the globe. Computational docking simulations have traditionally been used for <i>in silico</i> ligand design and remain popular method of choice for high-throughput screening of therapeutic agents in the fight against COVID-19. Despite the vast chemical space (millions to billions of biomolecules) that can be potentially explored as therapeutic agents, we remain severely limited in the search of candidate compounds owing to the high computational cost of these ensemble docking simulations employed in traditional <i>in silico</i> ligand design. Here, we present a <i>de novo</i> molecular design strategy that leverages artificial intelligence to discover new therapeutic biomolecules against SARS-CoV-2. A Monte Carlo Tree Search algorithm combined with a multi-task neural network (MTNN) surrogate model for expensive docking simulations and recurrent neural networks (RNN) for rollouts, is used to sample the exhaustive SMILES space of candidate biomolecules. Using Vina scores as target objective to measure binding of therapeutic molecules to either the isolated spike protein (S-protein) of SARS-CoV-2 at its host receptor region or to the S-protein:Angiotensin converting enzyme 2 (ACE2) receptor interface, we generate several (~100's) new biomolecules that outperform FDA (~1000’s) and non-FDA biomolecules (~million) from existing databases. A transfer learning strategy is deployed to retrain the MTNN surrogate as new candidate molecules are identified - this iterative search and retrain strategy is shown to accelerate the discovery of desired candidates. We perform detailed analysis using Lipinski's rules and also analyze the structural similarities between the various top performing candidates. We spilt the molecules using a molecular fragmenting algorithm and identify the common chemical fragments and patterns – such information is important to identify moieties that are responsible for improved performance. Although we focus on therapeutic biomolecules, our AI strategy is broadly applicable for accelerated design and discovery of any chemical molecules with user-desired functionality.


2017 ◽  
Vol 41 (5) ◽  
pp. 2020-2028 ◽  
Author(s):  
Njemuwa Nwaji ◽  
John Mack ◽  
Jonathan Britton ◽  
Tebello Nyokong

Ball-type phthalocyanines containing heavy central metals show enhanced nonlinear optical behaviour in solution or when embedded in polymer thin films. Time dependent density functional theory (TD-DFT) calculations were used to explain the spectra.


2021 ◽  
Author(s):  
Kenneth Atz ◽  
Clemens Isert ◽  
Markus N. A. Böcker ◽  
José Jiménez-Luna ◽  
Gisbert Schneider

Certain molecular design tasks benefit from fast and accurate calculations of quantum-mechanical (QM) properties. However, the computational cost of QM methods applied to drug-like compounds currently makes large-scale applications of quantum chemistry challenging. In order to mitigate this problem, we developed DelFTa, an open-source toolbox for predicting small-molecule electronic properties at the density functional (DFT) level of theory, using the Δ-machine learning principle. DelFTa employs state-of-the-art E(3)-equivariant graph neural networks that were trained on the QMugs dataset of QM properties. It provides access to a wide array of quantum observables by predicting approximations to ωB97X-D/def2-SVP values from a GFN2-xTB semiempirical baseline. Δ-learning with DelFTa was shown to outperform direct DFT learning for most of the considered QM endpoints. The software is provided as open-source code with fully-documented command-line and Python APIs.


2012 ◽  
Vol 1414 ◽  
Author(s):  
G. Jones ◽  
M. Elliott ◽  
C. C. Matthai

ABSTRACTIn recent years, first-principle electronic structure calculations have been carried out to investigate such diverse phenomena as charge transport in molecular wires, optical properties of quantum structures and in photonics. However, at this time the prohibitive computational cost does not allow for such calculations to be easily carried out on nano-scale device structures comprising thousands of atoms. In addition, there are issues relating to the applicability of these approaches to describing the excitations that ought to be involved in charge transport.Self-consistent extended Huckel theory (SC-EHT) has proved very effective in describing the band alignment at semiconductor interfaces, and optical properties of partially covered surfaces, as well as being employed in studying the electronic states of large molecules. We have developed a non-equilibrium Greens function (NEGF) SC-EHT code that may be applied to study charge transport through molecular wires. We study the transmission of a porphyrin molecule attached via thiol linkers to gold electrodes, compare our results with those obtained from density functional theory (DFT). We have studied the influence the thiol position on the Au substrate has on the conduction and the dependence of the electron transmission on the molecular conformation. In addition, we also report on the results of some preliminary investigations studying the influence of water on the conduction pathways.


2020 ◽  
Author(s):  
Srilok Srinivasan ◽  
Rohit Batra ◽  
Henry Chan ◽  
Ganesh Kamath ◽  
Mathew J. Cherukara ◽  
...  

An extensive search for active therapeutic agents against the SARS-CoV-2 is being conducted across the globe. Computational docking simulations have traditionally been used for <i>in silico</i> ligand design and remain popular method of choice for high-throughput screening of therapeutic agents in the fight against COVID-19. Despite the vast chemical space (millions to billions of biomolecules) that can be potentially explored as therapeutic agents, we remain severely limited in the search of candidate compounds owing to the high computational cost of these ensemble docking simulations employed in traditional <i>in silico</i> ligand design. Here, we present a <i>de novo</i> molecular design strategy that leverages artificial intelligence to discover new therapeutic biomolecules against SARS-CoV-2. A Monte Carlo Tree Search algorithm combined with a multi-task neural network (MTNN) surrogate model for expensive docking simulations and recurrent neural networks (RNN) for rollouts, is used to sample the exhaustive SMILES space of candidate biomolecules. Using Vina scores as target objective to measure binding of therapeutic molecules to either the isolated spike protein (S-protein) of SARS-CoV-2 at its host receptor region or to the S-protein:Angiotensin converting enzyme 2 (ACE2) receptor interface, we generate several (~100's) new biomolecules that outperform FDA (~1000’s) and non-FDA biomolecules (~million) from existing databases. A transfer learning strategy is deployed to retrain the MTNN surrogate as new candidate molecules are identified - this iterative search and retrain strategy is shown to accelerate the discovery of desired candidates. We perform detailed analysis using Lipinski's rules and also analyze the structural similarities between the various top performing candidates. We spilt the molecules using a molecular fragmenting algorithm and identify the common chemical fragments and patterns – such information is important to identify moieties that are responsible for improved performance. Although we focus on therapeutic biomolecules, our AI strategy is broadly applicable for accelerated design and discovery of any chemical molecules with user-desired functionality.


2022 ◽  
Author(s):  
Kevin P Greenman ◽  
William H. Green ◽  
Rafael Gomez-Bombarelli

Optical properties are central to molecular design for many applications, including solar cells and biomedical imaging. A variety of ab initio and statistical methods have been developed for their prediction,...


2016 ◽  
Vol 64 (1) ◽  
pp. 77-81 ◽  
Author(s):  
M Alauddin ◽  
MM Islam ◽  
MK Hasan ◽  
T Bredow ◽  
MA Aziz

The structural, spectroscopic (IR, NMR and UVis) UV-Vis) and optical properties of adenine (6-aminopurine, C5H5N5) are investigated theoretically using HF/DFT hybrid approach B3LYP. The calculated results are compared with available experimental data. The optimized bond distances and bond angles are converged within ±0.01 Å and ±0.8° with respect to the experimental values. The investigation of1H NMR chemical shift spectra of the aromatic C-H protons shows that the maximum deviation of the calculated chemical shift is ~ 0.53 ppm compared to the experimental data. The calculated vibrational spectra analysis shows four distinct IR active mode of vibrations which are assigned as scissoring vibration of –NH2, symmetric stretching vibration of, –NH2, free –NH vibration and anti-symmetric stretching vibration of–NH2, respectively The electronic and optical properties are calculated by Time Dependent Density Functional Theory (TD-DFT) approach. A reasonable agreement is obtained for the calculated optical absorption energy with the experimental value. Dhaka Univ. J. Sci. 64(1): 77-81, 2016 (January)


2021 ◽  
Author(s):  
◽  
Richard Kleingeld

<p>Spectroscopy is the science of utilising light in order to divine information about a molecule or system of molecules. Specifically, the absorption, emission, and scattering of different wavelengths of light can provide data about bond strength, bond order, vibrational frequency, and excitation energy [1, 2]. As the wavelength and therefore energy of the incident photons can be set by the instrument, the exact energies of absorbance or emission of the molecule can be measured. This data can be gathered experimentally using specialised equipment however some molecules resist synthesis, and so a wealth of data about many theoretically possible species eludes us. We may also want to isolate the molecule in “empty space” whereas “gas phase” measurements are not always possible. This is one place where computational chemistry comes to the fore. Using an appropriate computational method such as density functional theory (DFT), data can be theoretically derived and calculated for many interesting areas of chemistry. DFT is a computational method based on the findings of Hohenberg and Kohn in 1964 that the ground state electronic energy of a system can be determined completely by the electron density [3-6]. This means that it has a considerably higher efficiency as a computational method compared to the wave function approach, where the number of variables increases exponentially as your system increases in size, as the electron density has the same number of variables regardless of the size of the system [7]. The use of an appropriate functional to map the electron density and the energy is one of the vital choices in utilising this method, but if chosen well can provide good results with a much lower computational cost than other methods, while still accounting for electron correlation effects [8]. It has become a very popular method due to its versatility and generally good accuracy with relatively low computational expense when compared to ab initio methods [9].</p>


Sign in / Sign up

Export Citation Format

Share Document