scholarly journals Outsmarting Quantum Chemistry Through Transfer Learning

Author(s):  
Justin S Smith ◽  
Benjamin T. Nebgen ◽  
Roman Zubatyuk ◽  
Nicholas Lubbers ◽  
Christian Devereux ◽  
...  

<div>Computer simulations are foundational to theoretical chemistry. Quantum-mechanical (QM) methods provide the highest accuracy for simulating molecules but have difficulty scaling to large systems. Empirical interatomic potentials (classical force fields) are scalable, but lack transferability to new systems and are hard to systematically improve. Automated, data-driven machine learning is close to achieving the best of both approaches. Here we use transfer learning to retrain a general purpose neural network potential, ANI-1x, on a dataset of gold standard QM calculations (CCSD(T)/CBS level) that is relatively small but designed to optimally span chemical space. The resulting potential, ANI-1ccx, approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. ANI-1ccx is broadly applicable to materials science, biology and chemistry, and billions of times faster than the parent CCSD(T)/CBS calculations.</div>

Author(s):  
Justin S Smith ◽  
Benjamin T. Nebgen ◽  
Roman Zubatyuk ◽  
Nicholas Lubbers ◽  
Christian Devereux ◽  
...  

<p>Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology and chemistry, and billions of times faster<i></i>than CCSD(T)/CBS calculations. </p>


Author(s):  
Justin S Smith ◽  
Benjamin T. Nebgen ◽  
Roman Zubatyuk ◽  
Nicholas Lubbers ◽  
Christian Devereux ◽  
...  

<p>Computational modeling of chemical and biological systems at atomic resolution is a crucial tool in the chemist's toolset. The use of computer simulations requires a balance between cost and accuracy: quantum-mechanical methods provide high accuracy but are computationally expensive and scale poorly to large systems, while classical force fields are cheap and scalable, but lack transferability to new systems. Machine learning can be used to achieve the best of both approaches. Here we train a general-purpose neural network potential (ANI-1ccx) that approaches CCSD(T)/CBS accuracy on benchmarks for reaction thermochemistry, isomerization, and drug-like molecular torsions. This is achieved by training a network to DFT data then using transfer learning techniques to retrain on a dataset of gold standard QM calculations (CCSD(T)/CBS) that optimally spans chemical space. The resulting potential is broadly applicable to materials science, biology and chemistry, and billions of times faster<i></i>than CCSD(T)/CBS calculations. </p>


2019 ◽  
Author(s):  
Qi Yuan ◽  
Alejandro Santana-Bonilla ◽  
Martijn Zwijnenburg ◽  
Kim Jelfs

<p>The chemical space for novel electronic donor-acceptor oligomers with targeted properties was explored using deep generative models and transfer learning. A General Recurrent Neural Network model was trained from the ChEMBL database to generate chemically valid SMILES strings. The parameters of the General Recurrent Neural Network were fine-tuned via transfer learning using the electronic donor-acceptor database from the Computational Material Repository to generate novel donor-acceptor oligomers. Six different transfer learning models were developed with different subsets of the donor-acceptor database as training sets. We concluded that electronic properties such as HOMO-LUMO gaps and dipole moments of the training sets can be learned using the SMILES representation with deep generative models, and that the chemical space of the training sets can be efficiently explored. This approach identified approximately 1700 new molecules that have promising electronic properties (HOMO-LUMO gap <2 eV and dipole moment <2 Debye), 6-times more than in the original database. Amongst the molecular transformations, the deep generative model has learned how to produce novel molecules by trading off between selected atomic substitutions (such as halogenation or methylation) and molecular features such as the spatial extension of the oligomer. The method can be extended as a plausible source of new chemical combinations to effectively explore the chemical space for targeted properties.</p>


2021 ◽  
Author(s):  
Tom Young ◽  
Tristan Johnston-Wood ◽  
Volker L. Deringer ◽  
Fernanda Duarte

Predictive molecular simulations require fast, accurate and reactive interatomic potentials. Machine learning offers a promising approach to construct such potentials by fitting energies and forces to high-level quantum-mechanical data, but...


2021 ◽  
Vol 18 (4) ◽  
pp. 366-368
Author(s):  
Benjamin Buchfink ◽  
Klaus Reuter ◽  
Hajk-Georg Drost

AbstractWe are at the beginning of a genomic revolution in which all known species are planned to be sequenced. Accessing such data for comparative analyses is crucial in this new age of data-driven biology. Here, we introduce an improved version of DIAMOND that greatly exceeds previous search performances and harnesses supercomputing to perform tree-of-life scale protein alignments in hours, while matching the sensitivity of the gold standard BLASTP.


Tetrahedron ◽  
1993 ◽  
Vol 49 (34) ◽  
pp. 7393 ◽  
Author(s):  
Michael J.S. Dewar ◽  
Caoxian Jie ◽  
Jianguo Yu

2021 ◽  
Author(s):  
Airat Kotliar-Shapirov ◽  
Fedor S. Fedorov ◽  
Henni Ouerdane ◽  
Stanislav Evlashin ◽  
Albert G. Nasibulin ◽  
...  

In our manuscript, we present our protocol for data processing to mitigate the effects of interfering analytes on the identification of the chemical species detected by sensors. Considering NO2 and CO2, we designed electrochemical sensors whose response yielded the cyclic voltammetry data that we analyzed to classify single-species components and their mixtures using a data-driven approach to generate a chemical space where their mixtures can be deconvoluted.<br>


2021 ◽  
Author(s):  
Adarsh Kalikadien ◽  
Evgeny A. Pidko ◽  
Vivek Sinha

<div>Local chemical space exploration of an experimentally synthesized material can be done by making slight structural</div><div>variations of the synthesized material. This generation of many molecular structures with reasonable quality,</div><div>that resemble an existing (chemical) purposeful material, is needed for high-throughput screening purposes in</div><div>material design. Large databases of geometry and chemical properties of transition metal complexes are not</div><div>readily available, although these complexes are widely used in homogeneous catalysis. A Python-based workflow,</div><div>ChemSpaX, that is aimed at automating local chemical space exploration for any type of molecule, is introduced.</div><div>The overall computational workflow of ChemSpaX is explained in more detail. ChemSpaX uses 3D information,</div><div>to place functional groups on an input structure. For example, the input structure can be a catalyst for which one</div><div>wants to use high-throughput screening to investigate if the catalytic activity can be improved. The newly placed</div><div>substituents are optimized using a computationally cheap force-field optimization method. After placement of</div><div>new substituents, higher level optimizations using xTB or DFT instead of force-field optimization are also possible</div><div>in the current workflow. In representative applications of ChemSpaX, it is shown that the structures generated by</div><div>ChemSpaX have a reasonable quality for usage in high-throughput screening applications. Representative applications</div><div>of ChemSpaX are shown by investigating various adducts on functionalized Mn-based pincer complexes,</div><div>hydrogenation of Ru-based pincer complexes, functionalization of cobalt porphyrin complexes and functionalization</div><div>of a bipyridyl functionalized cobalt-porphyrin trapped in a M2L4 type cage complex. Descriptors such as</div><div>the Gibbs free energy of reaction and HOMO-LUMO gap, that can be used in data-driven design and discovery</div><div>of catalysts, were selected and studied in more detail for the selected use cases. The relatively fast GFN2-xTB</div><div>method was used to calculate these descriptors and a comparison was done against DFT calculated descriptors.</div><div>ChemSpaX is open-source and aims to bolster the efforts of the scientific community towards data-driven material</div><div>discovery.</div>


Sign in / Sign up

Export Citation Format

Share Document