scholarly journals Topological representations of crystalline compounds for the machine-learning prediction of materials properties

2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Yi Jiang ◽  
Dong Chen ◽  
Xin Chen ◽  
Tangyi Li ◽  
Guo-Wei Wei ◽  
...  

AbstractAccurate theoretical predictions of desired properties of materials play an important role in materials research and development. Machine learning (ML) can accelerate the materials design by building a model from input data. For complex datasets, such as those of crystalline compounds, a vital issue is how to construct low-dimensional representations for input crystal structures with chemical insights. In this work, we introduce an algebraic topology-based method, called atom-specific persistent homology (ASPH), as a unique representation of crystal structures. The ASPH can capture both pairwise and many-body interactions and reveal the topology-property relationship of a group of atoms at various scales. Combined with composition-based attributes, ASPH-based ML model provides a highly accurate prediction of the formation energy calculated by density functional theory (DFT). After training with more than 30,000 different structure types and compositions, our model achieves a mean absolute error of 61 meV/atom in cross-validation, which outperforms previous work such as Voronoi tessellations and Coulomb matrix method using the same ML algorithm and datasets. Our results indicate that the proposed topology-based method provides a powerful computational tool for predicting materials properties compared to previous works.

2021 ◽  
Author(s):  
Onur Çaylak ◽  
Björn Baumeier

<div> <div> <div> <p>We present a ∆-Machine Learning approach for the prediction of GW quasiparticle energies (∆MLQP) and photoelectron spectra of molecules and clusters, using orbital-sensitive graph-based representations in kernel ridge regression based supervised learning. Coulomb matrix, Bag-of-Bonds, and Bonds-Angles-Torsions representations are made orbital-sensitive by augmenting them with atom-centered orbital charges and Kohn–Sham orbital energies, which are both readily available from baseline calculations on the level of density-functional theory (DFT). We first illustrate the effects of different constructions of the orbital-sensitive representations (OSR) on the prediction of frontier orbital energies of 22K molecules of the QM8 dataset, and show that is is possible to predict the full photoelectron spectrum of molecules within the dataset using a single model with a mean-absolute error below 0.1eV. We further demonstrate that the OSR-based ∆MLQP captures the effects of intra- and intermolecular conformations in application to water monomers and dimers. Finally, we show that the approach can be embedded in multiscale simulation workflows, by studying the solvatochromic shifts of quasiparticle and electron-hole excitation energies of solvated acetone in a setup combining Molecular Dynamics, DFT, the GW approximation and the Bethe–Salpeter Equation. Our findings suggest that the ∆MLQP model allows to predict quasiparticle energies and photoelectron spectra of molecules and clusters with GW accuracy at DFT cost. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Onur Çaylak ◽  
Björn Baumeier

<div> <div> <div> <p>We present a ∆-Machine Learning approach for the prediction of GW quasiparticle energies (∆MLQP) and photoelectron spectra of molecules and clusters, using orbital-sensitive graph-based representations in kernel ridge regression based supervised learning. Coulomb matrix, Bag-of-Bonds, and Bonds-Angles-Torsions representations are made orbital-sensitive by augmenting them with atom-centered orbital charges and Kohn–Sham orbital energies, which are both readily available from baseline calculations on the level of density-functional theory (DFT). We first illustrate the effects of different constructions of the orbital-sensitive representations (OSR) on the prediction of frontier orbital energies of 22K molecules of the QM8 dataset, and show that is is possible to predict the full photoelectron spectrum of molecules within the dataset using a single model with a mean-absolute error below 0.1eV. We further demonstrate that the OSR-based ∆MLQP captures the effects of intra- and intermolecular conformations in application to water monomers and dimers. Finally, we show that the approach can be embedded in multiscale simulation workflows, by studying the solvatochromic shifts of quasiparticle and electron-hole excitation energies of solvated acetone in a setup combining Molecular Dynamics, DFT, the GW approximation and the Bethe–Salpeter Equation. Our findings suggest that the ∆MLQP model allows to predict quasiparticle energies and photoelectron spectra of molecules and clusters with GW accuracy at DFT cost. </p> </div> </div> </div>


2021 ◽  
Author(s):  
Alain Beaudelaire Tchagang ◽  
Ahmed H. Tewfik ◽  
Julio J. Valdés

Abstract Accumulation of molecular data obtained from quantum mechanics (QM) theories such as density functional theory (DFTQM) make it possible for machine learning (ML) to accelerate the discovery of new molecules, drugs, and materials. Models that combine QM with ML (QM↔ML) have been very effective in delivering the precision of QM at the high speed of ML. In this study, we show that by integrating well-known signal processing (SP) techniques (i.e. short time Fourier transform, continuous wavelet analysis and Wigner-Ville distribution) in the QM↔ML pipeline, we obtain a powerful machinery (QM↔SP↔ML) that can be used for representation, visualization and forward design of molecules. More precisely, in this study, we show that the time-frequency-like representation of molecules encodes their structural, geometric, energetic, electronic and thermodynamic properties. This is demonstrated by using the new representation in the forward design loop as input to a deep convolutional neural networks trained on DFTQM calculations, which outputs the properties of the molecules. Tested on the QM9 dataset (composed of 133,855 molecules and 16 properties), the new QM↔SP↔ML model is able to predict the properties of molecules with a mean absolute error (MAE) below acceptable chemical accuracy (i.e. MAE < 1 Kcal/mol for total energies and MAE < 0.1 ev for orbital energies). Furthermore, the new approach performs similarly or better compared to other ML state-of-the-art techniques described in the literature. In all, in this study, we show that the new QM↔SP↔ML model represents a powerful technique for molecular forward design. All the codes and data generated and used in this study are available as supporting materials. The QM↔SP↔ML is also housed at the following website: https://github.com/TABeau/QM-SP-ML.


2021 ◽  
Author(s):  
Hillary Pan ◽  
Alex Ganose ◽  
Matthew Horton ◽  
Muratahan Aykol ◽  
Kristin Persson ◽  
...  

Coordination numbers and geometries form a theoretical framework for understanding and predicting materials properties. Algorithms to determine coordination numbers automatically are increasingly used for machine learning and automatic structural analysis. In this work, we introduce MaterialsCoord, a benchmark suite containing 56 experimentally-derived crystal structures (spanning elements, binaries, and ternary compounds) and their corresponding coordination environments as described in the research literature. We also describe CrystalNN, a novel algorithm for determining near neighbors. We compare CrystalNN against 7 existing near-neighbor algorithms on the MaterialsCoord benchmark, finding CrystalNN to perform similarly to several well-established algorithms. For each algorithm, we also assess computational demand and sensitivity towards small perturbations that mimic thermal motion. Finally, we investigate the similarity between bonding algorithms when applied to the Materials Project database. We expect that this work will aid the development of coordination prediction algorithms as well as improve structural descriptors for machine learning and other applications.


2021 ◽  
Author(s):  
Sheena Agarwal ◽  
Kavita Joshi

Abstract<br>Identifying factors that influence interactions at the surface is still an active area of research. In this study, we present the importance of analyzing bondlength activation, while interpreting Density Functional Theory (DFT) results, as yet another crucial indicator for catalytic activity. We studied the<br>adsorption of small molecules, such as O 2 , N 2 , CO, and CO 2 , on seven face-centered cubic (fcc) transition metal surfaces (M = Ag, Au, Cu, Ir, Rh, Pt, and Pd) and their commonly studied facets (100, 110, and 111). Through our DFT investigations, we highlight the absence of linear correlation between adsorption energies (E ads ) and bondlength activation (BL act ). Our study indicates the importance of evaluating both to develop a better understanding of adsorption at surfaces. We also developed a Machine Learning (ML) model trained on simple periodic table properties to predict both, E ads and BL act . Our ML model gives an accuracy of Mean Absolute Error (MAE) ∼ 0.2 eV for E ads predictions and 0.02 Å for BL act predictions. The systematic study of the ML features<br>that affect E ads and BL act further reinforces the importance of looking beyond adsorption energies to get a full picture of surface interactions with DFT.<br>


2021 ◽  
Author(s):  
Cheng-Wei Ju ◽  
Ethan French ◽  
Nadav Geva ◽  
Alexander Kohn ◽  
Zhou Lin

High-throughput virtual materials and drug discovery based on density functional theory has achieved tremendous success in recent decades, but its power on organic semiconducting molecules suffered catastrophically from the self-interaction error until the optimally tuned range-separated hybrid (OT-RSH) exchange-correlation functionals were developed. The accurate but expensive �first-principles OT-RSH transitions from a short-range (semi-)local functional to a long-range Hartree-Fock exchange at a distance characterized by the inverse of a molecule-specific, non-empirically-determined range-separation parameter (ω). In the present study, we proposed a promising stacked ensemble machine learning (SEML) model that provides an accelerated alternative of OT-RSH based on system-dependent structural and electronic configurations. We trained ML-ωPBE, the first functional in our series, using a database of 1,970 organic semiconducting molecules with sufficient structural diversity, and assessed its accuracy and efficiency using another 1,956 molecules. Compared with the �first-principles OT-ωPBE, our ML-ωPBE reached a mean absolute error of 0:00504a_0^{-1} for the optimal value of ω, reduced the computational cost for the test set by 2.66 orders of magnitude, and achieved comparable predictive powers in various optical properties.


2021 ◽  
Author(s):  
Hillary Pan ◽  
Alex Ganose ◽  
Matthew Horton ◽  
Muratahan Aykol ◽  
Kristin Persson ◽  
...  

Coordination numbers and geometries form a theoretical framework for understanding and predicting materials properties. Algorithms to determine coordination numbers automatically are increasingly used for machine learning and automatic structural analysis. In this work, we introduce MaterialsCoord, a benchmark suite containing 56 experimentally-derived crystal structures (spanning elements, binaries, and ternary compounds) and their corresponding coordination environments as described in the research literature. We also describe CrystalNN, a novel algorithm for determining near neighbors. We compare CrystalNN against 7 existing near-neighbor algorithms on the MaterialsCoord benchmark, finding CrystalNN to perform similarly to several well-established algorithms. For each algorithm, we also assess computational demand and sensitivity towards small perturbations that mimic thermal motion. Finally, we investigate the similarity between bonding algorithms when applied to the Materials Project database. We expect that this work will aid the development of coordination prediction algorithms as well as improve structural descriptors for machine learning and other applications.


2020 ◽  
Author(s):  
Hillary Pan ◽  
Alex Ganose ◽  
Matthew Horton ◽  
Muratahan Aykol ◽  
Kristin Persson ◽  
...  

Coordination numbers and geometries form a theoretical framework for understanding and predicting materials properties. Algorithms to determine coordination numbers automatically are increasingly used for machine learning and automatic structural analysis. In this work, we introduce MaterialsCoord, a benchmark suite containing 56 experimentally-derived crystal structures (spanning elements, binaries, and ternary compounds) and their corresponding coordination environments as described in the research literature. We also describe CrystalNN, a novel algorithm for determining near neighbors. We compare CrystalNN against 7 existing near-neighbor algorithms on the MaterialsCoord benchmark, finding CrystalNN to be the most accurate overall. For each algorithm, we also assess computational demand and sensitivity towards small perturbations that mimic thermal motion. Finally, we investigate the similarity between bonding algorithms when applied to the Materials Project database. We expect that this work will aid the development of coordination prediction algorithms and improve the accuracy of structural descriptors for machine learning and other applications.


2020 ◽  
Author(s):  
Olga Egorova ◽  
Roohollah Hafizi ◽  
David C. Woods ◽  
Graeme Day

The prediction of crystal structures from first principles requires highly accurate energies for large numbers of putative crystal structures. The accuracy of solid state density functional theory (DFT) calculations is often required, but hundreds or more structures can be present in the low energy region of interest, so that the associated computational costs are prohibitive. Here, we apply statistical machine learning to predict expensive hybrid functional DFT (PBE0) calculations using a multi-fidelity approach to re-evalute the energies of crystal structures predicted with an inexpensive force field. The method uses an autoregressive Gaussian process, making use of less expensive GGA DFT (PBE) calculations to bridge the gap between the force field and PBE0 energies. The method is benchmarked on the crystal structure landscapes of three small, hydrogen bonding organic molecules and shown to produce accurate predictions of energies and crystal structure ranking using small numbers of the most expensive calculations; the PBE0 energies can be predicted with errors of less than 1 kJ/mol with between 4.2-6.8% of the cost of the full calculations. As the model that we have developed is probabilistic, we discuss how the uncertainties in predicted energies impact on assessment of the energetic ranking of crystal structures.


2021 ◽  
Vol 7 (1) ◽  
Author(s):  
Gyoung S. Na ◽  
Seunghun Jang ◽  
Hyunju Chang

AbstractDopants play an important role in synthesizing materials to improve target materials properties or stabilize the materials. In particular, the dopants are essential to improve thermoelectic performances of the materials. However, existing machine learning methods cannot accurately predict the materials properties of doped materials due to severely nonlinear relations with their materials properties. Here, we propose a unified architecture of neural networks, called DopNet, to accurately predict the materials properties of the doped materials. DopNet identifies the effects of the dopants by explicitly and independently embedding the host materials and the dopants. In our evaluations, DopNet outperformed existing machine learning methods in predicting experimentally measured thermoelectric properties, and the error of DopNet in predicting a figure of merit (ZT) was 0.06 in mean absolute error. In particular, DopNet was significantly effective in an extrapolation problem that predicts ZTs of unknown materials, which is a key task to discover novel thermoelectric materials.


Sign in / Sign up

Export Citation Format

Share Document