scholarly journals Materials Precursor Score: Modelling Chemists' Intuition for the Synthetic Accessibility of Porous Organic Cages

Author(s):  
Steven Bennett ◽  
Filip Szczypiński ◽  
Lukas Turcani ◽  
Michael Briggs ◽  
Rebecca L. Greenaway ◽  
...  

<div>Computation is increasingly being used to try to accelerate the discovery of new materials. One specific example of this is porous molecular materials, specifically porous organic cages, where the porosity of the materials predominantly comes from the internal cavities of the molecules themselves. The computational discovery of novel structures with useful properties is currently hindered by the difficulty in transitioning from a computational prediction to synthetic realisation. Attempts at experimental validation are often time-consuming, expensive and, frequently, the key bottleneck of material discovery. In this work, we developed a computational screening workflow for porous molecules that includes consideration of the synthetic difficulty of material precursors, aimed at easing the transition between computational prediction and experimental realisation. We trained a machine learning model by first collecting data on 12,553 molecules categorised either as `easy-to-synthesise' or `difficult-to-synthesise' by expert chemists with years of experience in organic synthesis. We used an approach to address the class imbalance present in our dataset, producing a binary classifier able to categorise easy-to-synthesise molecules with few false positives. We then used our model during computational screening for porous organic molecules to bias towards precursors whose easier synthesis requirements would make them promising candidates for experimental realisation and material development. We found that even by limiting precursors to those that are easier-to-synthesise, we are still able to identify cages with favourable, and even some rare, properties. </div>

2021 ◽  
Author(s):  
Steven Bennett ◽  
Filip Szczypiński ◽  
Lukas Turcani ◽  
Michael Briggs ◽  
Rebecca L. Greenaway ◽  
...  

<div>Computation is increasingly being used to try to accelerate the discovery of new materials. One specific example of this is porous molecular materials, specifically porous organic cages, where the porosity of the materials predominantly comes from the internal cavities of the molecules themselves. The computational discovery of novel structures with useful properties is currently hindered by the difficulty in transitioning from a computational prediction to synthetic realisation. Attempts at experimental validation are often time-consuming, expensive and, frequently, the key bottleneck of material discovery. In this work, we developed a computational screening workflow for porous molecules that includes consideration of the synthetic difficulty of material precursors, aimed at easing the transition between computational prediction and experimental realisation. We trained a machine learning model by first collecting data on 12,553 molecules categorised either as `easy-to-synthesise' or `difficult-to-synthesise' by expert chemists with years of experience in organic synthesis. We used an approach to address the class imbalance present in our dataset, producing a binary classifier able to categorise easy-to-synthesise molecules with few false positives. We then used our model during computational screening for porous organic molecules to bias towards precursors whose easier synthesis requirements would make them promising candidates for experimental realisation and material development. We found that even by limiting precursors to those that are easier-to-synthesise, we are still able to identify cages with favourable, and even some rare, properties. </div>


2017 ◽  
Vol 121 (28) ◽  
pp. 15211-15222 ◽  
Author(s):  
Marcin Miklitz ◽  
Shan Jiang ◽  
Rob Clowes ◽  
Michael E. Briggs ◽  
Andrew I. Cooper ◽  
...  

2016 ◽  
Vol 7 (2) ◽  
pp. 43-71 ◽  
Author(s):  
Sangeeta Lal ◽  
Neetu Sardana ◽  
Ashish Sureka

Logging is an important yet tough decision for OSS developers. Machine-learning models are useful in improving several steps of OSS development, including logging. Several recent studies propose machine-learning models to predict logged code construct. The prediction performances of these models are limited due to the class-imbalance problem since the number of logged code constructs is small as compared to non-logged code constructs. No previous study analyzes the class-imbalance problem for logged code construct prediction. The authors first analyze the performances of J48, RF, and SVM classifiers for catch-blocks and if-blocks logged code constructs prediction on imbalanced datasets. Second, the authors propose LogIm, an ensemble and threshold-based machine-learning model. Third, the authors evaluate the performance of LogIm on three open-source projects. On average, LogIm model improves the performance of baseline classifiers, J48, RF, and SVM, by 7.38%, 9.24%, and 4.6% for catch-blocks, and 12.11%, 14.95%, and 19.13% for if-blocks logging prediction.


2014 ◽  
Vol 5 (4) ◽  
pp. 1493-1505 ◽  
Author(s):  
Christopher Collins ◽  
Matthew S. Dyer ◽  
Antoine Demont ◽  
Philip A. Chater ◽  
Michael F. Thomas ◽  
...  

Computational screening of potential substitution species and sites in YBa2Fe3−xMxO8 predicted that Mn substitution at x = 1 should be possible. Experimental synthesis and characterization of Y1.175Ba1.825Fe2MnO8 confirms this prediction.


2020 ◽  
Author(s):  
Prashun Gorai ◽  
Alex Ganose ◽  
Alireza Faghaninia ◽  
Anubhav Jain ◽  
Vladan Stevanovic

<div> <div> <div> <p>Computational prediction of good thermoelectric (TE) performance in several n-type doped Zintl phases, combined with successful experimental realization, has sparked interest in discovering new n-type dopable members of this family of materials. However, most known Zintls are typically only p-type dopable; prior successes in finding n-type Zintl phases have been largely serendipitous. Here, we go beyond previously synthesized Zintl phases and perform chemical substitutions in known n-type dopable ABX Zintl phases to discover new ones. We use first-principles calculations to predict their stability, potential for TE performance as well as their n-type dopability. Using this approach, we find 17 new ABX Zintl phases in the KSnSb structure type that are predicted to be stable. Several of these newly pre- dicted phases (KSnBi, RbSnBi, NaGeP) are predicted to exhibit promising n-type TE performance and are n-type dopable. We propose these compounds for further experimental studies, especially KSnBi and RbSnBi, which are both predicted to be good TE materials with high electron concentrations due to self-doping by native defects, when grown under alkali-rich conditions. </p> </div> </div> </div>


2021 ◽  
Author(s):  
RUIMIN MA ◽  
Hanfeng Zhang ◽  
Tengfei Luo

Developing amorphous polymers with desirable thermal conductivity has significant implications, as they are ubiquitous in applications where thermal transport is critical. Conventional Edisonian approaches are slow and without guarantee of success in material development. In this work, using a reinforcement learning scheme, we design polymers with thermal conductivity above 0.4 W/m- K. We leverage a machine learning model trained against 469 thermal conductivity data calculated from high-throughput molecular dynamics (MD) simulations as the surrogate for thermal conductivity prediction, and we use a recurrent neural network trained with around one million virtual polymer structures as a polymer generator. For all newly generated polymers with thermal conductivity > 0.400 W/m-K, we have evaluated their synthesizability by calculating the synthesis accessibility score and validated the thermal conductivity of selected polymers using MD simulations. The best thermally conductive polymer designed has a MD-calculated thermal conductivity of 0.693 W/m-K, which is also estimated to be easily synthesizable. Our demonstrated inverse design scheme based on reinforcement learning may advance polymer development with target properties, and the scheme can also be generalized to other materials development tasks for different applications.


2022 ◽  
Vol 9 (1) ◽  
pp. 0-0

This article investigates the impact of data-complexity and team-specific characteristics on machine learning competition scores. Data from five real-world binary classification competitions hosted on Kaggle.com were analyzed. The data-complexity characteristics were measured in four aspects including standard measures, sparsity measures, class imbalance measures, and feature-based measures. The results showed that the higher the level of the data-complexity characteristics was, the lower the predictive ability of the machine learning model was as well. Our empirical evidence revealed that the imbalance ratio of the target variable was the most important factor and exhibited a nonlinear relationship with the model’s predictive abilities. The imbalance ratio adversely affected the predictive performance when it reached a certain level. However, mixed results were found for the impact of team-specific characteristics measured by team size, team expertise, and the number of submissions on team performance. For high-performing teams, these factors had no impact on team score.


Author(s):  
Ryther Anderson ◽  
Diego Gómez-Gualdrón

Metal-organic frameworks (MOFs) have captivated the research community due to a modular crystal structure that is tailorable for many applications. However, with millions of possible MOFs to be considered, it is challenging to identify the ideal MOF for the application of choice. Although computational screening of MOF databases has provided a fast way to evaluate MOF properties, validation experiments on predicted “exceptional” MOFs are not common due to uncertainties on the synthetic likelihood of computationally constructed MOFs, hence hindering material discovery. Aiming to leverage the perspective provided by large datasets, here we created and screened a topologically diverse database of 8,500 MOFs to interrogate whether thermodynamic stability metrics such as free energy could be used to generally predict the synthetic likelihood of computationally constructed MOFs. To this end, we first evaluated the suitability of two methods and three force fields to calculate free energies in MOFs at large scale, settling on the Frenkel-Ladd path thermodynamic integration method and the UFF4MOF force field. Upon defining a relative free energy, Δ<sub>LM</sub>F<sub>FL</sub>, that corrects for some force field artifacts specific to MOF nodes, we found that previously synthesized MOFs tended to cluster in a region below Δ<sub>LM</sub>F<sub>FL</sub> = 4.4 kJ/mol per atom, suggesting a general first filter to discriminate between synthetically likely and unlikely MOFs. However, a second filter is needed when several MOF isomorphs are below the Δ<sub>LM</sub>F<sub>FL</sub> threshold. In 84% of the cases, the synthetically accessible MOF within an isomorphic series presented the lowest predicted free energy. The present; work suggests that crystal free energies could be key to understanding synthetic likelihood for MOFs in computational databases (and MOFs in general), and that the thermodynamics stability of the fully assembled MOF often determines synthetic accessibility.


2020 ◽  
Author(s):  
Ryther Anderson ◽  
Diego Gómez-Gualdrón

Metal-organic frameworks (MOFs) have captivated the research community due to a modular crystal structure that is tailorable for many applications. However, with millions of possible MOFs to be considered, it is challenging to identify the ideal MOF for the application of choice. Although computational screening of MOF databases has provided a fast way to evaluate MOF properties, validation experiments on predicted “exceptional” MOFs are not common due to uncertainties on the synthetic likelihood of computationally constructed MOFs, hence hindering material discovery. Aiming to leverage the perspective provided by large datasets, here we created and screened a topologically diverse database of 8,500 MOFs to interrogate whether thermodynamic stability metrics such as free energy could be used to generally predict the synthetic likelihood of computationally constructed MOFs. To this end, we first evaluated the suitability of two methods and three force fields to calculate free energies in MOFs at large scale, settling on the Frenkel-Ladd path thermodynamic integration method and the UFF4MOF force field. Upon defining a relative free energy, Δ<sub>LM</sub>F<sub>FL</sub>, that corrects for some force field artifacts specific to MOF nodes, we found that previously synthesized MOFs tended to cluster in a region below Δ<sub>LM</sub>F<sub>FL</sub> = 4.4 kJ/mol per atom, suggesting a general first filter to discriminate between synthetically likely and unlikely MOFs. However, a second filter is needed when several MOF isomorphs are below the Δ<sub>LM</sub>F<sub>FL</sub> threshold. In 84% of the cases, the synthetically accessible MOF within an isomorphic series presented the lowest predicted free energy. The present; work suggests that crystal free energies could be key to understanding synthetic likelihood for MOFs in computational databases (and MOFs in general), and that the thermodynamics stability of the fully assembled MOF often determines synthetic accessibility.


Sign in / Sign up

Export Citation Format

Share Document