Computational methods for training set selection and error assessment applied to catalyst design: guidelines for deciding which reactions to run first and which to run next

Author(s):  
Andrew F. Zahrt ◽  
Brennan T. Rose ◽  
William T. Darrow ◽  
Jeremy J. Henle ◽  
Scott E. Denmark

Different subset selection methods are examined to guide catalyst selection in optimization campaigns. Error assessment methods are used to quantitatively inform selection of new catalyst candidates from in silico libraries of catalyst structures.

2020 ◽  
Author(s):  
Scott Denmark ◽  
Andrew Zahrt ◽  
William Darrow ◽  
Brennan Rose ◽  
Jeremy Henle

The application of machine learning (ML) to problems in homogeneous catalysis has emerged as a promising avenue for catalyst optimization. An important aspect of such optimization campaigns is determining which reactions to run at the outset of experimentation and which future predictions are the most reliable. Herein, we explore methods for these two tasks in the context of our previously developed chemoinformatics workflow. First, different methods for training set selection are compared, including algorithmic selection and selection informed by unsupervised learning methods. Next, an array of different metrics for assessment of prediction confidence are examined in multiple catalyst manifolds. These approaches will inform future computer-guided studies to accelerate catalyst selection and reaction optimization. Finally, this work demonstrates the generality of the Average Steric Occupancy (ASO) and Average Electronic Indicator Field (AEIF) descriptors in their application to transition metal catalysts for the first time. <br>


2020 ◽  
Author(s):  
Scott Denmark ◽  
Andrew Zahrt ◽  
William Darrow ◽  
Brennan Rose ◽  
Jeremy Henle

The application of machine learning (ML) to problems in homogeneous catalysis has emerged as a promising avenue for catalyst optimization. An important aspect of such optimization campaigns is determining which reactions to run at the outset of experimentation and which future predictions are the most reliable. Herein, we explore methods for these two tasks in the context of our previously developed chemoinformatics workflow. First, different methods for training set selection are compared, including algorithmic selection and selection informed by unsupervised learning methods. Next, an array of different metrics for assessment of prediction confidence are examined in multiple catalyst manifolds. These approaches will inform future computer-guided studies to accelerate catalyst selection and reaction optimization. Finally, this work demonstrates the generality of the Average Steric Occupancy (ASO) and Average Electronic Indicator Field (AEIF) descriptors in their application to transition metal catalysts for the first time. <br>


2012 ◽  
Vol 2012 ◽  
pp. 1-7 ◽  
Author(s):  
Ignacio Fernández Anitzine ◽  
Juan Antonio Romo Argota ◽  
Fernado Pérez Fontán

This paper analyzes the use of artificial neural networks (ANNs) for predicting the received power/path loss in both outdoor and indoor links. The approach followed has been a combined use of ANNs and ray-tracing, the latter allowing the identification and parameterization of the so-called dominant path. A complete description of the process for creating and training an ANN-based model is presented with special emphasis on the training process. More specifically, we will be discussing various techniques to arrive at valid predictions focusing on an optimum selection of the training set. A quantitative analysis based on results from two narrowband measurement campaigns, one outdoors and the other indoors, is also presented.


Author(s):  
Apurva Patel ◽  
Patrick Andrews ◽  
Joshua D. Summers

Artificial Neural Networks (ANNs) have been used to predict assembly time and market value from assembly models. This was done by converting the assembly models into bipartite graphs and extracting 29 graph complexity metrics which were used to train the ANN prediction models. This paper presents the use of sub-assembly models instead of the entire assembly model to predict assembly quality defects at an automotive OEM. The size of the training set, order of the bipartite graph, selection of training set, and defect type were experimentally studied. With a training size of 28 parts, an interpolation focused training set selection, and second order graph seeding, over 70% of the predictions were within 100% of the target value. The study shows that with an increase in training size and careful selection of training sets, assembly defects can be predicted reliably from sub-assemblies complexity data.


2004 ◽  
Vol 12 (2) ◽  
pp. 223-242 ◽  
Author(s):  
Christian W.G. Lasarczyk ◽  
Peter Dittrich ◽  
Wolfgang Banzhaf

A large training set of fitness cases can critically slow down genetic programming, if no appropriate subset selection method is applied. Such a method allows an individual to be evaluated on a smaller subset of fitness cases. In this paper we suggest a new subset selection method that takes the problem structure into account, while being problem independent at the same time. In order to achieve this, information about the problem structure is acquired during evolutionary search by creating a topology (relationship) on the set of fitness cases. The topology is induced by individuals of the evolving population. This is done by increasing the strength of the relation between two fitness cases, if an individual of the population is able to solve both of them. Our new topology—based subset selection method chooses a subset, such that fitness cases in this subset are as distantly related as is possible with respect to the induced topology. We compare topology—based selection of fitness cases with dynamic subset selection and stochastic subset sampling on four different problems. On average, runs with topology—based selection show faster progress than the others.


Author(s):  
Apurva Patel ◽  
Patrick Andrews ◽  
Joshua D. Summers ◽  
Erin Harrison ◽  
Joerg Schulte ◽  
...  

This paper presents the use of subassembly models instead of the entire assembly model to predict assembly quality defects at an automotive original equipment manufacturer (OEM). Specifically, artificial neural networks (ANNs) were used to predict assembly time and market value from assembly models. These models were converted into bipartite graphs from which 29 graph complexity metrics were extracted to train 18,900 ANN prediction models. The size of the training set, order of the bipartite graph, selection of training set, and defect type were experimentally studied. With a training size of 28 parts, an interpolation focused training set selection with a second-order graph seeding ensured that 70% of all predictions were within 100% of the target value. The study shows that with an increase in training size and careful selection of training sets, assembly defects can be predicted reliably from subassemblies' complexity data.


Sign in / Sign up

Export Citation Format

Share Document