scholarly journals Multi-fidelity Sequential Learning for Accelerated Materials Discovery

Author(s):  
Aini Palizhati ◽  
Muratahan Aykol ◽  
Santosh Suram ◽  
Jens Strabo Hummelshøj ◽  
Joseph H. Montoya

We introduce a new agent-based framework for materials discovery that combines multi-fidelity modeling and sequential learning to lower the number of expensive data acquisitions while maximizing discovery. We demonstrate the framework's capability by simulating a materials discovery campaign using experimental and DFT band gap data. Using these simulations, we determine how different machine learning models and acquisition strategies influence the overall rate of discovery of materials per experiment. The framework demonstrates that including lower fidelity (DFT) data, whether as a-priori knowledge or using in-tandem acquisition, increases the discovery rate of materials suitable for solar photoabsorption. We also show that the performance of a given agent depends on data size, model selection, and acquisition strategy. As such, our framework provides a tool that enables materials scientists to test various acquisition and model hyperparameters to maximize the discovery rate of their own multi-fidelity sequential learning campaigns for materials discovery.

2021 ◽  
Author(s):  
Aini Palizhati ◽  
Muratahan Aykol ◽  
Santosh Suram ◽  
Jens Strabo Hummelshøj ◽  
Joseph H. Montoya

We introduce a new agent-based framework for materials discovery that combines multi-fidelity modeling and sequential learning to lower the number of expensive data acquisitions while maximizing discovery. We demonstrate the framework's capability by simulating a materials discovery campaign using experimental and DFT band gap data. Using these simulations, we determine how different machine learning models and acquisition strategies influence the overall rate of discovery of materials per experiment. The framework demonstrates that including lower fidelity (DFT) data, whether as a-priori knowledge or using in-tandem acquisition, increases the discovery rate of materials suitable for solar photoabsorption. We also show that the performance of a given agent depends on data size, model selection, and acquisition strategy. As such, our framework provides a tool that enables materials scientists to test various acquisition and model hyperparameters to maximize the discovery rate of their own multi-fidelity sequential learning campaigns for materials discovery.


2021 ◽  
Author(s):  
Hua-Liang Wei ◽  
Stephen A Billings

Since the outbreak of COVID-19, an astronomical number of publications on the pandemic dynamics appeared in the literature, of which many use the susceptible infected removed (SIR) and susceptible exposed infected removed (SEIR) models, or their variants, to simulate and study the spread of the coronavirus. SIR and SEIR are continuous-time models which are a class of initial value problems (IVPs) of ordinary differential equations (ODEs). Discrete-time models such as regression and machine learning have also been applied to analyze COVID-19 pandemic data (e.g. predicting infection cases), but most of these methods use simplified models involving a small number of input variables pre-selected based on a priori knowledge, or use very complicated models (e.g. deep learning), purely focusing on certain prediction purposes and paying little attention to the model interpretability. There have been relatively fewer studies focusing on the investigations of the inherent time-lagged or time-delayed relationships e.g. between the reproduction number (R number), infection cases, and deaths, analyzing the pandemic spread from a systems thinking and dynamic perspective. The present study, for the first time, proposes using systems engineering and system identification approach to build transparent, interpretable, parsimonious and simulatable (TIPS) dynamic machine learning models, establishing links between the R number, the infection cases and deaths caused by COVID-19. The TIPS models are developed based on the well-known NARMAX (Nonlinear AutoRegressive Moving Average with eXogenous inputs) model, which can help better understand the COVID-19 pandemic dynamics. A case study on the UK COVID-19 data is carried out, and new findings are detailed. The proposed method and the associated new findings are useful for better understanding the spread dynamics of the COVID-19 pandemic.


Author(s):  
Christoph Völker ◽  
Rafia Firdous ◽  
Dietmar Stephan ◽  
Sabine Kruschwitz

AbstractAlkali-activated binders (AAB) can provide a clean alternative to conventional cement in terms of CO2 emissions. However, as yet there are no sufficiently accurate material models to effectively predict the AAB properties, thus making optimal mix design highly costly and reducing the attractiveness of such binders. This work adopts sequential learning (SL) in high-dimensional material spaces (consisting of composition and processing data) to find AABs that exhibit desired properties. The SL approach combines machine learning models and feedback from real experiments. For this purpose, 131 data points were collected from different publications. The data sources are described in detail, and the differences between the binders are discussed. The sought-after target property is the compressive strength of the binders after 28 days. The success is benchmarked in terms of the number of experiments required to find materials with the desired strength. The influence of some constraints was systematically analyzed, e.g., the possibility to parallelize the experiments, the influence of the chosen algorithm and the size of the training data set. The results show the advantage of SL, i.e., the amount of data required can potentially be reduced by at least one order of magnitude compared to traditional machine learning models, while at the same time exploiting highly complex information. This brings applications in laboratory practice within reach.


2020 ◽  
Vol 2 (1) ◽  
pp. 3-6
Author(s):  
Eric Holloway

Imagination Sampling is the usage of a person as an oracle for generating or improving machine learning models. Previous work demonstrated a general system for using Imagination Sampling for obtaining multibox models. Here, the possibility of importing such models as the starting point for further automatic enhancement is explored.


2021 ◽  
Author(s):  
Norberto Sánchez-Cruz ◽  
Jose L. Medina-Franco

<p>Epigenetic targets are a significant focus for drug discovery research, as demonstrated by the eight approved epigenetic drugs for treatment of cancer and the increasing availability of chemogenomic data related to epigenetics. This data represents a large amount of structure-activity relationships that has not been exploited thus far for the development of predictive models to support medicinal chemistry efforts. Herein, we report the first large-scale study of 26318 compounds with a quantitative measure of biological activity for 55 protein targets with epigenetic activity. Through a systematic comparison of machine learning models trained on molecular fingerprints of different design, we built predictive models with high accuracy for the epigenetic target profiling of small molecules. The models were thoroughly validated showing mean precisions up to 0.952 for the epigenetic target prediction task. Our results indicate that the herein reported models have considerable potential to identify small molecules with epigenetic activity. Therefore, our results were implemented as freely accessible and easy-to-use web application.</p>


Sign in / Sign up

Export Citation Format

Share Document