Number Density Descriptor on Extended-Connectivity Fingerprints Combined with Machine Learning Approaches for Predicting Polymer Properties

MRS Advances ◽  
2018 ◽  
Vol 3 (49) ◽  
pp. 2975-2980
Author(s):  
Takuya Minami ◽  
Yoshishige Okuno

AbstractWe developed a new type of polymer descriptor based on Extended Connectivity Fingerprints. The number densities, that are substructure numbers divided by the number of atoms in a polymer model, were employed. We found that this approach is superior in accurately predicting linear polymer properties, compared to the conventional approach, where just the substructure numbers are used as descriptors. In addition, dimension reduction and multiple replication of repeat unit were found to improve prediction accuracy. As a result, the novel descriptor based on the Extended Connectivity Fingerprints with machine learning approaches was found to achieve accurate prediction of the refractive indices of linear polymers, which is comparable to that by ab initio density functional theory. Although process-dependent properties such as mechanical properties were difficult to predict, the present approach was found to be applicable to prediction of substructure-dependent properties, for example, optical properties, thermal stabilities.

2018 ◽  
Author(s):  
Sherif Tawfik ◽  
Olexandr Isayev ◽  
Catherine Stampfl ◽  
Joseph Shapter ◽  
David Winkler ◽  
...  

<p>There are now, in principle, a limitless number of hybrid van der Waals heterostructures that can be built from the rapidly growing number of two-dimensional layers. The key question is how to explore this vast parameter space in a practical way. Computational methods can guide experimental work however, even the most efficient electronic structure methods such as density functional theory, are too time consuming to explore more than a tiny fraction of all possible hybrid 2D materials. Here we demonstrate that a combination of DFT and machine learning techniques provide a practical method for exploring this parameter space much more efficiently than by DFT or experiment. As a proof of concept we applied this methodology to predict the interlayer distance and band gap of bilayer heterostructures. Our methods quickly and accurately predicted these important properties for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.</p>


Author(s):  
Elif Ertekin ◽  
Joshua A. Schiller

It is challenging to evaluate machine learning approaches developed for accelerating materials search and discovery in a realistic way. Machine learning approaches to materials stability prediction are typically assessed by their ability to reproduce results from direct physical modeling, whereas ideally both machine learning and direct physical modeling should be assessed by their ability to reproduce reality. Additionally, traditional evaluation metrics do not directly reflect the experience of an experimental search for unknown compounds in a large candidate phase space, and often result in overly optimistic assessments. Here, we (i) present a framework that combines density functional theory and traditional supervised machine learning methods (ML/DFT), and (ii) introduce the concepts of search completeness – the fraction of discoverable compounds found relative to the fraction of search space explored – and search efficiency – the rate of discovery relative to the fraction of search space explored – to evaluate it. The ML/DFT framework is an iterative approach to predict stable chemistries of a fixed crystal structure (here, spinels) that uses DFT to generate a training set of unstable compounds. The training set of stable compounds is given by experimentally known spinels. The method is carried out using random forest, LASSO, and ridge regression to predict as-of-yet undiscovered spinel chemistries. TreeSHAP analysis is used to determine features that most contribute to stability/instability classification. While no single feature dominates, several emerge that align with chemical intuition. To estimate the efficacy of ML/DFT compared to pure DFT, we introduce a Bayesian description of DFT distribution of energies for stable and unstable spinels. The Bayesian model enables quantifying the search completeness and search efficiency of DFT, which is then compared to that of ML/DFT. ML/DFT achieves search completeness and efficiency on par with pure DFT, despite requiring fewer DFT simulations (∼300 vs. 14,200). More importantly, by quantitatively assessing ML approaches in ways that better reflect how they would be used in materials discovery experiments, we obtain key insights into the challenges that need to be overcome by such methods: that the small number of stable compounds to be found in a search space orders of magnitude larger places stringent demands on model accuracy to achieve good search efficiency. Finally, we report the top candidates of our spinel search, which may be of interest for synthesis experiments<br>


2018 ◽  
Author(s):  
Sherif Tawfik ◽  
Olexandr Isayev ◽  
Catherine Stampfl ◽  
Joseph Shapter ◽  
David Winkler ◽  
...  

<p>There are now, in principle, a limitless number of hybrid van der Waals heterostructures that can be built from the rapidly growing number of two-dimensional layers. The key question is how to explore this vast parameter space in a practical way. Computational methods can guide experimental work however, even the most efficient electronic structure methods such as density functional theory, are too time consuming to explore more than a tiny fraction of all possible hybrid 2D materials. Here we demonstrate that a combination of DFT and machine learning techniques provide a practical method for exploring this parameter space much more efficiently than by DFT or experiment. As a proof of concept we applied this methodology to predict the interlayer distance and band gap of bilayer heterostructures. Our methods quickly and accurately predicted these important properties for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.</p>


2018 ◽  
Author(s):  
Sherif Tawfik ◽  
Olexandr Isayev ◽  
Catherine Stampfl ◽  
Joseph Shapter ◽  
David Winkler ◽  
...  

<p>There are now, in principle, a limitless number of hybrid van der Waals heterostructures that can be built from the rapidly growing number of two-dimensional layers. The key question is how to explore this vast parameter space in a practical way. Computational methods can guide experimental work however, even the most efficient electronic structure methods such as density functional theory, are too time consuming to explore more than a tiny fraction of all possible hybrid 2D materials. Here we demonstrate that a combination of DFT and machine learning techniques provide a practical method for exploring this parameter space much more efficiently than by DFT or experiment. As a proof of concept we applied this methodology to predict the interlayer distance and band gap of bilayer heterostructures. Our methods quickly and accurately predicted these important properties for a large number of hybrid 2D materials. This work paves the way for rapid computational screening of the vast parameter space of van der Waals heterostructures to identify new hybrid materials with useful and interesting properties.</p>


2021 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Jina Kim ◽  
Yeonju Jang ◽  
Kunwoo Bae ◽  
Soyoung Oh ◽  
Nam Jeong Jeong ◽  
...  

PurposeUnderstanding customers' revisiting behavior is highlighted in the field of service industry and the emergence of online communities has enabled customers to express their prior experience. Thus, purpose of this study is to investigate customers' reviews on an online hotel reservation platform, and explores their postbehaviors from their reviews.Design/methodology/approachThe authors employ two different approaches and compare the accuracy of predicting customers' post behavior: (1) using several machine learning classifiers based on sentimental dimensions of customers' reviews and (2) conducting the experiment consisted of two subsections. In the experiment, the first subsection is designed for participants to predict whether customers who wrote reviews would visit the hotel again (referred to as Prediction), while the second subsection examines whether participants want to visit one of the particular hotels when they read other customers' reviews (dubbed as Decision).FindingsThe accuracy of the machine learning approaches (73.23%) is higher than that of the experimental approach (Prediction: 58.96% and Decision: 64.79%). The key reasons of users' predictions and decisions are identified through qualitative analyses.Originality/valueThe findings reveal that using machine learning approaches show the higher accuracy of predicting customers' repeat visits only based on employed sentimental features. With the novel approach of integrating customers' decision processes and machine learning classifiers, the authors provide valuable insights for researchers and providers of hospitality services.


Entropy ◽  
2022 ◽  
Vol 24 (1) ◽  
pp. 107
Author(s):  
Santosh Manicka ◽  
Michael Levin

What information-processing strategies and general principles are sufficient to enable self-organized morphogenesis in embryogenesis and regeneration? We designed and analyzed a minimal model of self-scaling axial patterning consisting of a cellular network that develops activity patterns within implicitly set bounds. The properties of the cells are determined by internal ‘genetic’ networks with an architecture shared across all cells. We used machine-learning to identify models that enable this virtual mini-embryo to pattern a typical axial gradient while simultaneously sensing the set boundaries within which to develop it from homogeneous conditions—a setting that captures the essence of early embryogenesis. Interestingly, the model revealed several features (such as planar polarity and regenerative re-scaling capacity) for which it was not directly selected, showing how these common biological design principles can emerge as a consequence of simple patterning modes. A novel “causal network” analysis of the best model furthermore revealed that the originally symmetric model dynamically integrates into intercellular causal networks characterized by broken-symmetry, long-range influence and modularity, offering an interpretable macroscale-circuit-based explanation for phenotypic patterning. This work shows how computation could occur in biological development and how machine learning approaches can generate hypotheses and deepen our understanding of how featureless tissues might develop sophisticated patterns—an essential step towards predictive control of morphogenesis in regenerative medicine or synthetic bioengineering contexts. The tools developed here also have the potential to benefit machine learning via new forms of backpropagation and by leveraging the novel distributed self-representation mechanisms to improve robustness and generalization.


Author(s):  
Elif Ertekin ◽  
Joshua A. Schiller

It is challenging to evaluate machine learning approaches developed for accelerating materials search and discovery in a realistic way. Machine learning approaches to materials stability prediction are typically assessed by their ability to reproduce results from direct physical modeling, whereas ideally both machine learning and direct physical modeling should be assessed by their ability to reproduce reality. Additionally, traditional evaluation metrics do not directly reflect the experience of an experimental search for unknown compounds in a large candidate phase space, and often result in overly optimistic assessments. Here, we (i) present a framework that combines density functional theory and traditional supervised machine learning methods (ML/DFT), and (ii) introduce the concepts of search completeness – the fraction of discoverable compounds found relative to the fraction of search space explored – and search efficiency – the rate of discovery relative to the fraction of search space explored – to evaluate it. The ML/DFT framework is an iterative approach to predict stable chemistries of a fixed crystal structure (here, spinels) that uses DFT to generate a training set of unstable compounds. The training set of stable compounds is given by experimentally known spinels. The method is carried out using random forest, LASSO, and ridge regression to predict as-of-yet undiscovered spinel chemistries. TreeSHAP analysis is used to determine features that most contribute to stability/instability classification. While no single feature dominates, several emerge that align with chemical intuition. To estimate the efficacy of ML/DFT compared to pure DFT, we introduce a Bayesian description of DFT distribution of energies for stable and unstable spinels. The Bayesian model enables quantifying the search completeness and search efficiency of DFT, which is then compared to that of ML/DFT. ML/DFT achieves search completeness and efficiency on par with pure DFT, despite requiring fewer DFT simulations (∼300 vs. 14,200). More importantly, by quantitatively assessing ML approaches in ways that better reflect how they would be used in materials discovery experiments, we obtain key insights into the challenges that need to be overcome by such methods: that the small number of stable compounds to be found in a search space orders of magnitude larger places stringent demands on model accuracy to achieve good search efficiency. Finally, we report the top candidates of our spinel search, which may be of interest for synthesis experiments<br>


Sign in / Sign up

Export Citation Format

Share Document