scholarly journals Can Machine Learning Find Extraordinary Materials?

Author(s):  
Steven Kauwe ◽  
Jake Graser ◽  
Ryan Murdock ◽  
Taylor Sparks

<p>One of the most common criticisms of machine learning is an assumed inability for models to extrapolate, i.e. to identify extraordinary materials with properties beyond those present in the training data set. To investigate whether this is indeed the case, this work takes advantage of density functional theory calculated properties (bulk modulus, shear modulus, thermal conductivity, thermal expansion, band gap and Debye temperature) to investigate whether machine learning is truly capable of predicting materials with properties that extend beyond previously seen values. We refer to these materials as extraordinary, meaning they represent the top 1% of values in the available data set. Interestingly, we show that even when machine learning is trained on a fraction of the bottom 99% we can consistently identify 3/4 of the highest performing compositions for all considered properties with a precision that is typically above 0.5. Moreover, we investigate a few different modeling choices and demonstrate how a classification approach can identify an equivalent amount of extraordinary compounds but with significantly fewer false positives than a regression approach. Finally, we discuss cautions and potential limitations in implementing such an approach to discover new record-breaking materials.</p>

Author(s):  
Steven Kauwe ◽  
Jake Graser ◽  
Ryan Murdock ◽  
Taylor Sparks

<p>One of the most common criticisms of machine learning is an assumed inability for models to extrapolate, i.e. to identify extraordinary materials with properties beyond those present in the training data set. To investigate whether this is indeed the case, this work takes advantage of density functional theory calculated properties (bulk modulus, shear modulus, thermal conductivity, thermal expansion, band gap and Debye temperature) to investigate whether machine learning is truly capable of predicting materials with properties that extend beyond previously seen values. We refer to these materials as extraordinary, meaning they represent the top 1% of values in the available data set. Interestingly, we show that even when machine learning is trained on a fraction of the bottom 99% we can consistently identify 3/4 of the highest performing compositions for all considered properties with a precision that is typically above 0.5. Moreover, we investigate a few different modeling choices and demonstrate how a classification approach can identify an equivalent amount of extraordinary compounds but with significantly fewer false positives than a regression approach. Finally, we discuss cautions and potential limitations in implementing such an approach to discover new record-breaking materials.</p>


2019 ◽  
Author(s):  
Mihail Bogojeski ◽  
Leslie Vogt-Maranto ◽  
Mark E. Tuckerman ◽  
Klaus-Robert Mueller ◽  
Kieron Burke

<div> <div> <p>Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal/mol with presently-available functionals. <i>Ab initio</i> methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal/mol) on test data. Moreover, density-based ∆-learning (learning only the correction to a standard DFT calculation, termed ∆-DFT) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of ∆-DFT is highlighted by correcting "on the fly" DFT-based molecular dynamics (MD) simulations of resorcinol (C<sub>6</sub>H<sub>4</sub>(OH)<sub>2</sub>) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that ∆-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.</p> </div> </div>


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Mihail Bogojeski ◽  
Leslie Vogt-Maranto ◽  
Mark E. Tuckerman ◽  
Klaus-Robert Müller ◽  
Kieron Burke

Abstract Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol−1) on test data. Moreover, density-based Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT  is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT  facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.


Author(s):  
Ritu Khandelwal ◽  
Hemlata Goyal ◽  
Rajveer Singh Shekhawat

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.


2019 ◽  
Vol 9 (6) ◽  
pp. 1128 ◽  
Author(s):  
Yundong Li ◽  
Wei Hu ◽  
Han Dong ◽  
Xueyan Zhang

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Carl E. Belle ◽  
Vural Aksakalli ◽  
Salvy P. Russo

AbstractFor photovoltaic materials, properties such as band gap $$E_{g}$$ E g are critical indicators of the material’s suitability to perform a desired function. Calculating $$E_{g}$$ E g is often performed using Density Functional Theory (DFT) methods, although more accurate calculation are performed using methods such as the GW approximation. DFT software often used to compute electronic properties includes applications such as VASP, CRYSTAL, CASTEP or Quantum Espresso. Depending on the unit cell size and symmetry of the material, these calculations can be computationally expensive. In this study, we present a new machine learning platform for the accurate prediction of properties such as $$E_{g}$$ E g of a wide range of materials.


2021 ◽  
Author(s):  
Eva van der Kooij ◽  
Marc Schleiss ◽  
Riccardo Taormina ◽  
Francesco Fioranelli ◽  
Dorien Lugt ◽  
...  

&lt;p&gt;Accurate short-term forecasts, also known as nowcasts, of heavy precipitation are desirable for creating early warning systems for extreme weather and its consequences, e.g. urban flooding. In this research, we explore the use of machine learning for short-term prediction of heavy rainfall showers in the Netherlands.&lt;/p&gt;&lt;p&gt;We assess the performance of a recurrent, convolutional neural network (TrajGRU) with lead times of 0 to 2 hours. The network is trained on a 13-year archive of radar images with 5-min temporal and 1-km spatial resolution from the precipitation radars of the Royal Netherlands Meteorological Institute (KNMI). We aim to train the model to predict the formation and dissipation of dynamic, heavy, localized rain events, a task for which traditional Lagrangian nowcasting methods still come up short.&lt;/p&gt;&lt;p&gt;We report on different ways to optimize predictive performance for heavy rainfall intensities through several experiments. The large dataset available provides many possible configurations for training. To focus on heavy rainfall intensities, we use different subsets of this dataset through using different conditions for event selection and varying the ratio of light and heavy precipitation events present in the training data set and change the loss function used to train the model.&lt;/p&gt;&lt;p&gt;To assess the performance of the model, we compare our method to current state-of-the-art Lagrangian nowcasting system from the pySTEPS library, like S-PROG, a deterministic approximation of an ensemble mean forecast. The results of the experiments are used to discuss the pros and cons of machine-learning based methods for precipitation nowcasting and possible ways to further increase performance.&lt;/p&gt;


Sign in / Sign up

Export Citation Format

Share Document