Can Machine Learning Find Extraordinary Materials?

10.26434/chemrxiv.9396623.v1 ◽

2019 ◽

Cited By ~ 1

Author(s):

Steven Kauwe ◽

Jake Graser ◽

Ryan Murdock ◽

Taylor Sparks

Keyword(s):

Machine Learning ◽

Thermal Conductivity ◽

New Record ◽

Density Functional ◽

Training Data ◽

Equivalent Amount ◽

Data Set ◽

Functional Theory ◽

Classification Approach ◽

Regression Approach

One of the most common criticisms of machine learning is an assumed inability for models to extrapolate, i.e. to identify extraordinary materials with properties beyond those present in the training data set. To investigate whether this is indeed the case, this work takes advantage of density functional theory calculated properties (bulk modulus, shear modulus, thermal conductivity, thermal expansion, band gap and Debye temperature) to investigate whether machine learning is truly capable of predicting materials with properties that extend beyond previously seen values. We refer to these materials as extraordinary, meaning they represent the top 1% of values in the available data set. Interestingly, we show that even when machine learning is trained on a fraction of the bottom 99% we can consistently identify 3/4 of the highest performing compositions for all considered properties with a precision that is typically above 0.5. Moreover, we investigate a few different modeling choices and demonstrate how a classification approach can identify an equivalent amount of extraordinary compounds but with significantly fewer false positives than a regression approach. Finally, we discuss cautions and potential limitations in implementing such an approach to discover new record-breaking materials.

Get full-text (via PubEx)

Quantum chemical accuracy from density functional approximations via machine learning

10.26434/chemrxiv.8079917 ◽

2019 ◽

Author(s):

Mihail Bogojeski ◽

Leslie Vogt-Maranto ◽

Mark E. Tuckerman ◽

Klaus-Robert Mueller ◽

Kieron Burke

Keyword(s):

Machine Learning ◽

Quantum Chemical ◽

Density Functional ◽

Md Simulations ◽

Training Data ◽

Coupled Cluster ◽

Functional Theory ◽

Computational Costs ◽

Standard Tool ◽

Cluster Accuracy

<div> <div> Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal/mol with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal/mol) on test data. Moreover, density-based ∆-learning (learning only the correction to a standard DFT calculation, termed ∆-DFT) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of ∆-DFT is highlighted by correcting "on the fly" DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that ∆-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails. </div> </div>

Get full-text (via PubEx)

Quantum chemical accuracy from density functional approximations via machine learning

Nature Communications ◽

10.1038/s41467-020-19093-1 ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 2

Author(s):

Mihail Bogojeski ◽

Leslie Vogt-Maranto ◽

Mark E. Tuckerman ◽

Klaus-Robert Müller ◽

Kieron Burke

Keyword(s):

Machine Learning ◽

Quantum Chemical ◽

Density Functional ◽

Md Simulations ◽

Training Data ◽

Coupled Cluster ◽

Functional Theory ◽

Computational Costs ◽

Standard Tool ◽

Cluster Accuracy

Abstract Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol−1) on test data. Moreover, density-based Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT ) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.

Get full-text (via PubEx)

Lattice thermal conductivity of half-Heuslers with density functional theory and machine learning: Enhancing predictivity by active sampling with principal component analysis

Computational Materials Science ◽

10.1016/j.commatsci.2021.110938 ◽

2022 ◽

Vol 202 ◽

pp. 110938

Author(s):

Rasmus Tranås ◽

Ole Martin Løvvik ◽

Oliver Tomic ◽

Kristian Berland

Keyword(s):

Machine Learning ◽

Thermal Conductivity ◽

Density Functional Theory ◽

Principal Component Analysis ◽

Density Functional ◽

Lattice Thermal Conductivity ◽

Principal Component ◽

Component Analysis ◽

Active Sampling ◽

Functional Theory

Get full-text (via PubEx)

Comparative Analysis of Machine Learning Techniques Using Predictive Modeling

Recent Advances in Computer Science and Communications ◽

10.2174/2666255813999200904164539 ◽

2020 ◽

Vol 13 ◽

Author(s):

Ritu Khandelwal ◽

Hemlata Goyal ◽

Rajveer Singh Shekhawat

Keyword(s):

Machine Learning ◽

Comparative Analysis ◽

Data Science ◽

Training Data ◽

Machine Learning Techniques ◽

Future Trends ◽

Data Set ◽

Learning Stage ◽

Learning Techniques ◽

Different Types

Introduction: Machine learning is an intelligent technology that works as a bridge between businesses and data science. With the involvement of data science, the business goal focuses on findings to get valuable insights on available data. The large part of Indian Cinema is Bollywood which is a multi-million dollar industry. This paper attempts to predict whether the upcoming Bollywood Movie would be Blockbuster, Superhit, Hit, Average or Flop. For this Machine Learning techniques (classification and prediction) will be applied. To make classifier or prediction model first step is the learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations. Methods: All the techniques related to classification and Prediction such as Support Vector Machine(SVM), Random Forest, Decision Tree, Naïve Bayes, Logistic Regression, Adaboost, and KNN will be applied and try to find out efficient and effective results. All these functionalities can be applied with GUI Based workflows available with various categories such as data, Visualize, Model, and Evaluate. Result: To make classifier or prediction model first step is learning stage in which we need to give the training data set to train the model by applying some technique or algorithm and after that different rules are generated which helps to make a model and predict future trends in different types of organizations Conclusion: This paper focuses on Comparative Analysis that would be performed based on different parameters such as Accuracy, Confusion Matrix to identify the best possible model for predicting the movie Success. By using Advertisement Propaganda, they can plan for the best time to release the movie according to the predicted success rate to gain higher benefits. Discussion: Data Mining is the process of discovering different patterns from large data sets and from that various relationships are also discovered to solve various problems that come in business and helps to predict the forthcoming trends. This Prediction can help Production Houses for Advertisement Propaganda and also they can plan their costs and by assuring these factors they can make the movie more profitable.

Get full-text (via PubEx)

Building Damage Detection from Post-Event Aerial Imagery Using Single Shot Multibox Detector

Applied Sciences ◽

10.3390/app9061128 ◽

2019 ◽

Vol 9 (6) ◽

pp. 1128 ◽

Cited By ~ 12

Author(s):

Yundong Li ◽

Wei Hu ◽

Han Dong ◽

Xueyan Zhang

Keyword(s):

Machine Learning ◽

Data Augmentation ◽

Hurricane Sandy ◽

Training Data ◽

Aerial Images ◽

Detection Methods ◽

Single Shot ◽

Data Set ◽

Augmentation Strategies ◽

Post Disaster

Using aerial cameras, satellite remote sensing or unmanned aerial vehicles (UAV) equipped with cameras can facilitate search and rescue tasks after disasters. The traditional manual interpretation of huge aerial images is inefficient and could be replaced by machine learning-based methods combined with image processing techniques. Given the development of machine learning, researchers find that convolutional neural networks can effectively extract features from images. Some target detection methods based on deep learning, such as the single-shot multibox detector (SSD) algorithm, can achieve better results than traditional methods. However, the impressive performance of machine learning-based methods results from the numerous labeled samples. Given the complexity of post-disaster scenarios, obtaining many samples in the aftermath of disasters is difficult. To address this issue, a damaged building assessment method using SSD with pretraining and data augmentation is proposed in the current study and highlights the following aspects. (1) Objects can be detected and classified into undamaged buildings, damaged buildings, and ruins. (2) A convolution auto-encoder (CAE) that consists of VGG16 is constructed and trained using unlabeled post-disaster images. As a transfer learning strategy, the weights of the SSD model are initialized using the weights of the CAE counterpart. (3) Data augmentation strategies, such as image mirroring, rotation, Gaussian blur, and Gaussian noise processing, are utilized to augment the training data set. As a case study, aerial images of Hurricane Sandy in 2012 were maximized to validate the proposed method’s effectiveness. Experiments show that the pretraining strategy can improve of 10% in terms of overall accuracy compared with the SSD trained from scratch. These experiments also demonstrate that using data augmentation strategies can improve mAP and mF1 by 72% and 20%, respectively. Finally, the experiment is further verified by another dataset of Hurricane Irma, and it is concluded that the paper method is feasible.

Get full-text (via PubEx)

Predicting Density Functional Theory-Quality Nuclear Magnetic Resonance Chemical Shifts via Δ-Machine Learning

Journal of Chemical Theory and Computation ◽

10.1021/acs.jctc.0c00979 ◽

2021 ◽

Vol 17 (2) ◽

pp. 826-840

Author(s):

Pablo A. Unzueta ◽

Chandler S. Greenwell ◽

Gregory J. O. Beran

Keyword(s):

Machine Learning ◽

Density Functional Theory ◽

Nuclear Magnetic Resonance ◽

Magnetic Resonance ◽

Density Functional ◽

Chemical Shifts ◽

Functional Theory ◽

Nuclear Magnetic

Get full-text (via PubEx)

A machine learning platform for the discovery of materials

Journal of Cheminformatics ◽

10.1186/s13321-021-00518-y ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Carl E. Belle ◽

Vural Aksakalli ◽

Salvy P. Russo

Keyword(s):

Machine Learning ◽

Density Functional ◽

Gw Approximation ◽

Dft Methods ◽

Functional Theory ◽

Unit Cell Size ◽

Learning Platform ◽

Photovoltaic Materials ◽

Wide Range ◽

Quantum Espresso

AbstractFor photovoltaic materials, properties such as band gap $$E_{g}$$ E g are critical indicators of the material’s suitability to perform a desired function. Calculating $$E_{g}$$ E g is often performed using Density Functional Theory (DFT) methods, although more accurate calculation are performed using methods such as the GW approximation. DFT software often used to compute electronic properties includes applications such as VASP, CRYSTAL, CASTEP or Quantum Espresso. Depending on the unit cell size and symmetry of the material, these calculations can be computationally expensive. In this study, we present a new machine learning platform for the accurate prediction of properties such as $$E_{g}$$ E g of a wide range of materials.

Get full-text (via PubEx)

A density-functional-theory-based and machine-learning-accelerated hybrid method for intricate system catalysis

Materials Reports: Energy ◽

10.1016/j.matre.2021.100046 ◽

2021 ◽

pp. 100046

Author(s):

Xuhao Wan ◽

Zhaofu Zhang ◽

Wei Yu ◽

Yuzheng Guo

Keyword(s):

Machine Learning ◽

Density Functional Theory ◽

Hybrid Method ◽

Density Functional ◽

Functional Theory

Get full-text (via PubEx)

Nowcasting heavy precipitation over the Netherlands using a 13-year radar archive: a machine learning approach

10.5194/egusphere-egu21-12814 ◽

2021 ◽

Author(s):

Eva van der Kooij ◽

Marc Schleiss ◽

Riccardo Taormina ◽

Francesco Fioranelli ◽

Dorien Lugt ◽

...

Keyword(s):

Machine Learning ◽

The Netherlands ◽

Heavy Rainfall ◽

Predictive Performance ◽

Heavy Precipitation ◽

Early Warning Systems ◽

Training Data ◽

Short Term ◽

Data Set ◽

Radar Images

Accurate short-term forecasts, also known as nowcasts, of heavy precipitation are desirable for creating early warning systems for extreme weather and its consequences, e.g. urban flooding. In this research, we explore the use of machine learning for short-term prediction of heavy rainfall showers in the Netherlands.We assess the performance of a recurrent, convolutional neural network (TrajGRU) with lead times of 0 to 2 hours. The network is trained on a 13-year archive of radar images with 5-min temporal and 1-km spatial resolution from the precipitation radars of the Royal Netherlands Meteorological Institute (KNMI). We aim to train the model to predict the formation and dissipation of dynamic, heavy, localized rain events, a task for which traditional Lagrangian nowcasting methods still come up short.We report on different ways to optimize predictive performance for heavy rainfall intensities through several experiments. The large dataset available provides many possible configurations for training. To focus on heavy rainfall intensities, we use different subsets of this dataset through using different conditions for event selection and varying the ratio of light and heavy precipitation events present in the training data set and change the loss function used to train the model.To assess the performance of the model, we compare our method to current state-of-the-art Lagrangian nowcasting system from the pySTEPS library, like S-PROG, a deterministic approximation of an ensemble mean forecast. The results of the experiments are used to discuss the pros and cons of machine-learning based methods for precipitation nowcasting and possible ways to further increase performance.

Get full-text (via PubEx)