scholarly journals Targeting Productive Composition Space Through Machine-Learning Directed Inorganic Synthesis

Author(s):  
Sogol Lotfi ◽  
Ziyan Zhang ◽  
Gayatri Viswanathan ◽  
Kaitlyn Fortenberry ◽  
Aria Mansouri Tehrani ◽  
...  

This work presents an approach to aid the discovery of novel inorganic solids by highlighting regions of underexplored, yet likely productive composition space using machine learning. A support vector regression (SVR) algorithm was constructed first to determine a compound’s formation energy (∆𝐸𝑓,SVR) based solely on chemical composition using data from 313,965 high-throughput density functional theory calculations. The resulting predicted formation energies (r<sup>2</sup> = 0.94; MAE = 85 meV/atom) were then used to construct zero-kelvin convex hull diagrams and identify compositions immediately on the hull, as well as +50 meV above the convex hull to capture potential compounds that are considered energetically unfavorable but that are still experimentally accessible. Using this methodology, four ternary composition diagrams, Y−Ag−<i>Tr</i> (<i>Tr</i> = B, Al, Ga, In), were explored owing to the diversity of chemistries as a function of triel element to provide experimental validation for the predictions. A particularly promising but unexplored region in the Y−Ag−In diagram was identified, and the ensuing solid-state high-temperature synthesis produced YAg<sub>0.65</sub>In<sub>1.35</sub>, which has not been reported. First-principle calculations were finally used to determine the ordering of Ag and In as well as confirm the crystal structure solution. Our combination of machine learning, inorganic synthesis, and computational modeling describes a new avenue where data-centric models and computation play a critical role in supporting the experimental examination of unexplored phase diagrams.

2020 ◽  
Author(s):  
Sogol Lotfi ◽  
Ziyan Zhang ◽  
Gayatri Viswanathan ◽  
Kaitlyn Fortenberry ◽  
Aria Mansouri Tehrani ◽  
...  

This work presents an approach to aid the discovery of novel inorganic solids by highlighting regions of underexplored, yet likely productive composition space using machine learning. A support vector regression (SVR) algorithm was constructed first to determine a compound’s formation energy (∆𝐸𝑓,SVR) based solely on chemical composition using data from 313,965 high-throughput density functional theory calculations. The resulting predicted formation energies (r<sup>2</sup> = 0.94; MAE = 8.5 meV/atom) were then used to construct zero-kelvin convex hull diagrams and identify compositions immediately on the hull, as well as +50 meV above the convex hull to capture potential compounds that are considered energetically unfavorable but that are still experimentally accessible. Using this methodology, four ternary composition diagrams, Y−Ag−<i>Tr</i> (<i>Tr</i> = B, Al, Ga, In), were explored owing to the diversity of chemistries as a function of triel element to provide experimental validation for the predictions. A particularly promising but unexplored region in the Y−Ag−In diagram was identified, and the ensuing solid-state high-temperature synthesis produced YAg<sub>0.65</sub>In<sub>1.35</sub>, which has not been reported. First-principle calculations were finally used to determine the ordering of Ag and In as well as confirm the crystal structure solution. Our combination of machine learning, inorganic synthesis, and computational modeling describes a new avenue where data-centric models and computation play a critical role in supporting the experimental examination of unexplored phase diagrams.


2020 ◽  
Author(s):  
Sogol Lotfi ◽  
Ziyan Zhang ◽  
Gayatri Viswanathan ◽  
Kaitlyn Fortenberry ◽  
Aria Mansouri Tehrani ◽  
...  

This work presents an approach to aid the discovery of novel inorganic solids by highlighting regions of underexplored, yet likely productive composition space using machine learning. A support vector regression (SVR) algorithm was constructed first to determine a compound’s formation energy (∆𝐸𝑓,SVR) based solely on chemical composition using data from 313,965 high-throughput density functional theory calculations. The resulting predicted formation energies (r<sup>2</sup> = 0.94; MAE = 85 meV/atom) were then used to construct zero-kelvin convex hull diagrams and identify compositions immediately on the hull, as well as +50 meV above the convex hull to capture potential compounds that are considered energetically unfavorable but that are still experimentally accessible. Using this methodology, four ternary composition diagrams, Y−Ag−<i>Tr</i> (<i>Tr</i> = B, Al, Ga, In), were explored owing to the diversity of chemistries as a function of triel element to provide experimental validation for the predictions. A particularly promising but unexplored region in the Y−Ag−In diagram was identified, and the ensuing solid-state high-temperature synthesis produced YAg<sub>0.65</sub>In<sub>1.35</sub>, which has not been reported. First-principle calculations were finally used to determine the ordering of Ag and In as well as confirm the crystal structure solution. Our combination of machine learning, inorganic synthesis, and computational modeling describes a new avenue where data-centric models and computation play a critical role in supporting the experimental examination of unexplored phase diagrams.


2019 ◽  
Author(s):  
Seoin Back ◽  
Kevin Tran ◽  
Zachary Ulissi

<div> <div> <div> <div><p>Developing active and stable oxygen evolution catalysts is a key to enabling various future energy technologies and the state-of-the-art catalyst is Ir-containing oxide materials. Understanding oxygen chemistry on oxide materials is significantly more complicated than studying transition metal catalysts for two reasons: the most stable surface coverage under reaction conditions is extremely important but difficult to understand without many detailed calculations, and there are many possible active sites and configurations on O* or OH* covered surfaces. We have developed an automated and high-throughput approach to solve this problem and predict OER overpotentials for arbitrary oxide surfaces. We demonstrate this for a number of previously-unstudied IrO2 and IrO3 polymorphs and their facets. We discovered that low index surfaces of IrO2 other than rutile (110) are more active than the most stable rutile (110), and we identified promising active sites of IrO2 and IrO3 that outperform rutile (110) by 0.2 V in theoretical overpotential. Based on findings from DFT calculations, we pro- vide catalyst design strategies to improve catalytic activity of Ir based catalysts and demonstrate a machine learning model capable of predicting surface coverages and site activity. This work highlights the importance of investigating unexplored chemical space to design promising catalysts.<br></p></div></div></div></div><div><div><div> </div> </div> </div>


2021 ◽  
Author(s):  
Sebastian Johannes Fritsch ◽  
Konstantin Sharafutdinov ◽  
Moein Einollahzadeh Samadi ◽  
Gernot Marx ◽  
Andreas Schuppert ◽  
...  

BACKGROUND During the course of the COVID-19 pandemic, a variety of machine learning models were developed to predict different aspects of the disease, such as long-term causes, organ dysfunction or ICU mortality. The number of training datasets used has increased significantly over time. However, these data now come from different waves of the pandemic, not always addressing the same therapeutic approaches over time as well as changing outcomes between two waves. The impact of these changes on model development has not yet been studied. OBJECTIVE The aim of the investigation was to examine the predictive performance of several models trained with data from one wave predicting the second wave´s data and the impact of a pooling of these data sets. Finally, a method for comparison of different datasets for heterogeneity is introduced. METHODS We used two datasets from wave one and two to develop several predictive models for mortality of the patients. Four classification algorithms were used: logistic regression (LR), support vector machine (SVM), random forest classifier (RF) and AdaBoost classifier (ADA). We also performed a mutual prediction on the data of that wave which was not used for training. Then, we compared the performance of models when a pooled dataset from two waves was used. The populations from the different waves were checked for heterogeneity using a convex hull analysis. RESULTS 63 patients from wave one (03-06/2020) and 54 from wave two (08/2020-01/2021) were evaluated. For both waves separately, we found models reaching sufficient accuracies up to 0.79 AUROC (95%-CI 0.76-0.81) for SVM on the first wave and up 0.88 AUROC (95%-CI 0.86-0.89) for RF on the second wave. After the pooling of the data, the AUROC decreased relevantly. In the mutual prediction, models trained on second wave´s data showed, when applied on first wave´s data, a good prediction for non-survivors but an insufficient classification for survivors. The opposite situation (training: first wave, test: second wave) revealed the inverse behaviour with models correctly classifying survivors and incorrectly predicting non-survivors. The convex hull analysis for the first and second wave populations showed a more inhomogeneous distribution of underlying data when compared to randomly selected sets of patients of the same size. CONCLUSIONS Our work demonstrates that a larger dataset is not a universal solution to all machine learning problems in clinical settings. Rather, it shows that inhomogeneous data used to develop models can lead to serious problems. With the convex hull analysis, we offer a solution for this problem. The outcome of such an analysis can raise concerns if the pooling of different datasets would cause inhomogeneous patterns preventing a better predictive performance.


Author(s):  
Yuta Maeda ◽  
Yoshiko Yamanaka ◽  
Takeo Ito ◽  
Shinichiro Horikawa

Summary We propose a new algorithm, focusing on spatial amplitude patterns, to automatically detect volcano seismic events from continuous waveforms. Candidate seismic events are detected based on signal-to-noise ratios. The algorithm then utilizes supervised machine learning to classify the existing candidate events into true and false categories. The input learning data are the ratios of the number of time samples with amplitudes greater than the background noise level at 1 s intervals (large amplitude ratios) given at every station site, and a manual classification table in which ‘true'' or ‘false'' flags are assigned to candidate events. A two-step approach is implemented in our procedure. First, using the large amplitude ratios at all stations, a neural network model representing a continuous spatial distribution of large amplitude probabilities is investigated at 1 s intervals. Second, several features are extracted from these spatial distributions, and a relation between the features and classification to true and false events is learned by a support vector machine. This two-step approach is essential to account for temporal loss of data, or station installation, movement, or removal. We evaluated the algorithm using data from Mt. Ontake, Japan, during the first ten days of a dense observation trial in the summit region (November 1–10, 2017). Results showed a classification accuracy of more than 97 per cent.


2015 ◽  
Vol 25 (09n10) ◽  
pp. 1699-1702 ◽  
Author(s):  
Theresia Ratih Dewi Saputri ◽  
Seok-Won Lee

National happiness has been actively studied throughout the past years. The happiness factor varies due to different human perspectives. The factors used in this work include both physical needs and the mental needs of humanity, for example, the educational factor. This work identified more than 90 features that can be used to predict the country happiness. Due to numerous features, it is unwise to rely on the prediction of national happiness by manual analysis. Therefore, this work used a machine learning technique called Support Vector Machine (SVM) to learn and predict the country happiness. In order to improve the prediction accuracy, dimensionality reduction technique which is the information gain was also used in this work. This technique was chosen due to its ability to explore the interrelationships among a set of variables. Using data of 187 countries from the UN Development Project, this work is able to identify which factor needed to be improved by a certain country to increase the happiness of their citizens.


2021 ◽  
Author(s):  
Zheng Cheng ◽  
Jiahui Du ◽  
Lei Zhang ◽  
Jing Ma ◽  
Wei Li ◽  
...  

<p>Molecular dynamic (MD) simulation plays an essential role in understanding protein functions at atomic level. At present, MD simulations on proteins are mainly based on classical force fields. However, the accuracy of classical force fields for proteins is still insufficient for accurate descriptions of their structures and dynamical properties. Here we present a novel protocol to construct machine learning force field (MLFF) for a given protein with full quantum mechanics (QM) accuracy. In this protocol, the energy of the target system is obtained by fitting energies of its various subsystems constructed with the generalized energy-based fragmentation (GEBF) approach. To facilitate the construction of MLFF for various proteins, a protein’s data library is created to store all data of subsystems generated from trained proteins. With this protein’s data library, for a new protein only its subsystems with new topological types are required for the construction of the corresponding MLFF. This protocol is illustrated with two polypeptides, 4ZNN and 1XQ8 segment, as examples. The energies and forces predicted from this MLFF are in good agreement with those from density functional theory calculations, and dihedral angle distributions from GEBF-MLFF MD simulations can also well reproduce those from <i>ab initio</i> MD simulations. Therefore, this GEBF-ML protocol is expected to be an efficient and systematic way to build force fields for proteins and other biological systems with QM accuracy.<b></b></p>


Drones ◽  
2020 ◽  
Vol 4 (3) ◽  
pp. 45
Author(s):  
Maria Angela Musci ◽  
Luigi Mazzara ◽  
Andrea Maria Lingua

Aircraft ground de-icing operations play a critical role in flight safety. However, to handle the aircraft de-icing, a considerable quantity of de-icing fluids is commonly employed. Moreover, some pre-flight inspections are carried out with engines running; thus, a large amount of fuel is wasted, and CO2 is emitted. This implies substantial economic and environmental impacts. In this context, the European project (reference call: MANUNET III 2018, project code: MNET18/ICT-3438) called SEI (Spectral Evidence of Ice) aims to provide innovative tools to identify the ice on aircraft and improve the efficiency of the de-icing process. The project includes the design of a low-cost UAV (uncrewed aerial vehicle) platform and the development of a quasi-real-time ice detection methodology to ensure a faster and semi-automatic activity with a reduction of applied operating time and de-icing fluids. The purpose of this work, developed within the activities of the project, is defining and testing the most suitable sensor using a radiometric approach and machine learning algorithms. The adopted methodology consists of classifying ice through spectral imagery collected by two different sensors: multispectral and hyperspectral camera. Since the UAV prototype is under construction, the experimental analysis was performed with a simulation dataset acquired on the ground. The comparison among the two approaches, and their related algorithms (random forest and support vector machine) for image processing, was presented: practical results show that it is possible to identify the ice in both cases. Nonetheless, the hyperspectral camera guarantees a more reliable solution reaching a higher level of accuracy of classified iced surfaces.


2021 ◽  
Author(s):  
Tomoya Inoue ◽  
Yujin Nakagawa ◽  
Ryota Wada ◽  
Keisuke Miyoshi ◽  
Shungo Abe ◽  
...  

Abstract The early detection of a stuck pipe during drilling operations is challenging and crucial. Some of the studies on stuck detection have adopted supervised machine learning approaches with ordinal support vector machines or neural networks using datasets for “stuck” and “normal”. However, for early detection before stuck occurs, the application of ordinal supervised machine learning has several concerns, such as limited stuck data, lack of an exact “stuck sign” before it occurs, and the various mechanisms involved in pipe sticking. This study acquires surface drilling data from various wells belonging to several agencies, examines the effectiveness of multiple learning models, and discusses the possibility of the early detection of pipe sticking before it occurs. Unsupervised machine learning using data on the normal activities is a possible advanced method for early stuck detection, which is adopted in this study. In addition, as a countermeasure to another concern that even normal activities involve various operations, we apply unsupervised learning with multiple learning models.


Sign in / Sign up

Export Citation Format

Share Document