scholarly journals Prediction of Thermal Properties of Zeolites through Machine Learning

Author(s):  
Maxime Ducamp ◽  
François-Xavier Coudert

The use of machine learning for the prediction of physical and chemical properties of crystals based on their structure alone is currently an area of intense research in computational materials science. In this work, we studied the possibility of using machine learning-trained algorithms in order to calculate the thermal properties of siliceous zeolite frameworks. We used as training data the thermal properties of 120 zeolites, calculated at the DFT level, in the quasi-harmonic approximation. We compared the statistical accuracy of trained models (based on the gradient boosting regression technique) using different types of descriptors, including ad hoc geometrical features, topology, pore space, and general geometric descriptors. While geometric descriptors were found to perform best, we also identified limitations on the accuracy of the predictions, especially for a small group of materials with very highly negative thermal expansion coefficients. We then studied the generalizability of the technique, demonstrating that the predictions were not sensitive to the refinement of framework structures at a high level of theory. Therefore, the models are suitable for the exploration and screening of large-scale databases of hypothetical frameworks, which we illustrate on the PCOD2 database of zeolites containing around 600,000 hypothetical structures.

2022 ◽  
Author(s):  
Maxime Ducamp ◽  
François-Xavier Coudert

The use of machine learning for the prediction of physical and chemical properties of crystals based on their structure alone is currently an area of intense research in computational materials science. In this work, we studied the possibility of using machine learning-trained algorithms in order to calculate the thermal properties of siliceous zeolite frameworks. We used as training data the thermal properties of 120 zeolites, calculated at the DFT level, in the quasi-harmonic approximation. We compared the statistical accuracy of trained models (based on the gradient boosting regression technique) using different types of descriptors, including ad hoc geometrical features, topology, pore space, and general geometric descriptors. While geometric descriptors were found to perform best, we also identified limitations on the accuracy of the predictions, especially for a small group of materials with very highly negative thermal expansion coefficients. We then studied the generalizability of the technique, demonstrating that the predictions were not sensitive to the refinement of framework structures at a high level of theory. Therefore, the models are suitable for the exploration and screening of large-scale databases of hypothetical frameworks, which we illustrate on the PCOD2 database of zeolites containing around 600,000 hypothetical structures.


2021 ◽  
Author(s):  
Maxime Ducamp ◽  
François-Xavier Coudert

The use of machine learning for the prediction of physical and chemical properties of crystals based on their structure alone is currently an area of intense research in computational materials science. In this work, we studied the possibility of using machine learning-trained algorithms in order to calculate the thermal properties of siliceous zeolite frameworks. We used as training data the thermal properties of 134 zeolites, calculated at the DFT level, in the quasi-harmonic approximation. We compared the statistical accuracy of trained models (based on the gradient boosting regression technique) using different types of descriptors, including ad hoc geometrical features, topology, pore space, and general geometric descriptors. While geometric descriptors were found to perform best, we also identified limitations on the accuracy of the predictions, especially for a small group of materials with very highly negative thermal expansion coefficients. We then studied the generalizability of the technique, demonstrating that the predictions were not sensitive to the refinement of framework structures at a high level of theory. Therefore, the models are suitable for the exploration and screening of large-scale databases of hypothetical frameworks, which we illustrate on the PCOD2 database of zeolites containing around 600,000 hypothetical structures.


2019 ◽  
Vol 10 (1) ◽  
Author(s):  
K. T. Schütt ◽  
M. Gastegger ◽  
A. Tkatchenko ◽  
K.-R. Müller ◽  
R. J. Maurer

AbstractMachine learning advances chemistry and materials science by enabling large-scale exploration of chemical space based on quantum chemical calculations. While these models supply fast and accurate predictions of atomistic chemical properties, they do not explicitly capture the electronic degrees of freedom of a molecule, which limits their applicability for reactive chemistry and chemical analysis. Here we present a deep learning framework for the prediction of the quantum mechanical wavefunction in a local basis of atomic orbitals from which all other ground-state properties can be derived. This approach retains full access to the electronic structure via the wavefunction at force-field-like efficiency and captures quantum mechanics in an analytically differentiable representation. On several examples, we demonstrate that this opens promising avenues to perform inverse design of molecular structures for targeting electronic property optimisation and a clear path towards increased synergy of machine learning and quantum chemistry.


2020 ◽  
Author(s):  
Jin Soo Lim ◽  
Jonathan Vandermause ◽  
Matthijs A. van Spronsen ◽  
Albert Musaelian ◽  
Christopher R. O’Connor ◽  
...  

Restructuring of interface plays a crucial role in materials science and heterogeneous catalysis. Bimetallic systems, in particular, often adopt very different composition and morphology at surfaces compared to the bulk. For the first time, we reveal a detailed atomistic picture of the long-timescale restructuring of Pd deposited on Ag, using microscopy, spectroscopy, and novel simulation methods. Encapsulation of Pd by Ag always precedes layer-by-layer dissolution of Pd, resulting in significant Ag migration out of the surface and extensive vacancy pits. These metastable structures are of vital catalytic importance, as Ag-encapsulated Pd remains much more accessible to reactants than bulk-dissolved Pd. The underlying mechanisms are uncovered by performing fast and large-scale machine-learning molecular dynamics, followed by our newly developed method for complete characterization of atomic surface restructuring events. Our approach is broadly applicable to other multimetallic systems of interest and enables the previously impractical mechanistic investigation of restructuring dynamics.


2021 ◽  
Vol 11 (2) ◽  
pp. 472
Author(s):  
Hyeongmin Cho ◽  
Sangkyun Lee

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A62-A62
Author(s):  
Dattatreya Mellacheruvu ◽  
Rachel Pyke ◽  
Charles Abbott ◽  
Nick Phillips ◽  
Sejal Desai ◽  
...  

BackgroundAccurately identified neoantigens can be effective therapeutic agents in both adjuvant and neoadjuvant settings. A key challenge for neoantigen discovery has been the availability of accurate prediction models for MHC peptide presentation. We have shown previously that our proprietary model based on (i) large-scale, in-house mono-allelic data, (ii) custom features that model antigen processing, and (iii) advanced machine learning algorithms has strong performance. We have extended upon our work by systematically integrating large quantities of high-quality, publicly available data, implementing new modelling algorithms, and rigorously testing our models. These extensions lead to substantial improvements in performance and generalizability. Our algorithm, named Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), is integrated into the ImmunoID NeXT Platform®, our immuno-genomics and transcriptomics platform specifically designed to enable the development of immunotherapies.MethodsIn-house immunopeptidomic data was generated using stably transfected HLA-null K562 cells lines that express a single HLA allele of interest, followed by immunoprecipitation using W6/32 antibody and LC-MS/MS. Public immunopeptidomics data was downloaded from repositories such as MassIVE and processed uniformly using in-house pipelines to generate peptide lists filtered at 1% false discovery rate. Other metrics (features) were either extracted from source data or generated internally by re-processing samples utilizing the ImmunoID NeXT Platform.ResultsWe have generated large-scale and high-quality immunopeptidomics data by using approximately 60 mono-allelic cell lines that unambiguously assign peptides to their presenting alleles to create our primary models. Briefly, our primary ‘binding’ algorithm models MHC-peptide binding using peptide and binding pockets while our primary ‘presentation’ model uses additional features to model antigen processing and presentation. Both primary models have significantly higher precision across all recall values in multiple test data sets, including mono-allelic cell lines and multi-allelic tissue samples. To further improve the performance of our model, we expanded the diversity of our training set using high-quality, publicly available mono-allelic immunopeptidomics data. Furthermore, multi-allelic data was integrated by resolving peptide-to-allele mappings using our primary models. We then trained a new model using the expanded training data and a new composite machine learning architecture. The resulting secondary model further improves performance and generalizability across several tissue samples.ConclusionsImproving technologies for neoantigen discovery is critical for many therapeutic applications, including personalized neoantigen vaccines, and neoantigen-based biomarkers for immunotherapies. Our new and improved algorithm (SHERPA) has significantly higher performance compared to a state-of-the-art public algorithm and furthers this objective.


Author(s):  
Mehdi Bouslama ◽  
Leonardo Pisani ◽  
Diogo Haussen ◽  
Raul Nogueira

Introduction : Prognostication is an integral part of clinical decision‐making in stroke care. Machine learning (ML) methods have gained increasing popularity in the medical field due to their flexibility and high performance. Using a large comprehensive stroke center registry, we sought to apply various ML techniques for 90‐day stroke outcome predictions after thrombectomy. Methods : We used individual patient data from our prospectively collected thrombectomy database between 09/2010 and 03/2020. Patients with anterior circulation strokes (Internal Carotid Artery, Middle Cerebral Artery M1, M2, or M3 segments and Anterior Cerebral Artery) and complete records were included. Our primary outcome was 90‐day functional independence (defined as modified Rankin Scale score 0–2). Pre‐ and post‐procedure models were developed. Four known ML algorithms (support vector machine, random forest, gradient boosting, and artificial neural network) were implemented using a 70/30 training‐test data split and 10‐fold cross‐validation on the training data for model calibration. Discriminative performance was evaluated using the area under the receiver operator characteristics curve (AUC) metric. Results : Among 1248 patients with anterior circulation large vessel occlusion stroke undergoing thrombectomy during the study period, 1020 had complete records and were included in the analysis. In the training data (n = 714), 49.3% of the patients achieved independence at 90‐days. Fifteen baseline clinical, laboratory and neuroimaging features were used to develop the pre‐procedural models, with four additional parameters included in the post‐procedure models. For the preprocedural models, the highest AUC was 0.797 (95%CI [0.75‐ 0.85]) for the gradient boosting model. Similarly, the same ML technique performed best on post‐procedural data and had an improved discriminative performance compared to the pre‐procedure model with an AUC of 0.82 (95%CI [0.77‐ 0.87]). Conclusions : Our pre‐and post‐procedural models reliably estimated outcomes in stroke patients undergoing thrombectomy. They represent a step forward in creating simple and efficient prognostication tools to aid treatment decision‐making. A web‐based platform and related mobile app are underway.


2019 ◽  
Author(s):  
Mojtaba Haghighatlari ◽  
Gaurav Vishwakarma ◽  
Mohammad Atif Faiz Afzal ◽  
Johannes Hachmann

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>


Energies ◽  
2021 ◽  
Vol 14 (23) ◽  
pp. 7834
Author(s):  
Christopher Hecht ◽  
Jan Figgener ◽  
Dirk Uwe Sauer

Electric vehicles may reduce greenhouse gas emissions from individual mobility. Due to the long charging times, accurate planning is necessary, for which the availability of charging infrastructure must be known. In this paper, we show how the occupation status of charging infrastructure can be predicted for the next day using machine learning models— Gradient Boosting Classifier and Random Forest Classifier. Since both are ensemble models, binary training data (occupied vs. available) can be used to provide a certainty measure for predictions. The prediction may be used to adapt prices in a high-load scenario, predict grid stress, or forecast available power for smart or bidirectional charging. The models were chosen based on an evaluation of 13 different, typically used machine learning models. We show that it is necessary to know past charging station usage in order to predict future usage. Other features such as traffic density or weather have a limited effect. We show that a Gradient Boosting Classifier achieves 94.8% accuracy and a Matthews correlation coefficient of 0.838, making ensemble models a suitable tool. We further demonstrate how a model trained on binary data can perform non-binary predictions to give predictions in the categories “low likelihood” to “high likelihood”.


2021 ◽  
Author(s):  
Aurore Lafond ◽  
Maurice Ringer ◽  
Florian Le Blay ◽  
Jiaxu Liu ◽  
Ekaterina Millan ◽  
...  

Abstract Abnormal surface pressure is typically the first indicator of a number of problematic events, including kicks, losses, washouts and stuck pipe. These events account for 60–70% of all drilling-related nonproductive time, so their early and accurate detection has the potential to save the industry billions of dollars. Detecting these events today requires an expert user watching multiple curves, which can be costly, and subject to human errors. The solution presented in this paper is aiming at augmenting traditional models with new machine learning techniques, which enable to detect these events automatically and help the monitoring of the drilling well. Today’s real-time monitoring systems employ complex physical models to estimate surface standpipe pressure while drilling. These require many inputs and are difficult to calibrate. Machine learning is an alternative method to predict pump pressure, but this alone needs significant labelled training data, which is often lacking in the drilling world. The new system combines these approaches: a machine learning framework is used to enable automated learning while the physical models work to compensate any gaps in the training data. The system uses only standard surface measurements, is fully automated, and is continuously retrained while drilling to ensure the most accurate pressure prediction. In addition, a stochastic (Bayesian) machine learning technique is used, which enables not only a prediction of the pressure, but also the uncertainty and confidence of this prediction. Last, the new system includes a data quality control workflow. It discards periods of low data quality for the pressure anomaly detection and enables to have a smarter real-time events analysis. The new system has been tested on historical wells using a new test and validation framework. The framework runs the system automatically on large volumes of both historical and simulated data, to enable cross-referencing the results with observations. In this paper, we show the results of the automated test framework as well as the capabilities of the new system in two specific case studies, one on land and another offshore. Moreover, large scale statistics enlighten the reliability and the efficiency of this new detection workflow. The new system builds on the trend in our industry to better capture and utilize digital data for optimizing drilling.


Sign in / Sign up

Export Citation Format

Share Document