Adsorption Isotherm Predictions for Multiple Molecules in MOFs Using the Same Deep Learning Model

Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>

2019 ◽  
Author(s):  
Ryther Anderson ◽  
Achay Biong ◽  
Diego Gómez-Gualdrón

<div>Tailoring the structure and chemistry of metal-organic frameworks (MOFs) enables the manipulation of their adsorption properties to suit specific energy and environmental applications. As there are millions of possible MOFs (with tens of thousands already synthesized), molecular simulation, such as grand canonical Monte Carlo (GCMC), has frequently been used to rapidly evaluate the adsorption performance of a large set of MOFs. This allows subsequent experiments to focus only on a small subset of the most promising MOFs. In many instances, however, even molecular simulation becomes prohibitively time consuming, underscoring the need for alternative screening methods, such as machine learning, to precede molecular simulation efforts. In this study, as a proof of concept, we trained a neural network as the first example of a machine learning model capable of predicting full adsorption isotherms of different molecules not included in the training of the model. To achieve this, we trained our neural network only on alchemical species, represented only by their geometry and force field parameters, and used this neural network to predict the loadings of real adsorbates. We focused on predicting room temperature adsorption of small (one- and two-atom) molecules relevant to chemical separations. Namely, argon, krypton, xenon, methane, ethane, and nitrogen. However, we also observed surprisingly promising predictions for more complex molecules, whose properties are outside the range spanned by the alchemical adsorbates. Prediction accuracies suitable for large-scale screening were achieved using simple MOF (e.g. geometric properties and chemical moieties), and adsorbate (e.g. forcefield parameters and geometry) descriptors. Our results illustrate a new philosophy of training that opens the path towards development of machine learning models that can predict the adsorption loading of any new adsorbate at any new operating conditions in any new MOF.</div>


2021 ◽  
pp. 1-18
Author(s):  
Gisela Vanegas ◽  
John Nejedlik ◽  
Pascale Neff ◽  
Torsten Clemens

Summary Forecasting production from hydrocarbon fields is challenging because of the large number of uncertain model parameters and the multitude of observed data that are measured. The large number of model parameters leads to uncertainty in the production forecast from hydrocarbon fields. Changing operating conditions [e.g., implementation of improved oil recovery or enhanced oil recovery (EOR)] results in model parameters becoming sensitive in the forecast that were not sensitive during the production history. Hence, simulation approaches need to be able to address uncertainty in model parameters as well as conditioning numerical models to a multitude of different observed data. Sampling from distributions of various geological and dynamic parameters allows for the generation of an ensemble of numerical models that could be falsified using principal-component analysis (PCA) for different observed data. If the numerical models are not falsified, machine-learning (ML) approaches can be used to generate a large set of parameter combinations that can be conditioned to the different observed data. The data conditioning is followed by a final step ensuring that parameter interactions are covered. The methodology was applied to a sandstone oil reservoir with more than 70 years of production history containing dozens of wells. The resulting ensemble of numerical models is conditioned to all observed data. Furthermore, the resulting posterior-model parameter distributions are only modified from the prior-model parameter distributions if the observed data are informative for the model parameters. Hence, changes in operating conditions can be forecast under uncertainty, which is essential if nonsensitive parameters in the history are sensitive in the forecast.


2019 ◽  
Author(s):  
Mojtaba Haghighatlari ◽  
Gaurav Vishwakarma ◽  
Mohammad Atif Faiz Afzal ◽  
Johannes Hachmann

<div><div><div><p>We present a multitask, physics-infused deep learning model to accurately and efficiently predict refractive indices (RIs) of organic molecules, and we apply it to a library of 1.5 million compounds. We show that it outperforms earlier machine learning models by a significant margin, and that incorporating known physics into data-derived models provides valuable guardrails. Using a transfer learning approach, we augment the model to reproduce results consistent with higher-level computational chemistry training data, but with a considerably reduced number of corresponding calculations. Prediction errors of machine learning models are typically smallest for commonly observed target property values, consistent with the distribution of the training data. However, since our goal is to identify candidates with unusually large RI values, we propose a strategy to boost the performance of our model in the remoter areas of the RI distribution: We bias the model with respect to the under-represented classes of molecules that have values in the high-RI regime. By adopting a metric popular in web search engines, we evaluate our effectiveness in ranking top candidates. We confirm that the models developed in this study can reliably predict the RIs of the top 1,000 compounds, and are thus able to capture their ranking. We believe that this is the first study to develop a data-derived model that ensures the reliability of RI predictions by model augmentation in the extrapolation region on such a large scale. These results underscore the tremendous potential of machine learning in facilitating molecular (hyper)screening approaches on a massive scale and in accelerating the discovery of new compounds and materials, such as organic molecules with high-RI for applications in opto-electronics.</p></div></div></div>


10.2196/18142 ◽  
2020 ◽  
Vol 8 (9) ◽  
pp. e18142
Author(s):  
Ramin Mohammadi ◽  
Mursal Atif ◽  
Amanda Jayne Centi ◽  
Stephen Agboola ◽  
Kamal Jethwani ◽  
...  

Background It is well established that lack of physical activity is detrimental to the overall health of an individual. Modern-day activity trackers enable individuals to monitor their daily activities to meet and maintain targets. This is expected to promote activity encouraging behavior, but the benefits of activity trackers attenuate over time due to waning adherence. One of the key approaches to improving adherence to goals is to motivate individuals to improve on their historic performance metrics. Objective The aim of this work was to build a machine learning model to predict an achievable weekly activity target by considering (1) patterns in the user’s activity tracker data in the previous week and (2) behavior and environment characteristics. By setting realistic goals, ones that are neither too easy nor too difficult to achieve, activity tracker users can be encouraged to continue to meet these goals, and at the same time, to find utility in their activity tracker. Methods We built a neural network model that prescribes a weekly activity target for an individual that can be realistically achieved. The inputs to the model were user-specific personal, social, and environmental factors, daily step count from the previous 7 days, and an entropy measure that characterized the pattern of daily step count. Data for training and evaluating the machine learning model were collected over a duration of 9 weeks. Results Of 30 individuals who were enrolled, data from 20 participants were used. The model predicted target daily count with a mean absolute error of 1545 (95% CI 1383-1706) steps for an 8-week period. Conclusions Artificial intelligence applied to physical activity data combined with behavioral data can be used to set personalized goals in accordance with the individual’s level of activity and thereby improve adherence to a fitness tracker; this could be used to increase engagement with activity trackers. A follow-up prospective study is ongoing to determine the performance of the engagement algorithm.


Symmetry ◽  
2021 ◽  
Vol 13 (12) ◽  
pp. 2293
Author(s):  
Zixiang Yue ◽  
Youliang Ding ◽  
Hanwei Zhao ◽  
Zhiwen Wang

A cable-stayed bridge is a typical symmetrical structure, and symmetry affects the deformation characteristics of such bridges. The main girder of a cable-stayed bridge will produce obvious deflection under the inducement of temperature. The regression model of temperature-induced deflection is hoped to provide a comparison value for bridge evaluation. Based on the temperature and deflection data obtained by the health monitoring system of a bridge, establishing the correlation model between temperature and temperature-induced deflection is meaningful. It is difficult to complete a high-quality model only by the girder temperature. The temperature features based on prior knowledge from the mechanical mechanism are used as the input information in this paper. At the same time, to strengthen the nonlinear ability of the model, this paper selects an independent recurrent neural network (IndRNN) for modeling. The deep learning neural network is compared with machine learning neural networks to prove the advancement of deep learning. When only the average temperature of the main girder is input, the calculation accuracy is not high regardless of whether the deep learning network or the machine learning network is used. When the temperature information extracted by the prior knowledge is input, the average error of IndRNN model is only 2.53%, less than those of BPNN model and traditional RNN. Combining knowledge with deep learning is undoubtedly the best modeling scheme. The deep learning model can provide a comparison value of bridge deformation for bridge management.


Water ◽  
2021 ◽  
Vol 13 (19) ◽  
pp. 2664
Author(s):  
Sunil Saha ◽  
Jagabandhu Roy ◽  
Tusar Kanti Hembram ◽  
Biswajeet Pradhan ◽  
Abhirup Dikshit ◽  
...  

The efficiency of deep learning and tree-based machine learning approaches has gained immense popularity in various fields. One deep learning model viz. convolution neural network (CNN), artificial neural network (ANN) and four tree-based machine learning models, namely, alternative decision tree (ADTree), classification and regression tree (CART), functional tree and logistic model tree (LMT), were used for landslide susceptibility mapping in the East Sikkim Himalaya region of India, and the results were compared. Landslide areas were delimited and mapped as landslide inventory (LIM) after gathering information from historical records and periodic field investigations. In LIM, 91 landslides were plotted and classified into training (64 landslides) and testing (27 landslides) subsets randomly to train and validate the models. A total of 21 landslide conditioning factors (LCFs) were considered as model inputs, and the results of each model were categorised under five susceptibility classes. The receiver operating characteristics curve and 21 statistical measures were used to evaluate and prioritise the models. The CNN deep learning model achieved the priority rank 1 with area under the curve of 0.918 and 0.933 by using the training and testing data, quantifying 23.02% and 14.40% area as very high and highly susceptible followed by ANN, ADtree, CART, FTree and LMT models. This research might be useful in landslide studies, especially in locations with comparable geophysical and climatological characteristics, to aid in decision making for land use planning.


2017 ◽  
Vol 24 (1) ◽  
pp. 3-37 ◽  
Author(s):  
SANDRA KÜBLER ◽  
CAN LIU ◽  
ZEESHAN ALI SAYYED

AbstractWe investigate feature selection methods for machine learning approaches in sentiment analysis. More specifically, we use data from the cooking platform Epicurious and attempt to predict ratings for recipes based on user reviews. In machine learning approaches to such tasks, it is a common approach to use word or part-of-speech n-grams. This results in a large set of features, out of which only a small subset may be good indicators for the sentiment. One of the questions we investigate concerns the extension of feature selection methods from a binary classification setting to a multi-class problem. We show that an inherently multi-class approach, multi-class information gain, outperforms ensembles of binary methods. We also investigate how to mitigate the effects of extreme skewing in our data set by making our features more robust and by using review and recipe sampling. We show that over-sampling is the best method for boosting performance on the minority classes, but it also results in a severe drop in overall accuracy of at least 6 per cent points.


Sign in / Sign up

Export Citation Format

Share Document