scholarly journals Applying Machine Learning Techniques to Predict the Properties of Energetic Materials

Author(s):  
Daniel Elton ◽  
Zois Boukouvalas ◽  
Mark S. Butrico ◽  
Mark D. Fuge ◽  
Peter W. Chung

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.

2018 ◽  
Author(s):  
Daniel Elton ◽  
Zois Boukouvalas ◽  
Mark S. Butrico ◽  
Mark D. Fuge ◽  
Peter W. Chung

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, Bag of Bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.


2018 ◽  
Author(s):  
Daniel Elton ◽  
Zois Boukouvalas ◽  
Mark S. Butrico ◽  
Mark D. Fuge ◽  
Peter W. Chung

We present a proof of concept that machine learning techniques can be used to predict the properties of CNOHF energetic molecules from their molecular structures. We focus on a small but diverse dataset consisting of 109 molecular structures spread across ten compound classes. Up until now, candidate molecules for energetic materials have been screened using predictions from expensive quantum simulations and thermochemical codes. We present a comprehensive comparison of machine learning models and several molecular featurization methods - sum over bonds, custom descriptors, Coulomb matrices, bag of bonds, and fingerprints. The best featurization was sum over bonds (bond counting), and the best model was kernel ridge regression. Despite having a small data set, we obtain acceptable errors and Pearson correlations for the prediction of detonation pressure, detonation velocity, explosive energy, heat of formation, density, and other properties out of sample. By including another dataset with 309 additional molecules in our training we show how the error can be pushed lower, although the convergence with number of molecules is slow. Our work paves the way for future applications of machine learning in this domain, including automated lead generation and interpreting machine learning models to obtain novel chemical insights.


2020 ◽  
Vol 28 (2) ◽  
pp. 253-265 ◽  
Author(s):  
Gabriela Bitencourt-Ferreira ◽  
Amauri Duarte da Silva ◽  
Walter Filgueira de Azevedo

Background: The elucidation of the structure of cyclin-dependent kinase 2 (CDK2) made it possible to develop targeted scoring functions for virtual screening aimed to identify new inhibitors for this enzyme. CDK2 is a protein target for the development of drugs intended to modulate cellcycle progression and control. Such drugs have potential anticancer activities. Objective: Our goal here is to review recent applications of machine learning methods to predict ligand- binding affinity for protein targets. To assess the predictive performance of classical scoring functions and targeted scoring functions, we focused our analysis on CDK2 structures. Methods: We have experimental structural data for hundreds of binary complexes of CDK2 with different ligands, many of them with inhibition constant information. We investigate here computational methods to calculate the binding affinity of CDK2 through classical scoring functions and machine- learning models. Results: Analysis of the predictive performance of classical scoring functions available in docking programs such as Molegro Virtual Docker, AutoDock4, and Autodock Vina indicated that these methods failed to predict binding affinity with significant correlation with experimental data. Targeted scoring functions developed through supervised machine learning techniques showed a significant correlation with experimental data. Conclusion: Here, we described the application of supervised machine learning techniques to generate a scoring function to predict binding affinity. Machine learning models showed superior predictive performance when compared with classical scoring functions. Analysis of the computational models obtained through machine learning could capture essential structural features responsible for binding affinity against CDK2.


2021 ◽  
Vol 11 (3) ◽  
pp. 1323
Author(s):  
Medard Edmund Mswahili ◽  
Min-Jeong Lee ◽  
Gati Lother Martin ◽  
Junghyun Kim ◽  
Paul Kim ◽  
...  

Cocrystals are of much interest in industrial application as well as academic research, and screening of suitable coformers for active pharmaceutical ingredients is the most crucial and challenging step in cocrystal development. Recently, machine learning techniques are attracting researchers in many fields including pharmaceutical research such as quantitative structure-activity/property relationship. In this paper, we develop machine learning models to predict cocrystal formation. We extract descriptor values from simplified molecular-input line-entry system (SMILES) of compounds and compare the machine learning models by experiments with our collected data of 1476 instances. As a result, we found that artificial neural network shows great potential as it has the best accuracy, sensitivity, and F1 score. We also found that the model achieved comparable performance with about half of the descriptors chosen by feature selection algorithms. We believe that this will contribute to faster and more accurate cocrystal development.


Author(s):  
Pratik Vyas ◽  
Diptangshu Pandit

The use of machine learning techniques in predictive health care is on the rise with minimal data used for training machine-learning models to derive high accuracy predictions. In this paper, we propose such a system, which utilizes Heart Rate Variability (HRV) as features for training machine learning models. This paper further benchmarks the usefulness of HRV as features calculated from basic heart-rate data using a window shifting method. The benchmarking has been conducted using different machine-learning classifiers such as artificial neural network, decision tree, k-nearest neighbour and naive bays classifier. Empirical results using MIT-BIH Arrhythmia database shows that the proposed system can be used for highly efficient predictability of abnormality in heartbeat data series.


2020 ◽  
Vol 2 (2) ◽  
pp. 106-119
Author(s):  
Subasish Das ◽  
Minh Le ◽  
Boya Dai

Abstract Crash occurrence is a complex phenomenon, and crashes associated with pedestrians and bicyclists are even more complex. Furthermore, pedestrian- and bicyclist-involved crashes are typically not reported in detail in state or national crash databases. To address this issue, developers created the Pedestrian and Bicycle Crash Analysis Tool (PBCAT). However, it is labour-intensive to manually identify the types of pedestrian and bicycle crash from crash-narrative reports and to classify different crash attributes from the textual content of police reports. Therefore, there is a need for a supporting tool that can assist practitioners in using PBCAT more efficiently and accurately. The objective of this study is to develop a framework for applying machine-learning models to classify crash types from unstructured textual content. In this study, the research team collected pedestrian crash-typing data from two locations in Texas. The XGBoost model was found to be the best classifier. The high prediction power of the XGBoost classifiers indicates that this machine-learning technique was able to classify pedestrian crash types with the highest accuracy rate (up to 77% for training data and 72% for test data). The findings demonstrate that advanced machine-learning models can extract underlying patterns and trends of crash mechanisms. This provides the basis for applying machine-learning techniques in addressing the crash typing issues associated with non-motorist crashes.


2020 ◽  
Vol 98 (Supplement_4) ◽  
pp. 44-45
Author(s):  
Dan Tulpan

Abstract This is a hands-on workshop offered as a pre-conference training opportunity for researchers interested in applying machine learning techniques to animal science datasets with the purpose of classifying, clustering, performing linear and non-linear regressions or selecting a subset of features relevant to further studies. The objective of this workshop is to provide the audience with a way to formulate a problem such that it will be solvable by machine learning techniques and apply an exploratory analysis of various machine learning on different datasets. The workshop is structured in a hands-on format and includes a brief overview of basic notions about machine learning, a description of relevant models and evaluation metrics followed by a practical session. The practical session requires each attendee to bring their own laptop and have already installed the Waikato Environment for Knowledge Analysis (Weka) workbench for machine learning available from https://www.cs.waikato.ac.nz/ml/weka/ and all freely available machine learning models. The Weka installation of freely available machine learning models can be achieved by using the Weka Package Manager available from the Tools menu in the main application. Detailed information will be provided 2 weeks before the beginning of the workshop (week of July 5, 2020) at the following URL:http://animalbiosciences.uoguelph.ca/~dtulpan/conferences/asas2020_mlworkshop/


Author(s):  
Antonio Bella ◽  
Cèsar Ferri ◽  
José Hernández-Orallo ◽  
María José Ramírez-Quintana

The evaluation of machine learning models is a crucial step before their application because it is essential to assess how well a model will behave for every single case. In many real applications, not only is it important to know the “total” or the “average” error of the model, it is also important to know how this error is distributed and how well confidence or probability estimations are made. Many current machine learning techniques are good in overall results but have a bad distribution assessment of the error. For these cases, calibration techniques have been developed as postprocessing techniques in order to improve the probability estimation or the error distribution of an existing model. This chapter presents the most common calibration techniques and calibration measures. Both classification and regression are covered, and a taxonomy of calibration techniques is established. Special attention is given to probabilistic classifier calibration.


2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Talal S. Qaid ◽  
Hussein Mazaar ◽  
Mohammad Yahya H. Al-Shamri ◽  
Mohammed S. Alqahtani ◽  
Abeer A. Raweh ◽  
...  

The COVID-19 pandemic has had a significant impact on public life and health worldwide, putting the world’s healthcare systems at risk. The first step in stopping this outbreak is to detect the infection in its early stages, which will relieve the risk, control the outbreak’s spread, and restore full functionality to the world’s healthcare systems. Currently, PCR is the most prevalent diagnosis tool for COVID-19. However, chest X-ray images may play an essential role in detecting this disease, as they are successful for many other viral pneumonia diseases. Unfortunately, there are common features between COVID-19 and other viral pneumonia, and hence manual differentiation between them seems to be a critical problem and needs the aid of artificial intelligence. This research employs deep- and transfer-learning techniques to develop accurate, general, and robust models for detecting COVID-19. The developed models utilize either convolutional neural networks or transfer-learning models or hybridize them with powerful machine-learning techniques to exploit their full potential. For experimentation, we applied the proposed models to two data sets: the COVID-19 Radiography Database from Kaggle and a local data set from Asir Hospital, Abha, Saudi Arabia. The proposed models achieved promising results in detecting COVID-19 cases and discriminating them from normal and other viral pneumonia with excellent accuracy. The hybrid models extracted features from the flatten layer or the first hidden layer of the neural network and then fed these features into a classification algorithm. This approach enhanced the results further to full accuracy for binary COVID-19 classification and 97.8% for multiclass classification.


2021 ◽  
Author(s):  
Salman Sadeg Deumah ◽  
Wahib Ali Yahya ◽  
Abbas Mohamed Al-khudafi ◽  
Khaled Saeed Ba-Jaalah ◽  
Waleed Tawfeeq Al-Absi

Abstract Gas viscosity is an important physical property that controls and influences the flow of gas through porous media and pipe networks. An accurate gas viscosity model is essential for use with reservoir and process simulators. The objective of this study is to assess the predictability of gas viscosity of Yemeni gas fields using machine learning techniques. Performance of some machine learning techniques in the prediction of gas viscosity investigated in this work. The techniques include K-nearest neighbors (KNN), Random Forest (RF), Multiple Linear Regression (MLR), and Decision Tree (DT). About 440 data points were collected from different Yemeni gas fields were used to develop the machine-learning model. The input data used in the training include pressure, temperature, gas density, specific gravity, gas formation volume factor, gas deviation factor, gas molecular weight, pseudo-reduced temperature and pressure, pseudo-critical temperature and pressure, and non-hydrocarbon gas components (N2, CO2, and H2S). Part of the data (75%) was used to train the developed models using the algorithms while another part of the data (25%) was used to predict the viscosity of gas for samples. Trained machine learning models were constructed using the Python programming language. The performance and accuracy of the machine learning models were tested and compared their results based on four different functional input datasets. The result of this study found that that the DT model predicted the gas viscosity with higher accuracy, and gave very good results better than other models based on input parameters of the dataset (A) and (B). This was evidenced by lower the Root mean square error (0.000832), lower mean absolute percent relative error (0.042%), and higher coefficient of determination (R2=0.9465). The proposed approach in the present study provides an accurate and inexpensive model for estimating the viscosity of gases as a function of all input parameters of the dataset (A). Overall, the relative effects of these different input parameters have verified that the gas viscosity has the uppermost relevant to the gas density and specific gravity that have the highest percentage of 51%.


Sign in / Sign up

Export Citation Format

Share Document