A Deep Neural Network Model for Packing Density Predictions and its Application in the Study of 1.5 Million Organic Molecules

<pre>The process of developing new compounds and materials is increasingly driven by computational modeling and simulation, which allow us to characterize candidates before pursuing them in the laboratory. One of the non-trivial properties of interest for organic materials is their packing in the bulk, which is highly dependent on their molecular structure. By controlling the latter, we can realize materials with a desired density (as well as other target properties). Molecular dynamics simulations are a popular and reasonably accurate way to compute the bulk density of molecules, however, since these calculations are computationally intensive, they are not a practically viable option for high-throughput screening studies that assess material candidates on a massive scale. In this work, we employ machine learning to develop a data-derived prediction model that is an alternative to physics-based simulations, and we utilize it for the hyperscreening of 1.5 million small organic molecules as well as to gain insights into the relationship between structural makeup and packing density.We also use this study to analyze the learning curve of the employed neural network approach and gain empirical data on the dependence of model performance and training data size, which will inform future investigations.</pre>

Download Full-text

A Deep Neural Network Model for Packing Density Predictions and its Application in the Study of 1.5 Million Organic Molecules

10.26434/chemrxiv.8217758.v2 ◽

2019 ◽

Author(s):

Mohammad Atif Faiz Afzal ◽

Aditya Sonpal ◽

Mojtaba Haghighatlari ◽

Andrew J. Schultz ◽

Johannes Hachmann

Keyword(s):

Neural Network ◽

High Throughput Screening ◽

Organic Molecules ◽

Model Performance ◽

Training Data ◽

Neural Network Approach ◽

Massive Scale ◽

Computationally Intensive ◽

Computational Modeling And Simulation ◽

Dynamics Simulations

<pre>The process of developing new compounds and materials is increasingly driven by computational modeling and simulation, which allow us to characterize candidates before pursuing them in the laboratory. One of the non-trivial properties of interest for organic materials is their packing in the bulk, which is highly dependent on their molecular structure. By controlling the latter, we can realize materials with a desired density (as well as other target properties). Molecular dynamics simulations are a popular and reasonably accurate way to compute the bulk density of molecules, however, since these calculations are computationally intensive, they are not a practically viable option for high-throughput screening studies that assess material candidates on a massive scale. In this work, we employ machine learning to develop a data-derived prediction model that is an alternative to physics-based simulations, and we utilize it for the hyperscreening of 1.5 million small organic molecules as well as to gain insights into the relationship between structural makeup and packing density.We also use this study to analyze the learning curve of the employed neural network approach and gain empirical data on the dependence of model performance and training data size, which will inform future investigations.</pre>

Download Full-text

Performance Evaluation of Deep CNN-Based Crack Detection and Localization Techniques for Concrete Structures

Sensors ◽

10.3390/s21051688 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1688

Author(s):

Luqman Ali ◽

Fady Alnajjar ◽

Hamad Al Jassmi ◽

Munkhjargal Gochoo ◽

Wasif Khan ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Crack Detection ◽

Concrete Structures ◽

Model Performance ◽

Training Data ◽

Computational Time ◽

Data Heterogeneity ◽

Public Datasets ◽

Detection And Localization

This paper proposes a customized convolutional neural network for crack detection in concrete structures. The proposed method is compared to four existing deep learning methods based on training data size, data heterogeneity, network complexity, and the number of epochs. The performance of the proposed convolutional neural network (CNN) model is evaluated and compared to pretrained networks, i.e., the VGG-16, VGG-19, ResNet-50, and Inception V3 models, on eight datasets of different sizes, created from two public datasets. For each model, the evaluation considered computational time, crack localization results, and classification measures, e.g., accuracy, precision, recall, and F1-score. Experimental results demonstrated that training data size and heterogeneity among data samples significantly affect model performance. All models demonstrated promising performance on a limited number of diverse training data; however, increasing the training data size and reducing diversity reduced generalization performance, and led to overfitting. The proposed customized CNN and VGG-16 models outperformed the other methods in terms of classification, localization, and computational time on a small amount of data, and the results indicate that these two models demonstrate superior crack detection and localization for concrete structures.

Download Full-text

A deep neural network model for packing density predictions and its application in the study of 1.5 million organic molecules

Chemical Science ◽

10.1039/c9sc02677k ◽

2019 ◽

Vol 10 (36) ◽

pp. 8374-8383 ◽

Cited By ~ 1

Author(s):

Mohammad Atif Faiz Afzal ◽

Aditya Sonpal ◽

Mojtaba Haghighatlari ◽

Andrew J. Schultz ◽

Johannes Hachmann

Keyword(s):

Neural Network ◽

Machine Learning ◽

Refractive Index ◽

High Throughput ◽

Neural Network Model ◽

High Throughput Screening ◽

Deep Neural Network ◽

Organic Molecules ◽

High Refractive Index ◽

Computational Pipeline

Computational pipeline for the accelerated discovery of organic materials with high refractive index via high-throughput screening and machine learning.

Download Full-text

DANNP: an efficient artificial neural network pruning tool

PeerJ Computer Science ◽

10.7717/peerj-cs.137 ◽

2017 ◽

Vol 3 ◽

pp. e137 ◽

Cited By ~ 7

Author(s):

Mona Alshahrani ◽

Othman Soufan ◽

Arturo Magana-Mora ◽

Vladimir B. Bajic

Keyword(s):

Neural Network ◽

State Of The Art ◽

Model Performance ◽

Training Data ◽

Classification Problems ◽

Link Type ◽

On Line ◽

Pruning Algorithms ◽

Artificial Neural ◽

The Impact

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text

Object Detection in Ground-Penetrating Radar Images Using a Deep Convolutional Neural Network and Image Set Preparation by Migration

International Journal of Geophysics ◽

10.1155/2018/9365184 ◽

2018 ◽

Vol 2018 ◽

pp. 1-8 ◽

Cited By ~ 3

Author(s):

Kazuya Ishitsuka ◽

Shinichiro Iso ◽

Kyosuke Onishi ◽

Toshifumi Matsuoka

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Ground Penetrating Radar ◽

Deep Convolutional Neural Network ◽

Training Data ◽

Neural Network Approach ◽

Radar Images ◽

Image Set ◽

Ground Penetrating ◽

Better Than

Ground-penetrating radar allows the acquisition of many images for investigation of the pavement interior and shallow geological structures. Accordingly, an efficient methodology of detecting objects, such as pipes, reinforcing steel bars, and internal voids, in ground-penetrating radar images is an emerging technology. In this paper, we propose using a deep convolutional neural network to detect characteristic hyperbolic signatures from embedded objects. As a first step, we developed a migration-based method to collect many training data and created 53510 categorized images. We then examined the accuracy of the deep convolutional neural network in detecting the signatures. The accuracy of the classification was 0.945 (94.5%)–0.979 (97.9%) when using several thousands of training images and was much better than the accuracy of the conventional neural network approach. Our results demonstrate the effectiveness of the deep convolutional neural network in detecting characteristic events in ground-penetrating radar images.

Download Full-text

THE STAG OIL FIELD FORMATION EVALUATION: A NEURAL NETWORK APPROACH

The APPEA Journal ◽

10.1071/aj98026 ◽

1999 ◽

Vol 39 (1) ◽

pp. 451 ◽

Cited By ~ 5

Author(s):

H. Crocker ◽

C.C. Fung ◽

K.W. Wong

Keyword(s):

Neural Network ◽

Back Propagation ◽

Back Propagation Neural Network ◽

Oil Field ◽

Training Data ◽

Ann Model ◽

Neural Network Approach ◽

Data Set ◽

Formation Evaluation ◽

Core Data

The producing M. australis Sandstone of the Stag Oil Field is a bioturbated glauconitic sandstone that is difficult to evaluate using conventional methods. Well log and core data are available for the Stag Field and for the nearby Centaur–1 well. Eight wells have log data; six also have core data.In the past few years artificial intelligence has been applied to formation evaluation. In particular, artificial neural networks (ANN) used to match log and core data have been studied. The ANN approach has been used to analyse the producing Stag Field sands. In this paper, new ways of applying the ANN are reported. Results from simple ANN approach are unsatisfactory. An integrated ANN approach comprising the unsupervised Self-Organising Map (SOM) and the Supervised Back Propagation Neural Network (BPNN) appears to give a more reasonable analysis.In this case study the mineralogical and petrophysical characteristics of a cored well are predicted from the 'training' data set of the other cored wells in the field. The prediction from the ANN model is then used for comparison with the known core data. In this manner, the accuracy of the prediction is determined and a prediction qualifier computed.This new approach to formation evaluation should provide a match between log and core data that may be used to predict the characteristics of a similar uncored interval. Although the results for the Stag Field are satisfactory, further study applying the method to other fields is required.

Download Full-text

Modeling of methane emissions using artificial neural network approach

Journal of the Serbian Chemical Society ◽

10.2298/jsc020414110s ◽

2015 ◽

Vol 80 (3) ◽

pp. 421-433 ◽

Cited By ~ 4

Author(s):

Lidija Stamenkovic ◽

Davor Antanasijevic ◽

Mirjana Ristic ◽

Aleksandra Peric-Grujic ◽

Viktor Pocajt

Keyword(s):

Neural Network ◽

National Level ◽

Model Performance ◽

General Regression Neural Network ◽

Annual Data ◽

Ann Model ◽

Neural Network Approach ◽

Ch4 Emissions ◽

Artificial Neural ◽

Mlr Model

The aim of this study was to develop a model for forecasting CH4 emissions at the national level, using Artificial Neural Networks (ANN) with broadly available sustainability, economical and industrial indicators as their inputs. ANN modeling was performed using two different types of architecture; a Backpropagation Neural Network (BPNN) and a General Regression Neural Network (GRNN). A conventional multiple linear regression (MLR) model was also developed in order to compare model performance and assess which model provides the best results. ANN and MLR models were developed and tested using the same annual data for 20 European countries. The ANN model demonstrated very good performance, significantly better than the MLR model. It was shown that a forecast of CH4 emissions at the national level using the ANN model can be made successfully and accurately for a future period of up to two years, thereby opening the possibility to apply such a modeling technique which can be used to support the implementation of sustainable development strategies and environmental management policies.

Download Full-text

Development of a deep neural network for predicting 6-hour average PM<sub>2.5</sub> concentrations up to two subsequent days using various training data

10.5194/gmd-2021-356 ◽

2021 ◽

Author(s):

Jeong-Beom Lee ◽

Jae-Bum Lee ◽

Youn-Seo Koo ◽

Hee-Yong Kwon ◽

Min-Hyeok Choi ◽

...

Keyword(s):

Neural Network ◽

Air Quality ◽

Deep Neural Network ◽

Model Performance ◽

Prediction Performance ◽

Training Data ◽

Forecasting Model ◽

Observation Data ◽

Forecast Data ◽

Air Quality Forecasting

Abstract. This study aims to develop a deep neural network (DNN) model as an artificial neural network (ANN) for the prediction of 6-hour average fine particulate matter (PM2.5) concentrations for a three-day period—the day of prediction (D+0), one day after prediction (D+1) and two days after prediction (D+2)—using observation data and forecast data obtained via numerical models. The performance of the DNN model was comparatively evaluated against that of the currently operational Community Multiscale Air Quality (CMAQ) modelling system for air quality forecasting in South Korea. In addition, the effect on predictive performance of the DNN model on using different training data was analyzed. For the D+0 forecast, the DNN model performance was superior to that of the CMAQ model, and there was no significant dependence on the training data. For the D+1 and D+2 forecasts, the DNN model that used the observation and forecast data (DNN-ALL) outperformed the CMAQ model. The root-mean-squared error (RMSE) of DNN-ALL was lower than that of the CMAQ model by 2.2 μgm−3, and 3.0 μgm−3 for the D+1 and D+2 forecasts, respectively, because the overprediction of higher concentrations was curtailed. An IOA increase of 0.46 for D+1 prediction and 0.59 for the D+2 prediction was observed in case of the DNN-ALL model compared to the IOA of the DNN model that used only observation data (DNN-OBS). In additionally, An RMSE decrease of 7.2 μgm−3 for the D+1 prediction and 6.3 μgm−3 for the D+2 prediction was observed in case of the DNN-ALL model, compared to the RMSE of DNN-OBS, indicating that the inclusion of forecast data in the training data greatly affected the DNN model performance. Considering the prediction of the 6-hour average PM2.5 concentration, the 8.8 μgm−3 RMSE of the DNN-ALL model was 2.7 μgm−3 lower than that of the CMAQ model, indicating the superior prediction performance of the former. These results suggest that the DNN model could be utilized as a better-performing air quality forecasting model than the CMAQ, and that observation data plays an important role in determining the prediction performance of the DNN model for D+0 forecasting, while prediction data does the same for D+1 and D+2 forecasting. The use of the proposed DNN model as a forecasting model may result in a reduction in the economic losses caused by pollution-mitigation policies and aid better protection of public health.

Download Full-text

Graph Neural Networks Bootstrapped for Synthetic Selection and Validation of Small Molecule Immunomodulators

10.33774/chemrxiv-2021-r4xnx-v2 ◽

2021 ◽

Author(s):

Prageeth R. Wijewardhane ◽

Krupal P. Jethava ◽

Jonathan A Fine ◽

Gaurav Chopra

Keyword(s):

Neural Network ◽

Machine Learning ◽

Small Molecule ◽

Model Performance ◽

Cost Effective ◽

Bioactive Compound ◽

Binding Pocket ◽

Chemical Diversity ◽

Training Data ◽

Kappa Score

The Programmed Cell Death Protein 1/Programmed Death-Ligand 1 (PD-1/PD-L1) interaction is an immune checkpoint utilized by cancer cells to enhance immune suppression. There is a huge need to develop small molecule drugs that are fast acting, cost effective, and readily bioavailable compared to antibodies. Unfortunately, synthesizing and validating large libraries of small- molecules to inhibit PD-1/PD-L1 interaction in a blind manner is both time-consuming and expensive. To improve this drug discovery pipeline, we have developed a machine learning methodology trained on patent data to identify, synthesize, and validate PD-1/PD-L1 small molecule inhibitors. Our model incorporates two features: docking scores to represent the energy of binding (E) as a global feature and sub-graph features through a graph neural network (GNN) of molecular topology to represent local features. This interaction energy-based Graph Neural Network (EGNN) model outperforms traditional machine learning methods and a simple GNN with a F1 score of 0.9524 and Cohen’s kappa score of 0.8861 for the hold out test set, suggesting that the topology of the small molecule, the structural interaction in the binding pocket, and chemical diversity of the training data are all important considerations for enhancing model performance. A Bootstrapped EGNN model was used to select compounds for synthesis and experimental validation with predicted high and low potency to inhibit PD-1/PD-L1 interaction. The potent inhibitor, (4-((3-(2,3-dihydrobenzo[b][1,4]dioxin-6-yl)-2- methylbenzyl)oxy)-2,6-dimethoxybenzyl)-D-serine, is a hybrid of two known bioactive scaffolds, with an IC50 of 339.9 nM that is comparatively better than the known bioactive compound. We conclude that our bootstrapped EGNN model will be useful to identify target-specific high potency molecules designed by scaffold hopping, a well-known medicinal chemistry technique.

Download Full-text