Graph Neural Networks Bootstrapped for Synthetic Selection and Validation of Small Molecule Immunomodulators

Combined Molecular Graph Neural Network and Structural Docking Selects Potent Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD-1/PD-L1) Small Molecule Inhibitors

10.26434/chemrxiv.12083907.v1 ◽

2020 ◽

Cited By ~ 1

Author(s):

Prageeth R. Wijewardhane ◽

Krupal P. Jethava ◽

Jonathan A Fine ◽

Gaurav Chopra

Keyword(s):

Neural Network ◽

Machine Learning ◽

Cell Death ◽

Small Molecule ◽

Small Molecule Inhibitors ◽

Molecular Graph ◽

Model Performance ◽

Chemical Diversity ◽

Training Data ◽

Cell Death Protein

The Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD-1/PD-L1) interaction is an immune checkpoint utilized by cancer cells to enhance immune suppression. There exists a huge need to develop small molecules drugs that are fast acting, cheap, and readily bioavailable compared to antibodies. Unfortunately, synthesizing and validating large libraries of small-molecule to inhibit PD-1/PD-L1 interaction in a blind manner is a both time-consuming and expensive. To improve this drug discovery pipeline, we have developed a machine learning methodology trained on patent data to identify, synthesize and validate PD-1/PD-L1 small molecule inhibitors. Our model incorporates two features: docking scores to represent the energy of binding (E) as a global feature and sub-graph features through a graph neural network (GNN) to represent local features. This Energy-Graph Neural Network (EGNN) model outperforms traditional machine learning methods as well as a simple GNN with an average F1 score of 0.997 (± 0.004) suggesting that the topology of the small molecule, the structural interaction in the binding pocket, and chemical diversity of the training data are all important considerations for enhancing model performance. A Bootstrapped EGNN model was used to select compounds for synthesis and experimental validation with predicted high and low potency to inhibit PD-1/PD-L1 interaction. The new potent inhibitor, (4-((3-(2,3-dihydrobenzo[b][1,4]dioxin-6-yl)-2-methylbenzyl)oxy)-2,6-dimethoxybenzyl)-D-serine, is a hybrid of two known bioactive scaffolds, and has an IC50 values of 339.9 nM that is comparatively better than the known bioactive compound. We conclude that our EGNN model can identify active molecules designed by scaffold hopping, a well-known medicinal chemistry technique and will be useful to identify new potent small molecule inhibitors for specific targets.

Download Full-text

Combined Molecular Graph Neural Network and Structural Docking Selects Potent Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD-1/PD-L1) Small Molecule Inhibitors

10.26434/chemrxiv.12083907 ◽

2020 ◽

Author(s):

Prageeth R. Wijewardhane ◽

Krupal P. Jethava ◽

Jonathan A Fine ◽

Gaurav Chopra

Keyword(s):

Neural Network ◽

Machine Learning ◽

Cell Death ◽

Small Molecule ◽

Small Molecule Inhibitors ◽

Molecular Graph ◽

Model Performance ◽

Chemical Diversity ◽

Training Data ◽

Cell Death Protein

The Programmable Cell Death Protein 1/Programmable Death-Ligand 1 (PD-1/PD-L1) interaction is an immune checkpoint utilized by cancer cells to enhance immune suppression. There exists a huge need to develop small molecules drugs that are fast acting, cheap, and readily bioavailable compared to antibodies. Unfortunately, synthesizing and validating large libraries of small-molecule to inhibit PD-1/PD-L1 interaction in a blind manner is a both time-consuming and expensive. To improve this drug discovery pipeline, we have developed a machine learning methodology trained on patent data to identify, synthesize and validate PD-1/PD-L1 small molecule inhibitors. Our model incorporates two features: docking scores to represent the energy of binding (E) as a global feature and sub-graph features through a graph neural network (GNN) to represent local features. This Energy-Graph Neural Network (EGNN) model outperforms traditional machine learning methods as well as a simple GNN with an average F1 score of 0.997 (± 0.004) suggesting that the topology of the small molecule, the structural interaction in the binding pocket, and chemical diversity of the training data are all important considerations for enhancing model performance. A Bootstrapped EGNN model was used to select compounds for synthesis and experimental validation with predicted high and low potency to inhibit PD-1/PD-L1 interaction. The new potent inhibitor, (4-((3-(2,3-dihydrobenzo[b][1,4]dioxin-6-yl)-2-methylbenzyl)oxy)-2,6-dimethoxybenzyl)-D-serine, is a hybrid of two known bioactive scaffolds, and has an IC50 values of 339.9 nM that is comparatively better than the known bioactive compound. We conclude that our EGNN model can identify active molecules designed by scaffold hopping, a well-known medicinal chemistry technique and will be useful to identify new potent small molecule inhibitors for specific targets.

Download Full-text

Machine Learning Parameterization of Mature Tropical Cyclone Boundary Layer

10.5194/egusphere-egu21-9333 ◽

2021 ◽

Author(s):

Le-Yi Wang ◽

Zhe-Min Tan

Keyword(s):

Neural Network ◽

Machine Learning ◽

Boundary Layer ◽

Tropical Cyclone ◽

Field Experiments ◽

Mesoscale Model ◽

Model Performance ◽

Training Data ◽

Extreme Condition ◽

Resolution Data

Tropical cyclone (TC) is among the most destructive weather phenomena on the earth, whose structure and intensity are strongly modulated by TC boundary layer. Mesoscale model used for TC research and prediction must rely on boundary layer parameterization due to low spacial resolution. These boundary layer schemes are mostly developed on field experiments under moderate wind speed. They often underestimate the influence of shear-driven rolls and turbulences. When applied under extreme condition like TC boundary layer, significant bias will be unavoidable. In this study, a novel machine learning model&#8212;one dimensional convolutional neural network (1D-CNN)&#8212;is proposed to tackle the TC boundary layer parameterization dilemma. The 1D-CNN saves about half of the learnable parameters and accomplishes a steady improvement compared to fully-connected neural network. TC large eddy simulation outputs are used as training data of 1D-CNN, which shows strong skewness in calculated turbulent fluxes. The data skewness problem is alleviated in order to reduce 1D-CNN model bias. It is shown in an offline TC boundary layer test that our proposed model, the 1D-CNN, performs significantly better than popular schemes now utilized in TC simulations. Model performance across different scales is essential to final application. It is found that the high resolution data contains the information of low resolution data but not vise versa. The model performance on the extreme data is key to final performance on the whole dataset. Training the model on the highest resolution non-extreme data plus extreme data of different resolutions can secure the robust performance across different scales.

Download Full-text

Performance Evaluation of Deep CNN-Based Crack Detection and Localization Techniques for Concrete Structures

Sensors ◽

10.3390/s21051688 ◽

2021 ◽

Vol 21 (5) ◽

pp. 1688

Author(s):

Luqman Ali ◽

Fady Alnajjar ◽

Hamad Al Jassmi ◽

Munkhjargal Gochoo ◽

Wasif Khan ◽

...

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Crack Detection ◽

Concrete Structures ◽

Model Performance ◽

Training Data ◽

Computational Time ◽

Data Heterogeneity ◽

Public Datasets ◽

Detection And Localization

This paper proposes a customized convolutional neural network for crack detection in concrete structures. The proposed method is compared to four existing deep learning methods based on training data size, data heterogeneity, network complexity, and the number of epochs. The performance of the proposed convolutional neural network (CNN) model is evaluated and compared to pretrained networks, i.e., the VGG-16, VGG-19, ResNet-50, and Inception V3 models, on eight datasets of different sizes, created from two public datasets. For each model, the evaluation considered computational time, crack localization results, and classification measures, e.g., accuracy, precision, recall, and F1-score. Experimental results demonstrated that training data size and heterogeneity among data samples significantly affect model performance. All models demonstrated promising performance on a limited number of diverse training data; however, increasing the training data size and reducing diversity reduced generalization performance, and led to overfitting. The proposed customized CNN and VGG-16 models outperformed the other methods in terms of classification, localization, and computational time on a small amount of data, and the results indicate that these two models demonstrate superior crack detection and localization for concrete structures.

Download Full-text

Possibility of Autonomous Estimation of Shiba Goat’s Estrus and Non-Estrus Behavior by Machine Learning Methods

Animals ◽

10.3390/ani10050771 ◽

2020 ◽

Vol 10 (5) ◽

pp. 771

Author(s):

Toshiya Arakawa

Keyword(s):

Neural Network ◽

Machine Learning ◽

Random Forest ◽

Markov Models ◽

Tracking System ◽

Video Tracking ◽

Training Data ◽

Support Vector ◽

Learning Methods ◽

Machine Learning Methods

Mammalian behavior is typically monitored by observation. However, direct observation requires a substantial amount of effort and time, if the number of mammals to be observed is sufficiently large or if the observation is conducted for a prolonged period. In this study, machine learning methods as hidden Markov models (HMMs), random forests, support vector machines (SVMs), and neural networks, were applied to detect and estimate whether a goat is in estrus based on the goat’s behavior; thus, the adequacy of the method was verified. Goat’s tracking data was obtained using a video tracking system and used to estimate whether they, which are in “estrus” or “non-estrus”, were in either states: “approaching the male”, or “standing near the male”. Totally, the PC of random forest seems to be the highest. However, The percentage concordance (PC) value besides the goats whose data were used for training data sets is relatively low. It is suggested that random forest tend to over-fit to training data. Besides random forest, the PC of HMMs and SVMs is high. However, considering the calculation time and HMM’s advantage in that it is a time series model, HMM is better method. The PC of neural network is totally low, however, if the more goat’s data were acquired, neural network would be an adequate method for estimation.

Download Full-text

Seasonal Prediction of Summer Precipitation in the Middle and Lower Reaches of the Yangtze River Valley: Comparison of Machine Learning and Climate Model Predictions

Water ◽

10.3390/w13223294 ◽

2021 ◽

Vol 13 (22) ◽

pp. 3294

Author(s):

Chentao He ◽

Jiangfeng Wei ◽

Yuanyuan Song ◽

Jing-Jia Luo

Keyword(s):

Neural Network ◽

Machine Learning ◽

Yangtze River ◽

Climate Models ◽

Summer Precipitation ◽

Yangtze River Valley ◽

River Valley ◽

Training Data ◽

The Yangtze River ◽

Learning Methods

The middle and lower reaches of the Yangtze River valley (YRV), which are among the most densely populated regions in China, are subject to frequent flooding. In this study, the predictor importance analysis model was used to sort and select predictors, and five methods (multiple linear regression (MLR), decision tree (DT), random forest (RF), backpropagation neural network (BPNN), and convolutional neural network (CNN)) were used to predict the interannual variation of summer precipitation over the middle and lower reaches of the YRV. Predictions from eight climate models were used for comparison. Of the five tested methods, RF demonstrated the best predictive skill. Starting the RF prediction in December, when its prediction skill was highest, the 70-year correlation coefficient from cross validation of average predictions was 0.473. Using the same five predictors in December 2019, the RF model successfully predicted the YRV wet anomaly in summer 2020, although it had weaker amplitude. It was found that the enhanced warm pool area in the Indian Ocean was the most important causal factor. The BPNN and CNN methods demonstrated the poorest performance. The RF, DT, and climate models all showed higher prediction skills when the predictions start in winter than in early spring, and the RF, DT, and MLR methods all showed better prediction skills than the numerical climate models. Lack of training data was a factor that limited the performance of the machine learning methods. Future studies should use deep learning methods to take full advantage of the potential of ocean, land, sea ice, and other factors for more accurate climate predictions.

Download Full-text

DANNP: an efficient artificial neural network pruning tool

PeerJ Computer Science ◽

10.7717/peerj-cs.137 ◽

2017 ◽

Vol 3 ◽

pp. e137 ◽

Cited By ~ 7

Author(s):

Mona Alshahrani ◽

Othman Soufan ◽

Arturo Magana-Mora ◽

Vladimir B. Bajic

Keyword(s):

Neural Network ◽

State Of The Art ◽

Model Performance ◽

Training Data ◽

Classification Problems ◽

Link Type ◽

On Line ◽

Pruning Algorithms ◽

Artificial Neural ◽

The Impact

Background Artificial neural networks (ANNs) are a robust class of machine learning models and are a frequent choice for solving classification problems. However, determining the structure of the ANNs is not trivial as a large number of weights (connection links) may lead to overfitting the training data. Although several ANN pruning algorithms have been proposed for the simplification of ANNs, these algorithms are not able to efficiently cope with intricate ANN structures required for complex classification problems. Methods We developed DANNP, a web-based tool, that implements parallelized versions of several ANN pruning algorithms. The DANNP tool uses a modified version of the Fast Compressed Neural Network software implemented in C++ to considerably enhance the running time of the ANN pruning algorithms we implemented. In addition to the performance evaluation of the pruned ANNs, we systematically compared the set of features that remained in the pruned ANN with those obtained by different state-of-the-art feature selection (FS) methods. Results Although the ANN pruning algorithms are not entirely parallelizable, DANNP was able to speed up the ANN pruning up to eight times on a 32-core machine, compared to the serial implementations. To assess the impact of the ANN pruning by DANNP tool, we used 16 datasets from different domains. In eight out of the 16 datasets, DANNP significantly reduced the number of weights by 70%–99%, while maintaining a competitive or better model performance compared to the unpruned ANN. Finally, we used a naïve Bayes classifier derived with the features selected as a byproduct of the ANN pruning and demonstrated that its accuracy is comparable to those obtained by the classifiers trained with the features selected by several state-of-the-art FS methods. The FS ranking methodology proposed in this study allows the users to identify the most discriminant features of the problem at hand. To the best of our knowledge, DANNP (publicly available at www.cbrc.kaust.edu.sa/dannp) is the only available and on-line accessible tool that provides multiple parallelized ANN pruning options. Datasets and DANNP code can be obtained at www.cbrc.kaust.edu.sa/dannp/data.php and https://doi.org/10.5281/zenodo.1001086.

Download Full-text

Rapid identification of wood species using XRF and neural network machine learning

Scientific Reports ◽

10.1038/s41598-021-96850-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Aaron N. Shugar ◽

B. Lee Drake ◽

Greg Kelley

Keyword(s):

Neural Network ◽

Machine Learning ◽

Convolutional Neural Network ◽

Wood Species ◽

Cost Effective ◽

Rapid Identification ◽

Invasive Technique ◽

X Ray ◽

Non Invasive ◽

Cost Effective Alternative

AbstractAn innovative approach for the rapid identification of wood species is presented. By combining X-ray fluorescence spectrometry with convolutional neural network machine learning, 48 different wood specimens were clearly differentiated and identified with a 99% accuracy. Wood species identification is imperative to assess illegally logged and transported lumber. Alternative options for identification can be time consuming and require some level of sampling. This non-invasive technique offers a viable, cost-effective alternative to rapidly and accurately identify timber in efforts to support environmental protection laws and regulations.

Download Full-text

Lightweight Convolutional Neural Network Based Intrusion Detection System

Journal of Communications ◽

10.12720/jcm.15.11.808-817 ◽

2020 ◽

pp. 808-817

Author(s):

Vinh Pham ◽

◽

Eunil Seo ◽

Tai-Myoung Chung

Keyword(s):

Neural Network ◽

Machine Learning ◽

Intrusion Detection ◽

Convolutional Neural Network ◽

Network Traffic ◽

Detection System ◽

Image Data ◽

Training Data ◽

Deep Packet Inspection ◽

Detection Model

Identifying threats contained within encrypted network traffic poses a great challenge to Intrusion Detection Systems (IDS). Because traditional approaches like deep packet inspection could not operate on encrypted network traffic, machine learning-based IDS is a promising solution. However, machine learning-based IDS requires enormous amounts of statistical data based on network traffic flow as input data and also demands high computing power for processing, but is slow in detecting intrusions. We propose a lightweight IDS that transforms raw network traffic into representation images. We begin by inspecting the characteristics of malicious network traffic of the CSE-CIC-IDS2018 dataset. We then adapt methods for effectively representing those characteristics into image data. A Convolutional Neural Network (CNN) based detection model is used to identify malicious traffic underlying within image data. To demonstrate the feasibility of the proposed lightweight IDS, we conduct three simulations on two datasets that contain encrypted traffic with current network attack scenarios. The experiment results show that our proposed IDS is capable of achieving 95% accuracy with a reasonable detection time while requiring relatively small size training data.

Download Full-text

Classification among Microaneurysms, Exudates, and Lesion free Retinal Regions in the Eye Images using Transfer Learned CNNs

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4539.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 5508-5512

Keyword(s):

Neural Network ◽

Machine Learning ◽

Diabetic Retinopathy ◽

Glucose Level ◽

Vision Loss ◽

Early Stage ◽

Training Data ◽

Fundus Images ◽

Diabetic Mellitus ◽

Start Process

When pancreas fails to secrete sufficient insulin in the human body, the glucose level in blood either becomes too high or too low. This fluctuation in glucose level affects different body organs such as kidney, brain, and eye. When the complications start appearing in the eyes due to Diabetic Mellitus (DM), it is called Diabetic Retinopathy (DR). DR can be categorized in several classes based on the severity, it can be Microaneurysms (ME), Haemorrhages (HE), Hard and Soft Exudates (EX and SE). DR is a slow start process that starts with very mild symptoms, becomes moderate with the time and results in complete vision loss, if not detected on time. Early-stage detection may greatly bolster in vision loss. However, it is impassable to detect the symptoms of DR with naked eyes. Ophthalmologist harbor to the several approaches and algorithm which makes use of different Machine Learning (ML) methods and classifiers to overcome this disease. The burgeoning insistence of Convolutional Neural Network (CNN) and their advancement in extracting features from different fundus images captivate several researchers to strive on it. Transfer Learning (TL) techniques help to use pre-trained CNN on a dataset that has finite training data, especially that in under developing countries. In this work, we propose several CNN architecture along with distinct classifiers which segregate the different lesions (ME and EX) in DR images with very eye-catching accuracies.

Download Full-text