kGCN: A Graph-Based Deep Learning Framework for Chemical Structures

<div>Deep learning is developing as an important technology to perform various tasks in cheminformatics. In particular, graph convolutional neural networks (GCNs) have been reported to perform well in many types of prediction tasks related to molecules. Although GCN exhibits considerable potential in various applications, appropriate utilization of this resource for obtaining reasonable and reliable prediction results requires thorough understanding of GCN and programming. To leverage the power of GCN to benefit various users from chemists to cheminformaticians, an open-source GCN tool, kGCN, is introduced. To support the users with various levels of programming skills, kGCN includes three interfaces: a graphical user interface (GUI) employing KNIME for users with limited programming skills such as chemists, as well as command-line and Python library interfaces for users with advanced programming skills such as cheminformaticians. To support the three steps required for building a prediction model, i.e., pre-processing, model tuning, and interpretation of results, kGCN includes functions of typical pre-processing, Bayesian optimization for automatic model tuning, and visualization of the atomic contribution to prediction for interpretation of results. kGCN supports three types of approaches, single-task, multi-task, and multimodal predictions. The prediction of compound-protein interaction for four matrixmetalloproteases, MMP-3, -9, -12 and -13, in the inhibition assays is performed as a representative case study using kGCN. Additionally, kGCN provides the visualization of atomic contributions to the prediction. Such visualization is useful for the validation of the prediction models and the design of molecules based on the prediction model, realizing “explainable AI” for understanding the factors affecting AI prediction. kGCN is available at https://github.com/clinfo/kGCN.</div>

Download Full-text

kGCN: A Graph-Based Deep Learning Framework for Chemical Structures

10.26434/chemrxiv.11859684 ◽

2020 ◽

Author(s):

Ryosuke Kojima ◽

Shoichi Ishida ◽

Masateru Ohta ◽

Hiroaki Iwata ◽

Teruki Honma ◽

...

Keyword(s):

Deep Learning ◽

Prediction Model ◽

Prediction Models ◽

Bayesian Optimization ◽

Factors Affecting ◽

Learning Framework ◽

Chemical Structures ◽

Programming Skills ◽

Model Tuning ◽

Interpretation Of Results

Download Full-text

kGCN: A graph-based deep learning framework for chemical structures

10.21203/rs.2.23904/v1 ◽

2020 ◽

Author(s):

Ryosuke Kojima ◽

Shoichi Ishida ◽

Masateru Ohta ◽

Hiroaki Iwata ◽

Teruki Honma ◽

...

Keyword(s):

Deep Learning ◽

Prediction Model ◽

Prediction Models ◽

Bayesian Optimization ◽

Factors Affecting ◽

Learning Framework ◽

Chemical Structures ◽

Programming Skills ◽

Model Tuning ◽

Interpretation Of Results

Abstract Deep learning is developing as an important technology to perform various tasks in cheminformatics. In particular, graph convolutional neural networks (GCNs) have been reported to perform well in many types of prediction tasks related to molecules. Although GCN exhibits considerable potential in various applications, appropriate utilization of this resource for obtaining reasonable and reliable prediction results requires thorough understanding of GCN and programming. To leverage the power of GCN to benefit various users from chemists to cheminformaticians, an open-source GCN tool, kGCN, is introduced. To support the users with various levels of programming skills, kGCN includes three interfaces: a graphical user interface (GUI) employing KNIME for users with limited programming skills such as chemists, as well as command-line and Python library interfaces for users with advanced programming skills such as cheminformaticians. To support the three steps required for building a prediction model, i.e., pre-processing, model tuning, and interpretation of results, kGCN includes functions of typical pre-processing, Bayesian optimization for automatic model tuning, and visualization of the atomic contribution to prediction for interpretation of results. kGCN supports three types of approaches, single-task, multi-task, and multi-modal predictions. The prediction of compound-protein interaction for four matrixmetalloproteases, MMP-3, -9, -12 and -13, in the inhibition assays is performed as a representative case study using kGCN. Additionally, kGCN provides the visualization of atomic contributions to the prediction. Such visualization is useful for the validation of the prediction models and the design of molecules based on the prediction model, realizing “explainable AI” for understanding the factors affecting AI prediction. kGCN is available at https://github.com/clinfo/kGCN.

Download Full-text

Enabling deeper learning on big data for materials informatics applications

Scientific Reports ◽

10.1038/s41598-021-83193-1 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Dipendra Jha ◽

Vishu Gupta ◽

Logan Ward ◽

Zijiang Yang ◽

Christopher Wolverton ◽

...

Keyword(s):

Neural Networks ◽

Big Data ◽

Deep Learning ◽

Deep Neural Networks ◽

Materials Science ◽

Prediction Models ◽

Model Performance ◽

Materials Informatics ◽

Learning Framework ◽

Significant Attention

AbstractThe application of machine learning (ML) techniques in materials science has attracted significant attention in recent years, due to their impressive ability to efficiently extract data-driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector-based materials representation as input to build accurate property prediction models. We find that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to significantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.

Download Full-text

Molecular Image-Based Prediction Models of Nuclear Receptor Agonists and Antagonists Using the DeepSnap-Deep Learning Approach with the Tox21 10K Library

Molecules ◽

10.3390/molecules25122764 ◽

2020 ◽

Vol 25 (12) ◽

pp. 2764

Author(s):

Yasunari Matsuzaka ◽

Yoshihiro Uesawa

Keyword(s):

Deep Learning ◽

Signaling Pathways ◽

High Performance ◽

Prediction Models ◽

Environmental Chemicals ◽

Adverse Outcome Pathway ◽

Toxicological Evaluation ◽

Chemical Structures ◽

Molecular Image ◽

Wide Range

The interaction of nuclear receptors (NRs) with chemical compounds can cause dysregulation of endocrine signaling pathways, leading to adverse health outcomes due to the disruption of natural hormones. Thus, identifying possible ligands of NRs is a crucial task for understanding the adverse outcome pathway (AOP) for human toxicity as well as the development of novel drugs. However, the experimental assessment of novel ligands remains expensive and time-consuming. Therefore, an in silico approach with a wide range of applications instead of experimental examination is highly desirable. The recently developed novel molecular image-based deep learning (DL) method, DeepSnap-DL, can produce multiple snapshots from three-dimensional (3D) chemical structures and has achieved high performance in the prediction of chemicals for toxicological evaluation. In this study, we used DeepSnap-DL to construct prediction models of 35 agonist and antagonist allosteric modulators of NRs for chemicals derived from the Tox21 10K library. We demonstrate the high performance of DeepSnap-DL in constructing prediction models. These findings may aid in interpreting the key molecular events of toxicity and support the development of new fields of machine learning to identify environmental chemicals with the potential to interact with NR signaling pathways.

Download Full-text

kGCN: a graph-based deep learning framework for chemical structures

Journal of Cheminformatics ◽

10.1186/s13321-020-00435-6 ◽

2020 ◽

Vol 12 (1) ◽

Author(s):

Ryosuke Kojima ◽

Shoichi Ishida ◽

Masateru Ohta ◽

Hiroaki Iwata ◽

Teruki Honma ◽

...

Keyword(s):

Deep Learning ◽

Learning Framework ◽

Chemical Structures

Download Full-text

Construction of a prediction model for drug removal rate in hemodialysis based on chemical structures

Molecular Diversity ◽

10.1007/s11030-021-10348-7 ◽

2022 ◽

Author(s):

Kousuke Nishikiori ◽

Kentaro Tanaka ◽

Yoshihiro Uesawa

Keyword(s):

Prediction Model ◽

Model Validation ◽

Prediction Models ◽

Removal Rate ◽

Structural Characteristics ◽

Estimation Method ◽

Quantitative Structure Activity Relationship ◽

Pharmacokinetic Parameters ◽

Qsar Analysis ◽

Chemical Structures

Abstract In designing drug dosing for hemodialysis patients, the removal rate (RR) of the drug by hemodialysis is important. However, acquiring the RR is difficult, and there is a need for an estimation method that can be used in clinical settings. In this study, the RR predictive model was constructed using the RR of known drugs by quantitative structure–activity relationship (QSAR) analysis. Drugs were divided into a model construction drug set (75%) and a model validation drug set (25%). The RR was collected from 143 medicines. The objective variable (RR) and chemical structural characteristics (descriptors) of the drug (explanatory variable) were used to construct a prediction model using partial least squares (PLS) regression and artificial neural network (ANN) analyses. The determination coefficients in the PLS and ANN methods were 0.586 and 0.721 for the model validation drug set, respectively. QSAR analysis successfully constructed dialysis RR prediction models that were comparable or superior to those using pharmacokinetic parameters. Considering that the RR dataset contains potential errors, we believe that this study has achieved the most reliable RR prediction accuracy currently available. These predictive RR models can be achieved using only the chemical structure of the drug. This model is expected to be applied at the time of hemodialysis. Graphic Abstract

Download Full-text

Towards Inferring Nanopore Sequencing Ionic Currents from Nucleotide Chemical Structures

10.1101/2020.11.30.404947 ◽

2020 ◽

Author(s):

Hongxu Ding ◽

Ioannis Anastopoulos ◽

Andrew D. Bailey ◽

Joshua Stuart ◽

Benedict Paten

Keyword(s):

Deep Learning ◽

De Novo ◽

Ionic Currents ◽

Chemical Information ◽

Nanopore Sequencing ◽

Convolutional Network ◽

Methyl Group ◽

Learning Framework ◽

Chemical Structures

ABSTRACTThe characteristic ionic currents of nucleotide kmers are commonly used in analyzing nanopore sequencing readouts. We present a graph convolutional network-based deep learning framework for predicting kmer characteristic ionic currents from corresponding chemical structures. We show such a framework can generalize the chemical information of the 5-methyl group from thymine to cytosine by correctly predicting 5-methylcytosine-containing DNA 6mers, thus shedding light on the de novo detection of nucleotide modifications.

Download Full-text

Role of FCBF Feature Selection in Educational Data Mining

Mehran University Research Journal of Engineering and Technology ◽

10.22581/muet1982.2004.09 ◽

2020 ◽

Vol 39 (4) ◽

pp. 772-778

Author(s):

Maryam Zaffar ◽

Manzoor Ahmad Hashmani ◽

K.S. Savita ◽

Syed Sajjad Hussain Rizvi ◽

Mubashar Rehman

Keyword(s):

Data Mining ◽

Feature Selection ◽

Prediction Model ◽

Student Performance ◽

Performance Prediction ◽

Prediction Models ◽

Educational Data Mining ◽

Action Plans ◽

Factors Affecting ◽

Academic Organization

The Educational Data Mining (EDM) is a very vigorous area of Data Mining (DM), and it is helpful in predicting the performance of students. Student performance prediction is not only important for the student but also helpful for academic organization to detect the causes of success and failures of students. Furthermore, the features selected through the students’ performance prediction models helps in developing action plans for academic welfare. Feature selection can increase the prediction accuracy of the prediction model. In student performance prediction model, where every feature is very important, as a neglection of any important feature can cause the wrong development of academic action plans. Moreover, the feature selection is a very important step in the development of student performance prediction models. There are different types of feature selection algorithms. In this paper, Fast Correlation-Based Filter (FCBF) is selected as a feature selection algorithm. This paper is a step on the way to identifying the factors affecting the academic performance of the students. In this paper performance of FCBF is being evaluated on three different student’s datasets. The performance of FCBF is detected well on a student dataset with greater no of features.

Download Full-text

RPITER: A Hierarchical Deep Learning Framework for ncRNA–Protein Interaction Prediction

International Journal of Molecular Sciences ◽

10.3390/ijms20051070 ◽

2019 ◽

Vol 20 (5) ◽

pp. 1070 ◽

Cited By ~ 12

Author(s):

Cheng Peng ◽

Siyu Han ◽

Hui Zhang ◽

Ying Li

Keyword(s):

Neural Network ◽

Deep Learning ◽

Protein Interaction ◽

Rna Binding ◽

Prediction Models ◽

Rna Binding Proteins ◽

Biological Research ◽

Sequence Information ◽

Learning Framework ◽

Coding Method

Non-coding RNAs (ncRNAs) play crucial roles in multiple fundamental biological processes, such as post-transcriptional gene regulation, and are implicated in many complex human diseases. Mostly ncRNAs function by interacting with corresponding RNA-binding proteins. The research on ncRNA–protein interaction is the key to understanding the function of ncRNA. However, the biological experiment techniques for identifying RNA–protein interactions (RPIs) are currently still expensive and time-consuming. Due to the complex molecular mechanism of ncRNA–protein interaction and the lack of conservation for ncRNA, especially for long ncRNA (lncRNA), the prediction of ncRNA–protein interaction is still a challenge. Deep learning-based models have become the state-of-the-art in a range of biological sequence analysis problems due to their strong power of feature learning. In this study, we proposed a hierarchical deep learning framework RPITER to predict RNA–protein interaction. For sequence coding, we improved the conjoint triad feature (CTF) coding method by complementing more primary sequence information and adding sequence structure information. For model design, RPITER employed two basic neural network architectures of convolution neural network (CNN) and stacked auto-encoder (SAE). Comprehensive experiments were performed on five benchmark datasets from PDB and NPInter databases to analyze and compare the performances of different sequence coding methods and prediction models. We found that CNN and SAE deep learning architectures have powerful fitting abilities for the k-mer features of RNA and protein sequence. The improved CTF coding method showed performance gain compared with the original CTF method. Moreover, our designed RPITER performed well in predicting RNA–protein interaction (RPI) and could outperform most of the previous methods. On five widely used RPI datasets, RPI369, RPI488, RPI1807, RPI2241 and NPInter, RPITER obtained A U C of 0.821, 0.911, 0.990, 0.957 and 0.985, respectively. The proposed RPITER could be a complementary method for predicting RPI and constructing RPI network, which would help push forward the related biological research on ncRNAs and lncRNAs.

Download Full-text

Prediction of Pathologic Complete Response to Neoadjuvant Chemotherapy Using Machine Learning Models in Patients with Breast Cancer

10.21203/rs.3.rs-217080/v1 ◽

2021 ◽

Author(s):

Ji-Yeon Kim ◽

Eunjoo Jeon ◽

Soonhwan Kwon ◽

Hyungsik Jung ◽

Sunghoon Joo ◽

...

Keyword(s):

Breast Cancer ◽

Machine Learning ◽

Neoadjuvant Chemotherapy ◽

Prediction Model ◽

Prediction Models ◽

Locally Advanced ◽

Pathologic Complete Response ◽

Complete Response ◽

Bayesian Optimization ◽

Pathological Characteristics

Abstract BackgroundThe aim of this study was to develop a machine learning(ML) based model to accurately predict pathologic complete response(pCR) to neoadjuvant chemotherapy(NAC) using pretreatment clinical and pathological characteristics of electronic medical record(EMR) data in breast cancer(BC).Methods The EMR data from patients diagnosed with early and locally advanced BC and who received NAC followed by curative surgery were reviewed. A total of 16 clinical and pathological characteristics was selected to develop ML model. We practiced six ML models using default settings for multivariate analysis with extracted variables. ResultsIn total, 2,065 patients were included in this analysis. Overall, 30.6% (n=632) of patients achieved pCR. Among six ML models, the LightGBM had the highest area under the curve (AUC) for pCR prediction. After hyper-parameter tuning with Bayesian optimization, AUC was 0.810. Performance of pCR prediction models in different histology-based subtypes was compared. The AUC was highest in HR+HER2- subgroup and lowest in HR-/HER2- subgroup (HR+/HER2- 0.841, HR+/HER2+ 0.716, HR-/HER2 0.753, HR-/HER2- 0.653).ConclusionsA ML based pCR prediction model using pre-treatment clinical and pathological characteristics provided useful information to predict pCR during NAC. This prediction model would help to determine treatment strategy in patients with BC planned NAC.

Download Full-text