OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design

Deep learning models have demonstrated outstanding results in many data-rich areas of research, such as computer vision and natural language processing. Currently, there is a rise of deep learning in computational chemistry and materials informatics, where deep learning could be effectively applied in modeling the relationship between chemical structures and their properties. With the immense growth of chemical and materials data, deep learning models can begin to outperform conventional machine learning techniques such as random forest, support vector machines, nearest neighbor, etc. Herein, we introduce OpenChem, a PyTorch-based deep learning toolkit for computational chemistry and drug design. OpenChem offers easy and fast model development, modular software design, and several data preprocessing modules. It is freely available via the GitHub repository.

Download Full-text

OpenChem: A Deep Learning Toolkit for Computational Chemistry and Drug Design

10.26434/chemrxiv.12691943 ◽

2020 ◽

Author(s):

Mariya Popova ◽

Boris Ginsburg ◽

Alexander Tropsha ◽

Olexandr Isayev

Keyword(s):

Deep Learning ◽

Drug Design ◽

Computational Chemistry ◽

Language Processing ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Support Vector ◽

Materials Informatics ◽

Learning Models ◽

Chemical Structures

Download Full-text

The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification

10.3233/faia210317 ◽

2021 ◽

Author(s):

Benjamin Clavié ◽

Marc Alphonsus

Keyword(s):

Deep Learning ◽

Language Processing ◽

Text Classification ◽

Traditional Approach ◽

Error Reduction ◽

Support Vector ◽

Learning Models ◽

Legal Text ◽

Classification Tasks ◽

Legal Domain

We aim to highlight an interesting trend to contribute to the ongoing debate around advances within legal Natural Language Processing. Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. In this paper, we show that a more traditional approach based on Support Vector Machine classifiers reaches competitive performance with deep learning models. We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. We discuss some hypotheses for these results to support future discussions.

Download Full-text

Evaluating Deep Learning models for predicting ALK-5 inhibition

PLoS ONE ◽

10.1371/journal.pone.0246126 ◽

2021 ◽

Vol 16 (1) ◽

pp. e0246126

Author(s):

Gabriel Z. Espinoza ◽

Rafaela M. Angelo ◽

Patricia R. Oliveira ◽

Kathia M. Honorio

Keyword(s):

Neural Network ◽

Biological Activity ◽

Deep Learning ◽

Deep Neural Network ◽

External Validation ◽

Machine Learning Techniques ◽

Coefficient Of Determination ◽

Support Vector ◽

Learning Models ◽

Alk 5

Computational methods have been widely used in drug design. The recent developments in machine learning techniques and the ever-growing chemical and biological databases are fertile ground for discoveries in this area. In this study, we evaluated the performance of Deep Learning models in comparison to Random Forest, and Support Vector Regression for predicting the biological activity (pIC50) of ALK-5 inhibitors as candidates to treat cancer. The generalization power of the models was assessed by internal and external validation procedures. A deep neural network model obtained the best performance in this comparative study, achieving a coefficient of determination of 0.658 on the external validation set with mean square error and mean absolute error of 0.373 and 0.450, respectively. Additionally, the relevance of the chemical descriptors for the prediction of biological activity was estimated using Permutation Importance. We can conclude that the forecast model obtained by the deep neural network is suitable for the problem and can be employed to predict the biological activity of new ALK-5 inhibitors.

Download Full-text

ASA: A framework for Arabic sentiment analysis

Journal of Information Science ◽

10.1177/0165551519849516 ◽

2019 ◽

Vol 46 (4) ◽

pp. 544-559 ◽

Cited By ~ 4

Author(s):

Ahmed Oussous ◽

Fatima-Zahra Benjelloun ◽

Ayoub Ait Lahcen ◽

Samir Belfkih

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Short Term Memory ◽

Research Area ◽

Support Vector ◽

Learning Models ◽

Arabic Natural Language Processing ◽

Arabic Sentiment Analysis

Sentiment analysis (SA), also known as opinion mining, is a growing important research area. Generally, it helps to automatically determine if a text expresses a positive, negative or neutral sentiment. It enables to mine the huge increasing resources of shared opinions such as social networks, review sites and blogs. In fact, SA is used by many fields and for various languages such as English and Arabic. However, since Arabic is a highly inflectional and derivational language, it raises many challenges. In fact, SA of Arabic text should handle such complex morphology. To better handle these challenges, we decided to provide the research community and Arabic users with a new efficient framework for Arabic Sentiment Analysis (ASA). Our primary goal is to improve the performance of ASA by exploiting deep learning while varying the preprocessing techniques. For that, we implement and evaluate two deep learning models namely convolutional neural network (CNN) and long short-term memory (LSTM) models. The framework offers various preprocessing techniques for ASA (including stemming, normalisation, tokenization and stop words). As a result of this work, we first provide a new rich and publicly available Arabic corpus called Moroccan Sentiment Analysis Corpus (MSAC). Second, the proposed framework demonstrates improvement in ASA. In fact, the experimental results prove that deep learning models have a better performance for ASA than classical approaches (support vector machines, naive Bayes classifiers and maximum entropy). They also show the key role of morphological features in Arabic Natural Language Processing (NLP).

Download Full-text

Deep Learning Application to Ensemble Learning—The Simple, but Effective, Approach to Sentiment Classifying

Applied Sciences ◽

10.3390/app9132760 ◽

2019 ◽

Vol 9 (13) ◽

pp. 2760 ◽

Cited By ~ 4

Author(s):

Khai Tran ◽

Thi Phan

Keyword(s):

Deep Learning ◽

Sentiment Analysis ◽

Ensemble Learning ◽

Language Processing ◽

Short Term Memory ◽

Learning Model ◽

Sentiment Classification ◽

Machine Learning Techniques ◽

Support Vector ◽

Deep Learning Model

Sentiment analysis is an active research area in natural language processing. The task aims at identifying, extracting, and classifying sentiments from user texts in post blogs, product reviews, or social networks. In this paper, the ensemble learning model of sentiment classification is presented, also known as CEM (classifier ensemble model). The model contains various data feature types, including language features, sentiment shifting, and statistical techniques. A deep learning model is adopted with word embedding representation to address explicit, implicit, and abstract sentiment factors in textual data. The experiments conducted based on different real datasets found that our sentiment classification system is better than traditional machine learning techniques, such as Support Vector Machines and other ensemble learning systems, as well as the deep learning model, Long Short-Term Memory network, which has shown state-of-the-art results for sentiment analysis in almost corpuses. Our model’s distinguishing point consists in its effective application to different languages and different domains.

Download Full-text

Classification of Parkinson’s disease and essential tremor based on balance and gait characteristics from wearable motion sensors via machine learning techniques: a data-driven approach

Journal of NeuroEngineering and Rehabilitation ◽

10.1186/s12984-020-00756-5 ◽

2020 ◽

Vol 17 (1) ◽

Author(s):

Sanghee Moon ◽

Hyun-Je Song ◽

Vibhash D. Sharma ◽

Kelly E. Lyons ◽

Rajesh Pahwa ◽

...

Keyword(s):

Machine Learning ◽

Nearest Neighbor ◽

Machine Learning Techniques ◽

Gradient Boosting ◽

Support Vector ◽

Motion Sensors ◽

Learning Models ◽

K Nearest Neighbor ◽

Gait Characteristics ◽

Machine Learning Models

Abstract Background Parkinson’s disease (PD) and essential tremor (ET) are movement disorders that can have similar clinical characteristics including tremor and gait difficulty. These disorders can be misdiagnosed leading to delay in appropriate treatment. The aim of the study was to determine whether balance and gait variables obtained with wearable inertial motion sensors can be utilized to differentiate between PD and ET using machine learning. Additionally, we compared classification performances of several machine learning models. Methods This retrospective study included balance and gait variables collected during the instrumented stand and walk test from people with PD (n = 524) and with ET (n = 43). Performance of several machine learning techniques including neural networks, support vector machine, k-nearest neighbor, decision tree, random forest, and gradient boosting, were compared with a dummy model or logistic regression using F1-scores. Results Machine learning models classified PD and ET based on balance and gait characteristics better than the dummy model (F1-score = 0.48) or logistic regression (F1-score = 0.53). The highest F1-score was 0.61 of neural network, followed by 0.59 of gradient boosting, 0.56 of random forest, 0.55 of support vector machine, 0.53 of decision tree, and 0.49 of k-nearest neighbor. Conclusions This study demonstrated the utility of machine learning models to classify different movement disorders based on balance and gait characteristics collected from wearable sensors. Future studies using a well-balanced data set are needed to confirm the potential clinical utility of machine learning models to discern between PD and ET.

Download Full-text

Comparison of Deep Transfer Learning Techniques in Human Skin Burns Discrimination

Applied System Innovation ◽

10.3390/asi3020020 ◽

2020 ◽

Vol 3 (2) ◽

pp. 20 ◽

Cited By ~ 3

Author(s):

Aliyu Abubakar ◽

Mohammed Ajuji ◽

Ibrahim Usman Yahya

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Transfer Learning ◽

Fine Tuning ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Skin Injuries ◽

Learning Techniques ◽

Injured Skin

While visual assessment is the standard technique for burn evaluation, computer-aided diagnosis is increasingly sought due to high number of incidences globally. Patients are increasingly facing challenges which are not limited to shortage of experienced clinicians, lack of accessibility to healthcare facilities and high diagnostic cost. Certain number of studies were proposed in discriminating burn and healthy skin using machine learning leaving a huge and important gap unaddressed; whether burns and related skin injuries can be effectively discriminated using machine learning techniques. Therefore, we specifically use transfer learning by leveraging pre-trained deep learning models due to deficient dataset in this paper, to discriminate two classes of skin injuries—burnt skin and injured skin. Experiments were extensively conducted using three state-of-the-art pre-trained deep learning models that includes ResNet50, ResNet101 and ResNet152 for image patterns extraction via two transfer learning strategies—fine-tuning approach where dense and classification layers were modified and trained with features extracted by base layers and in the second approach support vector machine (SVM) was used to replace top-layers of the pre-trained models, trained using off-the-shelf features from the base layers. Our proposed approach records near perfect classification accuracy in categorizing burnt skin ad injured skin of approximately 99.9%.

Download Full-text

Comparison of Deep Transfer Learning Techniques in Human Skin Burns Discrimination

10.20944/preprints202003.0204.v1 ◽

2020 ◽

Author(s):

Aliyu Abubakar ◽

Mohammed Ajuji ◽

Ibrahim Usman Yahya

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Learning Strategies ◽

Transfer Learning ◽

Standard Technique ◽

Fine Tuning ◽

Machine Learning Techniques ◽

Support Vector ◽

Learning Models ◽

Learning Techniques

While visual assessment is the standard technique for burn evaluation, computer-aided diagnosis is increasingly sought due to high number of incidences globally. Patients are increasingly facing challenges which are not limited to shortage of experienced clinicians, lack of accessibility to healthcare facilities, and high diagnostic cost. Certain number of studies were proposed in discriminating burn and healthy skin using machine learning leaving a huge and important gap unaddressed; whether burns and related skin injuries can be effectively discriminated using machine learning techniques. Therefore, we specifically use pre-trained deep learning models due to deficient dataset to train a new model from scratch. Experiments were extensively conducted using three state-of-the-art pre-trained deep learning models that includes ResNet50, ResNet101 and ResNet152 for image patterns extraction via two transfer learning strategies: fine-tuning approach where dense and classification layers were modified and trained with features extracted by base layers, and in the second approach support vector machine (SVM) was used to replace top-layers of the pre-trained models, trained using off-the-shelf features from the base layers. Our proposed approach records near perfect classification accuracy of approximately 99.9%.

Download Full-text

Improving the Accuracy of Protein-Ligand Binding Affinity Prediction by Deep Learning Models: Benchmark and Model

10.26434/chemrxiv.9866912 ◽

2019 ◽

Author(s):

Mohammad Rezaei ◽

Yanjun Li ◽

Xiaolin Li ◽

Chenglong Li

Keyword(s):

Deep Learning ◽

Drug Design ◽

Binding Affinity ◽

Benchmark Dataset ◽

Rational Drug Design ◽

Learning Models ◽

Structure Based Drug Design ◽

Binding Affinity Prediction ◽

Affinity Prediction ◽

Rational Drug

Introduction: The ability to discriminate among ligands binding to the same protein target in terms of their relative binding affinity lies at the heart of structure-based drug design. Any improvement in the accuracy and reliability of binding affinity prediction methods decreases the discrepancy between experimental and computational results. Objectives: The primary objectives were to find the most relevant features affecting binding affinity prediction, least use of manual feature engineering, and improving the reliability of binding affinity prediction using efficient deep learning models by tuning the model hyperparameters. Methods: The binding site of target proteins was represented as a grid box around their bound ligand. Both binary and distance-dependent occupancies were examined for how an atom affects its neighbor voxels in this grid. A combination of different features including ANOLEA, ligand elements, and Arpeggio atom types were used to represent the input. An efficient convolutional neural network (CNN) architecture, DeepAtom, was developed, trained and tested on the PDBbind v2016 dataset. Additionally an extended benchmark dataset was compiled to train and evaluate the models. Results: The best DeepAtom model showed an improved accuracy in the binding affinity prediction on PDBbind core subset (Pearson’s R=0.83) and is better than the recent state-of-the-art models in this field. In addition when the DeepAtom model was trained on our proposed benchmark dataset, it yields higher correlation compared to the baseline which confirms the value of our model. Conclusions: The promising results for the predicted binding affinities is expected to pave the way for embedding deep learning models in virtual screening and rational drug design fields.

Download Full-text

Application of Machine Learning Approaches for the Design and Study of Anticancer Drugs

Current Drug Targets ◽

10.2174/1389450119666180809122244 ◽

2019 ◽

Vol 20 (5) ◽

pp. 488-500 ◽

Cited By ~ 6

Author(s):

Yan Hu ◽

Yi Lu ◽

Shuo Wang ◽

Mengying Zhang ◽

Xiaosheng Qu ◽

...

Keyword(s):

Machine Learning ◽

Drug Design ◽

Anticancer Drugs ◽

Nearest Neighbor ◽

Cost Effective ◽

Support Vector ◽

Learning Approaches ◽

K Nearest Neighbor ◽

Activity Prediction ◽

Linear Discriminant

Background: Globally the number of cancer patients and deaths are continuing to increase yearly, and cancer has, therefore, become one of the world's highest causes of morbidity and mortality. In recent years, the study of anticancer drugs has become one of the most popular medical topics. Objective: In this review, in order to study the application of machine learning in predicting anticancer drugs activity, some machine learning approaches such as Linear Discriminant Analysis (LDA), Principal components analysis (PCA), Support Vector Machine (SVM), Random forest (RF), k-Nearest Neighbor (kNN), and Naïve Bayes (NB) were selected, and the examples of their applications in anticancer drugs design are listed. Results: Machine learning contributes a lot to anticancer drugs design and helps researchers by saving time and is cost effective. However, it can only be an assisting tool for drug design. Conclusion: This paper introduces the application of machine learning approaches in anticancer drug design. Many examples of success in identification and prediction in the area of anticancer drugs activity prediction are discussed, and the anticancer drugs research is still in active progress. Moreover, the merits of some web servers related to anticancer drugs are mentioned.

Download Full-text