scholarly journals Prediction of chemical compounds properties using a deep learning model

Author(s):  
Mykola Galushka ◽  
Chris Swain ◽  
Fiona Browne ◽  
Maurice D. Mulvenna ◽  
Raymond Bond ◽  
...  

AbstractThe discovery of new medications in a cost-effective manner has become the top priority for many pharmaceutical companies. Despite decades of innovation, many of their processes arguably remain relatively inefficient. One such process is the prediction of biological activity. This paper describes a new deep learning model, capable of conducting a preliminary screening of chemical compounds in-silico. The model has been constructed using a variation autoencoder to generate chemical compound fingerprints, which have been used to create a regression model to predict their LogD property and a classification model to predict binding in selected assays from the ChEMBL dataset. The conducted experiments demonstrate accurate prediction of the properties of chemical compounds only using structural definitions and also provide several opportunities to improve upon this model in the future.

2021 ◽  
Vol 2021 ◽  
pp. 1-11
Author(s):  
Dapeng Lang ◽  
Deyun Chen ◽  
Ran Shi ◽  
Yongjun He

Deep learning has been widely used in the field of image classification and image recognition and achieved positive practical results. However, in recent years, a number of studies have found that the accuracy of deep learning model based on classification greatly drops when making only subtle changes to the original examples, thus realizing the attack on the deep learning model. The main methods are as follows: adjust the pixels of attack examples invisible to human eyes and induce deep learning model to make the wrong classification; by adding an adversarial patch on the detection target, guide and deceive the classification model to make it misclassification. Therefore, these methods have strong randomness and are of very limited use in practical application. Different from the previous perturbation to traffic signs, our paper proposes a method that is able to successfully hide and misclassify vehicles in complex contexts. This method takes into account the complex real scenarios and can perturb with the pictures taken by a camera and mobile phone so that the detector based on deep learning model cannot detect the vehicle or misclassification. In order to improve the robustness, the position and size of the adversarial patch are adjusted according to different detection models by introducing the attachment mechanism. Through the test of different detectors, the patch generated in the single target detection algorithm can also attack other detectors and do well in transferability. Based on the experimental part of this paper, the proposed algorithm is able to significantly lower the accuracy of the detector. Affected by the real world, such as distance, light, angles, resolution, etc., the false classification of the target is realized by reducing the confidence level and background of the target, which greatly perturbs the detection results of the target detector. In COCO Dataset 2017, it reveals that the success rate of this algorithm reaches 88.7%.


Genes ◽  
2021 ◽  
Vol 12 (5) ◽  
pp. 717
Author(s):  
Arslan Siraj ◽  
Dae Yeong Lim ◽  
Hilal Tayara ◽  
Kil To Chong

Protein ubiquitylation is an essential post-translational modification process that performs a critical role in a wide range of biological functions, even a degenerative role in certain diseases, and is consequently used as a promising target for the treatment of various diseases. Owing to the significant role of protein ubiquitylation, these sites can be identified by enzymatic approaches, mass spectrometry analysis, and combinations of multidimensional liquid chromatography and tandem mass spectrometry. However, these large-scale experimental screening techniques are time consuming, expensive, and laborious. To overcome the drawbacks of experimental methods, machine learning and deep learning-based predictors were considered for prediction in a timely and cost-effective manner. In the literature, several computational predictors have been published across species; however, predictors are species-specific because of the unclear patterns in different species. In this study, we proposed a novel approach for predicting plant ubiquitylation sites using a hybrid deep learning model by utilizing convolutional neural network and long short-term memory. The proposed method uses the actual protein sequence and physicochemical properties as inputs to the model and provides more robust predictions. The proposed predictor achieved the best result with accuracy values of 80% and 81% and F-scores of 79% and 82% on the 10-fold cross-validation and an independent dataset, respectively. Moreover, we also compared the testing of the independent dataset with popular ubiquitylation predictors; the results demonstrate that our model significantly outperforms the other methods in prediction classification results.


2018 ◽  
Author(s):  
Yu Li ◽  
Zhongxiao Li ◽  
Lizhong Ding ◽  
Yuhui Hu ◽  
Wei Chen ◽  
...  

ABSTRACTMotivationIn most biological data sets, the amount of data is regularly growing and the number of classes is continuously increasing. To deal with the new data from the new classes, one approach is to train a classification model, e.g., a deep learning model, from scratch based on both old and new data. This approach is highly computationally costly and the extracted features are likely very different from the ones extracted by the model trained on the old data alone, which leads to poor model robustness. Another approach is to fine tune the trained model from the old data on the new data. However, this approach often does not have the ability to learn new knowledge without forgetting the previously learned knowledge, which is known as the catastrophic forgetting problem. To our knowledge, this problem has not been studied in the field of bioinformatics despite its existence in many bioinformatic problems.ResultsHere we propose a novel method, SupportNet, to solve the catastrophic forgetting problem efficiently and effectively. SupportNet combines the strength of deep learning and support vector machine (SVM), where SVM is used to identify the support data from the old data, which are fed to the deep learning model together with the new data for further training so that the model can review the essential information of the old data when learning the new information. Two powerful consolidation regularizers are applied to ensure the robustness of the learned model. Comprehensive experiments on various tasks, including enzyme function prediction, subcellular structure classification and breast tumor classification, show that SupportNet drastically outperforms the state-of-the-art incremental learning methods and reaches similar performance as the deep learning model trained from scratch on both old and new data.AvailabilityOur program is accessible at: https://github.com/lykaust15/SupportNet.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Jeong-Hun Yoo ◽  
Han-Gyeol Yeom ◽  
WooSang Shin ◽  
Jong Pil Yun ◽  
Jong Hyun Lee ◽  
...  

AbstractThis paper proposes a convolutional neural network (CNN)-based deep learning model for predicting the difficulty of extracting a mandibular third molar using a panoramic radiographic image. The applied dataset includes a total of 1053 mandibular third molars from 600 preoperative panoramic radiographic images. The extraction difficulty was evaluated based on the consensus of three human observers using the Pederson difficulty score (PDS). The classification model used a ResNet-34 pretrained on the ImageNet dataset. The correlation between the PDS values determined by the proposed model and those measured by the experts was calculated. The prediction accuracies for C1 (depth), C2 (ramal relationship), and C3 (angulation) were 78.91%, 82.03%, and 90.23%, respectively. The results confirm that the proposed CNN-based deep learning model could be used to predict the difficulty of extracting a mandibular third molar using a panoramic radiographic image.


2020 ◽  
Author(s):  
Ho Heon Kim ◽  
Jae Il An ◽  
Yu Rang Park

BACKGROUND Early detection of developmental disabilities in children is essential because early intervention can improve the prognosis of children due to rapid growth and neuroplasticity. Given the phenotypical nature of developmental disabilities, high variability may come from the assessment process. Because there is a growing body of evidence indicating a relationship between developmental disability and motor, motor skill is considered as a factor to facilitate early diagnosis of developmental disability. However, there are problems to capture their motor skill, such as lack of specialists and time constraints, in the diagnosis of developmental disorders, which is conducted through informal questions or surveys to their parents. OBJECTIVE This study aimed to 1) identify the possibility of drag-and-drop data as a digital biomarker and 2) develop a classification model based on drag-and-drop data to classify children with developmental disabilities. METHODS We collected the drag-and-drop data of children with normal and abnormal development from May 1, 2018, to May 1, 2020, in a mobile application (DoBrain). In this study, 223 normal development and 147 developmental disabled children were involved. We used touch coordinates and extracted kinetic variables from these coordinates. A deep learning algorithm was developed to predicted to classify children with development. For the interpretability of the model result, we identified which coordinates contribute the classification results by conducting the Grad-CAM. RESULTS Of the 370 children in the study, 223 had normal development, and 147 had developmental disabilities were included. In all games, the number of changes in the acceleration sign based on the direction of progress both in x, and y-axis showed significant differences between the two groups (p<0.001 and es>0.5, respectively). The deep learning convolutional neural network model showed that drag-and-drop data can help diagnose developmental disabilities with a sensitivity of 0.71 and specificity of 0.78. Grad class activation map, which can interpret the results of the deep learning model, was visualized with the game results of specific children. CONCLUSIONS Through the results of the deep learning model, it was confirmed that the drag-and-drop data can be a new digital biomarker for the diagnosis of developmental disabilities.


2021 ◽  
Vol 12 ◽  
Author(s):  
Ning Cheng ◽  
Yue Chen ◽  
Wanqing Gao ◽  
Jiajun Liu ◽  
Qunfu Huang ◽  
...  

Purpose: This study proposes an S-TextBLCNN model for the efficacy of traditional Chinese medicine (TCM) formula classification. This model uses deep learning to analyze the relationship between herb efficacy and formula efficacy, which is helpful in further exploring the internal rules of formula combination.Methods: First, for the TCM herbs extracted from Chinese Pharmacopoeia, natural language processing (NLP) is used to learn and realize the quantitative expression of different TCM herbs. Three features of herb name, herb properties, and herb efficacy are selected to encode herbs and to construct formula-vector and herb-vector. Then, based on 2,664 formulae for stroke collected in TCM literature and 19 formula efficacy categories extracted from Yifang Jijie, an improved deep learning model TextBLCNN consists of a bidirectional long short-term memory (Bi-LSTM) neural network and a convolutional neural network (CNN) is proposed. Based on 19 formula efficacy categories, binary classifiers are established to classify the TCM formulae. Finally, aiming at the imbalance problem of formula data, the over-sampling method SMOTE is used to solve it and the S-TextBLCNN model is proposed.Results: The formula-vector composed of herb efficacy has the best effect on the classification model, so it can be inferred that there is a strong relationship between herb efficacy and formula efficacy. The TextBLCNN model has an accuracy of 0.858 and an F1-score of 0.762, both higher than the logistic regression (acc = 0.561, F1-score = 0.567), SVM (acc = 0.703, F1-score = 0.591), LSTM (acc = 0.723, F1-score = 0.621), and TextCNN (acc = 0.745, F1-score = 0.644) models. In addition, the over-sampling method SMOTE is used in our model to tackle data imbalance, and the F1-score is greatly improved by an average of 47.1% in 19 models.Conclusion: The combination of formula feature representation and the S-TextBLCNN model improve the accuracy in formula efficacy classification. It provides a new research idea for the study of TCM formula compatibility.


High pace rise in Glaucoma, an irreversible eye disease that deteriorates vision capacity of human has alarmed academia-industries to develop a novel and robust Computer Aided Diagnosis (CAD) system for early Glaucomatic eye detection. The main root cause for glaucoma growth depends on its structural alterations in the retina and is very much essential for ophthalmologists to identify it at an initial period to stop its progression. Fundoscopy is among one of the biomedical imaging techniques to analyze the internal structure of retina. Recently, numerous efforts have been made to exploit SpatialTemporal features including morphological values of Optical Disk (OD), Optical Cup (OC), Neuro-Retinal Rim (NRR) etc to perform Glaucoma detection in fundus images. Here, some issues like: suitable pre-processing, precise Region of Interest segmentation, post-segmentation and lack of generalized threshold limits efficacy of the major existing approaches. Furthermore, the optimal segmentation of OD and OC, nerves removal from OD or OC is often tedious and demands more efficient solution. However, these approaches cumulatively turn out to be computationally complex and time-consuming. As potential alternative, deep learning techniques have gained widespread attention, especially for image analysis or vision technologies. With this motive, in this paper, the authors proposed a novel Convolutional Stacked Auto-Encoder (CSAE) assisted Deep Learning Model for Glaucoma Detection and Classification model named GlaucoNet. Unlike classical methods, GlaucoNet applies Stacked Auto-Encoder by using hierarchical CNN structure to perform deep feature extraction and learning. By adapting complex data nature, and large features, GlaucoNet was designed with three layers: convolutional layer (CONV), Max-pool layer (MP) and two Fully Connected (FC) layers where the first performs feature extraction and learning, while second exhibits feature selection followed by the reduction of spatial resolution of the individual feature map to avoid large number of parameters and computational complexities. To avoid saturation problem in this work, by marking an applied dropout as 0.5. MATLAB based simulation-results with DRISHTI-GS and DRION-DB datasets affirmed that the proposed GlaucoNet model outperforms as compared to other state-of-art techniques: neural network based approaches in terms of accuracy, recall, precision, F-Measure and balanced accuracy. The overall parametric measured values shown better performance for GlaucoNet model.


2020 ◽  
Vol ahead-of-print (ahead-of-print) ◽  
Author(s):  
Stefano Bromuri ◽  
Alexander P. Henkel ◽  
Deniz Iren ◽  
Visara Urovi

PurposeA vast body of literature has documented the negative consequences of stress on employee performance and well-being. These deleterious effects are particularly pronounced for service agents who need to constantly endure and manage customer emotions. The purpose of this paper is to introduce and describe a deep learning model to predict in real-time service agent stress from emotion patterns in voice-to-voice service interactions.Design/methodology/approachA deep learning model was developed to identify emotion patterns in call center interactions based on 363 recorded service interactions, subdivided in 27,889 manually expert-labeled three-second audio snippets. In a second step, the deep learning model was deployed in a call center for a period of one month to be further trained by the data collected from 40 service agents in another 4,672 service interactions.FindingsThe deep learning emotion classifier reached a balanced accuracy of 68% in predicting discrete emotions in service interactions. Integrating this model in a binary classification model, it was able to predict service agent stress with a balanced accuracy of 80%.Practical implicationsService managers can benefit from employing the deep learning model to continuously and unobtrusively monitor the stress level of their service agents with numerous practical applications, including real-time early warning systems for service agents, customized training and automatically linking stress to customer-related outcomes.Originality/valueThe present study is the first to document an artificial intelligence (AI)-based model that is able to identify emotions in natural (i.e. nonstaged) interactions. It is further a pioneer in developing a smart emotion-based stress measure for service agents. Finally, the study contributes to the literature on the role of emotions in service interactions and employee stress.


Sign in / Sign up

Export Citation Format

Share Document