scholarly journals DEEPScreen: High Performance Drug-Target Interaction Prediction with Convolutional Neural Networks Using 2-D Structural Compound Representations

2018 ◽  
Author(s):  
Ahmet Sureyya Rifaioglu ◽  
Volkan Atalay ◽  
Maria Jesus Martin ◽  
Rengul Cetin-Atalay ◽  
Tunca Dogan

The identification of physical interactions between drug candidate chemical substances and target biomolecules is an important step in the process of drug discovery, where the standard procedure is the systematic screening of chemical compounds against pre-selected target proteins. However, experimental screening procedures are expensive and time consuming, therefore, it is not possible to carry out comprehensive tests. Within the last decade, computational approaches have been developed with the objective of aiding experimental studies by predicting novel drug-target interactions (DTI), via the construction and application of statistical models. In this study, we propose a large-scale DTI interaction prediction system, DEEPScreen, for early stage drug discovery, using convolutional deep neural networks. One of the main advantages of DEEPScreen is employing readily available simple 2-D images of compounds at the input level instead of engineered complex feature vectors that displayed limited performance in DTI prediction tasks previously. DEEPScreen learns complex features inherently from the 2-D molecular representations, thus producing highly accurate predictions. DEEPScreen system was trained for 704 target proteins (using ChEMBL curated bioactivity data) and finalized with rigorous hyper-parameter optimization tests. We compared the performance of DEEPScreen against shallow classifiers such as the random forest, logistic regression and support vector machines, to indicate the effectiveness of the proposed deep learning approach. Additionally, we compared DEEPScreen with other deep learning based state-of-the-art DTI predictors on widely used benchmark datasets and showed that DEEPScreen produces better or comparable results to the top performers. The method proposed here can be employed to computationally scan a large portion of the recorded drug candidate compound and protein spaces to aid the experimentalists working in the field of drug discovery and repurposing by providing a preselection of interesting novel DTIs.

2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


2020 ◽  
Vol 11 (9) ◽  
pp. 2531-2557 ◽  
Author(s):  
Ahmet Sureyya Rifaioglu ◽  
Esra Nalbat ◽  
Volkan Atalay ◽  
Maria Jesus Martin ◽  
Rengul Cetin-Atalay ◽  
...  

The DEEPScreen system is composed of 704 target protein specific prediction models, each independently trained using experimental bioactivity measurements against many drug candidate small molecules, and optimized according to the binding properties of the target proteins.


2020 ◽  
Author(s):  
Thomas R. Lane ◽  
Daniel H. Foil ◽  
Eni Minerali ◽  
Fabio Urbina ◽  
Kimberley M. Zorn ◽  
...  

<p>Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies we have applied multiple machine learning algorithms, modeling metrics and in some cases compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and comparison of our proprietary software Assay Central<sup>TM</sup> with random forest, k-Nearest Neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (3 levels). Model performance <a>was</a> assessed using an array of five-fold cross-validation metrics including area-under-the-curve, F1 score, Cohen’s kappa and Matthews correlation coefficient. <a>Based on ranked normalized scores for the metrics or datasets all methods appeared comparable while the distance from the top indicated Assay Central<sup>TM</sup> and support vector classification were comparable. </a>Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case where minimal tuning was performed of any of the methods. If anything, Assay Central<sup>TM</sup> may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central<sup>TM</sup>performance, but support vector classification seems to be a strong competitor. We also apply Assay Central<sup>TM</sup> to prospective predictions for PXR and hERG to further validate these models. This work currently appears to be the largest comparison of machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors and algorithms, as well as further refining methods for evaluating and comparing models. </p><p><b> </b></p>


Sensors ◽  
2021 ◽  
Vol 21 (8) ◽  
pp. 2852
Author(s):  
Parvathaneni Naga Srinivasu ◽  
Jalluri Gnana SivaSai ◽  
Muhammad Fazal Ijaz ◽  
Akash Kumar Bhoi ◽  
Wonjoon Kim ◽  
...  

Deep learning models are efficient in learning the features that assist in understanding complex patterns precisely. This study proposed a computerized process of classifying skin disease through deep learning based MobileNet V2 and Long Short Term Memory (LSTM). The MobileNet V2 model proved to be efficient with a better accuracy that can work on lightweight computational devices. The proposed model is efficient in maintaining stateful information for precise predictions. A grey-level co-occurrence matrix is used for assessing the progress of diseased growth. The performance has been compared against other state-of-the-art models such as Fine-Tuned Neural Networks (FTNN), Convolutional Neural Network (CNN), Very Deep Convolutional Networks for Large-Scale Image Recognition developed by Visual Geometry Group (VGG), and convolutional neural network architecture that expanded with few changes. The HAM10000 dataset is used and the proposed method has outperformed other methods with more than 85% accuracy. Its robustness in recognizing the affected region much faster with almost 2× lesser computations than the conventional MobileNet model results in minimal computational efforts. Furthermore, a mobile application is designed for instant and proper action. It helps the patient and dermatologists identify the type of disease from the affected region’s image at the initial stage of the skin disease. These findings suggest that the proposed system can help general practitioners efficiently and effectively diagnose skin conditions, thereby reducing further complications and morbidity.


Author(s):  
Kamyab Keshtkar

As a relatively high percentage of adenoma polyps are missed, a computer-aided diagnosis (CAD) tool based on deep learning can aid the endoscopist in diagnosing colorectal polyps or colorectal cancer in order to decrease polyps missing rate and prevent colorectal cancer mortality. Convolutional Neural Network (CNN) is a deep learning method and has achieved better results in detecting and segmenting specific objects in images in the last decade than conventional models such as regression, support vector machines or artificial neural networks. In recent years, based on the studies in medical imaging criteria, CNN models have acquired promising results in detecting masses and lesions in various body organs, including colorectal polyps. In this review, the structure and architecture of CNN models and how colonoscopy images are processed as input and converted to the output are explained in detail. In most primary studies conducted in the colorectal polyp detection and classification field, the CNN model has been regarded as a black box since the calculations performed at different layers in the model training process have not been clarified precisely. Furthermore, I discuss the differences between the CNN and conventional models, inspect how to train the CNN model for diagnosing colorectal polyps or cancer, and evaluate model performance after the training process.


Author(s):  
Benedict Irwin ◽  
Thomas Whitehead ◽  
Scott Rowland ◽  
Samar Mahmoud ◽  
Gareth Conduit ◽  
...  

More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical R&D. However this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structure-activity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largest-to-date successful application of deep-learning imputation to datasets which are comparable in size to the corporate data repository of a pharmaceutical company (678,994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; i) target activity data compiled from a range of drug discovery projects, ii) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism and elimination properties and, iii) high throughput screening data, testing the algorithm’s limits on early-stage noisy and very sparse data. Achieving median coefficients of determination, R, of 0.69, 0.36 and 0.43 respectively across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median R values of 0.28, 0.19 and 0.23 respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.


Author(s):  
Kexin Huang ◽  
Tianfan Fu ◽  
Lucas M Glass ◽  
Marinka Zitnik ◽  
Cao Xiao ◽  
...  

Abstract Summary Accurate prediction of drug–target interactions (DTI) is crucial for drug discovery. Recently, deep learning (DL) models for show promising performance for DTI prediction. However, these models can be difficult to use for both computer scientists entering the biomedical field and bioinformaticians with limited DL experience. We present DeepPurpose, a comprehensive and easy-to-use DL library for DTI prediction. DeepPurpose supports training of customized DTI prediction models by implementing 15 compound and protein encoders and over 50 neural architectures, along with providing many other useful features. We demonstrate state-of-the-art performance of DeepPurpose on several benchmark datasets. Availability and implementation https://github.com/kexinhuang12345/DeepPurpose. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Shahenda Sarhan ◽  
Aida A. Nasr ◽  
Mahmoud Y. Shams

Multipose face recognition system is one of the recent challenges faced by the researchers interested in security applications. Different researches have been introduced discussing the accuracy improvement of multipose face recognition through enhancing the face detector as Viola-Jones, Real Adaboost, and Cascade Object Detector while others concentrated on the recognition systems as support vector machine and deep convolution neural networks. In this paper, a combined adaptive deep learning vector quantization (CADLVQ) classifier is proposed. The proposed classifier has boosted the weakness of the adaptive deep learning vector quantization classifiers through using the majority voting algorithm with the speeded up robust feature extractor. Experimental results indicate that, the proposed classifier provided promising results in terms of sensitivity, specificity, precision, and accuracy compared to recent approaches in deep learning, statistical, and classical neural networks. Finally, the comparison is empirically performed using confusion matrix to ensure the reliability and robustness of the proposed system compared to the state-of art.


BMC Genomics ◽  
2019 ◽  
Vol 20 (S9) ◽  
Author(s):  
Yang-Ming Lin ◽  
Ching-Tai Chen ◽  
Jia-Ming Chang

Abstract Background Tandem mass spectrometry allows biologists to identify and quantify protein samples in the form of digested peptide sequences. When performing peptide identification, spectral library search is more sensitive than traditional database search but is limited to peptides that have been previously identified. An accurate tandem mass spectrum prediction tool is thus crucial in expanding the peptide space and increasing the coverage of spectral library search. Results We propose MS2CNN, a non-linear regression model based on deep convolutional neural networks, a deep learning algorithm. The features for our model are amino acid composition, predicted secondary structure, and physical-chemical features such as isoelectric point, aromaticity, helicity, hydrophobicity, and basicity. MS2CNN was trained with five-fold cross validation on a three-way data split on the large-scale human HCD MS2 dataset of Orbitrap LC-MS/MS downloaded from the National Institute of Standards and Technology. It was then evaluated on a publicly available independent test dataset of human HeLa cell lysate from LC-MS experiments. On average, our model shows better cosine similarity and Pearson correlation coefficient (0.690 and 0.632) than MS2PIP (0.647 and 0.601) and is comparable with pDeep (0.692 and 0.642). Notably, for the more complex MS2 spectra of 3+ peptides, MS2PIP is significantly better than both MS2PIP and pDeep. Conclusions We showed that MS2CNN outperforms MS2PIP for 2+ and 3+ peptides and pDeep for 3+ peptides. This implies that MS2CNN, the proposed convolutional neural network model, generates highly accurate MS2 spectra for LC-MS/MS experiments using Orbitrap machines, which can be of great help in protein and peptide identifications. The results suggest that incorporating more data for deep learning model may improve performance.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1753 ◽  
Author(s):  
Hassan El-Khatib ◽  
Dan Popescu ◽  
Loretta Ichim

The main purpose of the study was to develop a high accuracy system able to diagnose skin lesions using deep learning–based methods. We propose a new decision system based on multiple classifiers like neural networks and feature–based methods. Each classifier (method) gives the final decision system a certain weight, depending on the calculated accuracy, helping the system make a better decision. First, we created a neural network (NN) that can differentiate melanoma from benign nevus. The NN architecture is analyzed by evaluating it during the training process. Some biostatistic parameters, such as accuracy, specificity, sensitivity, and Dice coefficient are calculated. Then, we developed three other methods based on convolutional neural networks (CNNs). The CNNs were pre-trained using large ImageNet and Places365 databases. GoogleNet, ResNet-101, and NasNet-Large, were used in the enumeration order. CNN architectures were fine-tuned in order to distinguish the different types of skin lesions using transfer learning. The accuracies of the classifications were determined. The last proposed method uses the classical method of image object detection, more precisely, the one in which some features are extracted from the images, followed by the classification step. In this case, the classification was done by using a support vector machine. Just as in the first method, the sensitivity, specificity, Dice similarity coefficient and accuracy are determined. A comparison of the obtained results from all the methods is then done. As mentioned above, the novelty of this paper is the integration of these methods in a global fusion-based decision system that uses the results obtained by each individual method to establish the fusion weights. The results obtained by carrying out the experiments on two different free databases shows that the proposed system offers higher accuracy results.


Sign in / Sign up

Export Citation Format

Share Document