scholarly journals Evaluation of Cross-Validation Strategies in Sequence-Based Binding Prediction Using Deep Learning

2018 ◽  
Author(s):  
Angela Lopez-del Rio ◽  
Alfons Nonell-Canals ◽  
David Vidal ◽  
Alexandre Perera-Lluna

Binding prediction between targets and drug-like compounds through Deep Neural Networks have generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are: (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database and (4) splitting based both in the clustering and in the source database. These schemas are applied to a Deep Learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our Deep Learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compounds clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.

2018 ◽  
Author(s):  
Angela Lopez-del Rio ◽  
Alfons Nonell-Canals ◽  
David Vidal ◽  
Alexandre Perera-Lluna

Binding prediction between targets and drug-like compounds through Deep Neural Networks have generated promising results in recent years, outperforming traditional machine learning-based methods. However, the generalization capability of these classification models is still an issue to be addressed. In this work, we explored how different cross-validation strategies applied to data from different molecular databases affect to the performance of binding prediction proteochemometrics models. These strategies are: (1) random splitting, (2) splitting based on K-means clustering (both of actives and inactives), (3) splitting based on source database and (4) splitting based both in the clustering and in the source database. These schemas are applied to a Deep Learning proteochemometrics model and to a simple logistic regression model to be used as baseline. Additionally, two different ways of describing molecules in the model are tested: (1) by their SMILES and (2) by three fingerprints. The classification performance of our Deep Learning-based proteochemometrics model is comparable to the state of the art. Our results show that the lack of generalization of these models is due to a bias in public molecular databases and that a restrictive cross-validation schema based on compounds clustering leads to worse but more robust and credible results. Our results also show better performance when representing molecules by their fingerprints.


2019 ◽  
Vol 59 (4) ◽  
pp. 1645-1657 ◽  
Author(s):  
Angela Lopez-del Rio ◽  
Alfons Nonell-Canals ◽  
David Vidal ◽  
Alexandre Perera-Lluna

2021 ◽  
pp. 27-38
Author(s):  
Rafaela Carvalho ◽  
João Pedrosa ◽  
Tudor Nedelcu

AbstractSkin cancer is one of the most common types of cancer and, with its increasing incidence, accurate early diagnosis is crucial to improve prognosis of patients. In the process of visual inspection, dermatologists follow specific dermoscopic algorithms and identify important features to provide a diagnosis. This process can be automated as such characteristics can be extracted by computer vision techniques. Although deep neural networks can extract useful features from digital images for skin lesion classification, performance can be improved by providing additional information. The extracted pseudo-features can be used as input (multimodal) or output (multi-tasking) to train a robust deep learning model. This work investigates the multimodal and multi-tasking techniques for more efficient training, given the single optimization of several related tasks in the latter, and generation of better diagnosis predictions. Additionally, the role of lesion segmentation is also studied. Results show that multi-tasking improves learning of beneficial features which lead to better predictions, and pseudo-features inspired by the ABCD rule provide readily available helpful information about the skin lesion.


2020 ◽  
Vol 10 (7) ◽  
pp. 2488 ◽  
Author(s):  
Muhammad Naseer Bajwa ◽  
Kaoru Muta ◽  
Muhammad Imran Malik ◽  
Shoaib Ahmed Siddiqui ◽  
Stephan Alexander Braun ◽  
...  

Propensity of skin diseases to manifest in a variety of forms, lack and maldistribution of qualified dermatologists, and exigency of timely and accurate diagnosis call for automated Computer-Aided Diagnosis (CAD). This study aims at extending previous works on CAD for dermatology by exploring the potential of Deep Learning to classify hundreds of skin diseases, improving classification performance, and utilizing disease taxonomy. We trained state-of-the-art Deep Neural Networks on two of the largest publicly available skin image datasets, namely DermNet and ISIC Archive, and also leveraged disease taxonomy, where available, to improve classification performance of these models. On DermNet we establish new state-of-the-art with 80% accuracy and 98% Area Under the Curve (AUC) for classification of 23 diseases. We also set precedence for classifying all 622 unique sub-classes in this dataset and achieved 67% accuracy and 98% AUC. On ISIC Archive we classified all 7 diseases with 93% average accuracy and 99% AUC. This study shows that Deep Learning has great potential to classify a vast array of skin diseases with near-human accuracy and far better reproducibility. It can have a promising role in practical real-time skin disease diagnosis by assisting physicians in large-scale screening using clinical or dermoscopic images.


Sensors ◽  
2021 ◽  
Vol 21 (4) ◽  
pp. 1480
Author(s):  
Nur-A-Alam Alam ◽  
Mominul Ahsan ◽  
Md. Abdul Based ◽  
Julfikar Haider ◽  
Marcin Kowalski

Currently, COVID-19 is considered to be the most dangerous and deadly disease for the human body caused by the novel coronavirus. In December 2019, the coronavirus spread rapidly around the world, thought to be originated from Wuhan in China and is responsible for a large number of deaths. Earlier detection of the COVID-19 through accurate diagnosis, particularly for the cases with no obvious symptoms, may decrease the patient’s death rate. Chest X-ray images are primarily used for the diagnosis of this disease. This research has proposed a machine vision approach to detect COVID-19 from the chest X-ray images. The features extracted by the histogram-oriented gradient (HOG) and convolutional neural network (CNN) from X-ray images were fused to develop the classification model through training by CNN (VGGNet). Modified anisotropic diffusion filtering (MADF) technique was employed for better edge preservation and reduced noise from the images. A watershed segmentation algorithm was used in order to mark the significant fracture region in the input X-ray images. The testing stage considered generalized data for performance evaluation of the model. Cross-validation analysis revealed that a 5-fold strategy could successfully impair the overfitting problem. This proposed feature fusion using the deep learning technique assured a satisfactory performance in terms of identifying COVID-19 compared to the immediate, relevant works with a testing accuracy of 99.49%, specificity of 95.7% and sensitivity of 93.65%. When compared to other classification techniques, such as ANN, KNN, and SVM, the CNN technique used in this study showed better classification performance. K-fold cross-validation demonstrated that the proposed feature fusion technique (98.36%) provided higher accuracy than the individual feature extraction methods, such as HOG (87.34%) or CNN (93.64%).


2021 ◽  
Author(s):  
Chao Cong ◽  
Yoko Kato ◽  
Henrique D. Vasconcellos ◽  
Mohammad R. Ostovaneh ◽  
Joao A.C. Lima ◽  
...  

AbstractBackgroundAutomatic coronary angiography (CAG) assessment may help in faster screening and diagnosis of patients. Current CNN-based vessel-segmentation suffers from sampling imbalance, candidate frame selection, and overfitting; few have shown adequate performance for CAG stenosis classification. We aimed to provide an end-to-end workflow that may solve these problems.MethodsA deep learning-based end-to-end workflow was employed as follows: 1) Candidate frame selection from CAG videograms with CNN+LSTM network, 2) Stenosis classification with Inception-v3 using 2 or 3 categories (<25%, >25%, and/or total occlusion) with and without redundancy training, and 3) Stenosis localization with two methods of class activation map (CAM) and anchor-based feature pyramid network (FPN). Overall 13744 frames from 230 studies were used for the stenosis classification training and 4-fold cross-validation for image-, artery-, and per-patient-level. For the stenosis localization training and 4-fold cross-validation, 690 images with >25% stenosis were used.ResultsOur model achieved an accuracy of 0.85, sensitivity of 0.96, and AUC of 0.86 in per-patient level stenosis classification. Redundancy training was effective to improve classification performance. Stenosis position localization was adequate with better quantitative results in anchor-based FPN model, achieving global-sensitivity for LCA and RCA of 0.68 and 0.70 with mean square error (MSE) values of 39.3 and 37.6 pixels respectively, in the 520 × 520 pixel image.ConclusionA fully-automatic end-to-end deep learning-based workflow that eliminates the vessel extraction and segmentation step was feasible in coronary artery stenosis classification and localization on CAG images.Key PointsThe fully-automatic, end-to-end workflow which eliminated the vessel extraction and segmentation step for supervised-learning was feasible in the stenosis classification on CAG images, achieving an accuracy of 0.85, sensitivity of 0.96, and AUC of 0.86 in per-patient level.The redundancy training improved the AUC values, accuracy, F1-score, and kappa score of the stenosis classification.Stenosis position localization was assessed in two methods of CAM-based and anchor-based models, which performance was acceptable with better quantitative results in anchor-based models.Summary StatementA fully-automatic end-to-end deep learning-based workflow which eliminated the vessel extraction and segmentation step was feasible in the stenosis classification and localization on CAG images. The redundancy training improved the stenosis classification performance.


2021 ◽  
Vol 11 (6) ◽  
pp. 7757-7762
Author(s):  
K. Aldriwish

Internet of Things (IoT) -based systems need to be up to date on cybersecurity threats. The security of IoT networks is challenged by software piracy and malware attacks, and much important information can be stolen and used for cybercrimes. This paper attempts to improve IoT cybersecurity by proposing a combined model based on deep learning to detect malware and software piracy across the IoT network. The malware’s model is based on Deep Convolutional Neural Networks (DCNNs). Apart from this, TensorFlow Deep Neural Networks (TFDNNs) are introduced to detect software piracy threats according to source code plagiarism. The investigation is conducted on the Google Code Jam (GCJ) dataset. The conducted experiments prove that the classification performance achieves high accuracy of about 98%.


Author(s):  
Yuejun Liu ◽  
Yifei Xu ◽  
Xiangzheng Meng ◽  
Xuguang Wang ◽  
Tianxu Bai

Background: Medical imaging plays an important role in the diagnosis of thyroid diseases. In the field of machine learning, multiple dimensional deep learning algorithms are widely used in image classification and recognition, and have achieved great success. Objective: The method based on multiple dimensional deep learning is employed for the auxiliary diagnosis of thyroid diseases based on SPECT images. The performances of different deep learning models are evaluated and compared. Methods: Thyroid SPECT images are collected with three types, they are hyperthyroidism, normal and hypothyroidism. In the pre-processing, the region of interest of thyroid is segmented and the amount of data sample is expanded. Four CNN models, including CNN, Inception, VGG16 and RNN, are used to evaluate deep learning methods. Results: Deep learning based methods have good classification performance, the accuracy is 92.9%-96.2%, AUC is 97.8%-99.6%. VGG16 model has the best performance, the accuracy is 96.2% and AUC is 99.6%. Especially, the VGG16 model with a changing learning rate works best. Conclusion: The standard CNN, Inception, VGG16, and RNN four deep learning models are efficient for the classification of thyroid diseases with SPECT images. The accuracy of the assisted diagnostic method based on deep learning is higher than that of other methods reported in the literature.


Sensors ◽  
2020 ◽  
Vol 20 (6) ◽  
pp. 1579
Author(s):  
Dongqi Wang ◽  
Qinghua Meng ◽  
Dongming Chen ◽  
Hupo Zhang ◽  
Lisheng Xu

Automatic detection of arrhythmia is of great significance for early prevention and diagnosis of cardiovascular disease. Traditional feature engineering methods based on expert knowledge lack multidimensional and multi-view information abstraction and data representation ability, so the traditional research on pattern recognition of arrhythmia detection cannot achieve satisfactory results. Recently, with the increase of deep learning technology, automatic feature extraction of ECG data based on deep neural networks has been widely discussed. In order to utilize the complementary strength between different schemes, in this paper, we propose an arrhythmia detection method based on the multi-resolution representation (MRR) of ECG signals. This method utilizes four different up to date deep neural networks as four channel models for ECG vector representations learning. The deep learning based representations, together with hand-crafted features of ECG, forms the MRR, which is the input of the downstream classification strategy. The experimental results of big ECG dataset multi-label classification confirm that the F1 score of the proposed method is 0.9238, which is 1.31%, 0.62%, 1.18% and 0.6% higher than that of each channel model. From the perspective of architecture, this proposed method is highly scalable and can be employed as an example for arrhythmia recognition.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
A. Wong ◽  
Z. Q. Lin ◽  
L. Wang ◽  
A. G. Chung ◽  
B. Shen ◽  
...  

AbstractA critical step in effective care and treatment planning for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the cause for the coronavirus disease 2019 (COVID-19) pandemic, is the assessment of the severity of disease progression. Chest x-rays (CXRs) are often used to assess SARS-CoV-2 severity, with two important assessment metrics being extent of lung involvement and degree of opacity. In this proof-of-concept study, we assess the feasibility of computer-aided scoring of CXRs of SARS-CoV-2 lung disease severity using a deep learning system. Data consisted of 396 CXRs from SARS-CoV-2 positive patient cases. Geographic extent and opacity extent were scored by two board-certified expert chest radiologists (with 20+ years of experience) and a 2nd-year radiology resident. The deep neural networks used in this study, which we name COVID-Net S, are based on a COVID-Net network architecture. 100 versions of the network were independently learned (50 to perform geographic extent scoring and 50 to perform opacity extent scoring) using random subsets of CXRs from the study, and we evaluated the networks using stratified Monte Carlo cross-validation experiments. The COVID-Net S deep neural networks yielded R$$^2$$ 2 of $$0.664 \pm 0.032$$ 0.664 ± 0.032 and $$0.635 \pm 0.044$$ 0.635 ± 0.044 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively, in stratified Monte Carlo cross-validation experiments. The best performing COVID-Net S networks achieved R$$^2$$ 2 of 0.739 and 0.741 between predicted scores and radiologist scores for geographic extent and opacity extent, respectively. The results are promising and suggest that the use of deep neural networks on CXRs could be an effective tool for computer-aided assessment of SARS-CoV-2 lung disease severity, although additional studies are needed before adoption for routine clinical use.


Sign in / Sign up

Export Citation Format

Share Document