scholarly journals iProDNA-CapsNet: identifying protein-DNA binding residues using capsule neural networks

2019 ◽  
Vol 20 (S23) ◽  
Author(s):  
Binh P. Nguyen ◽  
Quang H. Nguyen ◽  
Giang-Nam Doan-Ngoc ◽  
Thanh-Hoang Nguyen-Vo ◽  
Susanto Rahardja

Abstract Background Since protein-DNA interactions are highly essential to diverse biological events, accurately positioning the location of the DNA-binding residues is necessary. This biological issue, however, is currently a challenging task in the age of post-genomic where data on protein sequences have expanded very fast. In this study, we propose iProDNA-CapsNet – a new prediction model identifying protein-DNA binding residues using an ensemble of capsule neural networks (CapsNets) on position specific scoring matrix (PSMM) profiles. The use of CapsNets promises an innovative approach to determine the location of DNA-binding residues. In this study, the benchmark datasets introduced by Hu et al. (2017), i.e., PDNA-543 and PDNA-TEST, were used to train and evaluate the model, respectively. To fairly assess the model performance, comparative analysis between iProDNA-CapsNet and existing state-of-the-art methods was done. Results Under the decision threshold corresponding to false positive rate (FPR) ≈ 5%, the accuracy, sensitivity, precision, and Matthews’s correlation coefficient (MCC) of our model is increased by about 2.0%, 2.0%, 14.0%, and 5.0% with respect to TargetDNA (Hu et al., 2017) and 1.0%, 75.0%, 45.0%, and 77.0% with respect to BindN+ (Wang et al., 2010), respectively. With regards to other methods not reporting their threshold settings, iProDNA-CapsNet also shows a significant improvement in performance based on most of the evaluation metrics. Even with different patterns of change among the models, iProDNA-CapsNets remains to be the best model having top performance in most of the metrics, especially MCC which is boosted from about 8.0% to 220.0%. Conclusions According to all evaluation metrics under various decision thresholds, iProDNA-CapsNet shows better performance compared to the two current best models (BindN and TargetDNA). Our proposed approach also shows that CapsNet can potentially be used and adopted in other biological applications.

2021 ◽  
Author(s):  
Binh Nguyen ◽  
QH Nguyen ◽  
GN Doan-Ngoc ◽  
TH Nguyen-Vo ◽  
S Rahardja

© 2019 The Author(s). Background: Since protein-DNA interactions are highly essential to diverse biological events, accurately positioning the location of the DNA-binding residues is necessary. This biological issue, however, is currently a challenging task in the age of post-genomic where data on protein sequences have expanded very fast. In this study, we propose iProDNA-CapsNet - a new prediction model identifying protein-DNA binding residues using an ensemble of capsule neural networks (CapsNets) on position specific scoring matrix (PSMM) profiles. The use of CapsNets promises an innovative approach to determine the location of DNA-binding residues. In this study, the benchmark datasets introduced by Hu et al. (2017), i.e., PDNA-543 and PDNA-TEST, were used to train and evaluate the model, respectively. To fairly assess the model performance, comparative analysis between iProDNA-CapsNet and existing state-of-the-art methods was done. Results: Under the decision threshold corresponding to false positive rate (FPR) ≈ 5%, the accuracy, sensitivity, precision, and Matthews's correlation coefficient (MCC) of our model is increased by about 2.0%, 2.0%, 14.0%, and 5.0% with respect to TargetDNA (Hu et al., 2017) and 1.0%, 75.0%, 45.0%, and 77.0% with respect to BindN+ (Wang et al., 2010), respectively. With regards to other methods not reporting their threshold settings, iProDNA-CapsNet also shows a significant improvement in performance based on most of the evaluation metrics. Even with different patterns of change among the models, iProDNA-CapsNets remains to be the best model having top performance in most of the metrics, especially MCC which is boosted from about 8.0% to 220.0%. Conclusions: According to all evaluation metrics under various decision thresholds, iProDNA-CapsNet shows better performance compared to the two current best models (BindN and TargetDNA). Our proposed approach also shows that CapsNet can potentially be used and adopted in other biological applications.


2021 ◽  
Author(s):  
Binh Nguyen ◽  
QH Nguyen ◽  
GN Doan-Ngoc ◽  
TH Nguyen-Vo ◽  
S Rahardja

© 2019 The Author(s). Background: Since protein-DNA interactions are highly essential to diverse biological events, accurately positioning the location of the DNA-binding residues is necessary. This biological issue, however, is currently a challenging task in the age of post-genomic where data on protein sequences have expanded very fast. In this study, we propose iProDNA-CapsNet - a new prediction model identifying protein-DNA binding residues using an ensemble of capsule neural networks (CapsNets) on position specific scoring matrix (PSMM) profiles. The use of CapsNets promises an innovative approach to determine the location of DNA-binding residues. In this study, the benchmark datasets introduced by Hu et al. (2017), i.e., PDNA-543 and PDNA-TEST, were used to train and evaluate the model, respectively. To fairly assess the model performance, comparative analysis between iProDNA-CapsNet and existing state-of-the-art methods was done. Results: Under the decision threshold corresponding to false positive rate (FPR) ≈ 5%, the accuracy, sensitivity, precision, and Matthews's correlation coefficient (MCC) of our model is increased by about 2.0%, 2.0%, 14.0%, and 5.0% with respect to TargetDNA (Hu et al., 2017) and 1.0%, 75.0%, 45.0%, and 77.0% with respect to BindN+ (Wang et al., 2010), respectively. With regards to other methods not reporting their threshold settings, iProDNA-CapsNet also shows a significant improvement in performance based on most of the evaluation metrics. Even with different patterns of change among the models, iProDNA-CapsNets remains to be the best model having top performance in most of the metrics, especially MCC which is boosted from about 8.0% to 220.0%. Conclusions: According to all evaluation metrics under various decision thresholds, iProDNA-CapsNet shows better performance compared to the two current best models (BindN and TargetDNA). Our proposed approach also shows that CapsNet can potentially be used and adopted in other biological applications.


Author(s):  
Abhijeet Bhattacharya ◽  
Tanmay Baweja ◽  
S. P. K. Karri

The electroencephalogram (EEG) is the most promising and efficient technique to study epilepsy and record all the electrical activity going in our brain. Automated screening of epilepsy through data-driven algorithms reduces the manual workload of doctors to diagnose epilepsy. New algorithms are biased either towards signal processing or deep learning, which holds subjective advantages and disadvantages. The proposed pipeline is an end-to-end automated seizure prediction framework with a Fourier transform feature extraction and deep learning-based transformer model, a blend of signal processing and deep learning — this imbibes the potential features to automatically identify the attentive regions in EEG signals for effective screening. The proposed pipeline has demonstrated superior performance on the benchmark dataset with average sensitivity and false-positive rate per hour (FPR/h) as 98.46%, 94.83% and 0.12439, 0, respectively. The proposed work shows great results on the benchmark datasets and a big potential for clinics as a support system with medical experts monitoring the patients.


2019 ◽  
Vol 2019 ◽  
pp. 1-9 ◽  
Author(s):  
Gabriele Valvano ◽  
Gianmarco Santini ◽  
Nicola Martini ◽  
Andrea Ripoli ◽  
Chiara Iacconi ◽  
...  

Cluster of microcalcifications can be an early sign of breast cancer. In this paper, we propose a novel approach based on convolutional neural networks for the detection and segmentation of microcalcification clusters. In this work, we used 283 mammograms to train and validate our model, obtaining an accuracy of 99.99% on microcalcification detection and a false positive rate of 0.005%. Our results show how deep learning could be an effective tool to effectively support radiologists during mammograms examination.


2019 ◽  
Vol 20 (S3) ◽  
Author(s):  
Liu Liu ◽  
Xiuzhen Hu ◽  
Zhenxing Feng ◽  
Xiaojin Zhang ◽  
Shan Wang ◽  
...  

Abstract Background Proteins perform their functions by interacting with acid radical ions. Recently, it was a challenging work to precisely predict the binding residues of acid radical ion ligands in the research field of molecular drug design. Results In this study, we proposed an improved method to predict the acid radical ion binding residues by using K-nearest Neighbors classifier. Meanwhile, we constructed datasets of four acid radical ion ligand (NO2−, CO32−, SO42−, PO43−) binding residues from BioLip database. Then, based on the optimal window length for each acid radical ion ligand, we refined composition information and position conservative information and extracted them as feature parameters for K-nearest Neighbors classifier. In the results of 5-fold cross-validation, the Matthew’s correlation coefficient was higher than 0.45, the values of accuracy, sensitivity and specificity were all higher than 69.2%, and the false positive rate was lower than 30.8%. Further, we also performed an independent test to test the practicability of the proposed method. In the obtained results, the sensitivity was higher than 40.9%, the values of accuracy and specificity were higher than 84.2%, the Matthew’s correlation coefficient was higher than 0.116, and the false positive rate was lower than 15.4%. Finally, we identified binding residues of the six metal ion ligands. In the predicted results, the values of accuracy, sensitivity and specificity were all higher than 77.6%, the Matthew’s correlation coefficient was higher than 0.6, and the false positive rate was lower than 19.6%. Conclusions Taken together, the good results of our prediction method added new insights in the prediction of the binding residues of acid radical ion ligands.


2020 ◽  
Vol 9 (2) ◽  
pp. 59-79
Author(s):  
Heisnam Rohen Singh ◽  
Saroj Kr Biswas

Recent trends in data mining and machine learning focus on knowledge extraction and explanation, to make crucial decisions from data, but data is virtually enormous in size and mostly associated with noise. Neuro-fuzzy systems are most suitable for representing knowledge in a data-driven environment. Many neuro-fuzzy systems were proposed for feature selection and classification; however, they focus on quantitative (accuracy) than qualitative (transparency). Such neuro-fuzzy systems for feature selection and classification include Enhance Neuro-Fuzzy (ENF) and Adaptive Dynamic Clustering Neuro-Fuzzy (ADCNF). Here a neuro-fuzzy system is proposed for feature selection and classification with improved accuracy and transparency. The novelty of the proposed system lies in determining a significant number of linguistic features for each input and in suggesting a compelling order of classification rules using the importance of input feature and the certainty of the rules. The performance of the proposed system is tested with 8 benchmark datasets. 10-fold cross-validation is used to compare the accuracy of the systems. Other performance measures such as false positive rate, precision, recall, f-measure, Matthews correlation coefficient and Nauck's index are also used for comparing the systems. It is observed from the experimental results that the proposed system is superior to the existing neuro-fuzzy systems.


2019 ◽  
Vol 12 (4) ◽  
pp. 294-305
Author(s):  
Balázs Szűcs ◽  
Áron Ballagi

Nowadays machine learning and artificial neural networks are hot topic. These methods gains more and more ground in everyday life. In addition to everyday usage, an increasing emphasis placed on industrial use. In the field of research and development, materials science, robotics and thanks to the spread of Industry 4.0 and digitalization, more and more machine learning based systems introduced in production. This paper gives examples of possible ways of using machine learning algorithms in manufacturing, as well as reducing pseudo-error (false positive) rate of machine vision quality control systems. Even the simplest algorithms and models can be very effective on real-world problems. With the usage of convolutional neural networks, the pseudo-error rate of the examined system reducible.


Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.


Trudy NAMI ◽  
2021 ◽  
pp. 37-47
Author(s):  
P. A. Vasin ◽  
I. A. Kulikov

Introduction (problem statement and relevance). This article deals with the problem of training artificial neural networks intended to analyze images of the surrounding space in automotive computer vision systems. The conventional training approach implies using loss functions that only improve the overall identification quality making no distinction between types of possible false predictions. However, traffic safety risks associated with different types of prediction errors are unequal being higher for false positive estimations.The purpose of this work is to propose improved loss functions, which include penalties for false positive predictions, and to study how using these functions affects the behavior of a convolutional neural network when estimating the drivable space.Methodology and research methods. The proposed loss functions are based on the Sørensen-Dice coefficient differing from each other in the approaches to penalizing false positive errors. The performance of the trained neural networks is evaluated using three metrics, namely, the Jaccard coefficient, False Positive Rate and False Negative Rate. The proposed solutions are compared with the conventional one by calculating the ratios of their respective metrics.Scientific novelty and results. The improved loss functions have been proposed to train computer vision algorithms featuring penalties for false positive estimations. The experimental study of the trained neural networks using a test dataset has shown that the improved loss functions allow reducing the False Positive Rate by 21%.The practical significance of this work is constituted by the proposed method of training neural networks that allows to increase the safety of automated driving through an improved accuracy of analyzing the surrounding space using computer vision systems.


2020 ◽  
Author(s):  
Yen-Tin Chen ◽  
Tzu-Yi Lin ◽  
Po-Jen Cheng ◽  
Kok-Seong Chan ◽  
Hui-Yu Huang ◽  
...  

Abstract Background First trimester screening is essential to preeclampsia (PE) prevention. Fetal Medicine Foundation (FMF) model combined maternal characteristics with mean arterial pressure (MAP), uterine artery pulsatility index (UtAPI) and placental growth factor (PlGF) to estimate risk. High detection rate (DR) was observed in Asia. The study aims to evaluate performance of screening in Taiwan.Methods This was a prospective and non-interventional study between January, 2017 and June, 2018. Data was collected from 700 pregnant women at 11+ 0-13+ 6 gestational week. Maternal characteristics were recorded. MAP, UtAPI and PlGF were measured and converted into Multiple of the Median (MoM). Patient-specific risks were calculated with FMF model. Performance of screening was examined by ROC curve and DR.Results 25 women (3.57%) contracted PE, including 8 with preterm PE (1.14%). In preterm PE, mean MoM of MAP and UtAPI were higher (1.096 vs 1.000; 1.084 vs 1.035). Mean MoM of PlGF was lower (0.927 vs 1.031). DR in preterm PE achieved 12.5%, 50.0%, 50.0% and 62.5% at false-positive rate (FPR) of 5%, 10%, 15% and 20%.Conclusion FMF model showed high DR for PE in Taiwan. Integration of PE and Down screening could set up a one-step workflow.


Sign in / Sign up

Export Citation Format

Share Document