scholarly journals UPCLASS: a deep learning-based classifier for UniProtKB entry publications

Database ◽  
2020 ◽  
Vol 2020 ◽  
Author(s):  
Douglas Teodoro ◽  
Julien Knafou ◽  
Nona Naderi ◽  
Emilie Pasche ◽  
Julien Gobeill ◽  
...  

Abstract In the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliographies in UniProt, we investigate a convolutional neural network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge of categorizing publications at the accession annotation level is that the same publication can be annotated with multiple proteins and thus be associated with different category sets according to the evidence provided for the protein. We propose a model that divides the document into parts containing and not containing evidence for the protein annotation. Then, we use these parts to create different feature sets for each accession and feed them to separate layers of the network. The CNN model achieved a micro F1-score of 0.72 and a macro F1-score of 0.62, outperforming baseline models based on logistic regression and support vector machine by up to 22 and 18 percentage points, respectively. We believe that such an approach could be used to systematically categorize the computationally mapped bibliography in UniProtKB, which represents a significant set of the publications, and help curators to decide whether a publication is relevant for further curation for a protein accession. Database URL: https://goldorak.hesge.ch/bioexpclass/upclass/.

2019 ◽  
Author(s):  
Douglas Teodoro ◽  
Julien Knafou ◽  
Nona Naderi ◽  
Emilie Pasche ◽  
Julien Gobeill ◽  
...  

AbstractIn the UniProt Knowledgebase (UniProtKB), publications providing evidence for a specific protein annotation entry are organized across different categories, such as function, interaction and expression, based on the type of data they contain. To provide a systematic way of categorizing computationally mapped bibliography in UniProt, we investigate a Convolution Neural Network (CNN) model to classify publications with accession annotations according to UniProtKB categories. The main challenge to categorize publications at the accession annotation level is that the same publication can be annotated with multiple proteins, and thus be associated to different category sets according to the evidence provided for the protein. We propose a model that divides the document into parts containing and not containing evidence for the protein annotation. Then, we use these parts to create different feature sets for each accession and feed them to separate layers of the network. The CNN model achieved a F1-score of 0.72, outperforming baseline models based on logistic regression and support vector machine by up to 22 and 18 percentage points, respectively. We believe that such approach could be used to systematically categorize the computationally mapped bibliography in UniProtKB, which represents a significant set of the publications, and help curators to decide whether a publication is relevant for further curation for a protein accession.


2016 ◽  
Vol 24 (2) ◽  
pp. 361-370 ◽  
Author(s):  
Edward Choi ◽  
Andy Schuetz ◽  
Walter F Stewart ◽  
Jimeng Sun

Objective: We explored whether use of deep learning to model temporal relations among events in electronic health records (EHRs) would improve model performance in predicting initial diagnosis of heart failure (HF) compared to conventional methods that ignore temporality. Materials and Methods: Data were from a health system’s EHR on 3884 incident HF cases and 28 903 controls, identified as primary care patients, between May 16, 2000, and May 23, 2013. Recurrent neural network (RNN) models using gated recurrent units (GRUs) were adapted to detect relations among time-stamped events (eg, disease diagnosis, medication orders, procedure orders, etc.) with a 12- to 18-month observation window of cases and controls. Model performance metrics were compared to regularized logistic regression, neural network, support vector machine, and K-nearest neighbor classifier approaches. Results: Using a 12-month observation window, the area under the curve (AUC) for the RNN model was 0.777, compared to AUCs for logistic regression (0.747), multilayer perceptron (MLP) with 1 hidden layer (0.765), support vector machine (SVM) (0.743), and K-nearest neighbor (KNN) (0.730). When using an 18-month observation window, the AUC for the RNN model increased to 0.883 and was significantly higher than the 0.834 AUC for the best of the baseline methods (MLP). Conclusion: Deep learning models adapted to leverage temporal relations appear to improve performance of models for detection of incident heart failure with a short observation window of 12–18 months.


2021 ◽  
Vol 16 ◽  
Author(s):  
Farida Alaaeldin Mostafa ◽  
Yasmine Mohamed Afify ◽  
Rasha Mohamed Ismail ◽  
Nagwa Lotfy Badr

Background: Protein sequence analysis helps in the prediction of protein functions. As the number of proteins increases, it gives the bioinformaticians a challenge to analyze and study the similarity between them. Most of the existing protein analysis methods use Support Vector Machine. Deep learning did not receive much attention regarding protein analysis as it is noted that little work focused on studying the protein diseases classification. Objective: The contribution of this paper is to present a deep learning approach that classifies protein diseases based on protein descriptors. Methods: Different protein descriptors are used and decomposed into modified feature descriptors. Uniquely, we introduce using Convolutional Neural Network model to learn and classify protein diseases. The modified feature descriptors are fed to the Convolutional Neural Network model on a dataset of 1563 protein sequences classified into 3 different disease classes: Aids, Tumor suppressor, and Proto oncogene. Results: The usage of the modified feature descriptors shows a significant increase in the performance of the Convolutional Neural Network model over Support Vector Machine using different kernel functions. One modified feature descriptor improved by 19.8%, 27.9%, 17.6%, 21.5%, 17.3%, and 22% for evaluation metrics: Area Under the Curve, Matthews Correlation Coefficient, Accuracy, F1-score, Recall, and Precision, respectively. Conclusion: Results show that the prediction of the proposed modified feature descriptors significantly surpasses that of Support Vector Machine model.


2021 ◽  
Vol 9 ◽  
Author(s):  
Ashwini K ◽  
P. M. Durai Raj Vincent ◽  
Kathiravan Srinivasan ◽  
Chuan-Yu Chang

Neonatal infants communicate with us through cries. The infant cry signals have distinct patterns depending on the purpose of the cries. Preprocessing, feature extraction, and feature selection need expert attention and take much effort in audio signals in recent days. In deep learning techniques, it automatically extracts and selects the most important features. For this, it requires an enormous amount of data for effective classification. This work mainly discriminates the neonatal cries into pain, hunger, and sleepiness. The neonatal cry auditory signals are transformed into a spectrogram image by utilizing the short-time Fourier transform (STFT) technique. The deep convolutional neural network (DCNN) technique takes the spectrogram images for input. The features are obtained from the convolutional neural network and are passed to the support vector machine (SVM) classifier. Machine learning technique classifies neonatal cries. This work combines the advantages of machine learning and deep learning techniques to get the best results even with a moderate number of data samples. The experimental result shows that CNN-based feature extraction and SVM classifier provides promising results. While comparing the SVM-based kernel techniques, namely radial basis function (RBF), linear and polynomial, it is found that SVM-RBF provides the highest accuracy of kernel-based infant cry classification system provides 88.89% accuracy.


2020 ◽  
Author(s):  
Jian Zhan ◽  
Zuo-xi Wu ◽  
Zhen-xin Duan ◽  
Gui-ying Yang ◽  
Zhi-yong Du ◽  
...  

Abstract Background: Estimating the depth of anaesthesia (DoA) is critical in modern anaesthetic practice. Multiple DoA monitors based on electroencephalograms (EEGs) have been widely used for DoA monitoring; however, these monitors may be inaccurate under certain conditions. In this work, the hypothesis that heart rate variability (HRV)-derived features based on a deep neural network can distinguish different anaesthesia states was investigated.Methods: A novel method of distinguishing different anaesthesia states was developed based on four HRV-derived time and frequency domain features combined with a deep neural network. Four features were extracted from an electrocardiogram, including the HRV high-frequency power, low-frequency power, high-to-low-frequency power ratio, and sample entropy. Next, these features were used as inputs for the deep neural network, which used the expert assessment of consciousness level as the reference output. Finally, the deep neural network was compared with the logistic regression, support vector machine, and decision tree models. The datasets of 23 anaesthesia patients were used to assess the proposed method.Results: The accuracies of the four models, in distinguishing the anaesthesia states, were 86.2% (logistic regression), 87.5% (support vector machine), 87.2% (decision tree), and 90.1% (deep neural network). The accuracy of deep neural network was higher than those of the logistic regression (p < 0.05), support vector machine (p < 0.05), and decision tree (p < 0.05) approaches. Our method outperformed the logistic regression, support vector machine, and decision tree methods.Conclusions: The incorporation of four HRV-derived time and frequency domain features and a deep neural network could accurately distinguish between different anaesthesia states; however, this study is a pilot of a feasibility study, providing a method to supplement DoA monitoring based on EEG features to improve the accuracy of DoA estimation.


Sebatik ◽  
2020 ◽  
Vol 24 (2) ◽  
Author(s):  
Anifuddin Azis

Indonesia merupakan negara dengan keanekaragaman hayati terbesar kedua di dunia setelah Brazil. Indonesia memiliki sekitar 25.000 spesies tumbuhan dan 400.000 jenis hewan dan ikan. Diperkirakan 8.500 spesies ikan hidup di perairan Indonesia atau merupakan 45% dari jumlah spesies yang ada di dunia, dengan sekitar 7.000an adalah spesies ikan laut. Untuk menentukan berapa jumlah spesies tersebut dibutuhkan suatu keahlian di bidang taksonomi. Dalam pelaksanaannya mengidentifikasi suatu jenis ikan bukanlah hal yang mudah karena memerlukan suatu metode dan peralatan tertentu, juga pustaka mengenai taksonomi. Pemrosesan video atau citra pada data ekosistem perairan yang dilakukan secara otomatis mulai dikembangkan. Dalam pengembangannya, proses deteksi dan identifikasi spesies ikan menjadi suatu tantangan dibandingkan dengan deteksi dan identifikasi pada objek yang lain. Metode deep learning yang berhasil dalam melakukan klasifikasi objek pada citra mampu untuk menganalisa data secara langsung tanpa adanya ekstraksi fitur pada data secara khusus. Sistem tersebut memiliki parameter atau bobot yang berfungsi sebagai ektraksi fitur maupun sebagai pengklasifikasi. Data yang diproses menghasilkan output yang diharapkan semirip mungkin dengan data output yang sesungguhnya.  CNN merupakan arsitektur deep learning yang mampu mereduksi dimensi pada data tanpa menghilangkan ciri atau fitur pada data tersebut. Pada penelitian ini akan dikembangkan model hybrid CNN (Convolutional Neural Networks) untuk mengekstraksi fitur dan beberapa algoritma klasifikasi untuk mengidentifikasi spesies ikan. Algoritma klasifikasi yang digunakan pada penelitian ini adalah : Logistic Regression (LR), Support Vector Machine (SVM), Decision Tree, K-Nearest Neighbor (KNN),  Random Forest, Backpropagation.


2021 ◽  
Vol 0 (0) ◽  
Author(s):  
Kavitha Senthil ◽  
Vidyaathulasiramam

Abstract Objectives This paper proposed the neural network-based segmentation model using Pre-trained Mask Convolutional Neural Network (CNN) with VGG-19 architecture. Since ovarian is very tiny tissue, it needs to be segmented with higher accuracy from the annotated image of ovary images collected in dataset. This model is proposed to predict and suppress the illness early and to correctly diagnose it, helping the doctor save the patient's life. Methods The paper uses the neural network based segmentation using Pre-trained Mask CNN integrated with VGG-19 NN architecture for CNN to enhance the ovarian cancer prediction and diagnosis. Results Proposed segmentation using hybrid neural network of CNN will provide higher accuracy when compared with logistic regression, Gaussian naïve Bayes, and random Forest and Support Vector Machine (SVM) classifiers.


Author(s):  
P. Nagaraj ◽  
P. Deepalakshmi

Diabetes, caused by the rise in level of glucose in blood, has many latest devices to identify from blood samples. Diabetes, when unnoticed, may bring many serious diseases like heart attack, kidney disease. In this way, there is a requirement for solid research and learning model’s enhancement in the field of gestational diabetes identification and analysis. SVM is one of the powerful classification models in machine learning, and similarly, Deep Neural Network is powerful under deep learning models. In this work, we applied Enhanced Support Vector Machine and Deep Learning model Deep Neural Network for diabetes prediction and screening. The proposed method uses Deep Neural Network obtaining its input from the output of Enhanced Support Vector Machine, thus having a combined efficacy. The dataset we considered includes 768 patients’ data with eight major features and a target column with result “Positive” or “Negative”. Experiment is done with Python and the outcome of our demonstration shows that the deep Learning model gives more efficiency for diabetes prediction.


Sign in / Sign up

Export Citation Format

Share Document