scholarly journals Deep learning-derived evaluation metrics enable effective benchmarking of computational tools for phosphopeptide identification

2021 ◽  
pp. 100171
Author(s):  
Wen Jiang ◽  
Bo Wen ◽  
Kai Li ◽  
Wen-Feng Zeng ◽  
Felipe da Veiga Leprevost ◽  
...  
2021 ◽  
Vol 42 (Supplement_1) ◽  
Author(s):  
M Lewis ◽  
J Figueroa

Abstract   Recent health reforms have created incentives for cardiologists and accountable care organizations to participate in value-based care models for heart failure (HF). Accurate risk stratification of HF patients is critical to efficiently deploy interventions aimed at reducing preventable utilization. The goal of this paper was to compare deep learning approaches with traditional logistic regression (LR) to predict preventable utilization among HF patients. We conducted a prognostic study using data on 93,260 HF patients continuously enrolled for 2-years in a large U.S. commercial insurer to develop and validate prediction models for three outcomes of interest: preventable hospitalizations, preventable emergency department (ED) visits, and preventable costs. Patients were split into training, validation, and testing samples. Outcomes were modeled using traditional and enhanced LR and compared to gradient boosting model and deep learning models using sequential and non-sequential inputs. Evaluation metrics included precision (positive predictive value) at k, cost capture, and Area Under the Receiver operating characteristic (AUROC). Deep learning models consistently outperformed LR for all three outcomes with respect to the chosen evaluation metrics. Precision at 1% for preventable hospitalizations was 43% for deep learning compared to 30% for enhanced LR. Precision at 1% for preventable ED visits was 39% for deep learning compared to 33% for enhanced LR. For preventable cost, cost capture at 1% was 30% for sequential deep learning, compared to 18% for enhanced LR. The highest AUROCs for deep learning were 0.778, 0.681 and 0.727, respectively. These results offer a promising approach to identify patients for targeted interventions. FUNDunding Acknowledgement Type of funding sources: Private company. Main funding source(s): internally funded by Diagnostic Robotics Inc.


2019 ◽  
Author(s):  
Ido Springer ◽  
Hanan Besser ◽  
Nili Tickotsky-Moskovitz ◽  
Shirit Dvorkin ◽  
Yoram Louzoun

AbstractCurrent sequencing methods allow for detailed samples of T cell receptors (TCR) repertoires. To determine from a repertoire whether its host had been exposed to a target, computational tools that predict TCR-epitope binding are required. Currents tools are based on conserved motifs and are applied to peptides with many known binding TCRs.Given any TCR and peptide, we employ new NLP-based methods to predict whether they bind. We combined large-scale TCR-peptide dictionaries with deep learning methods to produce ERGO (pEptide tcR matchinG predictiOn), a highly specific and generic TCR-peptide binding predictor.A set of standard tests are defined for the performance of peptide-TCR binding, including the detection of TCRs binding to a given peptide/antigen, choosing among a set of candidate peptides for a given TCR and determining whether any pair of TCR-peptide bind. ERGO significantly outperforms current methods in these tests even when not trained specifically for each test.The software implementation and data sets are available at https://github.com/louzounlab/ERGO


2020 ◽  
Author(s):  
George Zaki ◽  
Prabhakar R. Gudla ◽  
Kyunghun Lee ◽  
Justin Kim ◽  
Laurent Ozbun ◽  
...  

AbstractDeep learning is rapidly becoming the technique of choice for automated segmentation of nuclei in biological image analysis workflows. In order to evaluate the feasibility of training nuclear segmentation models on small, custom annotated image datasets that have been augmented, we have designed a computational pipeline to systematically compare different nuclear segmentation model architectures and model training strategies. Using this approach, we demonstrate that transfer learning and tuning of training parameters, such as the composition, size and pre-processing of the training image dataset, can lead to robust nuclear segmentation models, which match, and often exceed, the performance of existing, off-the-shelf deep learning models pre-trained on large image datasets. We envision a practical scenario where deep learning nuclear segmentation models trained in this way can be shared across a laboratory, facility, or institution, and continuously improved by training them on progressively larger and varied image datasets. Our work provides computational tools and a practical framework for deep learning-based biological image segmentation using small annotated image datasets.


2021 ◽  
Vol 15 ◽  
Author(s):  
Zarina Rakhimberdina ◽  
Quentin Jodelet ◽  
Xin Liu ◽  
Tsuyoshi Murata

With the advent of brain imaging techniques and machine learning tools, much effort has been devoted to building computational models to capture the encoding of visual information in the human brain. One of the most challenging brain decoding tasks is the accurate reconstruction of the perceived natural images from brain activities measured by functional magnetic resonance imaging (fMRI). In this work, we survey the most recent deep learning methods for natural image reconstruction from fMRI. We examine these methods in terms of architectural design, benchmark datasets, and evaluation metrics and present a fair performance evaluation across standardized evaluation metrics. Finally, we discuss the strengths and limitations of existing studies and present potential future directions.


Author(s):  
Jie Gui ◽  
Xiaofeng Cong ◽  
Yuan Cao ◽  
Wenqi Ren ◽  
Jun Zhang ◽  
...  

The presence of haze significantly reduces the quality of images. Researchers have designed a variety of algorithms for image dehazing (ID) to restore the quality of hazy images. However, there are few studies that summarize the deep learning (DL) based dehazing technologies. In this paper, we conduct a comprehensive survey on the recent proposed dehazing methods. Firstly, we conclude the commonly used datasets, loss functions and evaluation metrics. Secondly, we group the existing researches of ID into two major categories: supervised ID and unsupervised ID. The core ideas of various influential dehazing models are introduced. Finally, the open issues for future research on ID are pointed out.


2020 ◽  
Vol 15 (4) ◽  
pp. 338-348
Author(s):  
Abdelbasset Boukelia ◽  
Anouar Boucheham ◽  
Meriem Belguidoum ◽  
Mohamed Batouche ◽  
Farida Zehraoui ◽  
...  

Background: Molecular biomarkers show new ways to understand many disease processes. Noncoding RNAs as biomarkers play a crucial role in several cellular activities, which are highly correlated to many human diseases especially cancer. The classification and the identification of ncRNAs have become a critical issue due to their application, such as biomarkers in many human diseases. Objective: Most existing computational tools for ncRNA classification are mainly used for classifying only one type of ncRNA. They are based on structural information or specific known features. Furthermore, these tools suffer from a lack of significant and validated features. Therefore, the performance of these methods is not always satisfactory. Methods: We propose a novel approach named imCnC for ncRNA classification based on multisource deep learning, which integrates several data sources such as genomic and epigenomic data to identify several ncRNA types. Also, we propose an optimization technique to visualize the extracted features pattern from the multisource CNN model to measure the epigenomics features of each ncRNA type. Results: The computational results using a dataset of 16 human ncRNA classes downloaded from RFAM show that imCnC outperforms the existing tools. Indeed, imCnC achieved an accuracy of 94,18%. In addition, our method enables to discover new ncRNA features using an optimization technique to measure and visualize the features pattern of the imCnC classifier.


Author(s):  
Pritam Ghosh ◽  
Subhranil Mustafi ◽  
Satyendra Nath Mandal

In this paper an attempt has been made to identify six different goat breeds from pure breed goat images. The images of goat breeds have been captured from different organized registered goat farms in India, and almost two thousand digital images of individual goats were captured in restricted (to get similar image background) and unrestricted (natural) environments without imposing stress to animals. A pre-trained deep learning-based object detection model called Faster R-CNN has been fine-tuned by using transfer-learning on the acquired images for automatic classification and localization of goat breeds. This fine-tuned model is able to locate the goat (localize) and classify (identify) its breed in the image. The Pascal VOC object detection evaluation metrics have been used to evaluate this model. Finally, comparison has been made with prediction accuracies of different technologies used for different animal breed identification.


2020 ◽  
Vol 17 (5) ◽  
pp. 702-712
Author(s):  
Manal Khayyat ◽  
Lamiaa Elrefaei

With the increasing amounts of existing unorganized images on the internet today and the necessity to use them efficiently in various types of applications. There is a critical need to discover rigid models that can classify and predict images successfully and instantaneously. Therefore, this study aims to collect Arabic manuscripts images in a dataset and predict their handwriting styles using the most powerful and trending technologies. There are many types of Arabic handwriting styles, including Al-Reqaa, Al-Nask, Al-Thulth, Al-Kufi, Al-Hur, Al-Diwani, Al-Farsi, Al-Ejaza, Al-Maghrabi, Al-Taqraa, etc. However, the study classified the collected dataset images according to the handwriting styles and focused on only six types of handwriting styles that existed in the collected Arabic manuscripts. To reach our goal, we applied the MobileNet pre-trained deep learning model on our classified dataset images to automatically capture and extract the features from them. Afterward, we evaluated the performance of the developed model by computing its recorded evaluation metrics. We reached that MobileNet convolutional neural network is a promising technology since it reached 0.9583 as the highest recorded accuracy and 0.9633 as the average F-score


Automatic text summarization is a technique of generating short and accurate summary of a longer text document. Text summarization can be classified based on the number of input documents (single document and multi-document summarization) and based on the characteristics of the summary generated (extractive and abstractive summarization). Multi-document summarization is an automatic process of creating relevant, informative and concise summary from a cluster of related documents. This paper does a detailed survey on the existing literature on the various approaches for text summarization. Few of the most popular approaches such as graph based, cluster based and deep learning-based summarization techniques are discussed here along with the evaluation metrics, which can provide an insight to the future researchers.


2020 ◽  
Author(s):  
Majid Ghorbani Eftekhar

AbstractIdentifying subcellular localization of protein is significant for understanding its molecular function. It provides valuable insights that can be of tremendous help to protein’s function research and the detection of potential cell surface/secreted drug targets. The prediction of protein subcellular localization using bioinformatics methods is an inexpensive option to experimentally approaches. Many computational tools have been built during the past two decades, however, producing reliable prediction has always been the challenge. In this study, a Deep learning (DL) technique is proposed to enhance the precision of the analytical engine of one of these tools called PSORTb v3.0. Its conventional SVM machine learning model was replaced by the state-of-the-art DL method (BiLSTM) and a Data augmentation measure (SeqGAN). As a result, the combination of BiLSTM and SeqGAN outperformed SVM by improving its precision from 57.4% to 75%. This method was applied on a dataset containing 8230 protein sequences, which was experimentally derived by Brinkman Lab. The presented model provides promising outcomes for the future research. The source code of the model is available at https://github.com/mgetech/SubLoc.


Sign in / Sign up

Export Citation Format

Share Document