DeepMSPeptide: peptide detectability prediction using deep learning

Bioinformatics ◽

10.1093/bioinformatics/btz708 ◽

2019 ◽

Author(s):

Guillermo Serrano ◽

Elizabeth Guruceaga ◽

Victor Segura

Keyword(s):

Deep Learning ◽

Protein Detection ◽

Amino Acid Sequences ◽

Supplementary Information ◽

Learning Method ◽

Supplementary Data ◽

Stochastic Nature ◽

Bioinformatic Tool ◽

Peptide Detectability ◽

Detection And Quantification

Abstract Summary The protein detection and quantification using high-throughput proteomic technologies is still challenging due to the stochastic nature of the peptide selection in the mass spectrometer, the difficulties in the statistical analysis of the results and the presence of degenerated peptides. However, considering in the analysis only those peptides that could be detected by mass spectrometry, also called proteotypic peptides, increases the accuracy of the results. Several approaches have been applied to predict peptide detectability based on the physicochemical properties of the peptides. In this manuscript, we present DeepMSPeptide, a bioinformatic tool that uses a deep learning method to predict proteotypic peptides exclusively based on the peptide amino acid sequences. Availability and implementation DeepMSPeptide is available at https://github.com/vsegurar/DeepMSPeptide. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Deep-learning method for data association in particle tracking

Bioinformatics ◽

10.1093/bioinformatics/btaa597 ◽

2020 ◽

Vol 36 (19) ◽

pp. 4935-4941 ◽

Cited By ~ 1

Author(s):

Yao Yao ◽

Ihor Smal ◽

Ilya Grigoriev ◽

Anna Akhmanova ◽

Erik Meijering

Keyword(s):

Deep Learning ◽

Particle Tracking ◽

Short Term Memory ◽

Data Association ◽

Time Lapse ◽

Supplementary Information ◽

Great Promise ◽

Learning Method ◽

Biological Studies ◽

Comprehensive Evaluations

Abstract Motivation Biological studies of dynamic processes in living cells often require accurate particle tracking as a first step toward quantitative analysis. Although many particle tracking methods have been developed for this purpose, they are typically based on prior assumptions about the particle dynamics, and/or they involve careful tuning of various algorithm parameters by the user for each application. This may make existing methods difficult to apply by non-expert users and to a broader range of tracking problems. Recent advances in deep-learning techniques hold great promise in eliminating these disadvantages, as they can learn how to optimally track particles from example data. Results Here, we present a deep-learning-based method for the data association stage of particle tracking. The proposed method uses convolutional neural networks and long short-term memory networks to extract relevant dynamics features and predict the motion of a particle and the cost of linking detected particles from one time point to the next. Comprehensive evaluations on datasets from the particle tracking challenge demonstrate the competitiveness of the proposed deep-learning method compared to the state of the art. Additional tests on real-time-lapse fluorescence microscopy images of various types of intracellular particles show the method performs comparably with human experts. Availability and implementation The software code implementing the proposed method as well as a description of how to obtain the test data used in the presented experiments will be available for non-commercial purposes from https://github.com/yoyohoho0221/pt_linking. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

DEEPrior: a deep learning tool for the prioritization of gene fusions

Bioinformatics ◽

10.1093/bioinformatics/btaa069 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3248-3250

Author(s):

Marta Lovino ◽

Maria Serena Ciaburri ◽

Gianvito Urgese ◽

Santa Di Cataldo ◽

Elisa Ficarra

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Gene Fusion ◽

Supplementary Information ◽

Gene Fusions ◽

Supplementary Data ◽

Learning Tool ◽

Cancer Driver ◽

Open Issue ◽

Passenger Mutation

Abstract Summary In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation Both DEEPrior and the protein fusions dataset are freely available from GitHub at (https://github.com/bioinformatics-polito/DEEPrior). The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

An improved deep learning method for predicting DNA-binding proteins based on contextual features in amino acid sequences

PLoS ONE ◽

10.1371/journal.pone.0225317 ◽

2019 ◽

Vol 14 (11) ◽

pp. e0225317 ◽

Cited By ~ 3

Author(s):

Siquan Hu ◽

Ruixiong Ma ◽

Haiou Wang

Keyword(s):

Deep Learning ◽

Amino Acid ◽

Dna Binding ◽

Binding Proteins ◽

Dna Binding Proteins ◽

Amino Acid Sequences ◽

Learning Method ◽

Contextual Features

Download Full-text

Structured crowdsourcing enables convolutional segmentation of histology images

Bioinformatics ◽

10.1093/bioinformatics/btz083 ◽

2019 ◽

Vol 35 (18) ◽

pp. 3461-3467 ◽

Cited By ~ 12

Author(s):

Mohamed Amgad ◽

Habiba Elfandy ◽

Hagar Hussein ◽

Lamees A Atteya ◽

Mai A T Elsebaie ◽

...

Keyword(s):

Breast Cancer ◽

Deep Learning ◽

Classification Accuracy ◽

Supplementary Information ◽

Supplementary Data ◽

Digital Slide ◽

Convolutional Networks ◽

Fully Convolutional Networks ◽

Annotation Data ◽

Whole Slide Images

Abstract Motivation While deep-learning algorithms have demonstrated outstanding performance in semantic image segmentation tasks, large annotation datasets are needed to create accurate models. Annotation of histology images is challenging due to the effort and experience required to carefully delineate tissue structures, and difficulties related to sharing and markup of whole-slide images. Results We recruited 25 participants, ranging in experience from senior pathologists to medical students, to delineate tissue regions in 151 breast cancer slides using the Digital Slide Archive. Inter-participant discordance was systematically evaluated, revealing low discordance for tumor and stroma, and higher discordance for more subjectively defined or rare tissue classes. Feedback provided by senior participants enabled the generation and curation of 20 000+ annotated tissue regions. Fully convolutional networks trained using these annotations were highly accurate (mean AUC=0.945), and the scale of annotation data provided notable improvements in image classification accuracy. Availability and Implementation Dataset is freely available at: https://goo.gl/cNM4EL. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Precise estimation of residue relative solvent accessible area from Cα atom distance matrix using a deep learning method

Bioinformatics ◽

10.1093/bioinformatics/btab616 ◽

2021 ◽

Author(s):

Jianzhao Gao ◽

Shuangjia Zheng ◽

Mengting Yao ◽

Peikun Wu

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Function ◽

Pearson Correlation ◽

Correlation Coefficients ◽

Distance Matrix ◽

Supplementary Information ◽

Learning Method ◽

Solvent Accessible Area ◽

Accessible Area

Abstract Motivation The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. Results In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921–0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. Availabilityand implementation The method is free available at https://github.com/cliffgao/EAGERER. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

CATHER: a novel threading algorithm with predicted contacts

Bioinformatics ◽

10.1093/bioinformatics/btz876 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2119-2125 ◽

Cited By ~ 1

Author(s):

Zongyang Du ◽

Shuo Pan ◽

Qi Wu ◽

Zhenling Peng ◽

Jianyi Yang

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Structure Prediction ◽

Supplementary Information ◽

Supplementary Data ◽

Contact Map ◽

Test Set ◽

Benchmark Tests ◽

Independent Test ◽

Push Forward

Abstract Motivation Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. Results We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. Availability and implementation http://yanglab.nankai.edu.cn/CATHER/. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

TCN-HBP: A Deep Learning Method for Identifying Hormone-Binding Proteins from Amino Acid Sequences Based on a Temporal Convolution Neural Network

Journal of Physics Conference Series ◽

10.1088/1742-6596/2025/1/012002 ◽

2021 ◽

Vol 2025 (1) ◽

pp. 012002

Author(s):

Jing Guo

Keyword(s):

Neural Network ◽

Deep Learning ◽

Amino Acid ◽

Binding Proteins ◽

Amino Acid Sequences ◽

Convolution Neural Network ◽

Learning Method ◽

Hormone Binding

Download Full-text

In silico spectral libraries by deep learning facilitate data-independent acquisition proteomics

Nature Communications ◽

10.1038/s41467-019-13866-z ◽

2020 ◽

Vol 11 (1) ◽

Cited By ~ 23

Author(s):

Yi Yang ◽

Xiaohui Liu ◽

Chengpin Shen ◽

Yu Lin ◽

Pengyuan Yang ◽

...

Keyword(s):

Deep Learning ◽

In Silico ◽

State Of The Art ◽

Protein Detection ◽

Serum Samples ◽

Data Independent Acquisition ◽

Peptide Detectability ◽

Spectral Libraries ◽

Prediction In Silico

AbstractData-independent acquisition (DIA) is an emerging technology for quantitative proteomic analysis of large cohorts of samples. However, sample-specific spectral libraries built by data-dependent acquisition (DDA) experiments are required prior to DIA analysis, which is time-consuming and limits the identification/quantification by DIA to the peptides identified by DDA. Herein, we propose DeepDIA, a deep learning-based approach to generate in silico spectral libraries for DIA analysis. We demonstrate that the quality of in silico libraries predicted by instrument-specific models using DeepDIA is comparable to that of experimental libraries, and outperforms libraries generated by global models. With peptide detectability prediction, in silico libraries can be built directly from protein sequence databases. We further illustrate that DeepDIA can break through the limitation of DDA on peptide/protein detection, and enhance DIA analysis on human serum samples compared to the state-of-the-art protocol using a DDA library. We expect this work expanding the toolbox for DIA proteomics.

Download Full-text

A column-based deep learning method for the detection and quantification of atrophy associated with AMD in OCT scans

Medical Image Analysis ◽

10.1016/j.media.2021.102130 ◽

2021 ◽

pp. 102130

Author(s):

Adi Szeskin ◽

Roei Yehuda ◽

Or Shmueli ◽

Jaime Levy ◽

Leo Joskowicz

Keyword(s):

Deep Learning ◽

Learning Method ◽

Detection And Quantification

Download Full-text

keras_dna: a wrapper for fast implementation of deep learning models in genomics

Bioinformatics ◽

10.1093/bioinformatics/btaa929 ◽

2020 ◽

Author(s):

Etienne Routhier ◽

Ayman Bin Kamruddin ◽

Julien Mozziconacci

Keyword(s):

Deep Learning ◽

Dna Sequences ◽

Supplementary Information ◽

Supplementary Data ◽

Learning Models ◽

Multiple Targets ◽

Fast Implementation ◽

Model Training ◽

High Level

Abstract Summary Prediction of genomic annotations from DNA sequences using deep learning is today becoming a flourishing field with many applications. Nevertheless, there are still difficulties in handling data in order to conveniently build and train models dedicated for specific end-user’s tasks. keras_dna is designed for an easy implementation of Keras models (TensorFlow high level API) for genomics. It can handle standard bioinformatic files formats as inputs such as bigwig, gff, bed, wig, bedGraph or fasta and returns standardized inputs for model training. keras_dna is designed to implement existing models but also to facilitate the development of news models that can have single or multiple targets or inputs. Availability and implementation Freely available with a MIT License using pip install keras_dna or cloning the github repo at https://github.com/etirouthier/keras_dna.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text