DeepMSPeptide: peptide detectability prediction using deep learning

2019 ◽  
Author(s):  
Guillermo Serrano ◽  
Elizabeth Guruceaga ◽  
Victor Segura

Abstract Summary The protein detection and quantification using high-throughput proteomic technologies is still challenging due to the stochastic nature of the peptide selection in the mass spectrometer, the difficulties in the statistical analysis of the results and the presence of degenerated peptides. However, considering in the analysis only those peptides that could be detected by mass spectrometry, also called proteotypic peptides, increases the accuracy of the results. Several approaches have been applied to predict peptide detectability based on the physicochemical properties of the peptides. In this manuscript, we present DeepMSPeptide, a bioinformatic tool that uses a deep learning method to predict proteotypic peptides exclusively based on the peptide amino acid sequences. Availability and implementation DeepMSPeptide is available at https://github.com/vsegurar/DeepMSPeptide. Supplementary information Supplementary data are available at Bioinformatics online.

2020 ◽  
Vol 36 (19) ◽  
pp. 4935-4941 ◽  
Author(s):  
Yao Yao ◽  
Ihor Smal ◽  
Ilya Grigoriev ◽  
Anna Akhmanova ◽  
Erik Meijering

Abstract Motivation Biological studies of dynamic processes in living cells often require accurate particle tracking as a first step toward quantitative analysis. Although many particle tracking methods have been developed for this purpose, they are typically based on prior assumptions about the particle dynamics, and/or they involve careful tuning of various algorithm parameters by the user for each application. This may make existing methods difficult to apply by non-expert users and to a broader range of tracking problems. Recent advances in deep-learning techniques hold great promise in eliminating these disadvantages, as they can learn how to optimally track particles from example data. Results Here, we present a deep-learning-based method for the data association stage of particle tracking. The proposed method uses convolutional neural networks and long short-term memory networks to extract relevant dynamics features and predict the motion of a particle and the cost of linking detected particles from one time point to the next. Comprehensive evaluations on datasets from the particle tracking challenge demonstrate the competitiveness of the proposed deep-learning method compared to the state of the art. Additional tests on real-time-lapse fluorescence microscopy images of various types of intracellular particles show the method performs comparably with human experts. Availability and implementation The software code implementing the proposed method as well as a description of how to obtain the test data used in the presented experiments will be available for non-commercial purposes from https://github.com/yoyohoho0221/pt_linking. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3248-3250
Author(s):  
Marta Lovino ◽  
Maria Serena Ciaburri ◽  
Gianvito Urgese ◽  
Santa Di Cataldo ◽  
Elisa Ficarra

Abstract Summary In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation Both DEEPrior and the protein fusions dataset are freely available from GitHub at (https://github.com/bioinformatics-polito/DEEPrior). The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3461-3467 ◽  
Author(s):  
Mohamed Amgad ◽  
Habiba Elfandy ◽  
Hagar Hussein ◽  
Lamees A Atteya ◽  
Mai A T Elsebaie ◽  
...  

Abstract Motivation While deep-learning algorithms have demonstrated outstanding performance in semantic image segmentation tasks, large annotation datasets are needed to create accurate models. Annotation of histology images is challenging due to the effort and experience required to carefully delineate tissue structures, and difficulties related to sharing and markup of whole-slide images. Results We recruited 25 participants, ranging in experience from senior pathologists to medical students, to delineate tissue regions in 151 breast cancer slides using the Digital Slide Archive. Inter-participant discordance was systematically evaluated, revealing low discordance for tumor and stroma, and higher discordance for more subjectively defined or rare tissue classes. Feedback provided by senior participants enabled the generation and curation of 20 000+ annotated tissue regions. Fully convolutional networks trained using these annotations were highly accurate (mean AUC=0.945), and the scale of annotation data provided notable improvements in image classification accuracy. Availability and Implementation Dataset is freely available at: https://goo.gl/cNM4EL. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Jianzhao Gao ◽  
Shuangjia Zheng ◽  
Mengting Yao ◽  
Peikun Wu

Abstract Motivation The solvent accessible surface is an essential structural property measure related to the protein structure and protein function. Relative solvent accessible area (RSA) is a standard measure to describe the degree of residue exposure in the protein surface or inside of protein. However, this computation will fail when the residues information is missing. Results In this article, we proposed a novel method for estimation RSA using the Cα atom distance matrix with the deep learning method (EAGERER). The new method, EAGERER, achieves Pearson correlation coefficients of 0.921–0.928 on two independent test datasets. We empirically demonstrate that EAGERER can yield better Pearson correlation coefficients than existing RSA estimators, such as coordination number, half sphere exposure and SphereCon. To the best of our knowledge, EAGERER represents the first method to estimate the solvent accessible area using limited information with a deep learning model. It could be useful to the protein structure and protein function prediction. Availabilityand implementation The method is free available at https://github.com/cliffgao/EAGERER. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2119-2125 ◽  
Author(s):  
Zongyang Du ◽  
Shuo Pan ◽  
Qi Wu ◽  
Zhenling Peng ◽  
Jianyi Yang

Abstract Motivation Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. Results We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. Availability and implementation http://yanglab.nankai.edu.cn/CATHER/. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Yi Yang ◽  
Xiaohui Liu ◽  
Chengpin Shen ◽  
Yu Lin ◽  
Pengyuan Yang ◽  
...  

AbstractData-independent acquisition (DIA) is an emerging technology for quantitative proteomic analysis of large cohorts of samples. However, sample-specific spectral libraries built by data-dependent acquisition (DDA) experiments are required prior to DIA analysis, which is time-consuming and limits the identification/quantification by DIA to the peptides identified by DDA. Herein, we propose DeepDIA, a deep learning-based approach to generate in silico spectral libraries for DIA analysis. We demonstrate that the quality of in silico libraries predicted by instrument-specific models using DeepDIA is comparable to that of experimental libraries, and outperforms libraries generated by global models. With peptide detectability prediction, in silico libraries can be built directly from protein sequence databases. We further illustrate that DeepDIA can break through the limitation of DDA on peptide/protein detection, and enhance DIA analysis on human serum samples compared to the state-of-the-art protocol using a DDA library. We expect this work expanding the toolbox for DIA proteomics.


2021 ◽  
pp. 102130
Author(s):  
Adi Szeskin ◽  
Roei Yehuda ◽  
Or Shmueli ◽  
Jaime Levy ◽  
Leo Joskowicz

Author(s):  
Etienne Routhier ◽  
Ayman Bin Kamruddin ◽  
Julien Mozziconacci

Abstract Summary Prediction of genomic annotations from DNA sequences using deep learning is today becoming a flourishing field with many applications. Nevertheless, there are still difficulties in handling data in order to conveniently build and train models dedicated for specific end-user’s tasks. keras_dna is designed for an easy implementation of Keras models (TensorFlow high level API) for genomics. It can handle standard bioinformatic files formats as inputs such as bigwig, gff, bed, wig, bedGraph or fasta and returns standardized inputs for model training. keras_dna is designed to implement existing models but also to facilitate the development of news models that can have single or multiple targets or inputs. Availability and implementation Freely available with a MIT License using pip install keras_dna or cloning the github repo at https://github.com/etirouthier/keras_dna.git. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document