scholarly journals DEEPrior: a deep learning tool for the prioritization of gene fusions

2020 ◽  
Vol 36 (10) ◽  
pp. 3248-3250
Author(s):  
Marta Lovino ◽  
Maria Serena Ciaburri ◽  
Gianvito Urgese ◽  
Santa Di Cataldo ◽  
Elisa Ficarra

Abstract Summary In the last decade, increasing attention has been paid to the study of gene fusions. However, the problem of determining whether a gene fusion is a cancer driver or just a passenger mutation is still an open issue. Here we present DEEPrior, an inherently flexible deep learning tool with two modes (Inference and Retraining). Inference mode predicts the probability of a gene fusion being involved in an oncogenic process, by directly exploiting the amino acid sequence of the fused protein. Retraining mode allows to obtain a custom prediction model including new data provided by the user. Availability and implementation Both DEEPrior and the protein fusions dataset are freely available from GitHub at (https://github.com/bioinformatics-polito/DEEPrior). The tool was designed to operate in Python 3.7, with minimal additional libraries. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Pavel Beran ◽  
Dagmar Stehlíková ◽  
Stephen P Cohen ◽  
Vladislav Čurn

Abstract Summary Searching for amino acid or nucleic acid sequences unique to one organism may be challenging depending on size of the available datasets. K-mer elimination by cross-reference (KEC) allows users to quickly and easily find unique sequences by providing target and non-target sequences. Due to its speed, it can be used for datasets of genomic size and can be run on desktop or laptop computers with modest specifications. Availability and implementation KEC is freely available for non-commercial purposes. Source code and executable binary files compiled for Linux, Mac and Windows can be downloaded from https://github.com/berybox/KEC. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (18) ◽  
pp. 3461-3467 ◽  
Author(s):  
Mohamed Amgad ◽  
Habiba Elfandy ◽  
Hagar Hussein ◽  
Lamees A Atteya ◽  
Mai A T Elsebaie ◽  
...  

Abstract Motivation While deep-learning algorithms have demonstrated outstanding performance in semantic image segmentation tasks, large annotation datasets are needed to create accurate models. Annotation of histology images is challenging due to the effort and experience required to carefully delineate tissue structures, and difficulties related to sharing and markup of whole-slide images. Results We recruited 25 participants, ranging in experience from senior pathologists to medical students, to delineate tissue regions in 151 breast cancer slides using the Digital Slide Archive. Inter-participant discordance was systematically evaluated, revealing low discordance for tumor and stroma, and higher discordance for more subjectively defined or rare tissue classes. Feedback provided by senior participants enabled the generation and curation of 20 000+ annotated tissue regions. Fully convolutional networks trained using these annotations were highly accurate (mean AUC=0.945), and the scale of annotation data provided notable improvements in image classification accuracy. Availability and Implementation Dataset is freely available at: https://goo.gl/cNM4EL. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Guillermo Serrano ◽  
Elizabeth Guruceaga ◽  
Victor Segura

Abstract Summary The protein detection and quantification using high-throughput proteomic technologies is still challenging due to the stochastic nature of the peptide selection in the mass spectrometer, the difficulties in the statistical analysis of the results and the presence of degenerated peptides. However, considering in the analysis only those peptides that could be detected by mass spectrometry, also called proteotypic peptides, increases the accuracy of the results. Several approaches have been applied to predict peptide detectability based on the physicochemical properties of the peptides. In this manuscript, we present DeepMSPeptide, a bioinformatic tool that uses a deep learning method to predict proteotypic peptides exclusively based on the peptide amino acid sequences. Availability and implementation DeepMSPeptide is available at https://github.com/vsegurar/DeepMSPeptide. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 36 (7) ◽  
pp. 2119-2125 ◽  
Author(s):  
Zongyang Du ◽  
Shuo Pan ◽  
Qi Wu ◽  
Zhenling Peng ◽  
Jianyi Yang

Abstract Motivation Threading is one of the most effective methods for protein structure prediction. In recent years, the increasing accuracy in protein contact map prediction opens a new avenue to improve the performance of threading algorithms. Several preliminary studies suggest that with predicted contacts, the performance of threading algorithms can be improved greatly. There is still much room to explore to make better use of predicted contacts. Results We have developed a new contact-assisted threading algorithm named CATHER using both conventional sequential profiles and contact map predicted by a deep learning-based algorithm. Benchmark tests on an independent test set and the CASP12 targets demonstrated that CATHER made significant improvement over other methods which only use either sequential profile or predicted contact map. Our method was ranked at the Top 10 among all 39 participated server groups on the 32 free modeling targets in the blind tests of the CASP13 experiment. These data suggest that it is promising to push forward the threading algorithms by using predicted contacts. Availability and implementation http://yanglab.nankai.edu.cn/CATHER/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 20 (7) ◽  
pp. 1645 ◽  
Author(s):  
Marta Lovino ◽  
Gianvito Urgese ◽  
Enrico Macii ◽  
Santa Di Cataldo ◽  
Elisa Ficarra

Gene fusions have a very important role in the study of cancer development. In this regard, predicting the probability of protein fusion transcripts of developing into a cancer is a very challenging and yet not fully explored research problem. To this date, all the available approaches in literature try to explain the oncogenic potential of gene fusions based on protein domain analysis, that is cancer-specific and not easy to adapt to newly developed information. In our work, we choose the raw protein sequences as the input baseline, and propose the use of deep learning, and more specifically Convolutional Neural Networks, to infer the oncogenity probability score of gene fusion transcripts and to group them into a number of categories (e.g., oncogenic/not oncogenic). This is an inherently flexible methodology that, unlike previous approaches, can be re-trained with very less efforts on newly available data (for example, from a different cancer). Based on experimental results on a large dataset of pre-annotated gene fusions, our method is able to predict the oncogenity potential of gene fusion transcripts with accuracy of about 72%, which increases to 86% if we consider the only instances that are classified with a high confidence level.


Author(s):  
Borja Pitarch ◽  
Juan A G Ranea ◽  
Florencio Pazos

Abstract Motivation Predicting the residues controlling a protein’s interaction specificity is important not only to better understand its interactions but also to design mutations aimed at fine-tuning or swapping them as well. Results In this work, we present a methodology that combines sequence information (in the form of multiple sequence alignments) with interactome information to detect that kind of residues in paralogous families of proteins. The interactome is used to define pairwise similarities of interaction contexts for the proteins in the alignment. The method looks for alignment positions with patterns of amino-acid changes reflecting the similarities/differences in the interaction neighborhoods of the corresponding proteins. We tested this new methodology in a large set of human paralogous families with structurally characterized interactions, and discuss in detail the results for the RasH family. We show that this approach is a better predictor of interfacial residues than both, sequence conservation and an equivalent ‘unsupervised’ method that does not use interactome information. Availability and implementation http://csbg.cnb.csic.es/pazos/Xdet/. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Etienne Routhier ◽  
Ayman Bin Kamruddin ◽  
Julien Mozziconacci

Abstract Summary Prediction of genomic annotations from DNA sequences using deep learning is today becoming a flourishing field with many applications. Nevertheless, there are still difficulties in handling data in order to conveniently build and train models dedicated for specific end-user’s tasks. keras_dna is designed for an easy implementation of Keras models (TensorFlow high level API) for genomics. It can handle standard bioinformatic files formats as inputs such as bigwig, gff, bed, wig, bedGraph or fasta and returns standardized inputs for model training. keras_dna is designed to implement existing models but also to facilitate the development of news models that can have single or multiple targets or inputs. Availability and implementation Freely available with a MIT License using pip install keras_dna or cloning the github repo at https://github.com/etirouthier/keras_dna.git. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Author(s):  
Zijie Jin ◽  
Wenjian Huang ◽  
Ning Shen ◽  
Juan Li ◽  
Xiaochen Wang ◽  
...  

AbstractGene fusions are widespread in tumor cells and can play important roles in tumor initiation and progression. Using full length single cell RNA sequencing (scRNA-seq), gene fusions can now be detected at single cell level by analyzing chimeric reads in scRNA-seq. However, scRNA-seq data has a high noise level and contains various technical artefacts. Direct application of fusion detection tools developed for bulk data can lead to spurious fusion discoveries and leave some true fusions undetected. In this paper, we present a computational tool, scFusion, for gene fusion detection based on scRNA-seq. scFusion is composed of a statistical model and a deep learning model, both of which are designed to control for potential false discoveries. The statistical model models the background noise as zero inflated negative binomial and uses a statistical testing procedure to control for false positives. The deep learning model is trained to recognize technical chimeric artefacts and filter false fusion candidates generated by these artefacts. We compared scFusion with bulk fusion detection methods using simulation data created based on real scRNA-seq data and found that scFusion had superior performance. Applying scFusion to a T cell data, scFusion successfully detected the invariant TCR gene recombinations in Mucosal-associated invariant T cells that many bulk methods failed to detect. In a multiple myeloma data, scFusion detected the known recurrent fusion IgH-WHSC1, which was associated with overexpression of the WHSC1 oncogene.SignificanceA critical challenge for fusion detection based on the full-length single cell RNA sequencing (scRNA-seq) is to identify the needles, or the true fusions, from a large haystack of false positives. We developed a fusion detection tool scFusion for scRNA-seq. scFusion is computationally more efficient, has far less false discoveries while achieves similar detection power compared to fusion detection tools developed for bulk data. Application of scFusion to a multiple myeloma dataset identied subclones with the fusion IgH-WHSC1 and revealed that over-expression of the oncogene WHSC1 was strongly associated with the fusion. The models developed in this work may also be generalized for other single cell analyses such as structural variation detection and the alternative splicing analysis.


Author(s):  
Denisa Bojkova ◽  
Jake E McGreig ◽  
Katie-May McLaughlin ◽  
Stuart G Masterson ◽  
Magdalena Antczak ◽  
...  

Abstract Motivation SARS-CoV-2 is a novel coronavirus currently causing a pandemic. Here, we performed a combined in-silico and cell culture comparison of SARS-CoV-2 and the closely related SARS-CoV. Results Many amino acid positions are differentially conserved between SARS-CoV-2 and SARS-CoV, which reflects the discrepancies in virus behaviour, i.e. more effective human-to-human transmission of SARS-CoV-2 and higher mortality associated with SARS-CoV. Variations in the S protein (mediates virus entry) were associated with differences in its interaction with ACE2 (cellular S receptor) and sensitivity to TMPRSS2 (enables virus entry via S cleavage) inhibition. Anti-ACE2 antibodies more strongly inhibited SARS-CoV than SARS-CoV-2 infection, probably due to a stronger SARS-CoV-2 S-ACE2 affinity relative to SARS-CoV S. Moreover, SARS-CoV-2 and SARS-CoV displayed differences in cell tropism. Cellular ACE2 and TMPRSS2 levels did not indicate susceptibility to SARS-CoV-2. In conclusion, we identified genomic variation between SARS-CoV-2 and SARS-CoV that may reflect the differences in their clinical and biological behaviour. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (16) ◽  
pp. 4527-4529
Author(s):  
Ales Saska ◽  
David Tichy ◽  
Robert Moore ◽  
Achilles Rasquinha ◽  
Caner Akdas ◽  
...  

Abstract Summary Visualizing a network provides a concise and practical understanding of the information it represents. Open-source web-based libraries help accelerate the creation of biologically based networks and their use. ccNetViz is an open-source, high speed and lightweight JavaScript library for visualization of large and complex networks. It implements customization and analytical features for easy network interpretation. These features include edge and node animations, which illustrate the flow of information through a network as well as node statistics. Properties can be defined a priori or dynamically imported from models and simulations. ccNetViz is thus a network visualization library particularly suited for systems biology. Availability and implementation The ccNetViz library, demos and documentation are freely available at http://helikarlab.github.io/ccNetViz/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document