scholarly journals HiLight-PTM: an online application to aid matching peptide pairs with isotopically labelled PTMs

2019 ◽  
Author(s):  
Harry J Whitwell ◽  
Peter DiMaggio

Abstract Motivation Database searching of isotopically labelled PTMs can be problematic and we frequently find that only one, or neither in a heavy/light pair are assigned. In such cases, having a pair of MS/MS spectra that differ due to an isotopic label can assist in identifying the relevant m/z values that support the correct peptide annotation or can be used for de novo sequencing. Results We have developed an online application that identifies matching peaks and peaks differing by the appropriate mass shift (difference between heavy and light PTM) between two MS/MS spectra. Furthermore, the application predicts, from the exact-match peaks, the mass of their complementary ions and highlights these as high confidence matches between the two spectra. The result is a tool to visually compare two spectra, and downloadable peaks lists that can be used to support de novo sequencing. Availability and implementation HiLight-PTM is released using shinyapps.io by RStudio, and can be accessed from any internet browser at https://harrywhitwell.shinyapps.io/hilight-ptm/. Supplementary information Supplementary data are available at Bioinformatics online.

2019 ◽  
Vol 35 (14) ◽  
pp. i183-i190 ◽  
Author(s):  
Hao Yang ◽  
Hao Chi ◽  
Wen-Feng Zeng ◽  
Wen-Jing Zhou ◽  
Si-Min He

AbstractMotivationDe novo peptide sequencing based on tandem mass spectrometry data is the key technology of shotgun proteomics for identifying peptides without any database and assembling unknown proteins. However, owing to the low ion coverage in tandem mass spectra, the order of certain consecutive amino acids cannot be determined if all of their supporting fragment ions are missing, which results in the low precision of de novo sequencing.ResultsIn order to solve this problem, we developed pNovo 3, which used a learning-to-rank framework to distinguish similar peptide candidates for each spectrum. Three metrics for measuring the similarity between each experimental spectrum and its corresponding theoretical spectrum were used as important features, in which the theoretical spectra can be precisely predicted by the pDeep algorithm using deep learning. On seven benchmark datasets from six diverse species, pNovo 3 recalled 29–102% more correct spectra, and the precision was 11–89% higher than three other state-of-the-art de novo sequencing algorithms. Furthermore, compared with the newly developed DeepNovo, which also used the deep learning approach, pNovo 3 still identified 21–50% more spectra on the nine datasets used in the study of DeepNovo. In summary, the deep learning and learning-to-rank techniques implemented in pNovo 3 significantly improve the precision of de novo sequencing, and such machine learning framework is worth extending to other related research fields to distinguish the similar sequences.Availability and implementationpNovo 3 can be freely downloaded from http://pfind.ict.ac.cn/software/pNovo/index.html.Supplementary informationSupplementary data are available at Bioinformatics online.


Author(s):  
Yuansheng Liu ◽  
Xiaocai Zhang ◽  
Quan Zou ◽  
Xiangxiang Zeng

Abstract Summary Removing duplicate and near-duplicate reads, generated by high-throughput sequencing technologies, is able to reduce computational resources in downstream applications. Here we develop minirmd, a de novo tool to remove duplicate reads via multiple rounds of clustering using different length of minimizer. Experiments demonstrate that minirmd removes more near-duplicate reads than existing clustering approaches and is faster than existing multi-core tools. To the best of our knowledge, minirmd is the first tool to remove near-duplicates on reverse-complementary strand. Availability and implementation https://github.com/yuansliu/minirmd. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (16) ◽  
pp. 2724-2729 ◽  
Author(s):  
L Carron ◽  
J B Morlot ◽  
V Matthys ◽  
A Lesne ◽  
J Mozziconacci

Abstract Motivation Genome-wide chromosomal contact maps are widely used to uncover the 3D organization of genomes. They rely on collecting millions of contacting pairs of genomic loci. Contacts at short range are usually well measured in experiments, while there is a lot of missing information about long-range contacts. Results We propose to use the sparse information contained in raw contact maps to infer high-confidence contact counts between all pairs of loci. Our algorithmic procedure, Boost-HiC, enables the detection of Hi-C patterns such as chromosomal compartments at a resolution that would be otherwise only attainable by sequencing a hundred times deeper the experimental Hi-C library. Boost-HiC can also be used to compare contact maps at an improved resolution. Availability and implementation Boost-HiC is available at https://github.com/LeopoldC/Boost-HiC. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Xin Li ◽  
Haiyan Hu ◽  
Xiaoman Li

Abstract Motivation It is essential to study bacterial strains in environmental samples. Existing methods and tools often depend on known strains or known variations, cannot work on individual samples, not reliable, or not easy to use, etc. It is thus important to develop more user-friendly tools that can identify bacterial strains more accurately. Results We developed a new tool called mixtureS that can de novo identify bacterial strains from shotgun reads of a clonal or metagenomic sample, without prior knowledge about the strains and their variations. Tested on 243 simulated datasets and 195 experimental datasets, mixtureS reliably identified the strains, their numbers and their abundance. Compared with three tools, mixtureS showed better performance in almost all simulated datasets and the vast majority of experimental datasets. Availability and implementation The source code and tool mixtureS is available at http://www.cs.ucf.edu/˜xiaoman/mixtureS/. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Gaëtan Benoit ◽  
Mahendra Mariadassou ◽  
Stéphane Robin ◽  
Sophie Schbath ◽  
Pierre Peterlongo ◽  
...  

Abstract Motivation De novo comparative metagenomics is one of the most straightforward ways to analyze large sets of metagenomic data. Latest methods use the fraction of shared k-mers to estimate genomic similarity between read sets. However, those methods, while extremely efficient, are still limited by computational needs for practical usage outside of large computing facilities. Results We present SimkaMin, a quick comparative metagenomics tool with low disk and memory footprints, thanks to an efficient data subsampling scheme used to estimate Bray-Curtis and Jaccard dissimilarities. One billion metagenomic reads can be analyzed in <3 min, with tiny memory (1.09 GB) and disk (≈0.3 GB) requirements and without altering the quality of the downstream comparative analyses, making of SimkaMin a tool perfectly tailored for very large-scale metagenomic projects. Availability and implementation https://github.com/GATB/simka. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Riccardo Delli Ponti ◽  
Alexandros Armaos ◽  
Andrea Vandelli ◽  
Gian Gaetano Tartaglia

Abstract Motivation RNA structure is difficult to predict in vivo due to interactions with enzymes and other molecules. Here we introduce CROSSalive, an algorithm to predict the single- and double-stranded regions of RNAs in vivo using predictions of protein interactions. Results Trained on icSHAPE data in presence (m6a+) and absence of N6 methyladenosine modification (m6a-), CROSSalive achieves cross-validation accuracies between 0.70 and 0.88 in identifying high-confidence single- and double-stranded regions. The algorithm was applied to the long non-coding RNA Xist (17 900 nt, not present in the training) and shows an Area under the ROC curve of 0.83 in predicting structured regions. Availability and implementation CROSSalive webserver is freely accessible at http://service.tartaglialab.com/new_submission/crossalive Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 35 (15) ◽  
pp. 2663-2664 ◽  
Author(s):  
Carlos de Lannoy ◽  
Judith Risse ◽  
Dick de Ridder

Abstract Summary Nanopore sequencing is a novel development in nucleic acid analysis. As such, nanopore-sequencing hardware and software are updated frequently and extensively, which quickly renders peer-reviewed publications on analysis pipeline benchmarking efforts outdated. To provide the user community with a faster, more flexible alternative to peer-reviewed benchmark papers for de novo assembly tool performance we constructed poreTally, a comprehensive benchmarking tool. poreTally automatically assembles a given read set using several often-used assembly pipelines, analyzes the resulting assemblies for correctness and continuity, and finally generates a quality report, which can immediately be published on Github/Gitlab. Availability and implementation poreTally is available on Github at https://github.com/ cvdelannoy/poreTally, under an MIT license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3812-3814
Author(s):  
Mohamad Koohi-Moghadam ◽  
Mitesh J Borad ◽  
Nhan L Tran ◽  
Kristin R Swanson ◽  
Lisa A Boardman ◽  
...  

Abstract Summary We present MetaMarker, a pipeline for discovering metagenomic biomarkers from whole-metagenome sequencing samples. Different from existing methods, MetaMarker is based on a de novo approach that does not require mapping raw reads to a reference database. We applied MetaMarker on whole-metagenome sequencing of colorectal cancer (CRC) stool samples from France to discover CRC specific metagenomic biomarkers. We showed robustness of the discovered biomarkers by validating in independent samples from Hong Kong, Austria, Germany and Denmark. We further demonstrated these biomarkers could be used to build a machine learning classifier for CRC prediction. Availability and implementation MetaMarker is freely available at https://bitbucket.org/mkoohim/metamarker under GPLv3 license. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4269-4275 ◽  
Author(s):  
Haidong Yan ◽  
Aureliano Bombarely ◽  
Song Li

Abstract Motivation Transposable elements (TEs) classification is an essential step to decode their roles in genome evolution. With a large number of genomes from non-model species becoming available, accurate and efficient TE classification has emerged as a new challenge in genomic sequence analysis. Results We developed a novel tool, DeepTE, which classifies unknown TEs using convolutional neural networks (CNNs). DeepTE transferred sequences into input vectors based on k-mer counts. A tree structured classification process was used where eight models were trained to classify TEs into super families and orders. DeepTE also detected domains inside TEs to correct false classification. An additional model was trained to distinguish between non-TEs and TEs in plants. Given unclassified TEs of different species, DeepTE can classify TEs into seven orders, which include 15, 24 and 16 super families in plants, metazoans and fungi, respectively. In several benchmarking tests, DeepTE outperformed other existing tools for TE classification. In conclusion, DeepTE successfully leverages CNN for TE classification, and can be used to precisely classify TEs in newly sequenced eukaryotic genomes. Availability and implementation DeepTE is accessible at https://github.com/LiLabAtVT/DeepTE. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document