scholarly journals qpMerge: Merging different peptide isoforms using a motif centric strategy

2016 ◽  
Author(s):  
Matthew M. Hindle ◽  
Thierry Le Bihan ◽  
Johanna Krahmer ◽  
Sarah F. Martin ◽  
Zeenat B. Noordally ◽  
...  

AbstractAccurate quantification and enumeration of peptide motifs is hampered by redundancy in peptide identification. A single phosphorylation motif may be split across charge states, alternative modifications (e.g. acetylation and oxidation), and multiple miss-cleavage sites which render the biological interpretation of MS data a challenge. In addition motif redundancy can affect quantitative and statistical analysis and prevent a realistic comparison of peptide numbers between datasets. In this study, we present a merging tool set developed for the Galaxy workflow environment to achieve a non-redundant set of quantifications for phospho-motifs. We present a Galaxy workflow to merge three exemplar dataset, and observe reduced phospho-motif redundancy and decreased replicate variation. The qpMerge tools provide a straightforward and reusable approach to facilitating phospho-motif analysis.The source-code and wiki documentation is publically available at http://sourceforge.net/projects/ppmerge. The galaxy pipeline used in the exemplar analysis can be found at http://www.myexperiment.org/workflows/4186.

2019 ◽  
Author(s):  
Christian Ndekezi ◽  
Joseph Nkamwesiga ◽  
Sylvester Ochwo ◽  
Magambo Phillip Kimuda ◽  
Frank Norbert Mwiine ◽  
...  

AbstractTicks are arthropod vectors of pathogens of both Veterinary and Public health importance. Ticks are largely controlled by acaricide application. However, acaricide efficacy is hampered by high cost, the need for regular application and selection of multi-acaricide resistant tick populations. In light of this, future tick control approaches are poised to rely on integration of rational acaricide application and other methods such as vaccination. To contribute to systematic research-guided efforts to produce anti-tick vaccines, we carried out an in silico tick Aquaporin-1 protein (AQP1) analysis to identify unique tick AQP1 peptide motifs that can be used in future peptide anti-tick vaccine development. We used multiple sequence alignment (MSA), motif analysis, homology modeling, and structural analysis to identify unique tick AQP1 peptide motifs. BepiPred, Chou & Fasman-Turn, Karplus & Schulz Flexibility and Parker-Hydrophilicity prediction models were used to asses these motifs’ abilities to induce antibody mediated immune responses. Tick AQP1 (MK334178) protein homology was largely similar to the bovine AQP1 (PDB:1J4N) (23% sequence similarity; Structural superimposition RMS=1.475). The highest similarities were observed in the transmembrane domains while differences were observed in the extra and intra cellular protein loops. Two unique tick AQP1 (MK334178) motifs, M7 (residues 106-125, p=5.4e-25) and M8 (residues 85-104, p=3.3e-24) were identified. These two motifs are located on the extra-cellular AQP1 domain and showed the highest Parker-Hydrophilicity prediction immunogenic scores of 1.153 and 2.612 respectively. The M7 and M8 motifs are a good starting point for the development of potential peptide-based anti-tick vaccine. Further analyses such as in vivo immunization assays are required to validate these findings.


2017 ◽  
Author(s):  
Bérénice Batut ◽  
Kévin Gravouil ◽  
Clémence Defois ◽  
Saskia Hiltemann ◽  
Jean-François Brugère ◽  
...  

AbstractBackgroundNew generation of sequencing platforms coupled to numerous bioinformatics tools has led to rapid technological progress in metagenomics and metatranscriptomics to investigate complex microorganism communities. Nevertheless, a combination of different bioinformatic tools remains necessary to draw conclusions out of microbiota studies. Modular and user-friendly tools would greatly improve such studies.FindingsWe therefore developed ASaiM, an Open-Source Galaxy-based framework dedicated to microbiota data analyses. ASaiM provides a curated collection of tools to explore and visualize taxonomic and functional information from raw amplicon, metagenomic or metatranscriptomic sequences. To guide different analyses, several customizable workflows are included. All workflows are supported by tutorials and Galaxy interactive tours to guide the users through the analyses step by step. ASaiM is implemented as Galaxy Docker flavour. It is scalable to many thousand datasets, but also can be used a normal PC. The associated source code is available under Apache 2 license at https://github.com/ASaiM/framework and documentation can be found online (http://asaim.readthedocs.io/)ConclusionsBased on the Galaxy framework, ASaiM offers sophisticated analyses to scientists without command-line knowledge. ASaiM provides a powerful framework to easily and quickly explore microbiota data in a reproducible and transparent environment.


GigaScience ◽  
2020 ◽  
Vol 9 (4) ◽  
Author(s):  
Thomas McGowan ◽  
James E Johnson ◽  
Praveen Kumar ◽  
Ray Sajulga ◽  
Subina Mehta ◽  
...  

Abstract Background Proteogenomics integrates genomics, transcriptomics, and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate ‘omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing, and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation. Findings MVP is built as an HTML Galaxy plug-in, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input—a custom data type (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface. Conclusions MVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.


2019 ◽  
Vol 20 (1) ◽  
Author(s):  
Rodolfo S. Allendes Osorio ◽  
Lokesh P. Tripathi ◽  
Kenji Mizuguchi

Abstract Background When visually comparing the results of hierarchical clustering, the differences in the arrangements of components are of special interest. However, in a biological setting, identifying such differences becomes less straightforward, as the changes in the dendrogram structure caused by permuting biological replicates, do not necessarily imply a different biological interpretation. Here, we introduce a visualization tool to help identify biologically similar topologies across different clustering results, even in the presence of replicates. Results Here we introduce CLINE, an open-access web application that allows users to visualize and compare multiple dendrogram structures, by visually displaying the links between areas of similarity across multiple structures. Through the use of a single page and a simple user interface, the user is able to load and remove structures form the visualization, change some aspects of their display and set the parameters used to match cluster topology across consecutive pairs of dendrograms. Conclusions We have implemented a web-tool that allows the users to visualize different dendrogram structures, showing not only the structures themselves, but also linking areas of similarity across multiple structures. The software is freely available at http://mizuguchilab.org/tools/cline/. Also, the source code, documentation and installation instructions are available on GitHub at https://github.com/RodolfoAllendes/cline/.


2006 ◽  
Vol 04 (01) ◽  
pp. 75-92 ◽  
Author(s):  
TUN-WEN PAI ◽  
BO-HAN SU ◽  
PEI-CHIH WU ◽  
MARGARET DAH-TSYR CHANG ◽  
HAO-TENG CHANG ◽  
...  

Human ribonuclease A (RNaseA) superfamily consists of eight RNases with high similarity in which RNase2 and RNase3 share 76.7% identity. The evolutionary variation of RNases results in differential structures and functions of the enzymes. To distinguish the characteristics of each RNase, we developed reinforced merging algorithms (RMA) to rapidly identify the unique peptide motifs for each member of the highly conserved human RNaseA superfamily. Many motifs in RNase3 identified by RMA correlated well with the antigenic regions predicted by DNAStar. Two unique peptide motifs were experimentally confirmed to contain epitopes for monoclonal antibodies (mAbs) specifically against RNase3. Further analysis of homologous RNases in different species revealed that the unique peptide motifs were located at the correspondent positions, and one of these motifs indeed matched the epitope for a specific anti-bovine pancreatic RNaseA (bpRNaseA) antibody. Our method provides a useful tool for identification of unique peptide motifs for further experimental design. The RMA system is available and free for academic use at and .


2018 ◽  
Vol 35 (7) ◽  
pp. 1249-1251 ◽  
Author(s):  
Kai Li ◽  
Marc Vaudel ◽  
Bing Zhang ◽  
Yan Ren ◽  
Bo Wen

Abstract Summary Data visualization plays critical roles in proteomics studies, ranging from quality control of MS/MS data to validation of peptide identification results. Herein, we present PDV, an integrative proteomics data viewer that can be used to visualize a wide range of proteomics data, including database search results, de novo sequencing results, proteogenomics files, MS/MS data in mzML/mzXML format and data from public proteomics repositories. PDV is a lightweight visualization tool that enables intuitive and fast exploration of diverse, large-scale proteomics datasets on standard desktop computers in both graphical user interface and command line modes. Availability and implementation PDV software and the user manual are freely available at http://pdv.zhang-lab.org. The source code is available at https://github.com/wenbostar/PDV and is released under the GPL-3 license. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Thomas McGowan ◽  
James E. Johnson ◽  
Praveen Kumar ◽  
Ray Sajulga ◽  
Subina Mehta ◽  
...  

AbstractBackgroundProteogenomics integrates genomics, transcriptomics and mass spectrometry (MS)-based proteomics data to identify novel protein sequences arising from gene and transcript sequence variants. Proteogenomic data analysis requires integration of disparate ‘omic software tools, as well as customized tools to view and interpret results. The flexible Galaxy platform has proven valuable for proteogenomic data analysis. Here, we describe a novel Multi-omics Visualization Platform (MVP) for organizing, visualizing and exploring proteogenomic results, adding a critically needed tool for data exploration and interpretation.FindingsMVP is built as an HTML Galaxy plugin, primarily based on JavaScript. Via the Galaxy API, MVP uses SQLite databases as input -- a custom datatype (mzSQLite) containing MS-based peptide identification information, a variant annotation table, and a coding sequence table. Users can interactively filter identified peptides based on sequence and data quality metrics, view annotated peptide MS data, and visualize protein-level information, along with genomic coordinates. Peptides that pass the user-defined thresholds can be sent back to Galaxy via the API for further analysis; processed data and visualizations can also be saved and shared. MVP leverages the Integrated Genomics Viewer JavaScript (IGVjs) framework, enabling interactive visualization of peptides and corresponding transcript and genomic coding information within the MVP interface.ConclusionsMVP provides a powerful, extensible platform for automated, interactive visualization of proteogenomic results within the Galaxy environment, adding a unique and critically needed tool for empowering exploration and interpretation of results. The platform is extensible, providing a basis for further development of new functionalities for proteogenomic data visualization.


2021 ◽  
Author(s):  
Le Zhang ◽  
Geng Liu ◽  
Guixue Hou ◽  
Haitao Xiang ◽  
Xi Zhang ◽  
...  

Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities and/or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we have developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search and then build a targeted database for refined search. Evaluated on three representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 80% comparing to conventional searches with unspecific digestions while maintaining a very high accuracy (~96%) as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data so that it performs equally well on both well-studied and poorly-studied HLA types, unlike a previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to conventional database search engines. Finally, we demonstrate the practical value of IntroSpect by discovering neoantigens from MS data directly. IntroSpect is freely available at https://github.com/BGI2016/IntroSpect.


Author(s):  
Junliang Shang ◽  
Jing Wang ◽  
Yan Sun ◽  
Feng Li ◽  
Jin-Xing Liu ◽  
...  

Abstract Motivation For network-assisted analysis, which has become a popular method of data mining, network construction is a crucial task. Network construction relies on the accurate quantification of direct associations among variables. The existence of multiscale associations among variables presents several quantification challenges, especially when quantifying nonlinear direct interactions. Results In this study, the multiscale part mutual information (MPMI), based on part mutual information (PMI) and nonlinear partial association (NPA), was developed for effectively quantifying nonlinear direct associations among variables in networks with multiscale associations. First, we defined the MPMI in theory and derived its five important properties. Second, an experiment in a three-node network was carried out to numerically estimate its quantification ability under two cases of strong associations. Third, experiments of the MPMI and comparisons with the PMI, NPA and conditional mutual information were performed on simulated datasets and on datasets from DREAM challenge project. Finally, the MPMI was applied to real datasets of glioblastoma and lung adenocarcinoma to validate its effectiveness. Results showed that the MPMI is an effective alternative measure for quantifying nonlinear direct associations in networks, especially those with multiscale associations. Availability The source code of MPMI is available online at https://github.com/CDMB-lab/MPMI. Supplementary information Supplementary data are available at Bioinformatics online.


2021 ◽  
Author(s):  
Tom Altenburg ◽  
Thilo Muth ◽  
Bernhard Y. Renard

AbstractMass spectrometry-based proteomics allows to study all proteins of a sample on a molecular level. The ever increasing complexity and amount of proteomics MS-data requires powerful and yet efficient computational and statistical analysis. In particular, most recent bottom-up MS-based proteomics studies consider either a diverse pool of post-translational modifications, employ large databases – as in metaproteomics or proteogenomics, contain multiple isoforms of proteins, include unspecific cleavage sites or even combinations thereof and thus face a computationally challenging situation regarding protein identification. In order to cope with resulting large search spaces, we present a deep learning approach that jointly embeds MS/MS spectra and peptides into the same vector space such that embeddings can be compared easily and interchangeable by using euclidean distances. In contrast to existing spectrum embedding techniques, ours are learned jointly with their respective peptides and thus remain meaningful. By visualizing the learned manifold of both spectrum and peptide embeddings in correspondence to their physicochemical properties our approach becomes easily interpretable. At the same time, our joint embeddings blur the lines between spectra and protein sequences, providing a powerful framework for peptide identification. In particular, we build an open search, which allows to search multiple ten-thousands of spectra against millions of peptides within seconds. yHydra achieves identification rates that are compatible with MSFragger. Due to the open search, delta masses are assigned to each identification which allows to unrestrictedly characterize post-translational modifications. Meaningful joint embeddings allow for faster open searches and generally make downstream analysis efficient and convenient for example for integration with other omics types.Availabilityupon [email protected]


Sign in / Sign up

Export Citation Format

Share Document