protein inference Latest Research Papers

The protein inference problem is complicated in metaproteomics due to the presence of homologous proteins from closely related species. Nevertheless, this process is vital to assign taxonomy and functions to identified proteins of microbial species, a task for which specialized tools such as Prophane have been developed. We here present Pout2Prot, which takes Percolator Output (.pout) files from multiple experiments and creates protein (sub)group output files (.tsv) that can be used directly with Prophane. Pout2Prot offers different grouping strategies, allows distinction between sample categories and replicates for multiple files, and uses a weighted spectral count for protein (sub)groups to reflect (sub)group abundance. Pout2Prot is available as a web application at https://pout2prot.ugent.be and is installable via pip as a standalone command line tool and reusable software library. All code is open source under the Apache License 2.0 and is available at https://github.com/compomics/pout2prot.

Download Full-text

Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs

10.1101/2021.07.28.454128 ◽

2021 ◽

Author(s):

Karin Schork ◽

Michael Turewicz ◽

Julian Uszkoreit ◽

Jörg Rahnenführer ◽

Martin Eisenacher

Keyword(s):

Real Data ◽

Bipartite Graphs ◽

Data Sets ◽

Complex Structures ◽

Protein Databases ◽

Protein Inference ◽

The Relationship ◽

Ambiguity Groups ◽

Proteins And Peptides

Motivation: In bottom-up proteomics, proteins are enzymatically digested before measurement with mass spectrometry. The relationship between proteins and peptides can be represented by bipartite graphs. This representation is useful to aid protein inference and quantification, which is complex due to the occurrence of shared peptides. We conducted a comprehensive analysis of bipartite graphs using theoretical peptides from in silico digestion of protein databases as well as quantified peptides quantified from real data sets. Results: The graphs based on quantified peptides are smaller and have less complex structures compared to graphs using theoretical peptides. The proportion of protein nodes without unique peptides and of graphs that contain such proteins are considerably greater for real data. Large differences between the two analyzed organisms (mouse and yeast) on database as well as quantitative level have been observed. Insights of this analysis may be useful for the development of protein inference and quantification algorithms.

Download Full-text

VIQoR: a web service for Visually supervised protein Inference and protein Quantification

10.1101/2021.06.01.446512 ◽

2021 ◽

Author(s):

Vasileios Tsiamis ◽

Veit Schwammle

Keyword(s):

Quantitative Analysis ◽

Web Service ◽

Quantitative Data ◽

Missing Values ◽

Interactive Visualization ◽

Weighted Average ◽

Protein Quantification ◽

Inference Problem ◽

Protein Inference ◽

Concentration Changes

Motivation: In quantitative bottom-up mass spectrometry (MS)-based proteomics the reliable estimation of protein concentration changes from peptide quantifications between different biological samples is essential. This estimation is not a single task but comprises the two processes of protein inference and protein abundance summarization. Furthermore, due to the high complexity of proteomics data and associated uncertainty about the performance of these processes, there is a demand for comprehensive visualization methods able to integrate protein with peptide quantitative data including their post-translational modifications. Hence, there is a lack of a suitable tool that provides post-identification quantitative analysis of proteins with simultaneous interactive visualization. Results: In this article, we present VIQoR, a user-friendly web service that accepts peptide quantitative data of both labeled and label-free experiments and accomplishes the processes for relative protein quantification, along with interactive visualization modules, including the novel VIQoR plot. We implemented two parsimonious algorithms to solve the protein inference problem, while protein summarization is facilitated by a well established factor analysis algorithm called fast-FARMS followed by a weighted average summarization function that minimizes the effect of missing values. In addition, summarization is optimized by the so-called Global Correlation Indicator (GCI). We test the tool on three publicly available ground truth datasets and demonstrate the ability of the protein inference algorithms to handle degenerate peptides. We furthermore show that GCI increases the accuracy of the quantitative analysis in data sets with replicated design. Availability and implementation: VIQoR is accessible at: http://computproteomics.bmb.sdu.dk:8192/app_direct/VIQoR/ . The source code is available at: https://bitbucket.org/vtsiamis/viqor/ .

Download Full-text

Improving the Protein Inference from Bottom-Up Proteomic Data Using Identifications from MS1 Spectra

Journal of the American Society for Mass Spectrometry ◽

10.1021/jasms.1c00061 ◽

2021 ◽

Author(s):

Mark V. Ivanov ◽

Elizaveta M. Solovyeva ◽

Julia A. Bubis ◽

Mikhail V. Gorshkov

Keyword(s):

Bottom Up ◽

Protein Inference ◽

Proteomic Data

Download Full-text

Synergistic effect of short- and long-read sequencing on functional meta-omics

10.1101/2021.04.22.440869 ◽

2021 ◽

Author(s):

Valentina Galata ◽

Susheel Bhanu Busi ◽

Benoît Josef Kunath ◽

Laura de Nies ◽

Magdalena Calusinska ◽

...

Keyword(s):

Synergistic Effects ◽

Independent Solution ◽

Metagenomic Data ◽

Sequencing Data ◽

Protein Inference ◽

Long Read ◽

Omic Data Integration ◽

Functional Analyses ◽

Omic Data

AbstractReal-world evaluations of metagenomic reconstructions are challenged by distinguishing reconstruction artefacts from genes and proteins present in situ. Here, we evaluate short-read-only, long-read-only, and hybrid assembly approaches on four different metagenomic samples of varying complexity and demonstrate how they affect gene and protein inference which is particularly relevant for downstream functional analyses. For a human gut microbiome sample, we use complementary metatranscriptomic, and metaproteomic data to evaluate the metagenomic data-based protein predictions. Our findings pave the way for critical assessments of metagenomic reconstructions and we propose a reference-independent solution based on the synergistic effects of multi-omic data integration for the in situ study of microbiomes using long-read sequencing data.

Download Full-text

Can we put Humpty Dumpty back together again? What does protein quantification mean in bottom-up proteomics?

10.1101/2021.01.25.428175 ◽

2021 ◽

Author(s):

Deanna L. Plubell ◽

Lukas Käll ◽

Bobbie-Jo Webb-Robertson ◽

Lisa Bramer ◽

Ashley Ives ◽

...

Keyword(s):

Rna Splicing ◽

Large Scale ◽

Protein Quantification ◽

Bottom Up ◽

Protein Coding ◽

Protein Inference ◽

Humpty Dumpty ◽

Peptide Data ◽

Biological Differences ◽

Quantitative Response

AbstractBottom-up proteomics provides peptide measurements and has been invaluable for moving proteomics into large-scale analyses. In bottom-up proteomics, protein parsimony and protein inference derived from these measured peptides are important for determining which protein coding genes are present. However, given the complexity of RNA splicing processes, and how proteins can be modified post-translationally, it is overly simplistic to assume that all peptides that map to a singular protein coding gene will demonstrate the same quantitative response. Accordingly, by assuming all peptides from a protein coding sequence are representative of the same protein we may be missing out on detecting important biological differences. To better account for the complexity of the proteome we need to think of new or better ways of handling peptide data.

Download Full-text

A Recombinant Protein Biomarker DDA Library Increases DIA Coverage of Low Abundance Plasma Proteins

10.1101/2020.11.11.377309 ◽

2020 ◽

Author(s):

Seong Beom Ahn ◽

Karthik S. Kamath ◽

Abidali Mohamedali ◽

Zainab Noor ◽

Jemma X. Wu ◽

...

Keyword(s):

Recombinant Protein ◽

Human Plasma ◽

Recombinant Proteins ◽

Plasma Proteins ◽

Biomarker Discovery ◽

Cancer Biomarkers ◽

Human Blood Plasma ◽

Spectral Library ◽

Protein Inference ◽

High Stringency

AbstractCredible detection and quantification of low abundance proteins from human blood plasma is a major challenge in precision medicine biomarker discovery when using mass spectrometry (MS). Here, we employed a mixture of recombinant proteins in DDA libraries to subsequently detect cancer-associated low abundance plasma proteins using SWATH/DIA. The exemplar DDA recombinant protein spectral library (rPSL) was derived from tryptic digestion of 36 human recombinant proteins that had been previously implicated as possible cancer biomarkers in both our own and other studies. The rPSL was then used to identify proteins from non-depleted colorectal cancer (CRC) plasmas by SWATH-MS. Most (32/36) of the proteins in the rPSL were reliably identified in plasma samples, including 8 proteins (BTC, CXCL10, IL1B, IL6, ITGB6, TGFα, TNF, TP53) not previously detected using high-stringency MS in human plasmas according to PeptideAtlas. The rPSL SWATH-MS protocol was compared to DDA-MS using MARS-depleted and post-digestion peptide fractionated plasmas (here referred to as a human plasma DDA library). Of the 32 proteins identified using rPSL SWATH, only 12 were identified using DDA-MS. The 20 additional proteins exclusively identified by using the rPSL approach with SWATH were mostly lower abundance (i.e., <10ng/ml) plasma proteins. To mitigate FDR concerns, and replicating a more typical approach, the DDA rPSL was also merged into a human plasma DDA library. When SWATH identification was repeated using this merged library, the majority (33/36) of low abundance plasma proteins from the rPSL could still be identified using high-stringency HPP Guidelines v3.0 protein inference criteria.

Download Full-text

ProtyQuant: Comparing Label-Free Shotgun Proteomics Datasets Using Accumulated Peptide Probabilities

10.26434/chemrxiv.12404363 ◽

2020 ◽

Author(s):

Robert Winkler

Keyword(s):

False Positive Rate ◽

Shotgun Proteomics ◽

Label Free ◽

Plain Text ◽

Protein Inference ◽

Positive Rate ◽

Yeast Lysate ◽

Free Search ◽

Spectrum Matching ◽

Free Quantification

<p>Comparing multiple label-free shotgun proteomics datasets requires various data processing and formatting steps, including peptide-spectrum matching, protein inference, and quantification. Finally, the compilation of results files into a format that allows for downstream analyses. ProtyQuant performs protein inference and quantification calculations, and combines the results of individual datasets into plain text tables. These are lightweight, human-readable, and easy to import into databases or statistical software. ProtyQuant reads validated pepXML from proteomic workflows such as the Trans-Proteomic Pipeline (TPP), which makes it compatible with many commercial and free search engines. For protein inference and quantification, a modified version of the PIPQ program (He et al. 2016) was integrated. In contrast to simple spectral-counting, PIPQ sums up peptide probabilities. For assigning peptides to proteins, three algorithms are available: Multiple Counting, Equal Division, and Linear Programming. The accumulated peptide probabilities (app) are used for both tasks, protein probability estimation, and quantification. ProtyQuant was tested using a reference dataset for label-free shotgun proteomics, obtained from different concentrations of 48 human UPS proteins spiked into yeast lysate. Compared to ProteinProphet, ProtyQuant detected up to 126 (15%) more proteins in the mixture, applying an equal false positive rate (FPR). Using the app values for label-free quantification showed suitable sensitivity and linearity. Strikingly, the app values represent a realistic measure of ‘Protein Presence,’ an integral concept of protein probability and quantity. ProtyQuant provides a graphical user interface (GUI) and scripts for console-based processing. It is available (GNU GLP v3) for Windows, Linux, and Docker from <a href="https://bitbucket.org/lababi/protyquant/">https://bitbucket.org/lababi/protyquant/</a>.</p>

Download Full-text

ProtyQuant: Comparing Label-Free Shotgun Proteomics Datasets Using Accumulated Peptide Probabilities

10.26434/chemrxiv.12404363.v1 ◽

2020 ◽

Author(s):

Robert Winkler

Keyword(s):

False Positive Rate ◽

Shotgun Proteomics ◽

Label Free ◽

Plain Text ◽

Protein Inference ◽

Positive Rate ◽

Yeast Lysate ◽

Free Search ◽

Spectrum Matching ◽

Free Quantification

<p>Comparing multiple label-free shotgun proteomics datasets requires various data processing and formatting steps, including peptide-spectrum matching, protein inference, and quantification. Finally, the compilation of results files into a format that allows for downstream analyses. ProtyQuant performs protein inference and quantification calculations, and combines the results of individual datasets into plain text tables. These are lightweight, human-readable, and easy to import into databases or statistical software. ProtyQuant reads validated pepXML from proteomic workflows such as the Trans-Proteomic Pipeline (TPP), which makes it compatible with many commercial and free search engines. For protein inference and quantification, a modified version of the PIPQ program (He et al. 2016) was integrated. In contrast to simple spectral-counting, PIPQ sums up peptide probabilities. For assigning peptides to proteins, three algorithms are available: Multiple Counting, Equal Division, and Linear Programming. The accumulated peptide probabilities (app) are used for both tasks, protein probability estimation, and quantification. ProtyQuant was tested using a reference dataset for label-free shotgun proteomics, obtained from different concentrations of 48 human UPS proteins spiked into yeast lysate. Compared to ProteinProphet, ProtyQuant detected up to 126 (15%) more proteins in the mixture, applying an equal false positive rate (FPR). Using the app values for label-free quantification showed suitable sensitivity and linearity. Strikingly, the app values represent a realistic measure of ‘Protein Presence,’ an integral concept of protein probability and quantity. ProtyQuant provides a graphical user interface (GUI) and scripts for console-based processing. It is available (GNU GLP v3) for Windows, Linux, and Docker from <a href="https://bitbucket.org/lababi/protyquant/">https://bitbucket.org/lababi/protyquant/</a>.</p>

Download Full-text

METATRYP v 2.0: Metaproteomic Least Common Ancestor Analysis for Taxonomic Inference Using Specialized Sequence Assemblies - Standalone Software and Web Servers for Marine Microorganisms and Coronaviruses

10.1101/2020.05.20.107490 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jaclyn K. Saunders ◽

David Gaylord ◽

Noelle Held ◽

Nick Symmonds ◽

Chris Dupont ◽

...

Keyword(s):

Influenza A ◽

Common Ancestor ◽

Marine Microorganisms ◽

Web Portal ◽

Tryptic Peptides ◽

Protein Inference ◽

Link Type ◽

Diagnostic Approaches ◽

Taxonomic Groups ◽

Rapid Deployment

AbstractWe present METATRYP version-2 software that identifies shared peptides across organisms within environmental metaproteomics studies to enable accurate taxonomic attribution of peptides during protein inference. Improvements include: ingestion of complex sequence assembly data categories (metagenomic and metatranscriptomic assemblies, single cell amplified genomes, and metagenome assembled genomes), prediction of the Least Common Ancestor (LCA) for a peptide shared across multiple organisms, increased performance through updates to the backend architecture, and development of a web portal (https://metatryp.whoi.edu). Major expansion of the marine database confirms low occurrence of shared tryptic peptides among disparate marine microorganisms, implying tractability for targeted metaproteomics. METATRYP was designed for ocean metaproteomics and has been integrated into the Ocean Protein Portal (https://oceanproteinportal.org); however, it can be readily applied to other domains. We describe the rapid deployment of a coronavirus-specific web portal (https://metatryp-coronavirus.whoi.edu/) to aid in use of proteomics on coronavirus research during the ongoing pandemic. A Coronavirus-focused METATRYP database identified potential SARS-CoV-2 peptide biomarkers and indicated very few shared tryptic peptides between SARS-CoV-2 and other disparate taxa, sharing 0.1% peptides or less (1 peptide) with the Influenza A & B pan-proteomes, establishing that taxonomic specificity is achievable using tryptic peptide-based proteomic diagnostic approaches.Statement of significanceWhen assigning taxonomic attribution in bottom-up metaproteomics, the potential for shared tryptic peptides among organisms in mixed communities should be considered. The software program METATRYP v 2 and associated interactive web portals enables users to identify the frequency of shared tryptic peptides among taxonomic groups and evaluate the occurrence of specific tryptic peptides within complex communities. METATRYP facilitates phyloproteomic studies of taxonomic groups and supports the identification and evaluation of potential metaproteomic biomarkers.

Download Full-text

protein inference
Recently Published Documents

TOTAL DOCUMENTS

H-INDEX

Pout2Prot: an efficient tool to create protein (sub)groups from Percolator output files

Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs

VIQoR: a web service for Visually supervised protein Inference and protein Quantification

Improving the Protein Inference from Bottom-Up Proteomic Data Using Identifications from MS1 Spectra

Synergistic effect of short- and long-read sequencing on functional meta-omics

Can we put Humpty Dumpty back together again? What does protein quantification mean in bottom-up proteomics?

A Recombinant Protein Biomarker DDA Library Increases DIA Coverage of Low Abundance Plasma Proteins

ProtyQuant: Comparing Label-Free Shotgun Proteomics Datasets Using Accumulated Peptide Probabilities

ProtyQuant: Comparing Label-Free Shotgun Proteomics Datasets Using Accumulated Peptide Probabilities

METATRYP v 2.0: Metaproteomic Least Common Ancestor Analysis for Taxonomic Inference Using Specialized Sequence Assemblies - Standalone Software and Web Servers for Marine Microorganisms and Coronaviruses

Export Citation Format

protein inferenceRecently Published Documents

TOTAL DOCUMENTS

H-INDEX

Pout2Prot: an efficient tool to create protein (sub)groups from Percolator output files

Characterization of peptide-protein relationships in protein ambiguity groups via bipartite graphs

VIQoR: a web service for Visually supervised protein Inference and protein Quantification

Improving the Protein Inference from Bottom-Up Proteomic Data Using Identifications from MS1 Spectra

Synergistic effect of short- and long-read sequencing on functional meta-omics

Can we put Humpty Dumpty back together again? What does protein quantification mean in bottom-up proteomics?

A Recombinant Protein Biomarker DDA Library Increases DIA Coverage of Low Abundance Plasma Proteins

ProtyQuant: Comparing Label-Free Shotgun Proteomics Datasets Using Accumulated Peptide Probabilities

ProtyQuant: Comparing Label-Free Shotgun Proteomics Datasets Using Accumulated Peptide Probabilities

METATRYP v 2.0: Metaproteomic Least Common Ancestor Analysis for Taxonomic Inference Using Specialized Sequence Assemblies - Standalone Software and Web Servers for Marine Microorganisms and Coronaviruses

protein inference
Recently Published Documents