Integrating identification and quantification uncertainty for differential protein abundance analysis with Triqler

Mapping Intimacies ◽

10.1101/2020.09.24.311605 ◽

2020 ◽

Author(s):

Matthew The ◽

Lukas Käll

Keyword(s):

Missing Values ◽

Shotgun Proteomics ◽

Protein Quantification ◽

Protein Abundance ◽

Posterior Distributions ◽

Label Free ◽

Differential Abundance ◽

Complicated Process ◽

Python Package ◽

Different Parts

AbstractProtein quantification for shotgun proteomics is a complicated process where errors can be introduced in each of the steps. Triqler is a Python package that estimates and integrates errors of the different parts of the label-free protein quantification pipeline into a single Bayesian model. Specifically, it weighs the quantitative values by the confidence we have in the correctness of the corresponding PSM. Furthermore, it treats missing values in a way that reflects their uncertainty relative to observed values. Finally, it combines these error estimates in a single differential abundance FDR that not only reflects the errors and uncertainties in quantification but also in identification. In this tutorial, we show how to (1) generate input data for Triqler from quantification packages such as MaxQuant and Quandenser, (2) run Triqler and what the different options are, (3) interpret the results, (4) investigate the posterior distributions of a protein of interest in detail and (5) verify that the hyperparameter estimations are sensible.

Download Full-text

Comparison of Different Label-Free Techniques for the Semi-Absolute Quantification of Protein Abundance

Proteomes ◽

10.3390/proteomes10010002 ◽

2022 ◽

Vol 10 (1) ◽

pp. 2

Author(s):

Aarón Millán-Oropeza ◽

Mélisande Blein-Nicolas ◽

Véronique Monnet ◽

Michel Zivy ◽

Céline Henry

Keyword(s):

Shotgun Proteomics ◽

Absolute Quantification ◽

Biological Data ◽

Protein Quantification ◽

Protein Abundance ◽

Label Free ◽

Absolute Abundance ◽

Yeast Saccharomyces Cerevisiae ◽

Genome Scale ◽

Free Quantification

In proteomics, it is essential to quantify proteins in absolute terms if we wish to compare results among studies and integrate high-throughput biological data into genome-scale metabolic models. While labeling target peptides with stable isotopes allow protein abundance to be accurately quantified, the utility of this technique is constrained by the low number of quantifiable proteins that it yields. Recently, label-free shotgun proteomics has become the “gold standard” for carrying out global assessments of biological samples containing thousands of proteins. However, this tool must be further improved if we wish to accurately quantify absolute levels of proteins. Here, we used different label-free quantification techniques to estimate absolute protein abundance in the model yeast Saccharomyces cerevisiae. More specifically, we evaluated the performance of seven different quantification methods, based either on spectral counting (SC) or extracted-ion chromatogram (XIC), which were applied to samples from five different proteome backgrounds. We also compared the accuracy and reproducibility of two strategies for transforming relative abundance into absolute abundance: a UPS2-based strategy and the total protein approach (TPA). This study mentions technical challenges related to UPS2 use and proposes ways of addressing them, including utilizing a smaller, more highly optimized amount of UPS2. Overall, three SC-based methods (PAI, SAF, and NSAF) yielded the best results because they struck a good balance between experimental performance and protein quantification.

Download Full-text

Comparison of Different Label-free Techniques for the Semi-absolute Quantification of Protein Abundance

10.20944/preprints202112.0212.v1 ◽

2021 ◽

Author(s):

Aarón Millán-Oropeza ◽

Mélisande Blein-Nicolas ◽

Véronique Monnet ◽

Michel Zivy ◽

Céline Henry

Keyword(s):

Shotgun Proteomics ◽

Absolute Quantification ◽

Biological Data ◽

Protein Quantification ◽

Protein Abundance ◽

Label Free ◽

Absolute Abundance ◽

Yeast Saccharomyces Cerevisiae ◽

Genome Scale ◽

Free Quantification

In proteomics, it is essential to quantify proteins in absolute terms if we wish compare results among studies and integrate high-throughput biological data into genome-scale metabolic models. While labeling target peptides with stable isotopes allows protein abundance to be accurately quantified, the utility of this technique is constrained by the low number of quantifiable proteins that it yields. Recently, label-free shotgun proteomics has become the “gold standard” for carrying out global assessments of biological samples containing thousands of proteins. However, this tool must be further improved if we wish to accurately quantify absolute levels of proteins. Here, we used different label-free quantification techniques to estimate absolute protein abundance in the model yeast Saccharomyces cerevisiae. More specifically, we evaluated the performance of seven different quantification methods, based either on spectral counting (SC) or extracted-ion chromatogram (XIC), which were applied to samples from five different proteome backgrounds. We also compared the accuracy and reproducibility of two strategies for transforming relative abundance into absolute abundance: a UPS2-based strategy and the total protein approach (TPA). This study mentions technical challenges related to UPS2 use and proposes ways of addressing them, including utilizing a smaller, more highly optimized amount of UPS2. Overall, three SC-based methods (PAI, SAF, and NSAF) yielded the best results because they struck a good balance between experimental performance and protein quantification.

Download Full-text

Improved LC-MS chromatographic alignment increases the accuracy of label-free quantitative proteomics: Comparison of spectral counting versus ion intensity-based proteomic quantification strategies

10.1101/111476 ◽

2017 ◽

Cited By ~ 1

Author(s):

Daniel H.J. Ng ◽

Jonathan D. Humphries ◽

Julian N. Selley ◽

Stacey Warwood ◽

David Knight ◽

...

Keyword(s):

Cell Biology ◽

Low Cost ◽

Quantitative Description ◽

Protein Quantification ◽

Global Changes ◽

Label Free ◽

Spectral Counting ◽

Differential Abundance ◽

Mass Spectrometers ◽

Wide Range

AbstractThe ability to provide an unbiased qualitative and quantitative description of the global changes to proteins in a cell or an organism would permit the systems-wide study of complex biological systems. Label-free quantitative shotgun proteomic strategies (including LC-MS ion intensity quantification and spectral counting) are attractive because of their relatively low cost, ease of implementation, and the lack of multiplexing restrictions when comparing multiple samples. Owing to improvements in the resolution and sensitivity of mass spectrometers, and the availability of analytical software packages, protein quantification by LC-MS ion intensity has increased in popularity. Here, we have addressed the importance of chromatographic alignment on protein quantification, and then assessed how spectral counting compares to ion intensity-based proteomic quantification. Using a spiked-in protein strategy, we analysed two situations that commonly arise in the application of proteomics to cell biology: (i) samples with a small number of proteins of differential abundance in a larger non-changing background, and (ii) samples with a larger number of proteins of differential abundance. To perform these assessments on biologically relevant samples, we used isolated integrin adhesion complexes (IACs). Technical replicate analysis of isolated IACs resulted in a range of alignment scores using the Progenesis QI software package and demonstrated that higher LC-MS chromatographic alignment scores increased the precision of protein quantification. Furthermore, implementation of a simple sample batch-running strategy enabled good chromatographic alignment for hundreds of samples over multiple batches. Finally, we applied the sample batch-running strategy and compared quantification by LC-MS ion intensity to spectral counting and found that quantification by LC-MS ion intensity was more accurate and precise. In summary, these results demonstrate that chromatographic alignment is important for precise and accurate protein quantification based on LC-MS ion intensity and accordingly we present a simple sample re-ordering strategy to facilitate improved alignment. These findings are not only relevant to label-free quantification using Progenesis QI but may be useful to the wide range of MS-based quantification strategies that rely on chromatographic alignment.

Download Full-text

MaxQuant software for ion mobility enhanced shotgun proteomics

10.1101/651760 ◽

2019 ◽

Cited By ~ 6

Author(s):

Nikita Prianichnikov ◽

Heiner Koch ◽

Scarlet Koch ◽

Markus Lubeck ◽

Raphael Heilig ◽

...

Keyword(s):

Ion Mobility ◽

Retention Time ◽

Signal Intensity ◽

Feature Detection ◽

Dynamic Range ◽

Shotgun Proteomics ◽

Detection Algorithm ◽

Protein Quantification ◽

Label Free ◽

Proteomics Data

SummaryIon mobility can add a dimension to LC-MS based shotgun proteomics which has the potential to boost proteome coverage, quantification accuracy and dynamic range. Required for this is suitable software that extracts the information contained in the four-dimensional (4D) data space spanned by m/z, retention time, ion mobility and signal intensity. Here we describe the ion mobility enhanced MaxQuant software, which utilizes the added data dimension. It offers an end to end computational workflow for the identification and quantification of peptides, proteins and posttranslational modification sites in LC-IMS-MS/MS shotgun proteomics data. We apply it to trapped ion mobility spectrometry (TIMS) coupled to a quadrupole time-of-flight (QTOF) analyzer. A highly parallelizable 4D feature detection algorithm extracts peaks which are assembled to isotope patterns. Masses are recalibrated with a non-linear m/z, retention time, ion mobility and signal intensity dependent model, based on peptides from the sample. A new matching between runs (MBR) algorithm that utilizes collisional cross section (CCS) values of MS1 features in the matching process significantly gains specificity from the extra dimension. Prerequisite for using CCS values in MBR is a relative alignment of the ion mobility values between the runs. The missing value problem in protein quantification over many samples is greatly reduced by CCS aware MBR.MS1 level label-free quantification is also implemented which proves to be highly precise and accurate on a benchmark dataset with known ground truth. MaxQuant for LC-IMS-MS/MS is part of the basic MaxQuant release and can be downloaded from http://maxquant.org.

Download Full-text

IceR improves proteome coverage and data completeness in global and single-cell proteomics

Nature Communications ◽

10.1038/s41467-021-25077-6 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Mathias Kalxdorf ◽

Torsten Müller ◽

Oliver Stegle ◽

Jeroen Krijgsveld

Keyword(s):

Single Cell ◽

Large Scale ◽

Missing Values ◽

Peptide Identification ◽

Protein Quantification ◽

Developmental Trajectory ◽

Ion Current ◽

Label Free ◽

Proteomics Data ◽

Data Completeness

AbstractLabel-free proteomics by data-dependent acquisition enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR (Ion current extraction Re-quantification), an efficient and user-friendly quantification workflow that combines high identification rates of data-dependent acquisition with low missing value rates similar to data-independent acquisition. Specifically, IceR uses ion current information for a hybrid peptide identification propagation approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. Applied to plasma and single-cell proteomics data, IceR enhanced the number of reliably quantified proteins, improved discriminability between single-cell populations, and allowed reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.

Download Full-text

Robust determination of differential abundance in shotgun proteomics using nonparametric statistics

Molecular Omics ◽

10.1039/c8mo00077h ◽

2018 ◽

Vol 14 (6) ◽

pp. 424-436

Author(s):

Patrick Slama ◽

Michael R. Hoopmann ◽

Robert L. Moritz ◽

Donald Geman

Keyword(s):

Shotgun Proteomics ◽

Nonparametric Statistics ◽

Protein Abundance ◽

Differential Abundance ◽

Parametric Algorithm ◽

Non Parametric

A peptide-centric, non-parametric algorithm to quantify protein abundance between conditions from shotgun proteomics.

Download Full-text

MSstatsTMT: Statistical Detection of Differentially Abundant Proteins in Experiments with Isobaric Labeling and Multiple Mixtures

Molecular & Cellular Proteomics ◽

10.1074/mcp.ra120.002105 ◽

2020 ◽

Vol 19 (10) ◽

pp. 1706-1723 ◽

Cited By ~ 1

Author(s):

Ting Huang ◽

Meena Choi ◽

Manuel Tzouros ◽

Sabrina Golling ◽

Nikhil Janak Pandya ◽

...

Keyword(s):

Large Scale ◽

Missing Values ◽

High Efficiency ◽

Protein Quantification ◽

Label Free ◽

Unbalanced Designs ◽

Biological Mixtures ◽

Proteome Discoverer ◽

General Statistical ◽

Differentially Abundant Proteins

Tandem mass tag (TMT) is a multiplexing technology widely-used in proteomic research. It enables relative quantification of proteins from multiple biological samples in a single MS run with high efficiency and high throughput. However, experiments often require more biological replicates or conditions than can be accommodated by a single run, and involve multiple TMT mixtures and multiple runs. Such larger-scale experiments combine sources of biological and technical variation in patterns that are complex, unique to TMT-based workflows, and challenging for the downstream statistical analysis. These patterns cannot be adequately characterized by statistical methods designed for other technologies, such as label-free proteomics or transcriptomics. This manuscript proposes a general statistical approach for relative protein quantification in MS- based experiments with TMT labeling. It is applicable to experiments with multiple conditions, multiple biological replicate runs and multiple technical replicate runs, and unbalanced designs. It is based on a flexible family of linear mixed-effects models that handle complex patterns of technical artifacts and missing values. The approach is implemented in MSstatsTMT, a freely available open-source R/Bioconductor package compatible with data processing tools such as Proteome Discoverer, MaxQuant, OpenMS, and SpectroMine. Evaluation on a controlled mixture, simulated datasets, and three biological investigations with diverse designs demonstrated that MSstatsTMT balanced the sensitivity and the specificity of detecting differentially abundant proteins, in large-scale experiments with multiple biological mixtures.

Download Full-text

MaxQuant Software for Ion Mobility Enhanced Shotgun Proteomics

Molecular & Cellular Proteomics ◽

10.1074/mcp.tir119.001720 ◽

2020 ◽

Vol 19 (6) ◽

pp. 1058-1069 ◽

Cited By ~ 11

Author(s):

Nikita Prianichnikov ◽

Heiner Koch ◽

Scarlet Koch ◽

Markus Lubeck ◽

Raphael Heilig ◽

...

Keyword(s):

Ion Mobility ◽

Retention Time ◽

Signal Intensity ◽

Feature Detection ◽

Dynamic Range ◽

Shotgun Proteomics ◽

Detection Algorithm ◽

Protein Quantification ◽

Label Free ◽

Proteomics Data

Ion mobility can add a dimension to LC-MS based shotgun proteomics which has the potential to boost proteome coverage, quantification accuracy and dynamic range. Required for this is suitable software that extracts the information contained in the four-dimensional (4D) data space spanned by m/z, retention time, ion mobility and signal intensity. Here we describe the ion mobility enhanced MaxQuant software, which utilizes the added data dimension. It offers an end to end computational workflow for the identification and quantification of peptides and proteins in LC-IMS-MS/MS shotgun proteomics data. We apply it to trapped ion mobility spectrometry (TIMS) coupled to a quadrupole time-of-flight (QTOF) analyzer. A highly parallelizable 4D feature detection algorithm extracts peaks which are assembled to isotope patterns. Masses are recalibrated with a non-linear m/z, retention time, ion mobility and signal intensity dependent model, based on peptides from the sample. A new matching between runs (MBR) algorithm that utilizes collisional cross section (CCS) values of MS1 features in the matching process significantly gains specificity from the extra dimension. Prerequisite for using CCS values in MBR is a relative alignment of the ion mobility values between the runs. The missing value problem in protein quantification over many samples is greatly reduced by CCS aware MBR.MS1 level label-free quantification is also implemented which proves to be highly precise and accurate on a benchmark dataset with known ground truth. MaxQuant for LC-IMS-MS/MS is part of the basic MaxQuant release and can be downloaded from http://maxquant.org.

Download Full-text

Focus on the spectra that matter by clustering of quantification data in shotgun proteomics

10.1101/488015 ◽

2018 ◽

Cited By ~ 2

Author(s):

Matthew The ◽

Lukas Käll

Keyword(s):

Protein Level ◽

Missing Values ◽

De Novo ◽

Differential Expression Analysis ◽

Search Time ◽

Shotgun Proteomics ◽

Error Rates ◽

False Positives ◽

Protein Quantification ◽

Differentially Abundant Proteins

AbstractIn shotgun proteomics, the information extractable from label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow. This prevents valuable information from being discarded prematurely in the identification stage and allows us to spend more effort on the identification process. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. Not only does this eliminate the need for redoing the quantification for each new set of search parameters and engines, but it also reduces search time due to the data reduction by MS2 clustering. For a dataset of partially known composition, we could now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Moreover, Quandenser reports error rates for feature matching, which we integrated into our probabilistic protein quantification method, Triqler. This propagates error probabilities from feature to protein level and appropriately deals with the noise in quantitative signals caused by false positives and missing values. Quandenser+Triqler outperformed the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins at 5% FDR: 123 vs. 117 true positives with 2 vs. 25 false positives in a dataset of partially known composition; 62 vs. 3 proteins in a bladder cancer set; 8 vs. 0 proteins in a hepatic fibrosis set; and 872 vs. 661 proteins in a nanoscale type 1 diabetes set. Compellingly, in all three clinical datasets investigated, the differentially abundant proteins showed enrichment for functional annotation terms.The source code and binary packages for all major operating systems are available from https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.

Download Full-text

IceR improves proteome coverage and data completeness in global and single-cell proteomics

10.1101/2020.11.01.363101 ◽

2020 ◽

Author(s):

Mathias Kalxdorf ◽

Torsten Müller ◽

Oliver Stegle ◽

Jeroen Krijgsveld

Keyword(s):

Single Cell ◽

Large Scale ◽

Missing Values ◽

Peptide Identification ◽

R Package ◽

Protein Quantification ◽

Developmental Trajectory ◽

Label Free ◽

Proteomics Data ◽

Data Completeness

AbstractLabel-free proteomics by data-dependent acquisition (DDA) enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR, an efficient and user-friendly quantification workflow that combines high identification rates of DDA with low missing value rates similar to DIA. Specifically, IceR uses ion current information in DDA data for a hybrid peptide identification propagation (PIP) approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. We demonstrate greatly improved quantification sensitivity on published plasma and single-cell proteomics data, enhancing the number of reliably quantified proteins, improving discriminability between single-cell populations, and allowing reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.

Download Full-text