scholarly journals Integrating identification and quantification uncertainty for differential protein abundance analysis with Triqler

2020 ◽  
Author(s):  
Matthew The ◽  
Lukas Käll

AbstractProtein quantification for shotgun proteomics is a complicated process where errors can be introduced in each of the steps. Triqler is a Python package that estimates and integrates errors of the different parts of the label-free protein quantification pipeline into a single Bayesian model. Specifically, it weighs the quantitative values by the confidence we have in the correctness of the corresponding PSM. Furthermore, it treats missing values in a way that reflects their uncertainty relative to observed values. Finally, it combines these error estimates in a single differential abundance FDR that not only reflects the errors and uncertainties in quantification but also in identification. In this tutorial, we show how to (1) generate input data for Triqler from quantification packages such as MaxQuant and Quandenser, (2) run Triqler and what the different options are, (3) interpret the results, (4) investigate the posterior distributions of a protein of interest in detail and (5) verify that the hyperparameter estimations are sensible.

Proteomes ◽  
2022 ◽  
Vol 10 (1) ◽  
pp. 2
Author(s):  
Aarón Millán-Oropeza ◽  
Mélisande Blein-Nicolas ◽  
Véronique Monnet ◽  
Michel Zivy ◽  
Céline Henry

In proteomics, it is essential to quantify proteins in absolute terms if we wish to compare results among studies and integrate high-throughput biological data into genome-scale metabolic models. While labeling target peptides with stable isotopes allow protein abundance to be accurately quantified, the utility of this technique is constrained by the low number of quantifiable proteins that it yields. Recently, label-free shotgun proteomics has become the “gold standard” for carrying out global assessments of biological samples containing thousands of proteins. However, this tool must be further improved if we wish to accurately quantify absolute levels of proteins. Here, we used different label-free quantification techniques to estimate absolute protein abundance in the model yeast Saccharomyces cerevisiae. More specifically, we evaluated the performance of seven different quantification methods, based either on spectral counting (SC) or extracted-ion chromatogram (XIC), which were applied to samples from five different proteome backgrounds. We also compared the accuracy and reproducibility of two strategies for transforming relative abundance into absolute abundance: a UPS2-based strategy and the total protein approach (TPA). This study mentions technical challenges related to UPS2 use and proposes ways of addressing them, including utilizing a smaller, more highly optimized amount of UPS2. Overall, three SC-based methods (PAI, SAF, and NSAF) yielded the best results because they struck a good balance between experimental performance and protein quantification.


Author(s):  
Aarón Millán-Oropeza ◽  
Mélisande Blein-Nicolas ◽  
Véronique Monnet ◽  
Michel Zivy ◽  
Céline Henry

In proteomics, it is essential to quantify proteins in absolute terms if we wish compare results among studies and integrate high-throughput biological data into genome-scale metabolic models. While labeling target peptides with stable isotopes allows protein abundance to be accurately quantified, the utility of this technique is constrained by the low number of quantifiable proteins that it yields. Recently, label-free shotgun proteomics has become the “gold standard” for carrying out global assessments of biological samples containing thousands of proteins. However, this tool must be further improved if we wish to accurately quantify absolute levels of proteins. Here, we used different label-free quantification techniques to estimate absolute protein abundance in the model yeast Saccharomyces cerevisiae. More specifically, we evaluated the performance of seven different quantification methods, based either on spectral counting (SC) or extracted-ion chromatogram (XIC), which were applied to samples from five different proteome backgrounds. We also compared the accuracy and reproducibility of two strategies for transforming relative abundance into absolute abundance: a UPS2-based strategy and the total protein approach (TPA). This study mentions technical challenges related to UPS2 use and proposes ways of addressing them, including utilizing a smaller, more highly optimized amount of UPS2. Overall, three SC-based methods (PAI, SAF, and NSAF) yielded the best results because they struck a good balance between experimental performance and protein quantification.


2017 ◽  
Author(s):  
Daniel H.J. Ng ◽  
Jonathan D. Humphries ◽  
Julian N. Selley ◽  
Stacey Warwood ◽  
David Knight ◽  
...  

AbstractThe ability to provide an unbiased qualitative and quantitative description of the global changes to proteins in a cell or an organism would permit the systems-wide study of complex biological systems. Label-free quantitative shotgun proteomic strategies (including LC-MS ion intensity quantification and spectral counting) are attractive because of their relatively low cost, ease of implementation, and the lack of multiplexing restrictions when comparing multiple samples. Owing to improvements in the resolution and sensitivity of mass spectrometers, and the availability of analytical software packages, protein quantification by LC-MS ion intensity has increased in popularity. Here, we have addressed the importance of chromatographic alignment on protein quantification, and then assessed how spectral counting compares to ion intensity-based proteomic quantification. Using a spiked-in protein strategy, we analysed two situations that commonly arise in the application of proteomics to cell biology: (i) samples with a small number of proteins of differential abundance in a larger non-changing background, and (ii) samples with a larger number of proteins of differential abundance. To perform these assessments on biologically relevant samples, we used isolated integrin adhesion complexes (IACs). Technical replicate analysis of isolated IACs resulted in a range of alignment scores using the Progenesis QI software package and demonstrated that higher LC-MS chromatographic alignment scores increased the precision of protein quantification. Furthermore, implementation of a simple sample batch-running strategy enabled good chromatographic alignment for hundreds of samples over multiple batches. Finally, we applied the sample batch-running strategy and compared quantification by LC-MS ion intensity to spectral counting and found that quantification by LC-MS ion intensity was more accurate and precise. In summary, these results demonstrate that chromatographic alignment is important for precise and accurate protein quantification based on LC-MS ion intensity and accordingly we present a simple sample re-ordering strategy to facilitate improved alignment. These findings are not only relevant to label-free quantification using Progenesis QI but may be useful to the wide range of MS-based quantification strategies that rely on chromatographic alignment.


2019 ◽  
Author(s):  
Nikita Prianichnikov ◽  
Heiner Koch ◽  
Scarlet Koch ◽  
Markus Lubeck ◽  
Raphael Heilig ◽  
...  

SummaryIon mobility can add a dimension to LC-MS based shotgun proteomics which has the potential to boost proteome coverage, quantification accuracy and dynamic range. Required for this is suitable software that extracts the information contained in the four-dimensional (4D) data space spanned by m/z, retention time, ion mobility and signal intensity. Here we describe the ion mobility enhanced MaxQuant software, which utilizes the added data dimension. It offers an end to end computational workflow for the identification and quantification of peptides, proteins and posttranslational modification sites in LC-IMS-MS/MS shotgun proteomics data. We apply it to trapped ion mobility spectrometry (TIMS) coupled to a quadrupole time-of-flight (QTOF) analyzer. A highly parallelizable 4D feature detection algorithm extracts peaks which are assembled to isotope patterns. Masses are recalibrated with a non-linear m/z, retention time, ion mobility and signal intensity dependent model, based on peptides from the sample. A new matching between runs (MBR) algorithm that utilizes collisional cross section (CCS) values of MS1 features in the matching process significantly gains specificity from the extra dimension. Prerequisite for using CCS values in MBR is a relative alignment of the ion mobility values between the runs. The missing value problem in protein quantification over many samples is greatly reduced by CCS aware MBR.MS1 level label-free quantification is also implemented which proves to be highly precise and accurate on a benchmark dataset with known ground truth. MaxQuant for LC-IMS-MS/MS is part of the basic MaxQuant release and can be downloaded from http://maxquant.org.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mathias Kalxdorf ◽  
Torsten Müller ◽  
Oliver Stegle ◽  
Jeroen Krijgsveld

AbstractLabel-free proteomics by data-dependent acquisition enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR (Ion current extraction Re-quantification), an efficient and user-friendly quantification workflow that combines high identification rates of data-dependent acquisition with low missing value rates similar to data-independent acquisition. Specifically, IceR uses ion current information for a hybrid peptide identification propagation approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. Applied to plasma and single-cell proteomics data, IceR enhanced the number of reliably quantified proteins, improved discriminability between single-cell populations, and allowed reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.


2018 ◽  
Vol 14 (6) ◽  
pp. 424-436
Author(s):  
Patrick Slama ◽  
Michael R. Hoopmann ◽  
Robert L. Moritz ◽  
Donald Geman

A peptide-centric, non-parametric algorithm to quantify protein abundance between conditions from shotgun proteomics.


2020 ◽  
Vol 19 (10) ◽  
pp. 1706-1723 ◽  
Author(s):  
Ting Huang ◽  
Meena Choi ◽  
Manuel Tzouros ◽  
Sabrina Golling ◽  
Nikhil Janak Pandya ◽  
...  

Tandem mass tag (TMT) is a multiplexing technology widely-used in proteomic research. It enables relative quantification of proteins from multiple biological samples in a single MS run with high efficiency and high throughput. However, experiments often require more biological replicates or conditions than can be accommodated by a single run, and involve multiple TMT mixtures and multiple runs. Such larger-scale experiments combine sources of biological and technical variation in patterns that are complex, unique to TMT-based workflows, and challenging for the downstream statistical analysis. These patterns cannot be adequately characterized by statistical methods designed for other technologies, such as label-free proteomics or transcriptomics. This manuscript proposes a general statistical approach for relative protein quantification in MS- based experiments with TMT labeling. It is applicable to experiments with multiple conditions, multiple biological replicate runs and multiple technical replicate runs, and unbalanced designs. It is based on a flexible family of linear mixed-effects models that handle complex patterns of technical artifacts and missing values. The approach is implemented in MSstatsTMT, a freely available open-source R/Bioconductor package compatible with data processing tools such as Proteome Discoverer, MaxQuant, OpenMS, and SpectroMine. Evaluation on a controlled mixture, simulated datasets, and three biological investigations with diverse designs demonstrated that MSstatsTMT balanced the sensitivity and the specificity of detecting differentially abundant proteins, in large-scale experiments with multiple biological mixtures.


2020 ◽  
Vol 19 (6) ◽  
pp. 1058-1069 ◽  
Author(s):  
Nikita Prianichnikov ◽  
Heiner Koch ◽  
Scarlet Koch ◽  
Markus Lubeck ◽  
Raphael Heilig ◽  
...  

Ion mobility can add a dimension to LC-MS based shotgun proteomics which has the potential to boost proteome coverage, quantification accuracy and dynamic range. Required for this is suitable software that extracts the information contained in the four-dimensional (4D) data space spanned by m/z, retention time, ion mobility and signal intensity. Here we describe the ion mobility enhanced MaxQuant software, which utilizes the added data dimension. It offers an end to end computational workflow for the identification and quantification of peptides and proteins in LC-IMS-MS/MS shotgun proteomics data. We apply it to trapped ion mobility spectrometry (TIMS) coupled to a quadrupole time-of-flight (QTOF) analyzer. A highly parallelizable 4D feature detection algorithm extracts peaks which are assembled to isotope patterns. Masses are recalibrated with a non-linear m/z, retention time, ion mobility and signal intensity dependent model, based on peptides from the sample. A new matching between runs (MBR) algorithm that utilizes collisional cross section (CCS) values of MS1 features in the matching process significantly gains specificity from the extra dimension. Prerequisite for using CCS values in MBR is a relative alignment of the ion mobility values between the runs. The missing value problem in protein quantification over many samples is greatly reduced by CCS aware MBR.MS1 level label-free quantification is also implemented which proves to be highly precise and accurate on a benchmark dataset with known ground truth. MaxQuant for LC-IMS-MS/MS is part of the basic MaxQuant release and can be downloaded from http://maxquant.org.


2018 ◽  
Author(s):  
Matthew The ◽  
Lukas Käll

AbstractIn shotgun proteomics, the information extractable from label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow. This prevents valuable information from being discarded prematurely in the identification stage and allows us to spend more effort on the identification process. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. Not only does this eliminate the need for redoing the quantification for each new set of search parameters and engines, but it also reduces search time due to the data reduction by MS2 clustering. For a dataset of partially known composition, we could now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Moreover, Quandenser reports error rates for feature matching, which we integrated into our probabilistic protein quantification method, Triqler. This propagates error probabilities from feature to protein level and appropriately deals with the noise in quantitative signals caused by false positives and missing values. Quandenser+Triqler outperformed the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins at 5% FDR: 123 vs. 117 true positives with 2 vs. 25 false positives in a dataset of partially known composition; 62 vs. 3 proteins in a bladder cancer set; 8 vs. 0 proteins in a hepatic fibrosis set; and 872 vs. 661 proteins in a nanoscale type 1 diabetes set. Compellingly, in all three clinical datasets investigated, the differentially abundant proteins showed enrichment for functional annotation terms.The source code and binary packages for all major operating systems are available from https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.


2020 ◽  
Author(s):  
Mathias Kalxdorf ◽  
Torsten Müller ◽  
Oliver Stegle ◽  
Jeroen Krijgsveld

AbstractLabel-free proteomics by data-dependent acquisition (DDA) enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR, an efficient and user-friendly quantification workflow that combines high identification rates of DDA with low missing value rates similar to DIA. Specifically, IceR uses ion current information in DDA data for a hybrid peptide identification propagation (PIP) approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. We demonstrate greatly improved quantification sensitivity on published plasma and single-cell proteomics data, enhancing the number of reliably quantified proteins, improving discriminability between single-cell populations, and allowing reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.


Sign in / Sign up

Export Citation Format

Share Document