scholarly journals MSstatsTMT: Statistical Detection of Differentially Abundant Proteins in Experiments with Isobaric Labeling and Multiple Mixtures

2020 ◽  
Vol 19 (10) ◽  
pp. 1706-1723 ◽  
Author(s):  
Ting Huang ◽  
Meena Choi ◽  
Manuel Tzouros ◽  
Sabrina Golling ◽  
Nikhil Janak Pandya ◽  
...  

Tandem mass tag (TMT) is a multiplexing technology widely-used in proteomic research. It enables relative quantification of proteins from multiple biological samples in a single MS run with high efficiency and high throughput. However, experiments often require more biological replicates or conditions than can be accommodated by a single run, and involve multiple TMT mixtures and multiple runs. Such larger-scale experiments combine sources of biological and technical variation in patterns that are complex, unique to TMT-based workflows, and challenging for the downstream statistical analysis. These patterns cannot be adequately characterized by statistical methods designed for other technologies, such as label-free proteomics or transcriptomics. This manuscript proposes a general statistical approach for relative protein quantification in MS- based experiments with TMT labeling. It is applicable to experiments with multiple conditions, multiple biological replicate runs and multiple technical replicate runs, and unbalanced designs. It is based on a flexible family of linear mixed-effects models that handle complex patterns of technical artifacts and missing values. The approach is implemented in MSstatsTMT, a freely available open-source R/Bioconductor package compatible with data processing tools such as Proteome Discoverer, MaxQuant, OpenMS, and SpectroMine. Evaluation on a controlled mixture, simulated datasets, and three biological investigations with diverse designs demonstrated that MSstatsTMT balanced the sensitivity and the specificity of detecting differentially abundant proteins, in large-scale experiments with multiple biological mixtures.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Mathias Kalxdorf ◽  
Torsten Müller ◽  
Oliver Stegle ◽  
Jeroen Krijgsveld

AbstractLabel-free proteomics by data-dependent acquisition enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR (Ion current extraction Re-quantification), an efficient and user-friendly quantification workflow that combines high identification rates of data-dependent acquisition with low missing value rates similar to data-independent acquisition. Specifically, IceR uses ion current information for a hybrid peptide identification propagation approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. Applied to plasma and single-cell proteomics data, IceR enhanced the number of reliably quantified proteins, improved discriminability between single-cell populations, and allowed reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.


2020 ◽  
Author(s):  
Mathias Kalxdorf ◽  
Torsten Müller ◽  
Oliver Stegle ◽  
Jeroen Krijgsveld

AbstractLabel-free proteomics by data-dependent acquisition (DDA) enables the unbiased quantification of thousands of proteins, however it notoriously suffers from high rates of missing values, thus prohibiting consistent protein quantification across large sample cohorts. To solve this, we here present IceR, an efficient and user-friendly quantification workflow that combines high identification rates of DDA with low missing value rates similar to DIA. Specifically, IceR uses ion current information in DDA data for a hybrid peptide identification propagation (PIP) approach with superior quantification precision, accuracy, reliability and data completeness compared to other quantitative workflows. We demonstrate greatly improved quantification sensitivity on published plasma and single-cell proteomics data, enhancing the number of reliably quantified proteins, improving discriminability between single-cell populations, and allowing reconstruction of a developmental trajectory. IceR will be useful to improve performance of large scale global as well as low-input proteomics applications, facilitated by its availability as an easy-to-use R-package.


2020 ◽  
Author(s):  
Matthew The ◽  
Lukas Käll

AbstractProtein quantification for shotgun proteomics is a complicated process where errors can be introduced in each of the steps. Triqler is a Python package that estimates and integrates errors of the different parts of the label-free protein quantification pipeline into a single Bayesian model. Specifically, it weighs the quantitative values by the confidence we have in the correctness of the corresponding PSM. Furthermore, it treats missing values in a way that reflects their uncertainty relative to observed values. Finally, it combines these error estimates in a single differential abundance FDR that not only reflects the errors and uncertainties in quantification but also in identification. In this tutorial, we show how to (1) generate input data for Triqler from quantification packages such as MaxQuant and Quandenser, (2) run Triqler and what the different options are, (3) interpret the results, (4) investigate the posterior distributions of a protein of interest in detail and (5) verify that the hyperparameter estimations are sensible.


2018 ◽  
Author(s):  
Cheng Chang ◽  
Zhiqiang Gao ◽  
Wantao Ying ◽  
Yan Zhao ◽  
Yan Fu ◽  
...  

AbstractMass spectrometry (MS) has become a prominent choice for large-scale absolute protein quantification, but its quantification accuracy still has substantial room for improvement. A crucial issue is the bias between the peptide MS intensity and the actual peptide abundance, i.e., the fact that peptides with equal abundance may have different MS intensities. This bias is mainly caused by the diverse physicochemical properties of peptides. Here, we propose a novel algorithm for label-free absolute protein quantification, LFAQ, which can correct the biased MS intensities by using the predicted peptide quantitative factors for all identified peptides. When validated on datasets produced by different MS instruments and data acquisition modes, LFAQ presented accuracy and precision superior to those of existing methods. In particular, it reduced the quantification error by an average of 46% for low-abundance proteins.


2018 ◽  
Author(s):  
Matthew The ◽  
Lukas Käll

AbstractIn shotgun proteomics, the information extractable from label-free quantification experiments is typically limited by the identification rate and the noise level in the quantitative data. This generally causes a low sensitivity in differential expression analysis on protein level. Here, we propose a quantification-first approach for peptides that reverses the classical identification-first workflow. This prevents valuable information from being discarded prematurely in the identification stage and allows us to spend more effort on the identification process. Specifically, we introduce a method, Quandenser, that applies unsupervised clustering on both MS1 and MS2 level to summarize all analytes of interest without assigning identities. Not only does this eliminate the need for redoing the quantification for each new set of search parameters and engines, but it also reduces search time due to the data reduction by MS2 clustering. For a dataset of partially known composition, we could now employ open modification and de novo searches to identify analytes of interest that would have gone unnoticed in traditional pipelines. Moreover, Quandenser reports error rates for feature matching, which we integrated into our probabilistic protein quantification method, Triqler. This propagates error probabilities from feature to protein level and appropriately deals with the noise in quantitative signals caused by false positives and missing values. Quandenser+Triqler outperformed the state-of-the-art method MaxQuant+Perseus, consistently reporting more differentially abundant proteins at 5% FDR: 123 vs. 117 true positives with 2 vs. 25 false positives in a dataset of partially known composition; 62 vs. 3 proteins in a bladder cancer set; 8 vs. 0 proteins in a hepatic fibrosis set; and 872 vs. 661 proteins in a nanoscale type 1 diabetes set. Compellingly, in all three clinical datasets investigated, the differentially abundant proteins showed enrichment for functional annotation terms.The source code and binary packages for all major operating systems are available from https://github.com/statisticalbiotechnology/quandenser, under Apache 2.0 license.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Liang Jin ◽  
Yingtao Bi ◽  
Chenqi Hu ◽  
Jun Qu ◽  
Shichen Shen ◽  
...  

AbstractThe presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR < 5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.


2020 ◽  
Author(s):  
Kruttika Dabke ◽  
Simion Kreimer ◽  
Michelle R. Jones ◽  
Sarah J. Parker

AbstractMissing values in proteomic data sets have real consequences on downstream data analysis and reproducibility. Although several imputation methods exist to handle missing values, there is no single imputation method that is best suited for a diverse range of data sets and no clear strategy exists for evaluating imputation methods for large-scale DIA-MS data sets, especially at different levels of protein quantification. To navigate through the different imputation strategies available in the literature, we have established a workflow to assess imputation methods on large-scale label-free DIA-MS data sets. We used two distinct DIA-MS data sets with real missing values to evaluate eight different imputation methods with multiple parameters at different levels of protein quantification; dilution series data set and an independent data set with actual experimental samples. We found that imputation methods based on local structures within the data, like local least squares (LLS) and random forest (RF), worked well in our dilution series data set whereas, imputation methods based on global structures within the data, like BPCA performed well in our independent data set. We also found that imputation at the most basic level of protein quantification – fragment level-improved accuracy and number of proteins quantified. Overall, this study indicates that the most suitable imputation method depends on the overall structure and correlations of proteins within the data set and can be identified with the workflow presented here.


2021 ◽  
Vol 15 (11) ◽  
pp. e0009949
Author(s):  
Teng Li ◽  
Hua Liu ◽  
Nan Jiang ◽  
Yiluo Wang ◽  
Ying Wang ◽  
...  

Cryptosporidium is a life-threating protozoan parasite belonging to the phylum Apicomplexa, which mainly causes gastroenteritis in a variety of vertebrate hosts. Currently, there is a re-emergence of Cryptosporidium infection; however, no fully effective drug or vaccine is available to treat Cryptosporidiosis. In the present study, to better understand the detailed interaction between the host and Cryptosporidium parvum, a large-scale label-free proteomics study was conducted to characterize the changes to the proteome induced by C. parvum infection. Among 4406 proteins identified, 121 proteins were identified as differentially abundant (> 1.5-fold cutoff, P < 0.05) in C. parvum infected HCT-8 cells compared with uninfected cells. Among them, 67 proteins were upregulated, and 54 proteins were downregulated at 36 h post infection. Analysis of the differentially abundant proteins revealed an interferon-centered immune response of the host cells against C. parvum infection and extensive inhibition of metabolism-related enzymes in the host cells caused by infection. Several proteins were further verified using quantitative real-time reverse transcription polymerase chain reaction and western blotting. This systematic analysis of the proteomics of C. parvum-infected HCT-8 cells identified a wide range of functional proteins that participate in host anti-parasite immunity or act as potential targets during infection, providing new insights into the molecular mechanism of C. parvum infection.


2018 ◽  
Author(s):  
Matthias May ◽  
Kira Rehfeld

Greenhouse gas emissions must be cut to limit global warming to 1.5-2C above preindustrial levels. Yet the rate of decarbonisation is currently too low to achieve this. Policy-relevant scenarios therefore rely on the permanent removal of CO<sub>2</sub> from the atmosphere. However, none of the envisaged technologies has demonstrated scalability to the decarbonization targets for the year 2050. In this analysis, we show that artificial photosynthesis for CO<sub>2</sub> reduction may deliver an efficient large-scale carbon sink. This technology is mainly developed towards solar fuels and its potential for negative emissions has been largely overlooked. With high efficiency and low sensitivity to high temperature and illumination conditions, it could, if developed towards a mature technology, present a viable approach to fill the gap in the negative emissions budget.<br>


2018 ◽  
Author(s):  
Matthias May ◽  
Kira Rehfeld

Greenhouse gas emissions must be cut to limit global warming to 1.5-2C above preindustrial levels. Yet the rate of decarbonisation is currently too low to achieve this. Policy-relevant scenarios therefore rely on the permanent removal of CO<sub>2</sub> from the atmosphere. However, none of the envisaged technologies has demonstrated scalability to the decarbonization targets for the year 2050. In this analysis, we show that artificial photosynthesis for CO<sub>2</sub> reduction may deliver an efficient large-scale carbon sink. This technology is mainly developed towards solar fuels and its potential for negative emissions has been largely overlooked. With high efficiency and low sensitivity to high temperature and illumination conditions, it could, if developed towards a mature technology, present a viable approach to fill the gap in the negative emissions budget.<br>


Sign in / Sign up

Export Citation Format

Share Document