scholarly journals A generalizable method for false-discovery rate estimation in mass spectrometry-based lipidomics

2020 ◽  
Author(s):  
Grant M. Fujimoto ◽  
Jennifer E. Kyle ◽  
Joon-Yong Lee ◽  
Thomas O. Metz ◽  
Samuel H. Payne

AbstractMass spectrometry (MS)-based lipidomics is revolutionizing lipid research with high throughput identification and quantification of hundreds to thousands of lipids with the goal of elucidating lipid metabolism and function. Estimates of statistical confidence in lipid identification are essential for downstream data interpretation in a biological context. In the related field of proteomics, a variety of methods for estimating false-discovery are available, and understanding the statistical confidence of identifications is typically required for data analysis and hypothesis testing. However, there is no current method for estimating the false discovery rate (FDR) or statistical confidence for MS-based lipid identifications. This has slowed the adoption of MS-based lipidomics research, as all identifications require manual inspection and validation to ensure their accuracy. We present here the first generalizable method for FDR estimation, a target/decoy approach, that allows those conducting MS-based lipidomics research to confidently adjust spectral score thresholds to minimize false discovery and to enable full automation of data analysis.

2019 ◽  
Author(s):  
Rebecca Beveridge ◽  
Johannes Stadlmann ◽  
Josef M. Penninger ◽  
Karl Mechtler

We have created synthetic peptide libraries to benchmark crosslinking mass spectrometry search engines for different types of crosslinker. The unique benefit of using a library is knowing which identified crosslinks are true and which are false. Here we have used mass spectrometry data generated from measurement of the peptide libraries to evaluate the most frequently applied search algorithms in crosslinking mass-spectrometry. When filtered to an estimated false discovery rate of 5%, false crosslink identification ranged from 5.2% to 11.3% for search engines with inbuilt validation strategies for error estimation. When different external validation strategies were applied to one single search output, false crosslink identification ranged from 2.4% to a surprising 32%, despite being filtered to an estimated 5% false discovery rate. Remarkably, the use of MS-cleavable crosslinkers did not reduce the false discovery rate compared to non-cleavable crosslinkers, results from which have far-reaching implications in structural biology. We anticipate that the datasets acquired during this research will further drive optimisation and development of search engines and novel data-interpretation technologies, thereby advancing our understanding of vital biological interactions.


2019 ◽  
Vol 18 (5) ◽  
pp. 2354-2358 ◽  
Author(s):  
Yulia Danilova ◽  
Anastasia Voronkova ◽  
Pavel Sulimov ◽  
Attila Kertész-Farkas

2017 ◽  
Author(s):  
Lutz Fischer ◽  
Juri Rappsilber

AbstractFalse discovery rate (FDR) estimation is a cornerstone of proteomics that has recently been adapted to cross-linking/mass spectrometry. Here we demonstrate that heterobifunctional cross-linkers, while theoretically different from homobifunctional cross-linkers, need not be considered separately in practice. We develop and then evaluate the impact of applying a correct FDR formula for use of heterobifunctional cross-linkers and conclude that there are minimal practical advantages. Hence a single formula can be applied to data generated from the many different non-cleavable cross-linkers.


2012 ◽  
Vol 9 (9) ◽  
pp. 901-903 ◽  
Author(s):  
Thomas Walzthoeni ◽  
Manfred Claassen ◽  
Alexander Leitner ◽  
Franz Herzog ◽  
Stefan Bohn ◽  
...  

2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i745-i753
Author(s):  
Yisu Peng ◽  
Shantanu Jain ◽  
Yong Fuga Li ◽  
Michal Greguš ◽  
Alexander R. Ivanov ◽  
...  

Abstract Motivation Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. Results We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. Availabilityand implementation https://github.com/shawn-peng/FDR-estimation. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document