scholarly journals A synthetic peptide library for benchmarking crosslinking mass spectrometry search engines

2019 ◽  
Author(s):  
Rebecca Beveridge ◽  
Johannes Stadlmann ◽  
Josef M. Penninger ◽  
Karl Mechtler

We have created synthetic peptide libraries to benchmark crosslinking mass spectrometry search engines for different types of crosslinker. The unique benefit of using a library is knowing which identified crosslinks are true and which are false. Here we have used mass spectrometry data generated from measurement of the peptide libraries to evaluate the most frequently applied search algorithms in crosslinking mass-spectrometry. When filtered to an estimated false discovery rate of 5%, false crosslink identification ranged from 5.2% to 11.3% for search engines with inbuilt validation strategies for error estimation. When different external validation strategies were applied to one single search output, false crosslink identification ranged from 2.4% to a surprising 32%, despite being filtered to an estimated 5% false discovery rate. Remarkably, the use of MS-cleavable crosslinkers did not reduce the false discovery rate compared to non-cleavable crosslinkers, results from which have far-reaching implications in structural biology. We anticipate that the datasets acquired during this research will further drive optimisation and development of search engines and novel data-interpretation technologies, thereby advancing our understanding of vital biological interactions.

2020 ◽  
Author(s):  
Grant M. Fujimoto ◽  
Jennifer E. Kyle ◽  
Joon-Yong Lee ◽  
Thomas O. Metz ◽  
Samuel H. Payne

AbstractMass spectrometry (MS)-based lipidomics is revolutionizing lipid research with high throughput identification and quantification of hundreds to thousands of lipids with the goal of elucidating lipid metabolism and function. Estimates of statistical confidence in lipid identification are essential for downstream data interpretation in a biological context. In the related field of proteomics, a variety of methods for estimating false-discovery are available, and understanding the statistical confidence of identifications is typically required for data analysis and hypothesis testing. However, there is no current method for estimating the false discovery rate (FDR) or statistical confidence for MS-based lipid identifications. This has slowed the adoption of MS-based lipidomics research, as all identifications require manual inspection and validation to ensure their accuracy. We present here the first generalizable method for FDR estimation, a target/decoy approach, that allows those conducting MS-based lipidomics research to confidently adjust spectral score thresholds to minimize false discovery and to enable full automation of data analysis.


Author(s):  
In Kwon Choi ◽  
Eroma Abeysinghe ◽  
Eric Coulter ◽  
Suresh Marru ◽  
Marlon Pierce ◽  
...  

2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i745-i753
Author(s):  
Yisu Peng ◽  
Shantanu Jain ◽  
Yong Fuga Li ◽  
Michal Greguš ◽  
Alexander R. Ivanov ◽  
...  

Abstract Motivation Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. Results We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. Availabilityand implementation https://github.com/shawn-peng/FDR-estimation. Supplementary information Supplementary data are available at Bioinformatics online.


2016 ◽  
Vol 15 (11) ◽  
pp. 3961-3970 ◽  
Author(s):  
Eric W. Deutsch ◽  
Christopher M. Overall ◽  
Jennifer E. Van Eyk ◽  
Mark S. Baker ◽  
Young-Ki Paik ◽  
...  

2019 ◽  
Vol 31 (1) ◽  
pp. 154
Author(s):  
J. Miles ◽  
E. Wright-Johnson ◽  
S. Walsh ◽  
C. Corey ◽  
L. Yao ◽  
...  

Alterations in the signalling of critical molecular factors within the uterine milieu result in deficiencies in embryo elongation, leading directly to embryonic loss as well as delayed elongation. The objective of this study was to identify metabolites within the uterine environment from populations of uniform and diverse porcine conceptuses as they transition between spherical, ovoid, and tubular conceptuses during the initiation of embryo elongation. White crossbred gilts (n=38) were bred at standing oestrus (designated Day 0) and again 24h later and randomly assigned to collection group. At Day 9, 10, or 11 of gestation, reproductive tracts were collected immediately following harvest and flushed with 40mL of RPMI-1640 media. Conceptus morphologies were assessed from each pregnancy to assign to 1 of 5 treatment groups based on these morphologies: (1) uniform spherical (n=8); (2) diverse spherical and ovoid (n=8); (3) uniform ovoid (n=8); (4) diverse ovoid and tubular (n=8); and (5) uniform tubular (n=6). Subsequently uterine flushings from these pregnancies were submitted for non-targeted profiling by gas chromatography-mass spectrometry (GC-MS) and ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) techniques. Raw spectral data were processed using the XCMS package in R (R Foundation for Statistical Computing, Vienna, Austria) and features were clustered using RAMclustR. Unsupervised multivariate principal component analysis was performed in R using pcamethods package, and univariate ANOVA was performed in R with a Benjamini-Hochberg false discovery rate adjustment. Principal component analysis of the GC-MS and UPLC-MS data identified 153 and 104 metabolites, respectively. Of the identified metabolites, 51 and 71 metabolites from the GC-MS and UPLC-MS analysis, respectively, corresponded to known compounds. After false discovery rate adjustment of the GC-MS and UPLC-MS data, 38 and 59 metabolites from the GC-MS and UPLC-MS analysis, respectively, differed (P<0.05) in uterine flushings from pregnancies for the 5 conceptus stages. Some metabolites were greater (P<0.05) in abundance for uterine flushings containing earlier stage conceptuses (i.e. spherical) such as uric acid, tryptophan, 5-hydroxy-L-tryptophan, and L-tryosine. In contrast, some metabolites were greater (P<0.05) in abundance for uterine flushings containing later stage conceptuses (i.e. tubular) such as creatinine, serine, isovaleryl-I-carnitine, and lauric diethaolamide. These data illustrate several putative metabolites that change within the uterine milieu as porcine embryos transition between spherical, ovoid, and tubular conceptuses. Funding was provided by USDA-NIFA-AFRI Grant no. 2017-67015-26456.


2016 ◽  
Vol 15 (11) ◽  
pp. 4082-4090 ◽  
Author(s):  
Gun Wook Park ◽  
Heeyoun Hwang ◽  
Kwang Hoe Kim ◽  
Ju Yeon Lee ◽  
Hyun Kyoung Lee ◽  
...  

2009 ◽  
Vol 07 (03) ◽  
pp. 547-569 ◽  
Author(s):  
NHA NGUYEN ◽  
HENG HUANG ◽  
SOONTORN ORAINTARA ◽  
AN VO

Mass Spectrometry (MS) is increasingly being used to discover diseases-related proteomic patterns. The peak detection step is one of the most important steps in the typical analysis of MS data. Recently, many new algorithms have been proposed to increase true position rate with low false discovery rate in peak detection. Most of them follow two approaches: one is the denoising approach and the other is the decomposing approach. In the previous studies, the decomposition of MS data method shows more potential than the first one. In this paper, we propose two novel methods, named GaborLocal and GaborEnvelop, both of which can detect more true peaks with a lower false discovery rate than previous methods. We employ the method of Gaussian local maxima to detect peaks, because it is robust to noise in signals. A new approach, peak rank, is defined for the first time to identify peaks instead of using the signal-to-noise ratio. Meanwhile, the Gabor filter is used to amplify important information and compress noise in the raw MS signal. Moreover, we also propose the envelope analysis to improve the quantification of peaks and remove more false peaks. The proposed methods have been performed on the real SELDI-TOF spectrum with known polypeptide positions. The experimental results demonstrate that our methods outperform other commonly used methods in the Receiver Operating Characteristic (ROC) curve.


Sign in / Sign up

Export Citation Format

Share Document