PEAK DETECTION IN MASS SPECTROMETRY BY GABOR FILTERS AND ENVELOPE ANALYSIS

2009 ◽  
Vol 07 (03) ◽  
pp. 547-569 ◽  
Author(s):  
NHA NGUYEN ◽  
HENG HUANG ◽  
SOONTORN ORAINTARA ◽  
AN VO

Mass Spectrometry (MS) is increasingly being used to discover diseases-related proteomic patterns. The peak detection step is one of the most important steps in the typical analysis of MS data. Recently, many new algorithms have been proposed to increase true position rate with low false discovery rate in peak detection. Most of them follow two approaches: one is the denoising approach and the other is the decomposing approach. In the previous studies, the decomposition of MS data method shows more potential than the first one. In this paper, we propose two novel methods, named GaborLocal and GaborEnvelop, both of which can detect more true peaks with a lower false discovery rate than previous methods. We employ the method of Gaussian local maxima to detect peaks, because it is robust to noise in signals. A new approach, peak rank, is defined for the first time to identify peaks instead of using the signal-to-noise ratio. Meanwhile, the Gabor filter is used to amplify important information and compress noise in the raw MS signal. Moreover, we also propose the envelope analysis to improve the quantification of peaks and remove more false peaks. The proposed methods have been performed on the real SELDI-TOF spectrum with known polypeptide positions. The experimental results demonstrate that our methods outperform other commonly used methods in the Receiver Operating Characteristic (ROC) curve.

2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i745-i753
Author(s):  
Yisu Peng ◽  
Shantanu Jain ◽  
Yong Fuga Li ◽  
Michal Greguš ◽  
Alexander R. Ivanov ◽  
...  

Abstract Motivation Accurate estimation of false discovery rate (FDR) of spectral identification is a central problem in mass spectrometry-based proteomics. Over the past two decades, target-decoy approaches (TDAs) and decoy-free approaches (DFAs) have been widely used to estimate FDR. TDAs use a database of decoy species to faithfully model score distributions of incorrect peptide-spectrum matches (PSMs). DFAs, on the other hand, fit two-component mixture models to learn the parameters of correct and incorrect PSM score distributions. While conceptually straightforward, both approaches lead to problems in practice, particularly in experiments that push instrumentation to the limit and generate low fragmentation-efficiency and low signal-to-noise-ratio spectra. Results We introduce a new decoy-free framework for FDR estimation that generalizes present DFAs while exploiting more search data in a manner similar to TDAs. Our approach relies on multi-component mixtures, in which score distributions corresponding to the correct PSMs, best incorrect PSMs and second-best incorrect PSMs are modeled by the skew normal family. We derive EM algorithms to estimate parameters of these distributions from the scores of best and second-best PSMs associated with each experimental spectrum. We evaluate our models on multiple proteomics datasets and a HeLa cell digest case study consisting of more than a million spectra in total. We provide evidence of improved performance over existing DFAs and improved stability and speed over TDAs without any performance degradation. We propose that the new strategy has the potential to extend beyond peptide identification and reduce the need for TDA on all analytical platforms. Availabilityand implementation https://github.com/shawn-peng/FDR-estimation. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Rebecca Beveridge ◽  
Johannes Stadlmann ◽  
Josef M. Penninger ◽  
Karl Mechtler

We have created synthetic peptide libraries to benchmark crosslinking mass spectrometry search engines for different types of crosslinker. The unique benefit of using a library is knowing which identified crosslinks are true and which are false. Here we have used mass spectrometry data generated from measurement of the peptide libraries to evaluate the most frequently applied search algorithms in crosslinking mass-spectrometry. When filtered to an estimated false discovery rate of 5%, false crosslink identification ranged from 5.2% to 11.3% for search engines with inbuilt validation strategies for error estimation. When different external validation strategies were applied to one single search output, false crosslink identification ranged from 2.4% to a surprising 32%, despite being filtered to an estimated 5% false discovery rate. Remarkably, the use of MS-cleavable crosslinkers did not reduce the false discovery rate compared to non-cleavable crosslinkers, results from which have far-reaching implications in structural biology. We anticipate that the datasets acquired during this research will further drive optimisation and development of search engines and novel data-interpretation technologies, thereby advancing our understanding of vital biological interactions.


2020 ◽  
Author(s):  
Grant M. Fujimoto ◽  
Jennifer E. Kyle ◽  
Joon-Yong Lee ◽  
Thomas O. Metz ◽  
Samuel H. Payne

AbstractMass spectrometry (MS)-based lipidomics is revolutionizing lipid research with high throughput identification and quantification of hundreds to thousands of lipids with the goal of elucidating lipid metabolism and function. Estimates of statistical confidence in lipid identification are essential for downstream data interpretation in a biological context. In the related field of proteomics, a variety of methods for estimating false-discovery are available, and understanding the statistical confidence of identifications is typically required for data analysis and hypothesis testing. However, there is no current method for estimating the false discovery rate (FDR) or statistical confidence for MS-based lipid identifications. This has slowed the adoption of MS-based lipidomics research, as all identifications require manual inspection and validation to ensure their accuracy. We present here the first generalizable method for FDR estimation, a target/decoy approach, that allows those conducting MS-based lipidomics research to confidently adjust spectral score thresholds to minimize false discovery and to enable full automation of data analysis.


2019 ◽  
Vol 31 (1) ◽  
pp. 154
Author(s):  
J. Miles ◽  
E. Wright-Johnson ◽  
S. Walsh ◽  
C. Corey ◽  
L. Yao ◽  
...  

Alterations in the signalling of critical molecular factors within the uterine milieu result in deficiencies in embryo elongation, leading directly to embryonic loss as well as delayed elongation. The objective of this study was to identify metabolites within the uterine environment from populations of uniform and diverse porcine conceptuses as they transition between spherical, ovoid, and tubular conceptuses during the initiation of embryo elongation. White crossbred gilts (n=38) were bred at standing oestrus (designated Day 0) and again 24h later and randomly assigned to collection group. At Day 9, 10, or 11 of gestation, reproductive tracts were collected immediately following harvest and flushed with 40mL of RPMI-1640 media. Conceptus morphologies were assessed from each pregnancy to assign to 1 of 5 treatment groups based on these morphologies: (1) uniform spherical (n=8); (2) diverse spherical and ovoid (n=8); (3) uniform ovoid (n=8); (4) diverse ovoid and tubular (n=8); and (5) uniform tubular (n=6). Subsequently uterine flushings from these pregnancies were submitted for non-targeted profiling by gas chromatography-mass spectrometry (GC-MS) and ultra-performance liquid chromatography-mass spectrometry (UPLC-MS) techniques. Raw spectral data were processed using the XCMS package in R (R Foundation for Statistical Computing, Vienna, Austria) and features were clustered using RAMclustR. Unsupervised multivariate principal component analysis was performed in R using pcamethods package, and univariate ANOVA was performed in R with a Benjamini-Hochberg false discovery rate adjustment. Principal component analysis of the GC-MS and UPLC-MS data identified 153 and 104 metabolites, respectively. Of the identified metabolites, 51 and 71 metabolites from the GC-MS and UPLC-MS analysis, respectively, corresponded to known compounds. After false discovery rate adjustment of the GC-MS and UPLC-MS data, 38 and 59 metabolites from the GC-MS and UPLC-MS analysis, respectively, differed (P<0.05) in uterine flushings from pregnancies for the 5 conceptus stages. Some metabolites were greater (P<0.05) in abundance for uterine flushings containing earlier stage conceptuses (i.e. spherical) such as uric acid, tryptophan, 5-hydroxy-L-tryptophan, and L-tryosine. In contrast, some metabolites were greater (P<0.05) in abundance for uterine flushings containing later stage conceptuses (i.e. tubular) such as creatinine, serine, isovaleryl-I-carnitine, and lauric diethaolamide. These data illustrate several putative metabolites that change within the uterine milieu as porcine embryos transition between spherical, ovoid, and tubular conceptuses. Funding was provided by USDA-NIFA-AFRI Grant no. 2017-67015-26456.


2019 ◽  
Author(s):  
Yohann Couté ◽  
Christophe Bruley ◽  
Thomas Burger

AbstractIn bottom-up discovery proteomics, target-decoy competition (TDC) is the most popular method for false discovery rate (FDR) control. Despite unquestionable statistical foundations, this method has drawbacks, including its hitherto unknown intrinsic lack of stability vis-à-vis practical conditions of application. Although some consequences of this instability have already been empirically described, they may have been misinter-preted. This article provides evidence that TDC has become less reliable as the accuracy of modern mass spectrometers improved. We therefore propose to replace TDC by a totally different method to control the FDR at spectrum, peptide and protein levels, while benefiting from the theoretical guarantees of the Benjamini-Hochberg framework. As this method is simpler to use, faster to compute and more stable than TDC, we argue that it is better adapted to the standardization and throughput constraints of current proteomic platforms.


2019 ◽  
Vol 18 (5) ◽  
pp. 2354-2358 ◽  
Author(s):  
Yulia Danilova ◽  
Anastasia Voronkova ◽  
Pavel Sulimov ◽  
Attila Kertész-Farkas

2018 ◽  
Author(s):  
Armin Schwartzman ◽  
Fabian Telschow

AbstractPeaks are a mainstay of neuroimage analysis for reporting localization results. The current peak detection procedure in SPM12 requires a pre-threshold for approximating p-values and a false discovery rate (FDR) nominal level for inference. However, the pre-threshold is an undesirable feature, while the FDR level is meaningless if the signal is assumed to be nonzero everywhere. This article provides: 1) a peak height distribution for smooth Gaussian error fields that does not require a screening pre-threshold; 2) a signal-plus-noise model where FDR of peaks can be controlled and properly interpreted. Matlab code for calculation of p-values using the exact peak height distribution is available as an SPM extension.


2017 ◽  
Author(s):  
Lutz Fischer ◽  
Juri Rappsilber

AbstractFalse discovery rate (FDR) estimation is a cornerstone of proteomics that has recently been adapted to cross-linking/mass spectrometry. Here we demonstrate that heterobifunctional cross-linkers, while theoretically different from homobifunctional cross-linkers, need not be considered separately in practice. We develop and then evaluate the impact of applying a correct FDR formula for use of heterobifunctional cross-linkers and conclude that there are minimal practical advantages. Hence a single formula can be applied to data generated from the many different non-cleavable cross-linkers.


2012 ◽  
Vol 9 (9) ◽  
pp. 901-903 ◽  
Author(s):  
Thomas Walzthoeni ◽  
Manfred Claassen ◽  
Alexander Leitner ◽  
Franz Herzog ◽  
Stefan Bohn ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document