scholarly journals Identification of best indicators of peptide-spectrum match using a permutation resampling approach

2014 ◽  
Vol 12 (05) ◽  
pp. 1440001 ◽  
Author(s):  
Malik N. Akhtar ◽  
Bruce R. Southey ◽  
Per E. Andrén ◽  
Jonathan V. Sweedler ◽  
Sandra L. Rodriguez-Zas

Various indicators of observed-theoretical spectrum matches were compared and the resulting statistical significance was characterized using permutation resampling. Novel decoy databases built by resampling the terminal positions of peptide sequences were evaluated to identify the conditions for accurate computation of peptide match significance levels. The methodology was tested on real and manually curated tandem mass spectra from peptides across a wide range of sizes. Spectra match indicators from complementary database search programs were profiled and optimal indicators were identified. The combination of the optimal indicator and permuted decoy databases improved the calculation of the peptide match significance compared to the approaches currently implemented in the database search programs that rely on distributional assumptions. Permutation tests using p-values obtained from software-dependent matching scores and E-values outperformed permutation tests using all other indicators. The higher overlap in matches between the database search programs when using end permutation compared to existing approaches confirmed the superiority of the end permutation method to identify peptides. The combination of effective match indicators and the end permutation method is recommended for accurate detection of peptides.

2019 ◽  
Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically non-significant at at least the alpha-level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this paper, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate's p-value and its associated confidence interval in relation to a specified alpha-level. These plots can help the analyst interpret and report both the statistical and substantive significance of their models. Illustrations are provided using a nonprobability sample of activists and participants at a 1962 anti-Communism school.


Author(s):  
Marshall A. Taylor

Coefficient plots are a popular tool for visualizing regression estimates. The appeal of these plots is that they visualize confidence intervals around the estimates and generally center the plot around zero, meaning that any estimate that crosses zero is statistically nonsignificant at least at the alpha level around which the confidence intervals are constructed. For models with statistical significance levels determined via randomization models of inference and for which there is no standard error or confidence intervals for the estimate itself, these plots appear less useful. In this article, I illustrate a variant of the coefficient plot for regression models with p-values constructed using permutation tests. These visualizations plot each estimate’s p-value and its associated confidence interval in relation to a specified alpha level. These plots can help the analyst interpret and report the statistical and substantive significances of their models. I illustrate using a nonprobability sample of activists and participants at a 1962 anticommunism school.


Mathematics ◽  
2021 ◽  
Vol 9 (6) ◽  
pp. 603
Author(s):  
Leonid Hanin

I uncover previously underappreciated systematic sources of false and irreproducible results in natural, biomedical and social sciences that are rooted in statistical methodology. They include the inevitably occurring deviations from basic assumptions behind statistical analyses and the use of various approximations. I show through a number of examples that (a) arbitrarily small deviations from distributional homogeneity can lead to arbitrarily large deviations in the outcomes of statistical analyses; (b) samples of random size may violate the Law of Large Numbers and thus are generally unsuitable for conventional statistical inference; (c) the same is true, in particular, when random sample size and observations are stochastically dependent; and (d) the use of the Gaussian approximation based on the Central Limit Theorem has dramatic implications for p-values and statistical significance essentially making pursuit of small significance levels and p-values for a fixed sample size meaningless. The latter is proven rigorously in the case of one-sided Z test. This article could serve as a cautionary guidance to scientists and practitioners employing statistical methods in their work.


PROTEOMICS ◽  
2009 ◽  
Vol 9 (7) ◽  
pp. 1763-1770 ◽  
Author(s):  
Hua Xu ◽  
Liwen Wang ◽  
Larry Sallans ◽  
Michael A. Freitas

SciVee ◽  
2011 ◽  
Author(s):  
Kyowon Jeong ◽  
Sangtae Kim ◽  
Nuno Bandeira ◽  
Pavel Pevzner

2010 ◽  
Vol 9 (12) ◽  
pp. 2772-2782 ◽  
Author(s):  
Xiaowen Liu ◽  
Yuval Inbar ◽  
Pieter C. Dorrestein ◽  
Colin Wynne ◽  
Nathan Edwards ◽  
...  

2011 ◽  
Vol 10 (12) ◽  
pp. M111.010017 ◽  
Author(s):  
Jian Wang ◽  
Philip E. Bourne ◽  
Nuno Bandeira

2010 ◽  
Vol 9 (12) ◽  
pp. 2840-2852 ◽  
Author(s):  
Sangtae Kim ◽  
Nikolai Mischerikow ◽  
Nuno Bandeira ◽  
J. Daniel Navarro ◽  
Louis Wich ◽  
...  

2013 ◽  
Vol 12 (8) ◽  
pp. 3652-3666 ◽  
Author(s):  
Kevin Brown Chandler ◽  
Petr Pompach ◽  
Radoslav Goldman ◽  
Nathan Edwards

PLoS ONE ◽  
2021 ◽  
Vol 16 (10) ◽  
pp. e0259349
Author(s):  
Muhammad Usman Tariq ◽  
Fahad Saeed

Historically, the database search algorithms have been the de facto standard for inferring peptides from mass spectrometry (MS) data. Database search algorithms deduce peptides by transforming theoretical peptides into theoretical spectra and matching them to the experimental spectra. Heuristic similarity-scoring functions are used to match an experimental spectrum to a theoretical spectrum. However, the heuristic nature of the scoring functions and the simple transformation of the peptides into theoretical spectra, along with noisy mass spectra for the less abundant peptides, can introduce a cascade of inaccuracies. In this paper, we design and implement a Deep Cross-Modal Similarity Network called SpeCollate, which overcomes these inaccuracies by learning the similarity function between experimental spectra and peptides directly from the labeled MS data. SpeCollate transforms spectra and peptides into a shared Euclidean subspace by learning fixed size embeddings for both. Our proposed deep-learning network trains on sextuplets of positive and negative examples coupled with our custom-designed SNAP-loss function. Online hardest negative mining is used to select the appropriate negative examples for optimal training performance. We use 4.8 million sextuplets obtained from the NIST and MassIVE peptide libraries to train the network and demonstrate that for closed search, SpeCollate is able to perform better than Crux and MSFragger in terms of the number of peptide-spectrum matches (PSMs) and unique peptides identified under 1% FDR for real-world data. SpeCollate also identifies a large number of peptides not reported by either Crux or MSFragger. To the best of our knowledge, our proposed SpeCollate is the first deep-learning network that can determine the cross-modal similarity between peptides and mass-spectra for MS-based proteomics. We believe SpeCollate is significant progress towards developing machine-learning solutions for MS-based omics data analysis. SpeCollate is available at https://deepspecs.github.io/.


Sign in / Sign up

Export Citation Format

Share Document