scholarly journals On Comparison of SimTandem with State-of-the-Art Peptide Identification Tools, Efficiency of Precursor Mass Filter and Dealing with Variable Modifications

2013 ◽  
Vol 10 (3) ◽  
pp. 1-15
Author(s):  
Jiří Novák ◽  
David Hoksza ◽  
Tomáš Skopal ◽  
Oliver Kohlbacher

Summary The similarity search in theoretical mass spectra generated from protein sequence databases is a widely accepted approach for identification of peptides from query mass spectra produced by shotgun proteomics. Growing protein sequence databases and noisy query spectra demand database indexing techniques and better similarity measures for the comparison of theoretical spectra against query spectra. We employ a modification of previously proposed parameterized Hausdorff distance for comparisons of mass spectra. The new distance outperforms the original distance, the angle distance and state-of-the-art peptide identification tools OMSSA and X!Tandem in the number of identified peptides even though the q-value is only 0.001. When a precursor mass filter is used as a database indexing technique, our method outperforms OMSSA in the speed of search. When variable modifications are not searched, the search time is similar to X!Tandem. We show that the precursor mass filter is an efficient database indexing technique for high-accuracy data even though many variable modifications are being searched. We demonstrate that the number of identified peptides is bigger when variable modifications are searched separately by more search runs of a peptide identification engine. Otherwise, the false discovery rates are affected by mixing unmodified and modified spectra together resulting in a lower number of identified peptides. Our method is implemented in the freely available application SimTandem which can be used in the framework TOPP based on OpenMS.

2019 ◽  
Vol 102 (5) ◽  
pp. 1263-1270 ◽  
Author(s):  
Weili Xiong ◽  
Melinda A McFarland ◽  
Cary Pirone ◽  
Christine H Parker

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.


2021 ◽  
Vol 13 (1) ◽  
pp. 1-25
Author(s):  
Michael Loster ◽  
Ioannis Koumarelas ◽  
Felix Naumann

The integration of multiple data sources is a common problem in a large variety of applications. Traditionally, handcrafted similarity measures are used to discover, merge, and integrate multiple representations of the same entity—duplicates—into a large homogeneous collection of data. Often, these similarity measures do not cope well with the heterogeneity of the underlying dataset. In addition, domain experts are needed to manually design and configure such measures, which is both time-consuming and requires extensive domain expertise. We propose a deep Siamese neural network, capable of learning a similarity measure that is tailored to the characteristics of a particular dataset. With the properties of deep learning methods, we are able to eliminate the manual feature engineering process and thus considerably reduce the effort required for model construction. In addition, we show that it is possible to transfer knowledge acquired during the deduplication of one dataset to another, and thus significantly reduce the amount of data required to train a similarity measure. We evaluated our method on multiple datasets and compare our approach to state-of-the-art deduplication methods. Our approach outperforms competitors by up to +26 percent F-measure, depending on task and dataset. In addition, we show that knowledge transfer is not only feasible, but in our experiments led to an improvement in F-measure of up to +4.7 percent.


1994 ◽  
Vol 19 (1) ◽  
pp. 23-27
Author(s):  
Gail M. Hodge

Discusses the state-of-the-art in computer indexing, defines indexing and computer assistance, describes the reasons for renewed interest, identifies the types of computer support in use using selected operational systems, describes the integration of various computer supports in one data base production system, and speculates on the future.


Sign in / Sign up

Export Citation Format

Share Document