scholarly journals IntroSpect: motif-guided immunopeptidome database building tool to improve the sensitivity of HLA binding peptide identification

2021 ◽  
Author(s):  
Le Zhang ◽  
Geng Liu ◽  
Guixue Hou ◽  
Haitao Xiang ◽  
Xi Zhang ◽  
...  

Although database search tools originally developed for shotgun proteome have been widely used in immunopeptidomic mass spectrometry identifications, they have been reported to achieve undesirably low sensitivities and/or high false positive rates as a result of the hugely inflated search space caused by the lack of specific enzymic digestions in immunopeptidome. To overcome such a problem, we have developed a motif-guided immunopeptidome database building tool named IntroSpect, which is designed to first learn the peptide motifs from high confidence hits in the initial search and then build a targeted database for refined search. Evaluated on three representative HLA class I datasets, IntroSpect can improve the sensitivity by an average of 80% comparing to conventional searches with unspecific digestions while maintaining a very high accuracy (~96%) as confirmed by synthetic validation experiments. A distinct advantage of IntroSpect is that it does not depend on any external HLA data so that it performs equally well on both well-studied and poorly-studied HLA types, unlike a previously developed method SpectMHC. We have also designed IntroSpect to keep a global FDR that can be conveniently controlled, similar to conventional database search engines. Finally, we demonstrate the practical value of IntroSpect by discovering neoantigens from MS data directly. IntroSpect is freely available at https://github.com/BGI2016/IntroSpect.

2016 ◽  
Author(s):  
Fengchao Yu ◽  
Ning Li ◽  
Weichuan Yu

AbstractIn computational proteomics, identification of peptides with an unlimited number of post-translational modification (PTM) types is a challenging task. The computational cost increases exponentially with respect to the number of modifiable amino acids and linearly with respect to the number of potential PTM types at each amino acid. The problem becomes intractable very quickly if we want to enumerate all possible modification patterns. Existing tools (e.g., MS-Alignment, ProteinProspector, and MODa) avoid enumerating modification patterns in database search by using an alignment-based approach to localize and characterize modified amino acids. This approach avoids enumerating all possible modification patterns in a database search. However, due to the large search space and PTM localization issue, the sensitivity of these tools is low. This paper proposes a novel method named PIPI to achieve PTM-invariant peptide identification. PIPI first codes peptide sequences into Boolean vectors and converts experimental spectra into real-valued vectors. Then, it finds the top 10 peptide-coded vectors for each spectrum-coded vector. After that, PIPI uses a dynamic programming algorithm to localize and characterize modified amino acids. Simulations and real data experiments have shown that PIPI outperforms existing tools by identifying more peptide-spectrum matches (PSMs) and reporting fewer false positives. It also runs much faster than existing tools when the database is large.


2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


2019 ◽  
Vol 9 (14) ◽  
pp. 2902
Author(s):  
Stan McClellan ◽  
Damian Valles ◽  
George Koutitas

A feedback-based architecture is presented for the distribution grid which enables the use of Machine Learning (ML) techniques for various applications, including Dynamic Voltage Optimization (DVO) and Demand Response (DR). In this architecture, sensor devices are resident on the distribution grid and therefore have a unique awareness of multiple system parameters. This enables the use of ongoing ML techniques for implementation of critical applications in the Smart Grid. Monitoring devices are placed at the endpoints and monitoring/control devices are placed along the power line on various types of grid-resident systems. Because the devices are grid-resident and interact directly with other devices on the same physical link, applications such as ML-assisted DVO can be targeted with very high confidence.


2020 ◽  
Vol 11 (1) ◽  
Author(s):  
Pan Fang ◽  
Yanlong Ji ◽  
Ivan Silbern ◽  
Carmen Doebele ◽  
Momchil Ninov ◽  
...  

Abstract Regulation of protein N-glycosylation is essential in human cells. However, large-scale, accurate, and site-specific quantification of glycosylation is still technically challenging. We here introduce SugarQuant, an integrated mass spectrometry-based pipeline comprising protein aggregation capture (PAC)-based sample preparation, multi-notch MS3 acquisition (Glyco-SPS-MS3) and a data-processing tool (GlycoBinder) that enables confident identification and quantification of intact glycopeptides in complex biological samples. PAC significantly reduces sample-handling time without compromising sensitivity. Glyco-SPS-MS3 combines high-resolution MS2 and MS3 scans, resulting in enhanced reporter signals of isobaric mass tags, improved detection of N-glycopeptide fragments, and lowered interference in multiplexed quantification. GlycoBinder enables streamlined processing of Glyco-SPS-MS3 data, followed by a two-step database search, which increases the identification rates of glycopeptides by 22% compared with conventional strategies. We apply SugarQuant to identify and quantify more than 5,000 unique glycoforms in Burkitt’s lymphoma cells, and determine site-specific glycosylation changes that occurred upon inhibition of fucosylation at high confidence.


2014 ◽  
Vol 13 (12) ◽  
pp. 3663-3673 ◽  
Author(s):  
Xusheng Wang ◽  
Yuxin Li ◽  
Zhiping Wu ◽  
Hong Wang ◽  
Haiyan Tan ◽  
...  

2019 ◽  
Author(s):  
Muhammad Haseeb ◽  
Muaaz G. Awan ◽  
Alexander S. Cadigan ◽  
Fahad Saeed

AbstractThe most commonly used strategy for peptide identification in shotgun LC-MS/MS proteomics involves searching of MS/MS data against an in-silico digested protein sequence database. Typically, the digested peptide sequences are indexed into the memory to allow faster search times. However, subjecting a database to post-translational modifications (PTMs) during digestion results in an exponential increase in the number of peptides and therefore memory consumption. This limits the usage of existing fragment-ion based open-search algorithms for databases with several PTMs. In this paper, we propose a novel fragment-ion indexing technique which is analogous to suffix array transformation and allows constant time querying of indexed ions. We extend our transformation method, called SLM-Transform, by constructing ion buckets that allow querying of all indexed ions by mass by only storing information on distribution of ion-frequencies within buckets. The stored information is used with a regression technique to locate the position of ions in constant time. Moreover, the number of theoretical b- and y-ions generated and indexed for each theoretical spectrum are limited. Our results show that SLM-Transform allows indexing of up to 4x peptides than other leading fragment-ion based database search tools within the same memory constraints. We show that SLM-Transform based index allows indexing of over 83 million peptides within 26GB RAM as compared to 80GB required by MSFragger. Finally, we show the constant ion retrieval time for SLM-Transform based index allowing ultrafast peptide search speeds.Source code will be made available at: https://github.com/pcdslab/slmindex


2013 ◽  
Vol 1 ◽  
pp. 327-340 ◽  
Author(s):  
Arianna Bisazza ◽  
Marcello Federico

Defining the reordering search space is a crucial issue in phrase-based SMT between distant languages. In fact, the optimal trade-off between accuracy and complexity of decoding is nowadays reached by harshly limiting the input permutation space. We propose a method to dynamically shape such space and, thus, capture long-range word movements without hurting translation quality nor decoding time. The space defined by loose reordering constraints is dynamically pruned through a binary classifier that predicts whether a given input word should be translated right after another. The integration of this model into a phrase-based decoder improves a strong Arabic-English baseline already including state-of-the-art early distortion cost (Moore and Quirk, 2007) and hierarchical phrase orientation models (Galley and Manning, 2008). Significant improvements in the reordering of verbs are achieved by a system that is notably faster than the baseline, while bleu and meteor remain stable, or even increase, at a very high distortion limit.


Sign in / Sign up

Export Citation Format

Share Document