scholarly journals Accurate Identification of Deamidation and Citrullination from Global Shotgun Proteomics Data Using a Dual-Search Delta Score Strategy

2020 ◽  
Vol 19 (4) ◽  
pp. 1863-1872 ◽  
Author(s):  
Xi Wang ◽  
Adam C. Swensen ◽  
Tong Zhang ◽  
Paul D. Piehowski ◽  
Matthew J. Gaffrey ◽  
...  
2020 ◽  
Author(s):  
John T. Halloran ◽  
Gregor Urban ◽  
David Rocke ◽  
Pierre Baldi

AbstractSemi-supervised machine learning post-processors critically improve peptide identification of shot-gun proteomics data. Such post-processors accept the peptide-spectrum matches (PSMs) and feature vectors resulting from a database search, train a machine learning classifier, and recalibrate PSMs using the trained parameters, often yielding significantly more identified peptides across q-value thresholds. However, current state-of-the-art post-processors rely on shallow machine learning methods, such as support vector machines. In contrast, the powerful training capabilities of deep learning models have displayed superior performance to shallow models in an ever-growing number of other fields. In this work, we show that deep models significantly improve the recalibration of PSMs compared to the most accurate and widely-used post-processors, such as Percolator and PeptideProphet. Furthermore, we show that deep learning is able to adaptively analyze complex datasets and features for more accurate universal post-processing, leading to both improved Prosit analysis and markedly better recalibration of recently developed database-search functions.


2014 ◽  
Author(s):  
Zhiqiang Hu ◽  
Hamish S. Scott ◽  
Guangrong Qin ◽  
Guangyong Zheng ◽  
Xixia Chu ◽  
...  

Biological and biomedical research relies on comprehensive understanding of protein-coding transcripts. However, the total number of human proteins is still unknown due to the prevalence of alternative splicing and is much larger than the number of human genes. In this paper, we detected 31,566 novel transcripts with coding potential by filtering our ab initio predictions with 50 RNA-seq datasets from diverse tissues/cell lines. PCR followed by MiSeq sequencing showed that at least 84.1% of these predicted novel splice sites could be validated. In contrast to known transcripts, the expression of these novel transcripts were highly tissue-specific. Based on these novel transcripts, at least 36 novel proteins were detected from shotgun proteomics data of 41 breast samples. We also showed L1 retrotransposons have a more significant impact on the origin of new transcripts/genes than previously thought. Furthermore, we found that alternative splicing is extraordinarily widespread for genes involved in specific biological functions like protein binding, nucleoside binding, neuron projection, membrane organization and cell adhesion. In the end, the total number of human transcripts with protein-coding potential was estimated to be at least 204,950.


2018 ◽  
Author(s):  
Uri Keich ◽  
Kaipo Tamura ◽  
William Stafford Noble

AbstractDecoy database search with target-decoy competition (TDC) provides an intuitive, easy-to-implement method for estimating the false discovery rate (FDR) associated with spectrum identifications from shotgun proteomics data. However, the procedure can yield different results for a fixed dataset analyzed with different decoy databases, and this decoy-induced variability is particularly problematic for smaller FDR thresholds, datasets or databases. In such cases, the nominal FDR might be 1% but the true proportion of false discoveries might be 10%. The averaged TDC protocol combats this problem by exploiting multiple independently shuffled decoy databases to provide an FDR estimate with reduced variability. We provide a tutorial introduction to aTDC, describe an improved variant of the protocol that offers increased statistical power, and discuss how to deploy aTDC in practice using the Crux software toolkit.


2020 ◽  
Author(s):  
D.C.L. Handler ◽  
P.A. Haynes

AbstractAssessment of replicate quality is an important process for any shotgun proteomics experiment. One fundamental question in proteomics data analysis is whether any specific replicates in a set of analyses are biasing the downstream comparative quantitation. In this paper, we present an experimental method to address such a concern. PeptideMind uses a series of clustering Machine Learning algorithms to assess outliers when comparing proteomics data from two states with six replicates each. The program is a JVM native application written in the Kotlin language with Python sub-process calls to scikit-learn. By permuting the six data replicates provided into four hundred triplet non redundant pairwise comparisons, PeptideMind determines if any one replicate is biasing the downstream quantitation of the states. In addition, PeptideMind generates useful visual representations of the spread of the significance measures, allowing researchers a rapid, effective way to monitor the quality of those identified proteins found to be differentially expressed between sample states.


2019 ◽  
Author(s):  
Nikita Prianichnikov ◽  
Heiner Koch ◽  
Scarlet Koch ◽  
Markus Lubeck ◽  
Raphael Heilig ◽  
...  

SummaryIon mobility can add a dimension to LC-MS based shotgun proteomics which has the potential to boost proteome coverage, quantification accuracy and dynamic range. Required for this is suitable software that extracts the information contained in the four-dimensional (4D) data space spanned by m/z, retention time, ion mobility and signal intensity. Here we describe the ion mobility enhanced MaxQuant software, which utilizes the added data dimension. It offers an end to end computational workflow for the identification and quantification of peptides, proteins and posttranslational modification sites in LC-IMS-MS/MS shotgun proteomics data. We apply it to trapped ion mobility spectrometry (TIMS) coupled to a quadrupole time-of-flight (QTOF) analyzer. A highly parallelizable 4D feature detection algorithm extracts peaks which are assembled to isotope patterns. Masses are recalibrated with a non-linear m/z, retention time, ion mobility and signal intensity dependent model, based on peptides from the sample. A new matching between runs (MBR) algorithm that utilizes collisional cross section (CCS) values of MS1 features in the matching process significantly gains specificity from the extra dimension. Prerequisite for using CCS values in MBR is a relative alignment of the ion mobility values between the runs. The missing value problem in protein quantification over many samples is greatly reduced by CCS aware MBR.MS1 level label-free quantification is also implemented which proves to be highly precise and accurate on a benchmark dataset with known ground truth. MaxQuant for LC-IMS-MS/MS is part of the basic MaxQuant release and can be downloaded from http://maxquant.org.


2020 ◽  
Vol 17 (9) ◽  
pp. 869-870 ◽  
Author(s):  
Felipe da Veiga Leprevost ◽  
Sarah E. Haynes ◽  
Dmitry M. Avtonomov ◽  
Hui-Yin Chang ◽  
Avinash K. Shanmugam ◽  
...  

BMC Genomics ◽  
2010 ◽  
Vol 11 (1) ◽  
Author(s):  
Machtelt Braaksma ◽  
Elena S Martens-Uzunova ◽  
Peter J Punt ◽  
Peter J Schaap

2018 ◽  
Vol 1 (1) ◽  
pp. 207-234 ◽  
Author(s):  
Pavel Sinitcyn ◽  
Jan Daniel Rudolph ◽  
Jürgen Cox

Computational proteomics is the data science concerned with the identification and quantification of proteins from high-throughput data and the biological interpretation of their concentration changes, posttranslational modifications, interactions, and subcellular localizations. Today, these data most often originate from mass spectrometry–based shotgun proteomics experiments. In this review, we survey computational methods for the analysis of such proteomics data, focusing on the explanation of the key concepts. Starting with mass spectrometric feature detection, we then cover methods for the identification of peptides. Subsequently, protein inference and the control of false discovery rates are highly important topics covered. We then discuss methods for the quantification of peptides and proteins. A section on downstream data analysis covers exploratory statistics, network analysis, machine learning, and multiomics data integration. Finally, we discuss current developments and provide an outlook on what the near future of computational proteomics might bear.


2007 ◽  
Vol 7 (4) ◽  
pp. 631-644 ◽  
Author(s):  
Norman Pavelka ◽  
Marjorie L. Fournier ◽  
Selene K. Swanson ◽  
Mattia Pelizzola ◽  
Paola Ricciardi-Castagnoli ◽  
...  

2014 ◽  
Vol 13 (6) ◽  
pp. 1552-1562 ◽  
Author(s):  
Yafeng Zhu ◽  
Lina Hultin-Rosenberg ◽  
Jenny Forshed ◽  
Rui M. M. Branca ◽  
Lukas M. Orre ◽  
...  

Sign in / Sign up

Export Citation Format

Share Document