Cooperativity within proximal phosphorylation sites is revealed from large-scale proteomics data

AbstractData independent analysis (DIA) exemplified by sequential window acquisition of all theoretical mass spectra (SWATH-MS) provides robust quantitative proteomics data, but the lack of a public primary human T-cell spectral library is a current resource gap. Here, we report the generation of a high-quality spectral library containing data for 4,833 distinct proteins from human T-cells across genetically unrelated donors, covering ~24% proteins of the UniProt/SwissProt reviewed human proteome. SWATH-MS analysis of 18 primary T-cell samples using the new human T-cell spectral library reliably identified and quantified 2,850 proteins at 1% false discovery rate (FDR). In comparison, the larger Pan-human spectral library identified and quantified 2,794 T-cell proteins in the same dataset. As the libraries identified an overlapping set of proteins, combining the two libraries resulted in quantification of 4,078 human T-cell proteins. Collectively, this large data archive will be a useful public resource for human T-cell proteomic studies. The human T-cell library is available at SWATHAtlas and the data are available via ProteomeXchange (PXD019446 and PXD019542) and PeptideAtlas (PASS01587).

Download Full-text

Identification of AMPK Phosphorylation Sites Reveals a Network of Proteins Involved in Cell Invasion and Facilitates Large-Scale Substrate Prediction

Cell Metabolism ◽

10.1016/j.cmet.2015.09.009 ◽

2015 ◽

Vol 22 (5) ◽

pp. 907-921 ◽

Cited By ~ 92

Author(s):

Bethany E. Schaffer ◽

Rebecca S. Levin ◽

Nicholas T. Hertz ◽

Travis J. Maures ◽

Michael L. Schoof ◽

...

Keyword(s):

Cell Invasion ◽

Large Scale ◽

Phosphorylation Sites ◽

Ampk Phosphorylation ◽

Substrate Prediction

Download Full-text

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Download Full-text

MSpectraAI: a powerful platform for deciphering proteome profiling of multi-tumor mass spectrometry data by using deep neural networks

BMC Bioinformatics ◽

10.1186/s12859-020-03783-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Shisheng Wang ◽

Hongwen Zhu ◽

Hu Zhou ◽

Jingqiu Cheng ◽

Hao Yang

Keyword(s):

Mass Spectrometry ◽

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Spectral Feature ◽

Mass Spectrometry Data ◽

Learning Approaches ◽

Proteomics Data ◽

Proteome Profiling ◽

Analytical Technique

Abstract Background Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved. Results We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967). Conclusion This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.

Download Full-text

Chromosome-level de novo assembly of the pig-tailed macaque genome using linked-read sequencing and HiC proximity scaffolding

GigaScience ◽

10.1093/gigascience/giaa069 ◽

2020 ◽

Vol 9 (7) ◽

Author(s):

Morteza Roodgar ◽

Afshin Babveyh ◽

Lan H Nguyen ◽

Wenyu Zhou ◽

Rahul Sinha ◽

...

Keyword(s):

Large Scale ◽

De Novo ◽

Innate Immune ◽

Human Diseases ◽

Macaca Nemestrina ◽

Proximity Ligation Assay ◽

Immune Gene ◽

Proteomics Data ◽

Macaque Species ◽

Chromosome Level

Abstract Background Macaque species share >93% genome homology with humans and develop many disease phenotypes similar to those of humans, making them valuable animal models for the study of human diseases (e.g., HIV and neurodegenerative diseases). However, the quality of genome assembly and annotation for several macaque species lags behind the human genome effort. Results To close this gap and enhance functional genomics approaches, we used a combination of de novo linked-read assembly and scaffolding using proximity ligation assay (HiC) to assemble the pig-tailed macaque (Macaca nemestrina) genome. This combinatorial method yielded large scaffolds at chromosome level with a scaffold N50 of 127.5 Mb; the 23 largest scaffolds covered 90% of the entire genome. This assembly revealed large-scale rearrangements between pig-tailed macaque chromosomes 7, 12, and 13 and human chromosomes 2, 14, and 15. We subsequently annotated the genome using transcriptome and proteomics data from personalized induced pluripotent stem cells derived from the same animal. Reconstruction of the evolutionary tree using whole-genome annotation and orthologous comparisons among 3 macaque species, human, and mouse genomes revealed extensive homology between human and pig-tailed macaques with regards to both pluripotent stem cell genes and innate immune gene pathways. Our results confirm that rhesus and cynomolgus macaques exhibit a closer evolutionary distance to each other than either species exhibits to humans or pig-tailed macaques. Conclusions These findings demonstrate that pig-tailed macaques can serve as an excellent animal model for the study of many human diseases particularly with regards to pluripotency and innate immune pathways.

Download Full-text

Connecting Histopathology Imaging and Proteomics in Kidney Cancer through Machine Learning

Journal of Clinical Medicine ◽

10.3390/jcm8101535 ◽

2019 ◽

Vol 8 (10) ◽

pp. 1535 ◽

Cited By ~ 7

Author(s):

Francisco Azuaje ◽

Sang-Yoon Kim ◽

Daniel Perez Hernandez ◽

Gunnar Dittmar

Keyword(s):

Machine Learning ◽

Large Scale ◽

Diagnostic Value ◽

Classification Model ◽

Clinical Approach ◽

Proteomics Data ◽

Cell Renal Cell Carcinoma ◽

Molecular Features ◽

Genes Encoding ◽

New Research

Proteomics data encode molecular features of diagnostic value and accurately reflect key underlying biological mechanisms in cancers. Histopathology imaging is a well-established clinical approach to cancer diagnosis. The predictive relationship between large-scale proteomics and H&E-stained histopathology images remains largely uncharacterized. Here we investigate such associations through the application of machine learning, including deep neural networks, to proteomics and histology imaging datasets generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) from clear cell renal cell carcinoma patients. We report robust correlations between a set of diagnostic proteins and predictions generated by an imaging-based classification model. Proteins significantly correlated with the histology-based predictions are significantly implicated in immune responses, extracellular matrix reorganization, and metabolism. Moreover, we showed that the genes encoding these proteins also reliably recapitulate the biological associations with imaging-derived predictions based on strong gene–protein expression correlations. Our findings offer novel insights into the integrative modeling of histology and omics data through machine learning, as well as the methodological basis for new research opportunities in this and other cancer types.

Download Full-text

Large-Scale Identification and Evolution Indexing of Tyrosine Phosphorylation Sites from Murine Brain

Journal of Proteome Research ◽

10.1021/pr0701254 ◽

2008 ◽

Vol 7 (1) ◽

pp. 311-318 ◽

Cited By ~ 122

Author(s):

Bryan A. Ballif ◽

G. Richard Carey ◽

Shamil R. Sunyaev ◽

Steven P. Gygi

Keyword(s):

Tyrosine Phosphorylation ◽

Large Scale ◽

Phosphorylation Sites ◽

Murine Brain

Download Full-text

Large-scale metabolome analysis and quantitative integration with genomics and proteomics data in Mycoplasma pneumoniae

Molecular BioSystems ◽

10.1039/c3mb70113a ◽

2013 ◽

Vol 9 (7) ◽

pp. 1743 ◽

Cited By ~ 17

Author(s):

Tobias Maier ◽

Josep Marcos ◽

Judith A. H. Wodke ◽

Bernhard Paetzold ◽

Manuel Liebeke ◽

...

Keyword(s):

Mycoplasma Pneumoniae ◽

Large Scale ◽

Proteomics Data ◽

Metabolome Analysis ◽

Genomics And Proteomics ◽

Quantitative Integration

Download Full-text

PolySTest: Robust statistical testing of proteomics data with missing values improves detection of biologically relevant features

10.1101/765818 ◽

2019 ◽

Cited By ~ 1

Author(s):

Veit Schwämmle ◽

Christina E Hagensen ◽

Adelina Rogowska-Wrzesinska ◽

Ole N. Jensen

Keyword(s):

Mass Spectrometry ◽

Large Scale ◽

Missing Values ◽

Statistical Tests ◽

Ground Truth ◽

Statistical Testing ◽

Molecular Networks ◽

Proteomics Data ◽

Biologically Relevant ◽

Data Browsing

AbstractStatistical testing remains one of the main challenges for high-confidence detection of differentially regulated proteins or peptides in large-scale quantitative proteomics experiments by mass spectrometry. Statistical tests need to be sufficiently robust to deal with experiment intrinsic data structures and variations and often also reduced feature coverage across different biological samples due to ubiquitous missing values. A robust statistical test provides accurate confidence scores of large-scale proteomics results, regardless of instrument platform, experimental protocol and software tools. However, the multitude of different combinations of experimental strategies, mass spectrometry techniques and informatics methods complicate the decision of choosing appropriate statistical approaches. We address this challenge by introducing PolySTest, a user-friendly web service for statistical testing, data browsing and data visualization. We introduce a new method, Miss Test, that simultaneously tests for missingness and feature abundance, thereby complementing common statistical tests by rescuing otherwise discarded data features. We demonstrate that PolySTest with integrated Miss Test achieves higher confidence and higher sensitivity for artificial and experimental proteomics data sets with known ground truth. Application of PolySTest to mass spectrometry based large-scale proteomics data obtained from differentiating muscle cells resulted in the rescue of 10%-20% additional proteins in the identified molecular networks relevant to muscle differentiation. We conclude that PolySTest is a valuable addition to existing tools and instrument enhancements that improve coverage and depth of large-scale proteomics experiments. A fully functional demo version of PolySTest and Miss Test is available via http://computproteomics.bmb.sdu.dk/Apps/PolySTest.

Download Full-text

FLEXIQuant-LF to quantify protein modification extent in label-free proteomics data

eLife ◽

10.7554/elife.58783 ◽

2020 ◽

Vol 9 ◽

Author(s):

Christoph N Schlaffner ◽

Konstantin Kahnert ◽

Jan Muntel ◽

Ruchi Chauhan ◽

Bernhard Y Renard ◽

...

Keyword(s):

Protein Modification ◽

Large Scale ◽

Software Tool ◽

Label Free ◽

Anaphase Promoting Complex ◽

Proteomics Data ◽

Single Experiment ◽

Post Translational Modifications ◽

Data Independent Acquisition ◽

Modified Peptides

Improvements in LC-MS/MS methods and technology have enabled the identification of thousands of modified peptides in a single experiment. However, protein regulation by post-translational modifications (PTMs) is not binary, making methods to quantify the modification extent crucial to understanding the role of PTMs. Here, we introduce FLEXIQuant-LF, a software tool for large-scale identification of differentially modified peptides and quantification of their modification extent without knowledge of the types of modifications involved. We developed FLEXIQuant-LF using label-free quantification of unmodified peptides and robust linear regression to quantify the modification extent of peptides. As proof of concept, we applied FLEXIQuant-LF to data-independent-acquisition (DIA) data of the anaphase promoting complex/cyclosome (APC/C) during mitosis. The unbiased FLEXIQuant-LF approach to assess the modification extent in quantitative proteomics data provides a better understanding of the function and regulation of PTMs. The software is available at https://github.com/SteenOmicsLab/FLEXIQuantLF.

Download Full-text