Optimization of Spectral Library Size Improves DIA-MS Proteome Coverage

AbstractEfficient peptide and protein identification from data-independent acquisition mass spectrometric (DIA-MS) data typically rely on an experiment-specific spectral library with a suitable size. Here, we report a computational strategy for optimizing the spectral library for a specific DIA dataset based on a comprehensive spectral library, which is accomplished by a priori analysis of the DIA dataset. This strategy achieved up to 44.7% increase in peptide identification and 38.1% increase in protein identification in the test dataset of six colorectal tumor samples compared with the comprehensive pan-human library strategy. We further applied this strategy to 389 carcinoma samples from 15 tumor datasets and observed up to 39.2% increase in peptide identification and 19.0% increase in protein identification. In summary, we present a computational strategy for spectral library size optimization to achieve deeper proteome coverage of DIA-MS data.

Download Full-text

A hybrid spectral library combining DIA-MS data and a targeted virtual library substantially deepens the proteome coverage

10.1101/2020.01.16.909952 ◽

2020 ◽

Author(s):

Ronghui Lou ◽

Pan Tang ◽

Kang Ding ◽

Shanshan Li ◽

Cuiping Tian ◽

...

Keyword(s):

Mouse Brain ◽

Protein Identification ◽

Transmembrane Protein ◽

Peptide Identification ◽

Protein Family ◽

Virtual Library ◽

Proteomic Profiling ◽

Spectral Library ◽

Proteome Coverage ◽

Brain Tissues

AbstractData-independent acquisition mass spectrometry (DIA-MS) is a rapidly evolving technique that enables relatively deep proteomic profiling with superior quantification reproducibility. DIA data mining predominantly relies on a spectral library of sufficient proteome coverage that, in most cases, is built on data-dependent acquisition-based analysis of the same sample. To expand the proteome coverage for a pre-determined protein family, we report herein on the construction of a hybrid spectral library that supplements a DIA experiment-derived library with a protein family-targeted virtual library predicted by deep learning. Leveraging this DIA hybrid library substantially deepens the coverage of three transmembrane protein families (G protein coupled receptors; ion channels; and transporters) in mouse brain tissues with increases in protein identification of 37-87%, and peptide identification of 58-161%. Moreover, of the 412 novel GPCR peptides exclusively identified with the DIA hybrid library strategy, 53.6% were validated as present in mouse brain tissues based on orthogonal experimental measurement.

Download Full-text

DIA-Pipe: Identification and Quantification of Post-Translational Modifications using exclusively Data-Independent Acquisition

10.1101/141382 ◽

2017 ◽

Author(s):

Jesse G. Meyer ◽

Sushanth Mukkamalla ◽

Alexandria K. D’Souza ◽

Alexey I. Nesvizhskii ◽

Bradford W. Gibson ◽

...

Keyword(s):

Peptide Identification ◽

Software Tool ◽

Label Free ◽

Spectral Library ◽

Automated Identification ◽

Post Translational Modifications ◽

Data Independent Acquisition ◽

Sample Amount ◽

Using Data ◽

Identification And Quantification

Label-free quantification using data-independent acquisition (DIA) is a robust method for deep and accurate proteome quantification1,2. However, when lacking a pre-existing spectral library, as is often the case with studies of novel post-translational modifications (PTMs), samples are typically analyzed several times: one or more data dependent acquisitions (DDA) are used to generate a spectral library followed by DIA for quantification. This type of multi-injection analysis results in significant cost with regard to sample consumption and instrument time for each new PTM study, and may not be possible when sample amount is limiting and/or studies require a large number of biological replicates. Recently developed software (e.g. DIA-Umpire) has enabled combined peptide identification and quantification from a data-independent acquisition without any pre-existing spectral library3,4. Still, these tools are designed for protein level quantification. Here we demonstrate a software tool and workflow that extends DIA-Umpire to allow automated identification and quantification of PTM peptides from DIA. We accomplish this using a custom, open-source graphical user interface DIA-Pipe (https://github.com/jgmeyerucsd/PIQEDia/releases/tag/v0.1.2) (figure 1a).

Download Full-text

Sample Size-Comparable Spectral Library Enhances Data-Independent Acquisition-Based Proteome Coverage of Low-Input Cells

Analytical Chemistry ◽

10.1021/acs.analchem.1c03477 ◽

2021 ◽

Author(s):

Asad Ali Siyal ◽

Eric Sheng-Wen Chen ◽

Hsin-Ju Chan ◽

Reta Birhanu Kitata ◽

Jhih-Ci Yang ◽

...

Keyword(s):

Sample Size ◽

Spectral Library ◽

Proteome Coverage ◽

Low Input ◽

Data Independent Acquisition

Download Full-text

High-pH reversed-phase fractionated neural retina proteome of normal growing C57BL/6 mouse

Scientific Data ◽

10.1038/s41597-021-00813-1 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Ying Hon Sze ◽

Qian Zhao ◽

Jimmy Ka Wai Cheung ◽

King Kit Li ◽

Dennis Yan Yin Tse ◽

...

Keyword(s):

Visual Information ◽

Protein Identification ◽

Wild Type Mouse ◽

Reversed Phase ◽

Mouse Retina ◽

Spectral Library ◽

High Ph ◽

Data Independent Acquisition ◽

Proteomics Approach ◽

Extensive Reference

AbstractThe retina is a key sensory tissue composed of multiple layers of cell populations that work coherently to process and decode visual information. Mass spectrometry-based proteomics approach has allowed high-throughput, untargeted protein identification, demonstrating the presence of these proteins in the retina and their involvement in biological signalling cascades. The comprehensive wild-type mouse retina proteome was prepared using a novel sample preparation approach, the suspension trapping (S-Trap) filter, and further fractionated with high-pH reversed phase chromatography involving a total of 28 injections. This data-dependent acquisition (DDA) approach using a Sciex TripleTOF 6600 mass spectrometer identified a total of 7,122 unique proteins (1% FDR), and generated a spectral library of 5,950 proteins in the normal C57BL/6 mouse retina. Data-independent acquisition (DIA) approach relies on a large and high-quality spectral library to analyse chromatograms, this spectral library would enable access to SWATH-MS acquisition to provide unbiased, multiplexed, and quantification of proteins in the mouse retina, acting as the most extensive reference library to investigate retinal diseases using the C57BL/6 mouse model.

Download Full-text

Computational Optimization of Spectral Library Size Improves DIA-MS Proteome Coverage and Applications to 15 Tumors

Journal of Proteome Research ◽

10.1021/acs.jproteome.1c00640 ◽

2021 ◽

Author(s):

Weigang Ge ◽

Xiao Liang ◽

Fangfei Zhang ◽

Yifan Hu ◽

Luang Xu ◽

...

Keyword(s):

Spectral Library ◽

Library Size ◽

Proteome Coverage ◽

Computational Optimization

Download Full-text

DeepPhospho accelerates DIA phosphoproteome profiling through in silico library generation

Nature Communications ◽

10.1038/s41467-021-26979-1 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Ronghui Lou ◽

Weizhen Liu ◽

Rongjie Li ◽

Shanshan Li ◽

Xuming He ◽

...

Keyword(s):

Data Mining ◽

In Silico ◽

Deep Neural Network ◽

Learning Models ◽

Spectral Library ◽

Proteome Coverage ◽

Data Independent Acquisition ◽

User Access ◽

Model Training ◽

Egf Signaling

AbstractPhosphoproteomics integrating data-independent acquisition (DIA) enables deep phosphoproteome profiling with improved quantification reproducibility and accuracy compared to data-dependent acquisition (DDA)-based phosphoproteomics. DIA data mining heavily relies on a spectral library that in most cases is built on DDA analysis of the same sample. Construction of this project-specific DDA library impairs the analytical throughput, limits the proteome coverage, and increases the sample size for DIA phosphoproteomics. Herein we introduce a deep neural network, DeepPhospho, which conceptually differs from previous deep learning models to achieve accurate predictions of LC-MS/MS data for phosphopeptides. By leveraging in silico libraries generated by DeepPhospho, we establish a DIA workflow for phosphoproteome profiling which involves DIA data acquisition and data mining with DeepPhospho predicted libraries, thus circumventing the need of DDA library construction. Our DeepPhospho-empowered workflow substantially expands the phosphoproteome coverage while maintaining high quantification performance, which leads to the discovery of more signaling pathways and regulated kinases in an EGF signaling study than the DDA library-based approach. DeepPhospho is provided as a web server as well as an offline app to facilitate user access to model training, predictions and library generation.

Download Full-text

DeepPhospho: Accelerate DIA phosphoproteome profiling by Deep Learning

10.21203/rs.3.rs-393214/v1 ◽

2021 ◽

Author(s):

Wenqing Shui ◽

Ronghui Lou ◽

Weizhen Liu ◽

Rongjie Li ◽

Shanshan Li ◽

...

Keyword(s):

Neural Network ◽

Data Mining ◽

Deep Learning ◽

Deep Neural Network ◽

Learning Models ◽

Spectral Library ◽

Proteome Coverage ◽

Data Independent Acquisition ◽

User Access ◽

Egf Signaling

Abstract Phosphoproteomics integrating data-independent acquisition (DIA) has enabled deep phosphoproteome profiling with improved quantification reproducibility and accuracy compared to data-dependent acquisition (DDA)-based phosphoproteomics. DIA data mining heavily relies on a spectral library that in most cases is built on DDA analysis of the same sample. Construction of this project-specific DDA library impairs the analytical throughput, limits the proteome coverage, and increases the sample size for DIA phosphoproteomics. Herein we introduce a novel deep neural network, DeepPhospho, which conceptually differs from previous deep learning models to achieve accurate predictions of LC-MS/MS data for phosphopeptides. By leveraging in silico libraries generated by DeepPhospho, we established a new DIA workflow for phosphoproteome profiling which involves DIA data acquisition and data mining with DeepPhospho predicted libraries, thus circumventing the need of DDA library construction. Our DeepPhospho-empowered workflow substantially expanded the phosphoproteome coverage while maintaining high quantification performance, which led to the discovery of more signaling pathways and regulated kinases in an EGF signaling study than the DDA library-based approach. DeepPhospho is provided as a web server to facilitate user access to predictions and library generation.

Download Full-text

Alpha-Frag: a deep neural network for fragment presence prediction improves peptide identification by data independent acquisition mass spectrometry

10.1101/2021.04.07.438629 ◽

2021 ◽

Author(s):

Jian Song ◽

Fangfei Zhang ◽

Changbin Yu

Keyword(s):

Neural Network ◽

Mass Spectrometry ◽

Deep Neural Network ◽

Peptide Identification ◽

Prediction Performance ◽

Fragmentation Mechanism ◽

Statistical Validation ◽

Test Dataset ◽

Data Independent Acquisition ◽

Fragment Ions

ABSTRACTMotivationIdentification of peptides in data-independent acquisition (DIA) mass spectrometry (MS) typically relies on the scoring for the peak groups upon extracted chromatograms of fragment ions. Expanding fragment scoring features closer to the genuine experimental spectra can improve DIA identification. Deep learning is able to predict fragment presence without understanding the fragmentation mechanism that can enrich the scoring features in DIA identification.ResultsIn this work, we developed a deep neural network-based model, Alpha-Frag, to predict the fragment ions that should be present for a given peptide by reporting their probabilities of existence. The prediction performance was evaluated in terms of intersection over union (IoU), and Alpha-Frag achieved an average of >0.7 and outperformed substantially the benchmarks across the validation datasets. Furthermore, qualitative scores based on Alpha-Frag were designed and incorporated into the peptide statistical validation tools as auxiliary scores. Our preliminary experiments show that the qualitative scores by Alpha-Frag are profitable for DIA identification, especially in the case of short gradient, and yielded an increase of 10.1%-29.3% improvements for the test dataset compared to the same scoring strategy but using Prosit.Availability and ImplementationSource code and the trained model are available at www.github.com/YuAirLab/Alpha-Frag.

Download Full-text