A general approach to explore prokaryotic protein glycosylation reveals the unique surface layer modulation of an anammox bacterium

The enormous chemical diversity and strain variability of prokaryotic protein glycosylation makes a large-scale exploration exceptionally challenging. Therefore, despite the universal relevance of protein glycosylation across all domains of life, the understanding of their biological significance and the evolutionary forces shaping oligosaccharide structures remains highly limited.Here, we report on a newly established mass binning glycoproteomics approach that establishes the chemical identity of the carbohydrate components and performs untargeted exploration of prokaryotic oligosaccharides from large-scale proteomics data directly. We demonstrate our approach by exploring an enrichment culture of the globally relevant anaerobic ammonium-oxidizing bacterium Ca. Kuenenia stuttgartiensis. By doing so we resolved a remarkable array of oligosaccharides, produced by two entirely unrelated glycosylation machineries targeting the same surface-layer protein (SLP) simultaneously. More intriguingly, the investigated strain also accomplished modulation of highly specialized sugars, supposedly in response to its energy metabolism—the anaerobic oxidation of ammonium —which depends on the acquisition of substrates of opposite charge. Ultimately, we provide a systematic approach for the compositional exploration of prokaryotic protein glycosylation, and reveal for the first time a remarkable balance between maximising cellular protection through a complex array of oligosaccharides and adhering to the requirements of the ‘metabolic lifestyle’.

Download Full-text

A general approach to explore prokaryotic protein glycosylation reveals the unique surface layer modulation of an anammox bacterium

The ISME Journal ◽

10.1038/s41396-021-01073-y ◽

2021 ◽

Author(s):

Martin Pabst ◽

Denis S. Grouzdev ◽

Christopher E. Lawson ◽

Hugo B. C. Kleikamp ◽

Carol de Ram ◽

...

Keyword(s):

Surface Layer ◽

Protein Glycosylation ◽

Anammox Bacterium ◽

Prokaryotic Protein ◽

Unique Surface

Download Full-text

A primary human T-cell spectral library to facilitate large scale quantitative T-cell proteomics

Scientific Data ◽

10.1038/s41597-020-00744-3 ◽

2020 ◽

Vol 7 (1) ◽

Author(s):

Harshi Weerakoon ◽

Jeremy Potriquet ◽

Alok K. Shah ◽

Sarah Reed ◽

Buddhika Jayakody ◽

...

Keyword(s):

T Cell ◽

Large Scale ◽

Large Data ◽

Spectral Library ◽

Proteomics Data ◽

Cell Library ◽

Public Resource ◽

Human T Cell ◽

Theoretical Mass ◽

Current Resource

AbstractData independent analysis (DIA) exemplified by sequential window acquisition of all theoretical mass spectra (SWATH-MS) provides robust quantitative proteomics data, but the lack of a public primary human T-cell spectral library is a current resource gap. Here, we report the generation of a high-quality spectral library containing data for 4,833 distinct proteins from human T-cells across genetically unrelated donors, covering ~24% proteins of the UniProt/SwissProt reviewed human proteome. SWATH-MS analysis of 18 primary T-cell samples using the new human T-cell spectral library reliably identified and quantified 2,850 proteins at 1% false discovery rate (FDR). In comparison, the larger Pan-human spectral library identified and quantified 2,794 T-cell proteins in the same dataset. As the libraries identified an overlapping set of proteins, combining the two libraries resulted in quantification of 4,078 human T-cell proteins. Collectively, this large data archive will be a useful public resource for human T-cell proteomic studies. The human T-cell library is available at SWATHAtlas and the data are available via ProteomeXchange (PXD019446 and PXD019542) and PeptideAtlas (PASS01587).

Download Full-text

Vulnerability of individual fish to capture by trawling is influenced by capacity for anaerobic metabolism

Proceedings of The Royal Society B Biological Sciences ◽

10.1098/rspb.2015.0603 ◽

2015 ◽

Vol 282 (1813) ◽

pp. 20150603 ◽

Cited By ~ 27

Author(s):

Shaun S. Killen ◽

Julie J. H. Nati ◽

Cory D. Suski

Keyword(s):

Large Scale ◽

Swimming Performance ◽

Anaerobic Capacity ◽

Standard Metabolic Rate ◽

Metabolic Demand ◽

Evolutionary Forces ◽

Individual Fish ◽

Fundamental Information ◽

Behavioural Phenotypes ◽

And Function

The harvest of animals by humans may constitute one of the strongest evolutionary forces affecting wild populations. Vulnerability to harvest varies among individuals within species according to behavioural phenotypes, but we lack fundamental information regarding the physiological mechanisms underlying harvest-induced selection. It is unknown, for example, what physiological traits make some individual fish more susceptible to capture by commercial fisheries. Active fishing methods such as trawling pursue fish during harvest attempts, causing fish to use both aerobic steady-state swimming and anaerobic burst-type swimming to evade capture. Using simulated trawling procedures with schools of wild minnows Phoxinus phoxinus , we investigate two key questions to the study of fisheries-induced evolution that have been impossible to address using large-scale trawls: (i) are some individuals within a fish shoal consistently more susceptible to capture by trawling than others?; and (ii) if so, is this related to individual differences in swimming performance and metabolism? Results provide the first evidence of repeatable variation in susceptibility to trawling that is strongly related to anaerobic capacity and swimming ability. Maximum aerobic swim speed was also negatively correlated with vulnerability to trawling. Standard metabolic rate was highest among fish that were least vulnerable to trawling, but this relationship probably arose through correlations with anaerobic capacity. These results indicate that vulnerability to trawling is linked to anaerobic swimming performance and metabolic demand, drawing parallels with factors influencing susceptibility to natural predators. Selection on these traits by fisheries could induce shifts in the fundamental physiological makeup and function of descendent populations.

Download Full-text

Dynamics of Genome Architecture in Rhizobium sp. Strain NGR234

Journal of Bacteriology ◽

10.1128/jb.184.1.171-176.2002 ◽

2002 ◽

Vol 184 (1) ◽

pp. 171-176 ◽

Cited By ~ 62

Author(s):

Patrick Mavingui ◽

Margarita Flores ◽

Xianwu Guo ◽

Guillermo Dávila ◽

Xavier Perret ◽

...

Keyword(s):

Large Scale ◽

Insertion Sequence ◽

Biological Significance ◽

Genome Architecture ◽

Bacterial Genomes ◽

Symbiotic Plasmid ◽

Sequence Elements ◽

Dynamic Structures ◽

Genome Analyses ◽

Insertion Sequence Elements

ABSTRACT Bacterial genomes are usually partitioned in several replicons, which are dynamic structures prone to mutation and genomic rearrangements, thus contributing to genome evolution. Nevertheless, much remains to be learned about the origins and dynamics of the formation of bacterial alternative genomic states and their possible biological consequences. To address these issues, we have studied the dynamics of the genome architecture in Rhizobium sp. strain NGR234 and analyzed its biological significance. NGR234 genome consists of three replicons: the symbiotic plasmid pNGR234a (536,165 bp), the megaplasmid pNGR234b (>2,000 kb), and the chromosome (>3,700 kb). Here we report that genome analyses of cell siblings showed the occurrence of large-scale DNA rearrangements consisting of cointegrations and excisions between the three replicons. As a result, four new genomic architectures have emerged. Three consisted of the cointegrates between two replicons: chromosome-pNGR234a, chromosome-pNGR234b, and pNGR234a-pNGR234b. The other consisted of a cointegrate of the three replicons (chromosome-pNGR234a-pNGR234b). Cointegration and excision of pNGR234a with either the chromosome or pNGR234b were studied and found to proceed via a Campbell-type mechanism, mediated by insertion sequence elements. We provide evidence showing that changes in the genome architecture did not alter the growth and symbiotic proficiency of Rhizobium derivatives.

Download Full-text

Gene Expression Imputation with Generative Adversarial Imputation Nets

10.1101/2020.06.09.141689 ◽

2020 ◽

Author(s):

Ramon Viñas ◽

Tiago Azevedo ◽

Eric R. Gamazon ◽

Pietro Liò

Keyword(s):

Gene Expression ◽

Large Scale ◽

Biological Significance ◽

Predictive Performance ◽

Cost Effective ◽

Rna Seq ◽

Comprehensive Collection ◽

Genomic Studies ◽

Biological Discovery ◽

Cancer Types

AbstractA question of fundamental biological significance is to what extent the expression of a subset of genes can be used to recover the full transcriptome, with important implications for biological discovery and clinical application. To address this challenge, we present GAIN-GTEx, a method for gene expression imputation based on Generative Adversarial Imputation Networks. In order to increase the applicability of our approach, we leverage data from GTEx v8, a reference resource that has generated a comprehensive collection of transcriptomes from a diverse set of human tissues. We compare our model to several standard and state-of-the-art imputation methods and show that GAIN-GTEx is significantly superior in terms of predictive performance and runtime. Furthermore, our results indicate strong generalisation on RNA-Seq data from 3 cancer types across varying levels of missingness. Our work can facilitate a cost-effective integration of large-scale RNA biorepositories into genomic studies of disease, with high applicability across diverse tissue types.

Download Full-text

Denoising large-scale biological data using network filters

10.21203/rs.3.rs-66071/v2 ◽

2021 ◽

Author(s):

Andrew J Kavran ◽

Aaron Clauset

Keyword(s):

Large Scale ◽

Synthetic Data ◽

Interaction Network ◽

Learning Task ◽

Biological Data ◽

Data Sets ◽

Proteomics Data ◽

Life History Variation ◽

Wide Range ◽

Underlying Processes

Abstract Background: Large-scale biological data sets are often contaminated by noise, which can impede accurate inferences about underlying processes. Such measurement noise can arise from endogenous biological factors like cell cycle and life history variation, and from exogenous technical factors like sample preparation and instrument variation.Results: We describe a general method for automatically reducing noise in large-scale biological data sets. This method uses an interaction network to identify groups of correlated or anti-correlated measurements that can be combined or “ﬁltered” to better recover an underlying biological signal. Similar to the process of denoising an image, a single network ﬁlter may be applied to an entire system, or the system may be ﬁrst decomposed into distinct modules and a diﬀerent ﬁlter applied to each. Applied to synthetic data with known network structure and signal, network ﬁlters accurately reduce noise across a wide range of noise levels and structures. Applied to a machine learning task of predicting changes in human protein expression in healthy and cancerous tissues, network ﬁltering prior to training increases accuracy up to 43% compared to using unﬁltered data.Conclusions: Network ﬁlters are a general way to denoise biological data and can account for both correlation and anti-correlation between diﬀerent measurements. Furthermore, we ﬁnd that partitioning a network prior to ﬁltering can signiﬁcantly reduce errors in networks with heterogenous data and correlation patterns, and this approach outperforms existing diﬀusion based methods. Our results on proteomics data indicate the broad potential utility of network ﬁlters to applications in systems biology.

Download Full-text

obaDIA: one-step biological analysis pipeline for data-independent acquisition and other quantitative proteomics data

Bioinformatics ◽

10.1093/bioinformatics/btaa893 ◽

2020 ◽

Author(s):

Jun Yan ◽

Hongning Zhai ◽

Ling Zhu ◽

Sasha Sa ◽

Xiaojun Ding

Keyword(s):

Protein Level ◽

Quantitative Proteomics ◽

Differential Expression Analysis ◽

Biological Significance ◽

Enrichment Analysis ◽

Supplementary Information ◽

Proteomics Data ◽

Data Independent Acquisition ◽

One Step ◽

Automatic Tool

Abstract Motivation Data mining and data quality evaluation are indispensable constituents of quantitative proteomics, but few integrated tools available. Results We introduced obaDIA, a one-step pipeline to generate visualizable and comprehensive results for quantitative proteomics data. obaDIA supports fragment-level, peptide-level and protein-level abundance matrices from DIA technique, as well as protein-level abundance matrices from other quantitative proteomic techniques. The result contains abundance matrix statistics, differential expression analysis, protein functional annotation and enrichment analysis. Additionally, enrichment strategies which use total proteins or expressed proteins as background are optional, and HTML based interactive visualization for differentially expressed proteins in the KEGG pathway is offered, which helps biological significance mining. In short, obaDIA is an automatic tool for bioinformatics analysis for quantitative proteomics. Availability and implementation obaDIA is freely available from https://github.com/yjthu/obaDIA.git. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

MSpectraAI: a powerful platform for deciphering proteome profiling of multi-tumor mass spectrometry data by using deep neural networks

BMC Bioinformatics ◽

10.1186/s12859-020-03783-0 ◽

2020 ◽

Vol 21 (1) ◽

Author(s):

Shisheng Wang ◽

Hongwen Zhu ◽

Hu Zhou ◽

Jingqiu Cheng ◽

Hao Yang

Keyword(s):

Mass Spectrometry ◽

Neural Networks ◽

Large Scale ◽

Deep Neural Networks ◽

Spectral Feature ◽

Mass Spectrometry Data ◽

Learning Approaches ◽

Proteomics Data ◽

Proteome Profiling ◽

Analytical Technique

Abstract Background Mass spectrometry (MS) has become a promising analytical technique to acquire proteomics information for the characterization of biological samples. Nevertheless, most studies focus on the final proteins identified through a suite of algorithms by using partial MS spectra to compare with the sequence database, while the pattern recognition and classification of raw mass-spectrometric data remain unresolved. Results We developed an open-source and comprehensive platform, named MSpectraAI, for analyzing large-scale MS data through deep neural networks (DNNs); this system involves spectral-feature swath extraction, classification, and visualization. Moreover, this platform allows users to create their own DNN model by using Keras. To evaluate this tool, we collected the publicly available proteomics datasets of six tumor types (a total of 7,997,805 mass spectra) from the ProteomeXchange consortium and classified the samples based on the spectra profiling. The results suggest that MSpectraAI can distinguish different types of samples based on the fingerprint spectrum and achieve better prediction accuracy in MS1 level (average 0.967). Conclusion This study deciphers proteome profiling of raw mass spectrometry data and broadens the promising application of the classification and prediction of proteomics data from multi-tumor samples using deep learning methods. MSpectraAI also shows a better performance compared to the other classical machine learning approaches.

Download Full-text