dbCAN-PUL: a database of experimentally characterized CAZyme gene clusters and their substrates

Abstract PULs (polysaccharide utilization loci) are discrete gene clusters of CAZymes (Carbohydrate Active EnZymes) and other genes that work together to digest and utilize carbohydrate substrates. While PULs have been extensively characterized in Bacteroidetes, there exist PULs from other bacterial phyla, as well as archaea and metagenomes, that remain to be catalogued in a database for efficient retrieval. We have developed an online database dbCAN-PUL (http://bcb.unl.edu/dbCAN_PUL/) to display experimentally verified CAZyme-containing PULs from literature with pertinent metadata, sequences, and annotation. Compared to other online CAZyme and PUL resources, dbCAN-PUL has the following new features: (i) Batch download of PUL data by target substrate, species/genome, genus, or experimental characterization method; (ii) Annotation for each PUL that displays associated metadata such as substrate(s), experimental characterization method(s) and protein sequence information, (iii) Links to external annotation pages for CAZymes (CAZy), transporters (UniProt) and other genes, (iv) Display of homologous gene clusters in GenBank sequences via integrated MultiGeneBlast tool and (v) An integrated BLASTX service available for users to query their sequences against PUL proteins in dbCAN-PUL. With these features, dbCAN-PUL will be an important repository for CAZyme and PUL research, complementing our other web servers and databases (dbCAN2, dbCAN-seq).

Download Full-text

Sift-PULs: A public repository for specific functional polysaccharide utilization loci

10.1101/2021.08.04.455021 ◽

2021 ◽

Author(s):

Tao Song ◽

Congchong Wei ◽

Dezhi Yuan ◽

Shengwei Xiang ◽

Lin Liu ◽

...

Keyword(s):

Biochemical Characterization ◽

Gene Clusters ◽

Public Repository ◽

Function Annotation ◽

Bacterial Gene ◽

Bacterial Phyla ◽

Signature Genes ◽

Specific Polysaccharide ◽

Encoding Genes ◽

Polysaccharide Utilization Loci

Background Polysaccharide utilization loci (PULs) were bacterial gene clusters encoding genes responsible for polysaccharide utilization process. PUL studies are blooming in recent years but the biochemical characterization speed is relative slow. There is a growing demand for PUL database with function annotations. Results Using signature genes corresponding for specific polysaccharide, 10422 PULs specific for 6 polysaccharides (agar, alginate, pectin, carrageenan, chitin and β-manan) from various bacterial phyla were predicted. Then online website of specific functional polysaccharide utilization loci (Sift-PULs) was constructed. Sift-PULs provides a repository where users could browse, search and download interested PULs without registration. Conclusions The key advantage of Sift-PULs is to assign a function annotation of each PUL, which is not available in existing PUL databases. PUL's functional annotation lays a foundation for studying novel enzymes, new pathways, PUL evolution or bioengineering. The website is available on http://sift-puls.org

Download Full-text

Identification, heterologous production and bioactivity of lentinulin A and dendrothelin A, two natural variants of backbone N-methylated peptide macrocycle omphalotin A

Scientific Reports ◽

10.1038/s41598-021-83106-2 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Emmanuel Matabaro ◽

Hannelore Kaspar ◽

Paul Dahlin ◽

Daniel L. V. Bader ◽

Claudia E. Murar ◽

...

Keyword(s):

Natural Products ◽

Lentinula Edodes ◽

Gene Clusters ◽

Biosynthetic Gene Cluster ◽

Precursor Protein ◽

Fungal Genome ◽

Homologous Gene ◽

C Terminus ◽

Natural Variants ◽

Recombinant Peptides

AbstractBackbone N-methylation and macrocyclization improve the pharmacological properties of peptides by enhancing their proteolytic stability, membrane permeability and target selectivity. Borosins are backbone N-methylated peptide macrocycles derived from a precursor protein which contains a peptide α-N-methyltransferase domain autocatalytically modifying the core peptide located at its C-terminus. Founding members of borosins are the omphalotins from the mushroom Omphalotus olearius (omphalotins A-I) with nine out of 12 L-amino acids being backbone N-methylated. The omphalotin biosynthetic gene cluster codes for the precursor protein OphMA, the protease prolyloligopeptidase OphP and other proteins that are likely to be involved in other post-translational modifications of the peptide. Mining of available fungal genome sequences revealed the existence of highly homologous gene clusters in the basidiomycetes Lentinula edodes and Dendrothele bispora. The respective borosins, referred to as lentinulins and dendrothelins are naturally produced by L. edodes and D. bispora as shown by analysis of respective mycelial extracts. We produced all three homologous peptide natural products by coexpression of OphMA hybrid proteins and OphP in the yeast Pichia pastoris. The recombinant peptides differ in their nematotoxic activity against the plant pathogen Meloidogyne incognita. Our findings pave the way for the production of borosin peptide natural products and their potential application as novel biopharmaceuticals and biopesticides.

Download Full-text

Selection of Tree Nut Allergen Peptide Markers: A Need for Improved Protein Sequence Databases

Journal of AOAC International ◽

10.1093/jaoac/102.5.1263 ◽

2019 ◽

Vol 102 (5) ◽

pp. 1263-1270 ◽

Cited By ~ 1

Author(s):

Weili Xiong ◽

Melinda A McFarland ◽

Cary Pirone ◽

Christine H Parker

Keyword(s):

Food Allergen ◽

Protein Sequence ◽

Sequence Information ◽

Sequencing Data ◽

Reference Tree ◽

Candidate Peptide ◽

Tree Nut ◽

Allergen Detection ◽

Sequence Databases ◽

Selection Of

Abstract Background: To effectively safeguard the food-allergic population and support compliance with food-labeling regulations, the food industry and regulatory agencies require reliable methods for food allergen detection and quantification. MS-based detection of food allergens relies on the systematic identification of robust and selective target peptide markers. The selection of proteotypic peptide markers, however, relies on the availability of high-quality protein sequence information, a bottleneck for the analysis of many plant-based proteomes. Method: In this work, data were compiled for reference tree nut ingredients and evaluated using a parsimony-driven global proteomics workflow. Results: The utility of supplementing existing incomplete protein sequence databases with translated genomic sequencing data was evaluated for English walnut and provided enhanced selection of candidate peptide markers and differentiation between closely related species. Highlights: Future improvements of protein databases and release of genomics-derived sequences are expected to facilitate the development of robust and harmonized LC–tandem MS-based methods for food allergen detection.

Download Full-text

Biofilm Formation in Pseudomonas aeruginosa: Fimbrial cup Gene Clusters Are Controlled by the Transcriptional Regulator MvaT

Journal of Bacteriology ◽

10.1128/jb.186.9.2880-2890.2004 ◽

2004 ◽

Vol 186 (9) ◽

pp. 2880-2890 ◽

Cited By ~ 102

Author(s):

Isabelle Vallet ◽

Stephen P. Diggle ◽

Rachael E. Stacey ◽

Miguel Cámara ◽

Isabelle Ventre ◽

...

Keyword(s):

Pseudomonas Aeruginosa ◽

Biofilm Formation ◽

Dna Microarrays ◽

Transcriptional Profiling ◽

Bacterial Pathogen ◽

Gene Clusters ◽

Northern Blotting ◽

Negative Regulator ◽

Regulatory Control ◽

Homologous Gene

ABSTRACT Pseudomonas aeruginosa is an opportunistic bacterial pathogen which poses a major threat to long-term-hospitalized patients and individuals with cystic fibrosis. The capacity of P. aeruginosa to form biofilms is an important requirement for chronic colonization of human tissues and for persistence in implanted medical devices. Various stages of biofilm formation by this organism are mediated by extracellular appendages, such as type IV pili and flagella. Recently, we identified three P. aeruginosa gene clusters that were termed cup (chaperone-usher pathway) based on their sequence relatedness to the chaperone-usher fimbrial assembly pathway in other bacteria. The cupA gene cluster, but not the cupB or cupC cluster, is required for biofilm formation on abiotic surfaces. In this study, we identified a gene (mvaT) encoding a negative regulator of cupA expression. Such regulatory control was confirmed by several approaches, including lacZ transcriptional fusions, Northern blotting, and transcriptional profiling using DNA microarrays. MvaT also represses the expression of the cupB and cupC genes, although the extent of the regulatory effect is not as pronounced as with cupA. Consistent with this finding, mvaT mutants exhibit enhanced biofilm formation. Although the P. aeruginosa genome contains a highly homologous gene, mvaU, the repression of cupA genes is MvaT specific. Thus, MvaT appears to be an important regulatory component within a complex network that controls biofilm formation and maturation in P. aeruginosa.

Download Full-text

Microbial characterisation and Cold-Adapted Predicted Protein (CAPP) database construction from the active layer of Greenland's permafrost

FEMS Microbiology Ecology ◽

10.1093/femsec/fiab127 ◽

2021 ◽

Author(s):

Gilda Varliero ◽

Muhammad Rafiq ◽

Swati Singh ◽

Annabel Summerfield ◽

Fotis Sgouridis ◽

...

Keyword(s):

Active Layer ◽

Sequence Variation ◽

Gene Clusters ◽

Taxonomic Composition ◽

Sequence Information ◽

Biosynthetic Gene Clusters ◽

Medical Settings ◽

Cold Adapted ◽

Database Construction ◽

Medium Quality

Abstract Permafrost represents a reservoir for the biodiscovery of cold-adapted proteins which are advantageous in industrial and medical settings. Comparisons between different thermo-adapted proteins can give important information for cold-adaptation bioengineering. We collected permafrost active layer samples from 34 points along a proglacial transect in southwest Greenland. We obtained a deep read coverage assembly (>164x) from nanopore and Illumina sequences for the purposes of i) analysing metagenomic and metatranscriptomic trends of the microbial community of this area, and ii) creating the Cold-Adapted Predicted Protein (CAPP) database. The community showed a similar taxonomic composition in all samples along the transect, with a solid permafrost-shaped community, rather than microbial trends typical of proglacial systems. We retrieved 69 high- and medium-quality metagenome-assembled clusters, 213 complete biosynthetic gene clusters and more than three million predicted proteins. The latter constitute the CAPP database that can provide cold-adapted protein sequence information for protein- and taxon-focused amino acid sequence modifications for the future bioengineering of cold-adapted enzymes. As an example, we focused on the enzyme polyphenol oxidase, and demonstrated how sequence variation information could inform its protein engineering.

Download Full-text

Deciphering protein sequence information through hydrophobic cluster analysis (HCA): current status and perspectives

Cellular and Molecular Life Sciences ◽

10.1007/s000180050082 ◽

1997 ◽

Vol 53 (8) ◽

pp. 621-645 ◽

Cited By ~ 338

Author(s):

I. Callebaut ◽

G. Labesse ◽

P. Durand ◽

A. Poupon ◽

L. Canard ◽

...

Keyword(s):

Cluster Analysis ◽

Protein Sequence ◽

Current Status ◽

Sequence Information ◽

Hydrophobic Cluster Analysis ◽

Hydrophobic Cluster

Download Full-text

Multimodal deep representation learning for protein interaction identification and protein family classification

BMC Bioinformatics ◽

10.1186/s12859-019-3084-y ◽

2019 ◽

Vol 20 (S16) ◽

Cited By ~ 4

Author(s):

Da Zhang ◽

Mansur Kabuka

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Representation Learning ◽

Superior Performance ◽

Sequence Information ◽

Protein Protein Interactions ◽

Learning Framework ◽

Topological Features ◽

Ppi Networks ◽

Ppi Prediction

Abstract Background Protein-protein interactions(PPIs) engage in dynamic pathological and biological procedures constantly in our life. Thus, it is crucial to comprehend the PPIs thoroughly such that we are able to illuminate the disease occurrence, achieve the optimal drug-target therapeutic effect and describe the protein complex structures. However, compared to the protein sequences obtainable from various species and organisms, the number of revealed protein-protein interactions is relatively limited. To address this dilemma, lots of research endeavor have investigated in it to facilitate the discovery of novel PPIs. Among these methods, PPI prediction techniques that merely rely on protein sequence data are more widespread than other methods which require extensive biological domain knowledge. Results In this paper, we propose a multi-modal deep representation learning structure by incorporating protein physicochemical features with the graph topological features from the PPI networks. Specifically, our method not only bears in mind the protein sequence information but also discerns the topological representations for each protein node in the PPI networks. In our paper, we construct a stacked auto-encoder architecture together with a continuous bag-of-words (CBOW) model based on generated metapaths to study the PPI predictions. Following by that, we utilize the supervised deep neural networks to identify the PPIs and classify the protein families. The PPI prediction accuracy for eight species ranged from 96.76% to 99.77%, which signifies that our multi-modal deep representation learning framework achieves superior performance compared to other computational methods. Conclusion To the best of our knowledge, this is the first multi-modal deep representation learning framework for examining the PPI networks.

Download Full-text

Hox and HOM: Homologous gene clusters in insects and vertebrates

Cell ◽

10.1016/0092-8674(89)90909-4 ◽

1989 ◽

Vol 57 (3) ◽

pp. 347-349 ◽

Cited By ~ 272

Author(s):

Michael Akam

Keyword(s):

Gene Clusters ◽

Homologous Gene

Download Full-text

Robust and accurate prediction of protein–protein interactions by exploiting evolutionary information

Scientific Reports ◽

10.1038/s41598-021-96265-z ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Yang Li ◽

Zheng Wang ◽

Li-Ping Li ◽

Zhu-Hong You ◽

Wen-Zhun Huang ◽

...

Keyword(s):

Protein Interactions ◽

Protein Sequence ◽

Large Scale ◽

False Positive Rate ◽

Computational Method ◽

Evolutionary Information ◽

Local Alignment ◽

Protein Interaction Data ◽

Sequence Information ◽

Protein Protein Interactions

AbstractVarious biochemical functions of organisms are performed by protein–protein interactions (PPIs). Therefore, recognition of protein–protein interactions is very important for understanding most life activities, such as DNA replication and transcription, protein synthesis and secretion, signal transduction and metabolism. Although high-throughput technology makes it possible to generate large-scale PPIs data, it requires expensive cost of both time and labor, and leave a risk of high false positive rate. In order to formulate a more ingenious solution, biology community is looking for computational methods to quickly and efficiently discover massive protein interaction data. In this paper, we propose a computational method for predicting PPIs based on a fresh idea of combining orthogonal locality preserving projections (OLPP) and rotation forest (RoF) models, using protein sequence information. Specifically, the protein sequence is first converted into position-specific scoring matrices (PSSMs) containing protein evolutionary information by using the Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST). Then we characterize a protein as a fixed length feature vector by applying OLPP to PSSMs. Finally, we train an RoF classifier for the purpose of identifying non-interacting and interacting protein pairs. The proposed method yielded a significantly better results than existing methods, with 90.07% and 96.09% prediction accuracy on Yeast and Human datasets. Our experiment show the proposed method can serve as a useful tool to accelerate the process of solving key problems in proteomics.

Download Full-text

Bakterielle Mechanismen der marinen Polysaccharidverwertung

BIOspektrum ◽

10.1007/s12268-020-1489-9 ◽

2020 ◽

Vol 26 (7) ◽

pp. 800-802

Author(s):

Thomas Schweder ◽

Uwe Bornscheuer ◽

Jan-Hendrik Hehemann ◽

Rudolf Amann

Keyword(s):

Algal Biomass ◽

Major Part ◽

Gene Clusters ◽

Carbohydrate Active Enzymes ◽

High Productivity ◽

Rapid Turnover ◽

Conserved Gene ◽

Complex Polymer

AbstractThe oceans have been compared to a “global heterotrophic digester”. This is due to the high productivity of microalgae and the rapid turnover of the produced biomass by microbes. A major part of the algal biomass consists of diverse polysaccharides which belong to the most complex polymer structures in nature. These marine sugars are decomposed by specialized bacteria, mainly of the phyla Bacteroidetes and Gammaproteobacteria, which possess dedicated conserved gene clusters encoding a remarkable diversity of carbohydrate-active enzymes.

Download Full-text