sequence similarity networks
Recently Published Documents


TOTAL DOCUMENTS

27
(FIVE YEARS 14)

H-INDEX

7
(FIVE YEARS 2)

2020 ◽  
Vol 16 (12) ◽  
pp. e1007988
Author(s):  
Bojan Krtenic ◽  
Adrian Drazic ◽  
Thomas Arnesen ◽  
Nathalie Reuter

The enzymes of the GCN5-related N-acetyltransferase (GNAT) superfamily count more than 870 000 members through all kingdoms of life and share the same structural fold. GNAT enzymes transfer an acyl moiety from acyl coenzyme A to a wide range of substrates including aminoglycosides, serotonin, glucosamine-6-phosphate, protein N-termini and lysine residues of histones and other proteins. The GNAT subtype of protein N-terminal acetyltransferases (NATs) alone targets a majority of all eukaryotic proteins stressing the omnipresence of the GNAT enzymes. Despite the highly conserved GNAT fold, sequence similarity is quite low between members of this superfamily even when substrates are similar. Furthermore, this superfamily is phylogenetically not well characterized. Thus functional annotation based on sequence similarity is unreliable and strongly hampered for thousands of GNAT members that remain biochemically uncharacterized. Here we used sequence similarity networks to map the sequence space and propose a new classification for eukaryotic GNAT acetyltransferases. Using the new classification, we built a phylogenetic tree, representing the entire GNAT acetyltransferase superfamily. Our results show that protein NATs have evolved more than once on the GNAT acetylation scaffold. We use our classification to predict the function of uncharacterized sequences and verify by in vitro protein assays that two fungal genes encode NAT enzymes targeting specific protein N-terminal sequences, showing that even slight changes on the GNAT fold can lead to change in substrate specificity. In addition to providing a new map of the relationship between eukaryotic acetyltransferases the classification proposed constitutes a tool to improve functional annotation of GNAT acetyltransferases.


2020 ◽  
Vol 401 (12) ◽  
pp. 1389-1405
Author(s):  
Lars-Oliver Essen ◽  
Marian Samuel Vogt ◽  
Hans-Ulrich Mösch

AbstractSelective adhesion of fungal cells to one another and to foreign surfaces is fundamental for the development of multicellular growth forms and the successful colonization of substrates and host organisms. Accordingly, fungi possess diverse cell wall-associated adhesins, mostly large glycoproteins, which present N-terminal adhesion domains at the cell surface for ligand recognition and binding. In order to function as robust adhesins, these glycoproteins must be covalently linkedto the cell wall via C-terminal glycosylphosphatidylinositol (GPI) anchors by transglycosylation. In this review, we summarize the current knowledge on the structural and functional diversity of so far characterized protein families of adhesion domains and set it into a broad context by an in-depth bioinformatics analysis using sequence similarity networks. In addition, we discuss possible mechanisms for the membrane-to-cell wall transfer of fungal adhesins by membrane-anchored Dfg5 transglycosidases.


2020 ◽  
Author(s):  
Sourav Biswas ◽  
Suparna Saha ◽  
Sanghamitra Bandyopadhyay ◽  
Malay Bhattacharyya

AbstractWith an increasing number of SARS-CoV-2 sequences available day by day, new genomic information is getting revealed to us. As SARS-CoV-2 sequences highlight wide changes across the samples, we aim to explore whether these changes reveal the geographical origin of the corresponding samples. The k-mer distributions, denoting normalized frequency counts of all possible combinations of nucleotide of size upto k, are often helpful to explore sequence level patterns. Given the SARS-CoV-2 sequences are highly imbalanced by its geographical origin (relatively with a higher number samples collected from the USA), we observe that with proper under-sampling k-mer distributions in the SARS-CoV-2 sequences predict its geographical origin with more than 90% accuracy. The experiments are performed on the samples collected from six countries with maximum number of sequences available till July 07, 2020. This comprises SARS-CoV-2 sequences from Australia, USA, China, India, Greece and France. Moreover, we demonstrate that the changes of genomic sequences characterize the continents as a whole. We also highlight that the network motifs present in the sequence similarity networks have a significant difference across the said countries. This, as a whole, is capable of predicting the geographical shift of SARS-CoV-2.


2020 ◽  
Author(s):  
Neil L Grenade ◽  
Dragos S. Chiriac ◽  
Graeme W. Howe ◽  
Avena Ross

Bacterial natural products are an immensely valuable source of therapeutics. As modern DNA sequencing efforts provide increasing numbers of microbial genomes, it is clear that the molecules produced by most natural product biosynthetic gene clusters (BGCs) remain unknown. Genome mining makes use of bioinformatic techniques to elucidate the natural products produced by these “orphan” BGCs. Here, we report the use of sequence similarity networks (SSNs) and genome neighborhood networks (GNNs) to identify an orphan BGC that is responsible for the production of the antitumor tambjamine BE-18591 in Streptomyces albus NRRL B-2362. Although BE-18591 is a close structural analogue of tambjamine YP1 produced by Pseudoalteromonas tunicata, the biosynthetic routes to produce these molecules differ significantly. Notably, the C12-alkylamine tail that is appended onto the bipyrrole core of tambjamine YP1 is derived from fatty acids siphoned from the primary metabolism of the pseudoalteromonad, whilst the S. albus NRRL B-2362 BGC encodes a dedicated system for the de novo biosynthesis of the alkylamine portion of tambjamine BE-18591. These remarkably different biosynthetic strategies represent a striking example of convergent BGC evolution, with selective pressure for the production of tambjamines seemingly leading to the emergence of separate biosynthetic pathways in pseudoalteromonads and streptomycetes that ultimately produce closely related compounds


2020 ◽  
Author(s):  
Neil L Grenade ◽  
Dragos S. Chiriac ◽  
Graeme W. Howe ◽  
Avena Ross

Bacterial natural products are an immensely valuable source of therapeutics. As modern DNA sequencing efforts provide increasing numbers of microbial genomes, it is clear that the molecules produced by most natural product biosynthetic gene clusters (BGCs) remain unknown. Genome mining makes use of bioinformatic techniques to elucidate the natural products produced by these “orphan” BGCs. Here, we report the use of sequence similarity networks (SSNs) and genome neighborhood networks (GNNs) to identify an orphan BGC that is responsible for the production of the antitumor tambjamine BE-18591 in Streptomyces albus NRRL B-2362. Although BE-18591 is a close structural analogue of tambjamine YP1 produced by Pseudoalteromonas tunicata, the biosynthetic routes to produce these molecules differ significantly. Notably, the C12-alkylamine tail that is appended onto the bipyrrole core of tambjamine YP1 is derived from fatty acids siphoned from the primary metabolism of the pseudoalteromonad, whilst the S. albus NRRL B-2362 BGC encodes a dedicated system for the de novo biosynthesis of the alkylamine portion of tambjamine BE-18591. These remarkably different biosynthetic strategies represent a striking example of convergent BGC evolution, with selective pressure for the production of tambjamines seemingly leading to the emergence of separate biosynthetic pathways in pseudoalteromonads and streptomycetes that ultimately produce closely related compounds


2020 ◽  
Vol 36 (9) ◽  
pp. 2740-2749
Author(s):  
Henry Xing ◽  
Steven W Kembel ◽  
Vladimir Makarenkov

Abstract Motivation Phylogenetic trees and the methods for their analysis have played a key role in many evolutionary, ecological and bioinformatics studies. Alternatively, phylogenetic networks have been widely used to analyze and represent complex reticulate evolutionary processes which cannot be adequately studied using traditional phylogenetic methods. These processes include, among others, hybridization, horizontal gene transfer, and genetic recombination. Nowadays, sequence similarity and genome similarity networks have become an efficient tool for community analysis of large molecular datasets in comparative studies. These networks can be used for tackling a variety of complex evolutionary problems such as the identification of horizontal gene transfer events, the recovery of mosaic genes and genomes, and the study of holobionts. Results The shortest path in a phylogenetic tree is used to estimate evolutionary distances between species. We show how the shortest path concept can be extended to sequence similarity networks by defining five new distances, NetUniFrac, Spp, Spep, Spelp and Spinp, and the Transfer index, between species communities present in the network. These new distances can be seen as network analogs of the traditional UniFrac distance used to assess dissimilarity between species communities in a phylogenetic tree, whereas the Transfer index is intended for estimating the rate and direction of gene transfers, or species dispersal, between different phylogenetic, or ecological, species communities. Moreover, NetUniFrac and the Transfer index can be computed in linear time with respect to the number of edges in the network. We show how these new measures can be used to analyze microbiota and antibiotic resistance gene similarity networks. Availability and implementation Our NetFrac program, implemented in R and C, along with its source code, is freely available on Github at the following URL address: https://github.com/XPHenry/Netfrac. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Tyler Stack ◽  
Katelyn Morrison ◽  
Thomas Dettmer ◽  
Brendan Wille ◽  
Chan Kim ◽  
...  

<p>L-Ascorbate (vitamin C) is ubiquitous in both our diet and the environment. <i>Ralstonia eutropha </i>H16 (<i>Cupriavidus necator </i>ATCC 17699) uses L-ascorbate as sole carbon source but lacks the genes encoding the known catabolic pathways. RNAseq identified eight candidate catabolic genes. Sequence similarity networks and genome neighborhood networks guided predictions for function of the encoded proteins; the predictions were confirmed by <i>in vitro</i> assays and <i>in vivo</i> growth phenotypes of gene deletion mutants. L-Ascorbate, a lactone, is oxidized and ring-opened by enzymes in the cytochrome b<sub>561</sub> and gluconolactonase families, respectively, to form 2,3-diketo-L-gulonate. A protein predicted to have a WD40-like fold catalyzes an unprecedented benzilic acid rearrangement involving migration of a carboxylate group to form 2-carboxy-L-lyxonolactone; the lactone is hydrolyzed by a member of the amidohydrolase superfamily to yield 2-carboxy-L-lyxonate. A member of the PdxA family of oxidative decarboxylases catalyzes a novel decarboxylation that uses NAD<sup>+</sup> catalytically. The product, L-lyxonate, is catabolized to alpha-ketoglutarate by a previously characterized pathway.</p>


2019 ◽  
Author(s):  
Tyler Stack ◽  
Katelyn Morrison ◽  
Thomas Dettmer ◽  
Brendan Wille ◽  
Chan Kim ◽  
...  

<p>L-Ascorbate (vitamin C) is ubiquitous in both our diet and the environment. <i>Ralstonia eutropha </i>H16 (<i>Cupriavidus necator </i>ATCC 17699) uses L-ascorbate as sole carbon source but lacks the genes encoding the known catabolic pathways. RNAseq identified eight candidate catabolic genes. Sequence similarity networks and genome neighborhood networks guided predictions for function of the encoded proteins; the predictions were confirmed by <i>in vitro</i> assays and <i>in vivo</i> growth phenotypes of gene deletion mutants. L-Ascorbate, a lactone, is oxidized and ring-opened by enzymes in the cytochrome b<sub>561</sub> and gluconolactonase families, respectively, to form 2,3-diketo-L-gulonate. A protein predicted to have a WD40-like fold catalyzes an unprecedented benzilic acid rearrangement involving migration of a carboxylate group to form 2-carboxy-L-lyxonolactone; the lactone is hydrolyzed by a member of the amidohydrolase superfamily to yield 2-carboxy-L-lyxonate. A member of the PdxA family of oxidative decarboxylases catalyzes a novel decarboxylation that uses NAD<sup>+</sup> catalytically. The product, L-lyxonate, is catabolized to alpha-ketoglutarate by a previously characterized pathway.</p>


2019 ◽  
Vol 201 (23) ◽  
Author(s):  
Ariana Umaña ◽  
Blake E. Sanders ◽  
Christopher C. Yoo ◽  
Michael A. Casasanta ◽  
Barath Udayasuryan ◽  
...  

ABSTRACT Fusobacterium spp. are Gram-negative, anaerobic, opportunistic pathogens involved in multiple diseases, including a link between the oral pathogen Fusobacterium nucleatum and the progression and severity of colorectal cancer. The identification and characterization of virulence factors in the genus Fusobacterium has been greatly hindered by a lack of properly assembled and annotated genomes. Using newly completed genomes from nine strains and seven species of Fusobacterium, we report the identification and corrected annotation of verified and potential virulence factors from the type 5 secreted autotransporter, FadA, and MORN2 protein families, with a focus on the genetically tractable strain F. nucleatum subsp. nucleatum ATCC 23726 and type strain F. nucleatum subsp. nucleatum ATCC 25586. Within the autotransporters, we used sequence similarity networks to identify protein subsets and show a clear differentiation between the prediction of outer membrane adhesins, serine proteases, and proteins with unknown function. These data have identified unique subsets of type 5a autotransporters, which are key proteins associated with virulence in F. nucleatum. However, we coupled our bioinformatic data with bacterial binding assays to show that a predicted weakly invasive strain of F. necrophorum that lacks a Fap2 autotransporter adhesin strongly binds human colonocytes. These analyses confirm a gap in our understanding of how autotransporters, MORN2 domain proteins, and FadA adhesins contribute to host interactions and invasion. In summary, we identify candidate virulence genes in Fusobacterium, and caution that experimental validation of host-microbe interactions should complement bioinformatic predictions to increase our understanding of virulence protein contributions in Fusobacterium infections and disease. IMPORTANCE Fusobacterium spp. are emerging pathogens that contribute to mammalian and human diseases, including colorectal cancer. Despite a validated connection with disease, few proteins have been characterized that define a direct molecular mechanism for Fusobacterium pathogenesis. We report a comprehensive examination of virulence-associated protein families in multiple Fusobacterium species and show that complete genomes facilitate the correction and identification of multiple, large type 5a secreted autotransporter genes in previously misannotated or fragmented genomes. In addition, we use protein sequence similarity networks and human cell interaction experiments to show that previously predicted noninvasive strains can indeed bind to and potentially invade human cells and that this could be due to the expansion of specific virulence proteins that drive Fusobacterium infections and disease.


Genes ◽  
2019 ◽  
Vol 10 (9) ◽  
pp. 648
Author(s):  
Yaqing Ou ◽  
James O. McInerney

The formation of new genes by combining parts of existing genes is an important evolutionary process. Remodelled genes, which we call composites, have been investigated in many species, however, their distribution across all of life is still unknown. We set out to examine the extent to which genomes from cells and mobile genetic elements contain composite genes. We identify composite genes as those that show partial homology to at least two unrelated component genes. In order to identify composite and component genes, we constructed sequence similarity networks (SSNs) of more than one million genes from all three domains of life, as well as viruses and plasmids. We identified non-transitive triplets of nodes in this network and explored the homology relationships in these triplets to see if the middle nodes were indeed composite genes. In total, we identified 221,043 (18.57%) composites genes, which were distributed across all genomic and functional categories. In particular, the presence of composite genes is statistically more likely in eukaryotes than prokaryotes.


Sign in / Sign up

Export Citation Format

Share Document