Transfer index, NetUniFrac and some useful shortest path-based distances for community analysis in sequence similarity networks

Henry Xing; Steven W Kembel; Vladimir Makarenkov

doi:10.1093/bioinformatics/btaa043

Transfer index, NetUniFrac and some useful shortest path-based distances for community analysis in sequence similarity networks

Bioinformatics ◽

10.1093/bioinformatics/btaa043 ◽

2020 ◽

Vol 36 (9) ◽

pp. 2740-2749

Author(s):

Henry Xing ◽

Steven W Kembel ◽

Vladimir Makarenkov

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Phylogenetic Tree ◽

Shortest Path ◽

Sequence Similarity ◽

Community Analysis ◽

Supplementary Information ◽

Similarity Networks ◽

Transfer Index ◽

Sequence Similarity Networks

Abstract Motivation Phylogenetic trees and the methods for their analysis have played a key role in many evolutionary, ecological and bioinformatics studies. Alternatively, phylogenetic networks have been widely used to analyze and represent complex reticulate evolutionary processes which cannot be adequately studied using traditional phylogenetic methods. These processes include, among others, hybridization, horizontal gene transfer, and genetic recombination. Nowadays, sequence similarity and genome similarity networks have become an efficient tool for community analysis of large molecular datasets in comparative studies. These networks can be used for tackling a variety of complex evolutionary problems such as the identification of horizontal gene transfer events, the recovery of mosaic genes and genomes, and the study of holobionts. Results The shortest path in a phylogenetic tree is used to estimate evolutionary distances between species. We show how the shortest path concept can be extended to sequence similarity networks by defining five new distances, NetUniFrac, Spp, Spep, Spelp and Spinp, and the Transfer index, between species communities present in the network. These new distances can be seen as network analogs of the traditional UniFrac distance used to assess dissimilarity between species communities in a phylogenetic tree, whereas the Transfer index is intended for estimating the rate and direction of gene transfers, or species dispersal, between different phylogenetic, or ecological, species communities. Moreover, NetUniFrac and the Transfer index can be computed in linear time with respect to the number of edges in the network. We show how these new measures can be used to analyze microbiota and antibiotic resistance gene similarity networks. Availability and implementation Our NetFrac program, implemented in R and C, along with its source code, is freely available on Github at the following URL address: https://github.com/XPHenry/Netfrac. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PPIT: an R package for inferring microbial taxonomy from nifH sequences

Bioinformatics ◽

10.1093/bioinformatics/btab100 ◽

2021 ◽

Author(s):

Bennett J Kapili ◽

Anne E Dekas

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Query Sequence ◽

Marker Gene ◽

R Package ◽

Supplementary Information ◽

Marker Genes ◽

Pairwise Identity ◽

Metabolic Marker ◽

Microbial Taxonomy

Abstract Motivation Linking microbial community members to their ecological functions is a central goal of environmental microbiology. When assigned taxonomy, amplicon sequences of metabolic marker genes can suggest such links, thereby offering an overview of the phylogenetic structure underpinning particular ecosystem functions. However, inferring microbial taxonomy from metabolic marker gene sequences remains a challenge, particularly for the frequently sequenced nitrogen fixation marker gene, nitrogenase reductase (nifH). Horizontal gene transfer in recent nifH evolutionary history can confound taxonomic inferences drawn from the pairwise identity methods used in existing software. Other methods for inferring taxonomy are not standardized and require manual inspection that is difficult to scale. Results We present Phylogenetic Placement for Inferring Taxonomy (PPIT), an R package that infers microbial taxonomy from nifH amplicons using both phylogenetic and sequence identity approaches. After users place query sequences on a reference nifH gene tree provided by PPIT (n = 6317 full-length nifH sequences), PPIT searches the phylogenetic neighborhood of each query sequence and attempts to infer microbial taxonomy. An inference is drawn only if references in the phylogenetic neighborhood are: (1) taxonomically consistent and (2) share sufficient pairwise identity with the query, thereby avoiding erroneous inferences due to known horizontal gene transfer events. We find that PPIT returns a higher proportion of correct taxonomic inferences than BLAST-based approaches at the cost of fewer total inferences. We demonstrate PPIT on deep-sea sediment and find that Deltaproteobacteria are the most abundant potential diazotrophs. Using this dataset we show that emending PPIT inferences based on visual inspection of query sequence placement can achieve taxonomic inferences for nearly all sequences in a query set. We additionally discuss how users can apply PPIT to the analysis of other marker genes. Availability PPIT is freely available to non-commercial users at https://github.com/bkapili/ppit. Installation includes a vignette that demonstrates package use and reproduces the nifH amplicon analysis discussed here. The raw nifH amplicon sequence data have been deposited in the GenBank, EMBL, and DDBJ databases under BioProject number PRJEB37167. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Diversity of GPI-anchored fungal adhesins

Biological Chemistry ◽

10.1515/hsz-2020-0199 ◽

2020 ◽

Vol 401 (12) ◽

pp. 1389-1405

Author(s):

Lars-Oliver Essen ◽

Marian Samuel Vogt ◽

Hans-Ulrich Mösch

Keyword(s):

Cell Wall ◽

Cell Surface ◽

Bioinformatics Analysis ◽

Current Knowledge ◽

Sequence Similarity ◽

Growth Forms ◽

Similarity Networks ◽

Fungal Adhesins ◽

Sequence Similarity Networks ◽

Successful Colonization

AbstractSelective adhesion of fungal cells to one another and to foreign surfaces is fundamental for the development of multicellular growth forms and the successful colonization of substrates and host organisms. Accordingly, fungi possess diverse cell wall-associated adhesins, mostly large glycoproteins, which present N-terminal adhesion domains at the cell surface for ligand recognition and binding. In order to function as robust adhesins, these glycoproteins must be covalently linkedto the cell wall via C-terminal glycosylphosphatidylinositol (GPI) anchors by transglycosylation. In this review, we summarize the current knowledge on the structural and functional diversity of so far characterized protein families of adhesion domains and set it into a broad context by an in-depth bioinformatics analysis using sequence similarity networks. In addition, we discuss possible mechanisms for the membrane-to-cell wall transfer of fungal adhesins by membrane-anchored Dfg5 transglycosidases.

Download Full-text

Community detection in sequence similarity networks based on attribute clustering

PLoS ONE ◽

10.1371/journal.pone.0178650 ◽

2017 ◽

Vol 12 (7) ◽

pp. e0178650

Author(s):

Janamejaya Chowdhary ◽

Frank E. Löffler ◽

Jeremy C. Smith

Keyword(s):

Community Detection ◽

Sequence Similarity ◽

Similarity Networks ◽

Sequence Similarity Networks ◽

Attribute Clustering

Download Full-text

Bap-dependent biofilm formation by pathogenic species of Staphylococcus: evidence of horizontal gene transfer?

Microbiology ◽

10.1099/mic.0.27865-0 ◽

2005 ◽

Vol 151 (7) ◽

pp. 2465-2475 ◽

Cited By ~ 168

Author(s):

M. Ángeles Tormo ◽

Erwin Knecht ◽

Friedrich Götz ◽

Iñigo Lasa ◽

José R. Penadés

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Biofilm Formation ◽

Sequence Similarity ◽

Surface Protein ◽

Pathogenicity Island ◽

Alternative Mechanism ◽

Coagulase Negative Staphylococci ◽

High Sequence Similarity ◽

Staphylococcus Simulans

The biofilm-associated protein (Bap) is a surface protein implicated in biofilm formation by Staphylococcus aureus isolated from chronic mastitis infections. The bap gene is carried in a putative composite transposon inserted in SaPIbov2, a mobile staphylococcal pathogenicity island. In this study, bap orthologue genes from several staphylococcal species, including Staphylococcus epidermidis, Staphylococcus chromogenes, Staphylococcus xylosus, Staphylococcus simulans and Staphylococcus hyicus, were identified, cloned and sequenced. Sequence analysis comparison of the bap gene from these species revealed a very high sequence similarity, suggesting the horizontal gene transfer of SaPIbov2 amongst them. However, sequence analyses of the flanking region revealed that the bap gene of these species was not contained in the SaPIbov2 pathogenicity island. Although they did not contain the icaADBC operon, all the coagulase-negative staphylococcal isolates harbouring bap were strong biofilm producers. Disruption of the bap gene in S. epidermidis abolished its capacity to form a biofilm, whereas heterologous complementation of a biofilm-negative strain of S. aureus with the Bap protein from S. epidermidis bestowed the capacity to form a biofilm on a polystyrene surface. Altogether, these results demonstrate that Bap orthologues from coagulase-negative staphylococci induce an alternative mechanism of biofilm formation that is independent of the PIA/PNAG exopolysaccharide.

Download Full-text

Exploring the sequence, function, and evolutionary space of protein superfamilies using sequence similarity networks and phylogenetic reconstructions

Methods in Enzymology - New Approaches for Flavin Catalysis ◽

10.1016/bs.mie.2019.03.015 ◽

2019 ◽

pp. 315-347 ◽

Cited By ~ 4

Author(s):

Janine N. Copp ◽

Dave W. Anderson ◽

Eyal Akiva ◽

Patricia C. Babbitt ◽

Nobuhiko Tokuriki

Keyword(s):

Sequence Similarity ◽

Protein Superfamilies ◽

Phylogenetic Reconstructions ◽

Similarity Networks ◽

Sequence Similarity Networks

Download Full-text

Utilizing Whole Fusobacterium Genomes To Identify, Correct, and Characterize Potential Virulence Protein Families

Journal of Bacteriology ◽

10.1128/jb.00273-19 ◽

2019 ◽

Vol 201 (23) ◽

Cited By ~ 5

Author(s):

Ariana Umaña ◽

Blake E. Sanders ◽

Christopher C. Yoo ◽

Michael A. Casasanta ◽

Barath Udayasuryan ◽

...

Keyword(s):

Colorectal Cancer ◽

Virulence Factors ◽

Serine Proteases ◽

Sequence Similarity ◽

Protein Families ◽

Emerging Pathogens ◽

Content Type ◽

Virulence Protein ◽

Similarity Networks ◽

Sequence Similarity Networks

ABSTRACT Fusobacterium spp. are Gram-negative, anaerobic, opportunistic pathogens involved in multiple diseases, including a link between the oral pathogen Fusobacterium nucleatum and the progression and severity of colorectal cancer. The identification and characterization of virulence factors in the genus Fusobacterium has been greatly hindered by a lack of properly assembled and annotated genomes. Using newly completed genomes from nine strains and seven species of Fusobacterium, we report the identification and corrected annotation of verified and potential virulence factors from the type 5 secreted autotransporter, FadA, and MORN2 protein families, with a focus on the genetically tractable strain F. nucleatum subsp. nucleatum ATCC 23726 and type strain F. nucleatum subsp. nucleatum ATCC 25586. Within the autotransporters, we used sequence similarity networks to identify protein subsets and show a clear differentiation between the prediction of outer membrane adhesins, serine proteases, and proteins with unknown function. These data have identified unique subsets of type 5a autotransporters, which are key proteins associated with virulence in F. nucleatum. However, we coupled our bioinformatic data with bacterial binding assays to show that a predicted weakly invasive strain of F. necrophorum that lacks a Fap2 autotransporter adhesin strongly binds human colonocytes. These analyses confirm a gap in our understanding of how autotransporters, MORN2 domain proteins, and FadA adhesins contribute to host interactions and invasion. In summary, we identify candidate virulence genes in Fusobacterium, and caution that experimental validation of host-microbe interactions should complement bioinformatic predictions to increase our understanding of virulence protein contributions in Fusobacterium infections and disease. IMPORTANCE Fusobacterium spp. are emerging pathogens that contribute to mammalian and human diseases, including colorectal cancer. Despite a validated connection with disease, few proteins have been characterized that define a direct molecular mechanism for Fusobacterium pathogenesis. We report a comprehensive examination of virulence-associated protein families in multiple Fusobacterium species and show that complete genomes facilitate the correction and identification of multiple, large type 5a secreted autotransporter genes in previously misannotated or fragmented genomes. In addition, we use protein sequence similarity networks and human cell interaction experiments to show that previously predicted noninvasive strains can indeed bind to and potentially invade human cells and that this could be due to the expansion of specific virulence proteins that drive Fusobacterium infections and disease.

Download Full-text

A Clustering Method for Analysis of Sequence Similarity Networks of Proteins Using Maximal Components of Graphs

IPSJ Digital Courier ◽

10.2197/ipsjdc.4.207 ◽

2008 ◽

Vol 4 ◽

pp. 207-216

Author(s):

Morihiro Hayashida ◽

Tatsuya Akutsu ◽

Hiroshi Nagamochi

Keyword(s):

Sequence Similarity ◽

Clustering Method ◽

Similarity Networks ◽

Sequence Similarity Networks

Download Full-text

Explanatory note on DNA sequence similarity searches in the context of the assessment of horizontal gene transfer from plants to microorganisms

EFSA Supporting Publications ◽

10.2903/sp.efsa.2015.en-916 ◽

2015 ◽

Vol 12 (12) ◽

Cited By ~ 2

Author(s):

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Dna Sequence ◽

Sequence Similarity ◽

Explanatory Note ◽

Similarity Searches

Download Full-text

Novel 4-Chlorophenol Degradation Gene Cluster and Degradation Route via Hydroxyquinol in Arthrobacter chlorophenolicus A6

Applied and Environmental Microbiology ◽

10.1128/aem.71.11.6538-6544.2005 ◽

2005 ◽

Vol 71 (11) ◽

pp. 6538-6544 ◽

Cited By ~ 81

Author(s):

Karolina Nordin ◽

Maria Unell ◽

Janet K. Jansson

Keyword(s):

Gene Transfer ◽

Horizontal Gene Transfer ◽

Gene Cluster ◽

Microbial Degradation ◽

Codon Bias ◽

Sequence Similarity ◽

Open Reading Frames ◽

Genes Encoding ◽

Arthrobacter Chlorophenolicus ◽

Reading Frames

ABSTRACT Arthrobacter chlorophenolicus A6, a previously described 4-chlorophenol-degrading strain, was found to degrade 4-chlorophenol via hydroxyquinol, which is a novel route for aerobic microbial degradation of this compound. In addition, 10 open reading frames exhibiting sequence similarity to genes encoding enzymes involved in chlorophenol degradation were cloned and designated part of a chlorophenol degradation gene cluster (cph genes). Several of the open reading frames appeared to encode enzymes with similar functions; these open reading frames included two genes, cphA-I and cphA-II, which were shown to encode functional hydroxyquinol 1,2-dioxygenases. Disruption of the cphA-I gene yielded a mutant that exhibited negligible growth on 4-chlorophenol, thereby linking the cph gene cluster to functional catabolism of 4-chlorophenol in A. chlorophenolicus A6. The presence of a resolvase pseudogene in the cph gene cluster together with analyses of the G+C content and codon bias of flanking genes suggested that horizontal gene transfer was involved in assembly of the gene cluster during evolution of the ability of the strain to grow on 4-chlorophenol.

Download Full-text

Sequence Similarity Networks for the Protein Universe

The FASEB Journal ◽

10.1096/fasebj.29.1_supplement.573.17 ◽

2015 ◽

Vol 29 (S1) ◽

Author(s):

Katie Whalen ◽

Boris Sadkhin ◽

Daniel Davidson ◽

John Gerlt

Keyword(s):

Sequence Similarity ◽

Protein Universe ◽

Similarity Networks ◽

Sequence Similarity Networks

Download Full-text