scholarly journals Alignment-free microbial phylogenomics under scenarios of sequence divergence, genome rearrangement and lateral genetic transfer

2016 ◽  
Vol 6 (1) ◽  
Author(s):  
Guillaume Bernard ◽  
Cheong Xin Chan ◽  
Mark A. Ragan
mSystems ◽  
2018 ◽  
Vol 3 (6) ◽  
Author(s):  
Guillaume Bernard ◽  
Paul Greenfield ◽  
Mark A. Ragan ◽  
Cheong Xin Chan

ABSTRACTMicrobial genomes have been shaped by parent-to-offspring (vertical) descent and lateral genetic transfer. These processes can be distinguished by alignment-based inference and comparison of phylogenetic trees for individual gene families, but this approach is not scalable to whole-genome sequences, and a tree-like structure does not adequately capture how these processes impact microbial physiology. Here we adopted alignment-free approaches based onk-mer statistics to infer phylogenomic networks involving 2,783 completely sequenced bacterial and archaeal genomes and compared the contributions of rRNA, protein-coding, and plasmid sequences to these networks. Our results show that the phylogenomic signal arising from ribosomal RNAs is strong and extends broadly across all taxa, whereas that from plasmids is strong but restricted to closely related groups, particularlyProteobacteria. However, the signal from the other chromosomal regions is restricted in breadth. We show that meank-mer similarity can correlate with taxonomic rank. We also link the implicatedk-mers to genome annotation (thus, functions) and define corek-mers (thus, core functions) in specific phyletic groups. Highly conserved functions in most phyla include amino acid metabolism and transport as well as energy production and conversion. Intracellular trafficking and secretion are the most prominent core functions amongSpirochaetes, whereas energy production and conversion are not highly conserved among the largely parasitic or commensalTenericutes. These observations suggest that differential conservation of functions relates to niche specialization and evolutionary diversification of microbes. Our results demonstrate thatk-mer approaches can be used to efficiently identify phylogenomic signals and conserved core functions at the multigenome scale.IMPORTANCEGenome evolution of microbes involves parent-to-offspring descent, and lateral genetic transfer that convolutes the phylogenomic signal. This study investigated phylogenomic signals among thousands of microbial genomes based on short subsequences without using multiple-sequence alignment. The signal from ribosomal RNAs is strong across all taxa, and the signal of plasmids is strong only in closely related groups, particularlyProteobacteria. However, the signal from other chromosomal regions (∼99% of the genomes) is remarkably restricted in breadth. The similarity of subsequences is found to correlate with taxonomic rank and informs on conserved and differential core functions relative to niche specialization and evolutionary diversification of microbes. These results provide a comprehensive, alignment-free view of microbial genome evolution as a network, beyond a tree-like structure.


PLoS ONE ◽  
2009 ◽  
Vol 4 (2) ◽  
pp. e4524 ◽  
Author(s):  
Cheong Xin Chan ◽  
Aaron E. Darling ◽  
Robert G. Beiko ◽  
Mark A. Ragan

2019 ◽  
Author(s):  
Tianyu Han(Former Corresponding Author) ◽  
Mimi Li ◽  
Jiawei Li ◽  
Han Lv ◽  
Bingru Ren ◽  
...  

Abstract Background Some Gynura species were reported to be natural anti-diabetic plants. The chloroplast genomes of four Gynura species were sequenced for hybridizations to improve agronomic traits. There are only 4 genera of tribe Senecioneae have published chloroplast genome in Genbank up to now. The internal relationships of the genus Gynura and the relationship of the genus Gynura with other genera in tribe Senecioneae need further researches. Results The chloroplast genome of 4 Gynura species were sequenced, assembled and annotated. Comparing with other 12 Senecioneae species, the chloroplast genome features were detailedly analyzed. Subsequently, the differences of the microsatellites and repeats type in the tribe were found. By comparison, the IR expansion and contraction is conserved in the genera Gynura, Dendrosenecio and Ligularia. The region from 25,000 to 50,000 bp is relatively not conservative but the 7 ndh genes in this region are under purifying selection with small change in amino acids. The phylogenetic tree shows two major clades, same as the sequence divergence in region 25,000 to 50,000 bp. Based on the oldest Artemisia pollen fossil, the divergence time were estimated. Conclusions Sequencing of chloroplast genome of the 4 Gynura species help us to develop abundant genetic resources. The phylogenetic relationships and divergence time among 4 Gynura and 16 Senecioneae species were sorted out by comparing the chloroplast genomes. The phylogenetic relationship of the genera Gynura and Ligularia is different with former work and further morphology and genome-wide analysis are needed to clarify the genera relationship.


2019 ◽  
Author(s):  
Tianyu Han ◽  
Mimi Li ◽  
Jiawei Li ◽  
Han Lv ◽  
Bingru Ren ◽  
...  

Abstract Background Some Gynura species were reported to be natural anti-diabetic plants. The chloroplast genomes of four Gynura species were sequenced for hybridizations to improve agronomic traits. There are only 4 genera of tribe Senecioneae have published chloroplast genome in Genbank up to now. The internal relationships of the genus Gynura and the relationship of the genus Gynura with other genera in tribe Senecioneae need further researches. Results The chloroplast genome of 4 Gynura species were sequenced, assembled and annotated. Comparing with other 12 Senecioneae species, the chloroplast genome features were detailedly analyzed. Subsequently, the differences of the microsatellites and repeats type in the tribe were found. By comparison, the IR expansion and contraction is conserved in the genera Gynura, Dendrosenecio and Ligularia. The region from 25,000 to 50,000 bp is relatively not conservative but the 7 ndh genes in this region are under purifying selection with small change in amino acids. The phylogenetic tree shows two major clades, same as the sequence divergence in region 25,000 to 50,000 bp. Based on the oldest Artemisia pollen fossil, the divergence time were estimated. Conclusions Sequencing of chloroplast genome of the 4 Gynura species help us to develop abundant genetic resources. The phylogenetic relationships and divergence time among 4 Gynura and 16 Senecioneae species were sorted out by comparing the chloroplast genomes. The phylogenetic relationship of the genera Gynura and Ligularia is different with former work and further morphology and genome-wide analysis are needed to clarify the genera relationship.


F1000Research ◽  
2016 ◽  
Vol 5 ◽  
pp. 2789
Author(s):  
Guillaume Bernard ◽  
Mark A. Ragan ◽  
Cheong Xin Chan

Ernst Haeckel based his landmark Tree of Life on the supposed ontogenic recapitulation of phylogeny, i.e. that successive embryonic stages during the development of an organism re-trace the morphological forms of its ancestors over the course of evolution. Much of this idea has since been discredited. Today, phylogenies are often based on molecular sequences. A typical phylogenetic inference aims to capture and represent, in the form of a tree, the evolutionary history of a family of molecular sequences. The standard approach starts with a multiple sequence alignment, in which the sequences are arranged relative to each other in a way that maximises a measure of similarity position-by-position along their entire length. However, this approach ignores important evolutionary processes that are known to shape the genomes of microbes (bacteria, archaea and some morphologically simple eukaryotes). Recombination, genome rearrangement and lateral genetic transfer undermine the assumptions that underlie multiple sequence alignment, and imply that a tree-like structure may be too simplistic. Here, using genome sequences of 143 bacterial and archaeal genomes, we construct a network of phylogenetic relatedness based on the number of shared k-mers (subsequences at fixed length k). Our findings suggest that the network captures not only key aspects of microbial genome evolution as inferred from a tree, but also features that are not treelike. The method is highly scalable, allowing for investigation of genome evolution across a large number of genomes. Instead of using specific regions or sequences from genome sequences, or indeed Haeckel’s idea of ontogeny, we argue that genome phylogenies can be inferred using k-mers from whole-genome sequences. Representing these networks dynamically allows biological questions of interest to be formulated and addressed quickly and in a visually intuitive manner.


2016 ◽  
Vol 14 (01) ◽  
pp. 1640003 ◽  
Author(s):  
Bingxin Lu ◽  
Hon Wai Leong

Genomic islands (GIs) are clusters of functionally related genes acquired by lateral genetic transfer (LGT), and they are present in many bacterial genomes. GIs are extremely important for bacterial research, because they not only promote genome evolution but also contain genes that enhance adaption and enable antibiotic resistance. Many methods have been proposed to predict GI. But most of them rely on either annotations or comparisons with other closely related genomes. Hence these methods cannot be easily applied to new genomes. As the number of newly sequenced bacterial genomes rapidly increases, there is a need for methods to detect GI based solely on sequences of a single genome. In this paper, we propose a novel method, GI-SVM, to predict GIs given only the unannotated genome sequence. GI-SVM is based on one-class support vector machine (SVM), utilizing composition bias in terms of k-mer content. From our evaluations on three real genomes, GI-SVM can achieve higher recall compared with current methods, without much loss of precision. Besides, GI-SVM allows flexible parameter tuning to get optimal results for each genome. In short, GI-SVM provides a more sensitive method for researchers interested in a first-pass detection of GI in newly sequenced genomes.


Sign in / Sign up

Export Citation Format

Share Document