Computational Synteny Block: A Framework to Identify Evolutionary Events

Abstract Motivation De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics. Results To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs. Availability and implementation HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.

Download Full-text

mySyntenyPortal: an application package to construct websites for synteny block analysis

BMC Bioinformatics ◽

10.1186/s12859-018-2219-x ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 2

Author(s):

Jongin Lee ◽

Daehwan Lee ◽

Mikang Sim ◽

Daehong Kwon ◽

Juyeon Kim ◽

...

Keyword(s):

Synteny Block ◽

Application Package

Download Full-text

Phylogenetic Reconstruction Based on Synteny Block and Gene Adjacencies

Molecular Biology and Evolution ◽

10.1093/molbev/msaa114 ◽

2020 ◽

Vol 37 (9) ◽

pp. 2747-2762 ◽

Cited By ~ 4

Author(s):

Guénola Drillon ◽

Raphaël Champeimont ◽

Francesco Oteri ◽

Gilles Fischer ◽

Alessandra Carbone

Keyword(s):

Chromosomal Rearrangements ◽

Phylogenetic Reconstruction ◽

Simulated Data ◽

Synteny Block ◽

Data Sets ◽

Reconstruction Method ◽

Reconstruction Methods ◽

Wide Range ◽

Genomic Markers ◽

Almost All

Abstract Gene order can be used as an informative character to reconstruct phylogenetic relationships between species independently from the local information present in gene/protein sequences. PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively, supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches. PhyChro performance is evaluated on two data sets of 13 vertebrates and 21 yeast genomes by using up to 130,000 and 179,000 breakpoints, respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared with other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in <15 min.

Download Full-text

halSynteny: a fast, easy-to-use conserved synteny block construction method for multiple whole-genome alignments

GigaScience ◽

10.1093/gigascience/giaa047 ◽

2020 ◽

Vol 9 (6) ◽

Author(s):

Ksenia Krasheninnikova ◽

Mark Diekhans ◽

Joel Armstrong ◽

Aleksei Dievskii ◽

Benedict Paten ◽

...

Keyword(s):

Large Scale ◽

Pairwise Alignment ◽

Synteny Block ◽

Rapid Identification ◽

Full Genome Sequence ◽

Whole Genome ◽

Full Genome ◽

Genome Data ◽

Multiple Alignments ◽

Binary Format

Abstract Background Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an accurate and efficient computational approach for synteny block production. Findings halSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred–way, reference-free vertebrate alignments built with the Cactus system. Conclusions halSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at https://github.com/ComparativeGenomicsToolkit/hal/.

Download Full-text

Engineering Elizabethkingia meningoseptica sp. F2 for Vitamin K2 production guided by genome analysis.

10.21203/rs.3.rs-25351/v1 ◽

2020 ◽

Author(s):

Qiang Yang ◽

zhiming zheng ◽

Hui Liu ◽

Peng Wang ◽

Li Wang ◽

...

Keyword(s):

Metabolic Pathway ◽

Genome Analysis ◽

Genomic Sequence ◽

Efflux Pump ◽

Synteny Block ◽

Vitamin K2 ◽

Evolutionary Status ◽

Multidrug Efflux Pump ◽

Elizabethkingia Meningoseptica ◽

A Genome

Abstract Background The species in family Elizabethkingia meningoseptica are interesting strain for investigating Vitamin K2 metabolic analysis. However, their genomic sequence, metabolic pathway, potential abilities, and evolutionary status are still unknown. Results This study therefore aimed to perform a genome sequencing of Elizabethkingia meningoseptica sp. F2 and further accomplished comparative analysis with other Vitamin K2 strains reveals overall identifying its unique/shared metabolic genes across genomes. The 3,874,794–base pair sequence of Elizabethkingia meningoseptica sp. F2 is presented. Of 3,539 genes annotation was applied. Results of synteny block demonstrated Elizabethkingia meningoseptica sp. F2 shares high levels of synteny with Elizabethkingia meningoseptica ATCC 13253 and Elizabethkingia meningoseptica NBRC 12535. Identification of Vitamin K2 metabolic pathway in Elizabethkingia meningoseptica sp. F2 were also accomplished. In addition, Elizabethkingia meningoseptica sp. F2 was resistant to gentamicin, streptomycin, ampicillin and caramycin, consistent with the presence of multiple genes encoding diverse multidrug efflux pump protein in the genome. Furthermore, By co-overexpression experiments of MenA and MenG, we showed that Vitamin K2 content was enhanced by 37% compared with control strain. Conclusions The genome analysis of Elizabethkingia meningoseptica sp. F2 in conjunction with the comparative metabolic pathways analysis among the E.coli, Bacillus subtilis and Streptomyces provided a useful information on the Vitamin K2 biosynthetic pathway and other related pathways at systems level.

Download Full-text

Synteny Portal: a web-based application portal for synteny block analysis

Nucleic Acids Research ◽

10.1093/nar/gkw310 ◽

2016 ◽

Vol 44 (W1) ◽

pp. W35-W40 ◽

Cited By ~ 24

Author(s):

Jongin Lee ◽

Woon-young Hong ◽

Minah Cho ◽

Mikang Sim ◽

Daehwan Lee ◽

...

Keyword(s):

Synteny Block ◽

Web Based

Download Full-text

Computational Synteny Block: A framework to identify evolutionary events

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) ◽

10.1109/bibm.2015.7359648 ◽

2015 ◽

Author(s):

Jose A. Arjona-Medina ◽

Oswaldo Trelles

Keyword(s):

Synteny Block

Download Full-text

HOMOLOGOUS SYNTENY BLOCK DETECTION BASED ON SUFFIX TREE ALGORITHMS

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001343004x ◽

2013 ◽

Vol 11 (06) ◽

pp. 1343004 ◽

Cited By ~ 1

Author(s):

YU-LUN CHEN ◽

CHIEN-MING CHEN ◽

TUN-WEN PAI ◽

HON-WAI LEONG ◽

KET-FAH CHONG

Keyword(s):

Suffix Tree ◽

Genome Rearrangement ◽

Gene Annotation ◽

Orthologous Gene ◽

Synteny Block ◽

Whole Genome Sequence ◽

Synteny Blocks ◽

Tree Algorithms ◽

Multiple Species ◽

Conserved Gene

A synteny block represents a set of contiguous genes located within the same chromosome and well conserved among various species. Through long evolutionary processes and genome rearrangement events, large numbers of synteny blocks remain highly conserved across multiple species. Understanding distribution of conserved gene blocks facilitates evolutionary biologists to trace the diversity of life, and it also plays an important role for orthologous gene detection and gene annotation in the genomic era. In this work, we focus on collinear synteny detection in which the order of genes is required and well conserved among multiple species. To achieve this goal, the suffix tree based algorithms for efficiently identifying homologous synteny blocks was proposed. The traditional suffix tree algorithm was modified by considering a chromosome as a string and each gene in a chromosome is encoded as a symbol character. Hence, a suffix tree can be built for different query chromosomes from various species. We can then efficiently search for conserved synteny blocks that are modeled as overlapped contiguous edges in our suffix tree. In addition, we defined a novel Synteny Block Conserved Index (SBCI) to evaluate the relationship of synteny block distribution between two species, and which could be applied as an evolutionary indicator for constructing a phylogenetic tree from multiple species instead of performing large computational requirements through whole genome sequence alignment.

Download Full-text

Phylogenetic reconstruction based on synteny block and gene adjacencies

10.1101/840942 ◽

2019 ◽

Author(s):

Guénola Drillon ◽

Raphaël Champeimont ◽

Francesco Oteri ◽

Gilles Fischer ◽

Alessandra Carbone

Keyword(s):

Chromosomal Rearrangements ◽

Phylogenetic Reconstruction ◽

Simulated Data ◽

Synteny Block ◽

Reconstruction Method ◽

Reconstruction Methods ◽

Wide Range ◽

Genomic Markers ◽

Tree Topologies ◽

Almost All

AbstractGene order can be used as an informative character to reconstruct phylogenetic relationships-between species independently from the local information present in gene/protein sequences.PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches.PhyChro performance isevaluatedon two datasets of 13 vertebrates and 21 yeast genomes by using up to 130 000 and 179 000 breakpoints respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared to other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in less than 15 min.AvailabilityPhyChro will be freely available under the BSD license after [email protected]

Download Full-text

Genome diversity and species richness in mammals

10.1101/709311 ◽

2019 ◽

Author(s):

John Herrick ◽

Bianca Sclavi

Keyword(s):

Species Richness ◽

Species Diversity ◽

Genome Size ◽

Chromosomal Rearrangements ◽

Synteny Block ◽

Genome Diversity ◽

Evolutionary Changes ◽

Family Level ◽

Size Standard ◽

Different Levels

AbstractEvolutionary changes in karyotype have long been implicated in speciation events; however, the phylogenetic relationship between karyotype diversity and species richness in closely and distantly related mammalian lineages remains to be fully elucidated. Here we examine the association between genome diversity and species diversity across the class Mammalia. We tested five different metrics of genome diversity: clade-average genome size, standard deviation of genome size, diploid and fundamental numbers (karyotype diversity), sub-chromosomal rearrangements and percent synteny block conservation. We found a significant association between species richness (phylogenetic clade diversity) and genome diversity at both order and family level clades. Karyotype diversity provided the strongest support for a relationship between genome diversity and species diversity. Our results suggest that lineage specific variations in genome and karyotype stability can account for different levels of species diversity in mammals.

Download Full-text