Computational Synteny Block: A Framework to Identify Evolutionary Events

2016 ◽  
Vol 15 (4) ◽  
pp. 343-353 ◽  
Author(s):  
Jose A. Arjona-Medina ◽  
Oswaldo Trelles
Keyword(s):  
2020 ◽  
Vol 36 (13) ◽  
pp. 3966-3974
Author(s):  
Ryo Nakabayashi ◽  
Shinichi Morishita

Abstract Motivation De novo assembly of reference-quality genomes used to require enormously laborious tasks. In particular, it is extremely time-consuming to build genome markers for ordering assembled contigs along chromosomes; thus, they are only available for well-established model organisms. To resolve this issue, recent studies demonstrated that Hi-C could be a powerful and cost-effective means to output chromosome-length scaffolds for non-model species with no genome marker resources, because the Hi-C contact frequency between a pair of two loci can be a good estimator of their genomic distance, even if there is a large gap between them. Indeed, state-of-the-art methods such as 3D-DNA are now widely used for locating contigs in chromosomes. However, it remains challenging to reduce errors in contig orientation because shorter contigs have fewer contacts with their neighboring contigs. These orientation errors lower the accuracy of gene prediction, read alignment, and synteny block estimation in comparative genomics. Results To reduce these contig orientation errors, we propose a new algorithm, named HiC-Hiker, which has a firm grounding in probabilistic theory, rigorously models Hi-C contacts across contigs, and effectively infers the most probable orientations via the Viterbi algorithm. We compared HiC-Hiker and 3D-DNA using human and worm genome contigs generated from short reads, evaluated their performances, and observed a remarkable reduction in the contig orientation error rate from 4.3% (3D-DNA) to 1.7% (HiC-Hiker). Our algorithm can consider long-range information between distal contigs and precisely estimates Hi-C read contact probabilities among contigs, which may also be useful for determining the ordering of contigs. Availability and implementation HiC-Hiker is freely available at: https://github.com/ryought/hic_hiker.


2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Jongin Lee ◽  
Daehwan Lee ◽  
Mikang Sim ◽  
Daehong Kwon ◽  
Juyeon Kim ◽  
...  

2020 ◽  
Vol 37 (9) ◽  
pp. 2747-2762 ◽  
Author(s):  
Guénola Drillon ◽  
Raphaël Champeimont ◽  
Francesco Oteri ◽  
Gilles Fischer ◽  
Alessandra Carbone

Abstract Gene order can be used as an informative character to reconstruct phylogenetic relationships between species independently from the local information present in gene/protein sequences. PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively, supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches. PhyChro performance is evaluated on two data sets of 13 vertebrates and 21 yeast genomes by using up to 130,000 and 179,000 breakpoints, respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared with other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in <15 min.


GigaScience ◽  
2020 ◽  
Vol 9 (6) ◽  
Author(s):  
Ksenia Krasheninnikova ◽  
Mark Diekhans ◽  
Joel Armstrong ◽  
Aleksei Dievskii ◽  
Benedict Paten ◽  
...  

Abstract Background Large-scale sequencing projects provide high-quality full-genome data that can be used for reconstruction of chromosomal exchanges and rearrangements that disrupt conserved syntenic blocks. The highest resolution of cross-species homology can be obtained on the basis of whole-genome, reference-free alignments. Very large multiple alignments of full-genome sequence stored in a binary format demand an accurate and efficient computational approach for synteny block production. Findings halSynteny performs efficient processing of pairwise alignment blocks for any pair of genomes in the alignment. The tool is part of the HAL comparative genomics suite and is targeted to build synteny blocks for multi-hundred–way, reference-free vertebrate alignments built with the Cactus system. Conclusions halSynteny enables an accurate and rapid identification of synteny in multiple full-genome alignments. The method is implemented in C++11 as a component of the halTools software and released under MIT license. The package is available at https://github.com/ComparativeGenomicsToolkit/hal/.


2020 ◽  
Author(s):  
Qiang Yang ◽  
zhiming zheng ◽  
Hui Liu ◽  
Peng Wang ◽  
Li Wang ◽  
...  

Abstract Background The species in family Elizabethkingia meningoseptica are interesting strain for investigating Vitamin K2 metabolic analysis. However, their genomic sequence, metabolic pathway, potential abilities, and evolutionary status are still unknown. Results This study therefore aimed to perform a genome sequencing of Elizabethkingia meningoseptica sp. F2 and further accomplished comparative analysis with other Vitamin K2 strains reveals overall identifying its unique/shared metabolic genes across genomes. The 3,874,794–base pair sequence of Elizabethkingia meningoseptica sp. F2 is presented. Of 3,539 genes annotation was applied. Results of synteny block demonstrated Elizabethkingia meningoseptica sp. F2 shares high levels of synteny with Elizabethkingia meningoseptica ATCC 13253 and Elizabethkingia meningoseptica NBRC 12535. Identification of Vitamin K2 metabolic pathway in Elizabethkingia meningoseptica sp. F2 were also accomplished. In addition, Elizabethkingia meningoseptica sp. F2 was resistant to gentamicin, streptomycin, ampicillin and caramycin, consistent with the presence of multiple genes encoding diverse multidrug efflux pump protein in the genome. Furthermore, By co-overexpression experiments of MenA and MenG, we showed that Vitamin K2 content was enhanced by 37% compared with control strain. Conclusions The genome analysis of Elizabethkingia meningoseptica sp. F2 in conjunction with the comparative metabolic pathways analysis among the E.coli, Bacillus subtilis and Streptomyces provided a useful information on the Vitamin K2 biosynthetic pathway and other related pathways at systems level.


2016 ◽  
Vol 44 (W1) ◽  
pp. W35-W40 ◽  
Author(s):  
Jongin Lee ◽  
Woon-young Hong ◽  
Minah Cho ◽  
Mikang Sim ◽  
Daehwan Lee ◽  
...  
Keyword(s):  

2013 ◽  
Vol 11 (06) ◽  
pp. 1343004 ◽  
Author(s):  
YU-LUN CHEN ◽  
CHIEN-MING CHEN ◽  
TUN-WEN PAI ◽  
HON-WAI LEONG ◽  
KET-FAH CHONG

A synteny block represents a set of contiguous genes located within the same chromosome and well conserved among various species. Through long evolutionary processes and genome rearrangement events, large numbers of synteny blocks remain highly conserved across multiple species. Understanding distribution of conserved gene blocks facilitates evolutionary biologists to trace the diversity of life, and it also plays an important role for orthologous gene detection and gene annotation in the genomic era. In this work, we focus on collinear synteny detection in which the order of genes is required and well conserved among multiple species. To achieve this goal, the suffix tree based algorithms for efficiently identifying homologous synteny blocks was proposed. The traditional suffix tree algorithm was modified by considering a chromosome as a string and each gene in a chromosome is encoded as a symbol character. Hence, a suffix tree can be built for different query chromosomes from various species. We can then efficiently search for conserved synteny blocks that are modeled as overlapped contiguous edges in our suffix tree. In addition, we defined a novel Synteny Block Conserved Index (SBCI) to evaluate the relationship of synteny block distribution between two species, and which could be applied as an evolutionary indicator for constructing a phylogenetic tree from multiple species instead of performing large computational requirements through whole genome sequence alignment.


2019 ◽  
Author(s):  
Guénola Drillon ◽  
Raphaël Champeimont ◽  
Francesco Oteri ◽  
Gilles Fischer ◽  
Alessandra Carbone

AbstractGene order can be used as an informative character to reconstruct phylogenetic relationships-between species independently from the local information present in gene/protein sequences.PhyChro is a reconstruction method based on chromosomal rearrangements, applicable to a wide range of eukaryotic genomes with different gene contents and levels of synteny conservation. For each synteny breakpoint issued from pairwise genome comparisons, the algorithm defines two disjoint sets of genomes, named partial splits, respectively supporting the two block adjacencies defining the breakpoint. Considering all partial splits issued from all pairwise comparisons, a distance between two genomes is computed from the number of partial splits separating them. Tree reconstruction is achieved through a bottom-up approach by iteratively grouping sister genomes minimizing genome distances. PhyChro estimates branch lengths based on the number of synteny breakpoints and provides confidence scores for the branches.PhyChro performance isevaluatedon two datasets of 13 vertebrates and 21 yeast genomes by using up to 130 000 and 179 000 breakpoints respectively, a scale of genomic markers that has been out of reach until now. PhyChro reconstructs very accurate tree topologies even at known problematic branching positions. Its robustness has been benchmarked for different synteny block reconstruction methods. On simulated data PhyChro reconstructs phylogenies perfectly in almost all cases, and shows the highest accuracy compared to other existing tools. PhyChro is very fast, reconstructing the vertebrate and yeast phylogenies in less than 15 min.AvailabilityPhyChro will be freely available under the BSD license after [email protected]


2019 ◽  
Author(s):  
John Herrick ◽  
Bianca Sclavi

AbstractEvolutionary changes in karyotype have long been implicated in speciation events; however, the phylogenetic relationship between karyotype diversity and species richness in closely and distantly related mammalian lineages remains to be fully elucidated. Here we examine the association between genome diversity and species diversity across the class Mammalia. We tested five different metrics of genome diversity: clade-average genome size, standard deviation of genome size, diploid and fundamental numbers (karyotype diversity), sub-chromosomal rearrangements and percent synteny block conservation. We found a significant association between species richness (phylogenetic clade diversity) and genome diversity at both order and family level clades. Karyotype diversity provided the strongest support for a relationship between genome diversity and species diversity. Our results suggest that lineage specific variations in genome and karyotype stability can account for different levels of species diversity in mammals.


Sign in / Sign up

Export Citation Format

Share Document