genomic distance
Recently Published Documents


TOTAL DOCUMENTS

53
(FIVE YEARS 19)

H-INDEX

16
(FIVE YEARS 3)

Author(s):  
Diego P. Rubert ◽  
Daniel Doerr ◽  
Marília D. V. Braga

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.


2021 ◽  
Author(s):  
Hemanoel Passarelli-Araujo ◽  
Gloria Regina Franco ◽  
Thiago Motta Venancio

The growth of sequenced bacterial genomes has revolutionized the assessment of microbial diversity. Pseudomonas is a widely diverse genus, comprising isolates associated with processes from pathogenesis to biotechnological applications. However, this high diversity led to historical taxonomic inconsistencies. Although type strains have been employed to estimate Pseudomonas diversity, they represent a small fraction of the genomic diversity at a genus level. We used 10,035 available Pseudomonas genomes, including 210 type strains, to build a genomic distance network to estimate the number of species through community identification. We identified inconsistencies with several type strains and found that 25.65% of the Pseudomonas genomes deposited on Genbank are misclassified. We retrieved the 13 main Pseudomonas groups and proposed P. alcaligenes as a new group. Finally, this work provides new insights on the phylogenetic boundaries of Pseudomonas and highlights that the Pseudomonas diversity has been hitherto overlooked.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Diego P. Rubert ◽  
Fábio V. Martinez ◽  
Marília D. V. Braga

Abstract Background A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410–431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances. Results In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.


2021 ◽  
Author(s):  
Diego P. Rubert ◽  
Fábio V. Martinez ◽  
Marília Braga

Abstract Background: A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome.The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkamper etal. (J. Comput. Biol., 2020) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almostempty matchings give smaller distances. Results: In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our modelthen results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger searchspace, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkamper et al. for instances with the same number of multipleconnections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.


2021 ◽  
Author(s):  
Tetsuya Yamamoto ◽  
Takahiro Sakaue ◽  
Helmut Schiessel

AbstractEnhancers are DNA sequences at a long genomic distance from target genes. Recent experiments suggest that enhancers are anchored to the surfaces of condensates of transcription machinery and that the loop extrusion process enhances the transcription level of their target genes. Here we theoretically study the polymer dynamics driven by the loop extrusion of the linker DNA between an enhancer and the promoter of its target gene to calculate the contact probability of the promoter to the transcription machinery in the condensate. Our theory predicts that when the loop extrusion process is active, the contact probability increases with increasing linker DNA length. This finding reflects the fact that the relaxation time, with which the promoter stays in proximity to the surface of the transcriptional condensate, increases as the length of the linker DNA increases. This contrasts the equilibrium case for which the contact probability between the promoter and the transcription machineries is smaller for longer linker DNA lengths.


Viruses ◽  
2020 ◽  
Vol 12 (12) ◽  
pp. 1421
Author(s):  
Andrew C. Lewin ◽  
Lyndon M. Coghill ◽  
Melanie Mironovich ◽  
Chin-Chi Liu ◽  
Renee T. Carter ◽  
...  

Canid alphaherpesvirus 1 (CHV-1) is a widespread pathogen of dogs with multiple associated clinical signs. There has been limited prior investigation into the genomics and phylogeny of this virus using whole viral genome analysis. Fifteen CHV-1 isolates were collected from animals with ocular disease based in the USA. Viral DNA was extracted for Illumina MiSeq full genome sequencing from each isolate. These data were combined with genomes of previously sequenced CHV-1 isolates obtained from hosts in the UK, Australia and Brazil. Genomic, recombinational and phylogenetic analysis were performed using multiple programs. Two isolates were separated into a clade apart from the remaining isolates and accounted for the majority of genomic distance (0.09%): one was obtained in 2019 from a USA-based host (ELAL-1) and the other in 2012 from a host in Brazil (BTU-1). ELAL-1 was found to contain variants previously reported in BTU-1 but also novel variants in the V57 gene region. Multiple non-synonymous variants were found in USA-based isolates in regions associated with antiviral resistance. Evidence of recombination was detected between ELAL-1 and BTU-1. Collectively, this represents evidence of trans-boundary transmission of a novel form of CHV-1, which highlights the importance of surveillance for this pathogen in domestic dog populations.


iScience ◽  
2020 ◽  
Vol 23 (12) ◽  
pp. 101861
Author(s):  
Jean-Charles Walter ◽  
Jérôme Rech ◽  
Nils-Ole Walliser ◽  
Jérôme Dorignac ◽  
Frédéric Geniet ◽  
...  

2020 ◽  
Vol 6 (4) ◽  
pp. 246
Author(s):  
Cene Gostinčar

The discussion of fungal species delineation has yet to reach a consensus, despite the advancements in technology, which helped modernise traditional approaches. In particular, the phylogenetic species concept was one of the tools that has been used with considerable success across the fungal kingdom. The fast rise of fungal genomics provides an unprecedented opportunity to expand measuring the relatedness of fungal strains to the level of whole genomes. However, the use of genomic information in taxonomy has only just begun, and few methodological guidelines have been suggested so far. Here, a simple approach of computationally measuring genomic distances and their use as a standard for species delineation is investigated. A fixed threshold genomic distance calculated by the quick and easy-to-use tools Mash and Dashing proved to be an unexpectedly widely applicable and robust criterion for determining whether two genomes belong to the same or to different species. The accuracy of species delineation in an uncurated dataset of GenBank fungal genomes was close to 90%—and exceeded 90% with minimal curation. As expected, the discriminative power of this approach was lower at higher taxonomic ranks, but still significantly larger than zero. Simple instructions for calculation of a genomic distance between two genomes and species similarity thresholds at different k-mer sizes are suggested. The calculation of genomic distance is identified as a powerful approach for delineating fungal species and is proposed—not as the only criterion—but as an additional tool in the versatile toolbox of fungal taxonomy.


Sign in / Sign up

Export Citation Format

Share Document