scholarly journals An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes

2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Klairton L. Brito ◽  
Andre R. Oliveira ◽  
Alexsandro O. Alexandrino ◽  
Ulisses Dias ◽  
Zanoni Dias

Abstract Background In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome. Results and conclusions In this work, we investigate the sorting by intergenic reversals and transpositions problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.

2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Andre R. Oliveira ◽  
Géraldine Jean ◽  
Guillaume Fertin ◽  
Ulisses Dias ◽  
Zanoni Dias

Abstract Background The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. Results and conclusions In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which affect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs. We also show that these algorithms improve their approximation factors when the input permutation has a higher number of inversions, where the approximation factor decreases from 3 to either 2 or 1.5, and from 5 to either 3 or 2.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Gabriel Siqueira ◽  
Alexsandro Oliveira Alexandrino ◽  
Andre Rodrigues Oliveira ◽  
Zanoni Dias

AbstractThe rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes as sequences of genes without repetition. More realistic models started to consider gene repetition or the presence of intergenic regions, sequences of nucleotides between genes and in the extremities of the genome. This work explores the transposition and reversal events applied in a genome representation considering both gene repetition and intergenic regions. We define two problems called Minimum Common Intergenic String Partition and Reverse Minimum Common Intergenic String Partition. Using a relation with these two problems, we show a $$\Theta \left( k \right)$$ Θ k -approximation for the Intergenic Transposition Distance, the Intergenic Reversal Distance, and the Intergenic Reversal and Transposition Distance problems, where k is the maximum number of copies of a gene in the genomes. Our practical experiments on simulated genomes show that the use of partitions improves the estimates for the distances.


1999 ◽  
Vol 17 (S1) ◽  
pp. S621-S626
Author(s):  
Li Hsu ◽  
Corinne Aragaki ◽  
Filemon Quiaoit ◽  
Xiangjing Wang ◽  
Xiubin Xu ◽  
...  

2021 ◽  
Vol 7 (29) ◽  
pp. eabc0776
Author(s):  
Nathan K. Schaefer ◽  
Beth Shapiro ◽  
Richard E. Green

Many humans carry genes from Neanderthals, a legacy of past admixture. Existing methods detect this archaic hominin ancestry within human genomes using patterns of linkage disequilibrium or direct comparison to Neanderthal genomes. Each of these methods is limited in sensitivity and scalability. We describe a new ancestral recombination graph inference algorithm that scales to large genome-wide datasets and demonstrate its accuracy on real and simulated data. We then generate a genome-wide ancestral recombination graph including human and archaic hominin genomes. From this, we generate a map within human genomes of archaic ancestry and of genomic regions not shared with archaic hominins either by admixture or incomplete lineage sorting. We find that only 1.5 to 7% of the modern human genome is uniquely human. We also find evidence of multiple bursts of adaptive changes specific to modern humans within the past 600,000 years involving genes related to brain development and function.


Vaccine ◽  
2011 ◽  
Vol 29 (10) ◽  
pp. 1863-1873 ◽  
Author(s):  
Noriyuki Otsuki ◽  
Hitoshi Abo ◽  
Toru Kubota ◽  
Yoshio Mori ◽  
Yukiko Umino ◽  
...  

Development ◽  
2021 ◽  
Author(s):  
Zoe L. Grant ◽  
Peter F. Hickey ◽  
Waruni Abeysekera ◽  
Lachlan Whitehead ◽  
Sabrina M. Lewis ◽  
...  

Blood vessel growth and remodelling are essential during embryonic development and disease pathogenesis. The diversity of endothelial cells (ECs) is transcriptionally evident and ECs undergo dynamic changes in gene expression during vessel growth and remodelling. Here, we investigated the role of the histone acetyltransferase HBO1 (KAT7), which is important for activating genes during development and histone H3 lysine 14 acetylation (H3K14ac). Loss of HBO1 and H3K14ac impaired developmental sprouting angiogenesis and reduced pathological EC overgrowth in the retinal endothelium. Single-cell RNA-sequencing of retinal ECs revealed an increased abundance of tip cells in Hbo1 deleted retinas, which lead to EC overcrowding in the retinal sprouting front and prevented efficient tip cell migration. We found that H3K14ac was highly abundant in the endothelial genome in both intra- and intergenic regions suggesting that the role of HBO1 is as a genome organiser that promotes efficient tip cell behaviour necessary for sprouting angiogenesis.


Entropy ◽  
2019 ◽  
Vol 21 (8) ◽  
pp. 802
Author(s):  
Chun-xiao Sun ◽  
Yu Yang ◽  
Hua Wang ◽  
Wen-hu Wang

Chromatin immunoprecipitation combined with next-generation sequencing (ChIP-Seq) technology has enabled the identification of transcription factor binding sites (TFBSs) on a genome-wide scale. To effectively and efficiently discover TFBSs in the thousand or more DNA sequences generated by a ChIP-Seq data set, we propose a new algorithm named AP-ChIP. First, we set two thresholds based on probabilistic analysis to construct and further filter the cluster subsets. Then, we use Affinity Propagation (AP) clustering on the candidate cluster subsets to find the potential motifs. Experimental results on simulated data show that the AP-ChIP algorithm is able to make an almost accurate prediction of TFBSs in a reasonable time. Also, the validity of the AP-ChIP algorithm is tested on a real ChIP-Seq data set.


2017 ◽  
Vol 23 (5) ◽  
pp. 349-366 ◽  
Author(s):  
Jesus Garcia-Diaz ◽  
Jairo Sanchez-Hernandez ◽  
Ricardo Menchaca-Mendez ◽  
Rolando Menchaca-Mendez

2008 ◽  
Vol 28 (17) ◽  
pp. 5446-5457 ◽  
Author(s):  
Laura Milligan ◽  
Laurence Decourty ◽  
Cosmin Saveanu ◽  
Juri Rappsilber ◽  
Hugo Ceulemans ◽  
...  

ABSTRACT A genome-wide screen for synthetic lethal (SL) interactions with loss of the nuclear exosome cofactors Rrp47/Lrp1 or Air1 identified 3′→5′ exonucleases, the THO complex required for mRNP assembly, and Ynr024w (Mpp6). SL interactions with mpp6Δ were confirmed for rrp47Δ and nuclear exosome component Rrp6. The results of bioinformatic analyses revealed homology between Mpp6 and a human exosome cofactor, underlining the high conservation of the RNA surveillance system. Mpp6 is an RNA binding protein that physically associates with the exosome and was localized throughout the nucleus. The results of functional analyses demonstrated roles for Mpp6 in the surveillance of both pre-rRNA and pre-mRNAs and in the degradation of “cryptic” noncoding RNAs (ncRNAs) derived from intergenic regions and the ribosomal DNA spacer heterochromatin. Strikingly, these ncRNAs are also targeted by other exosome cofactors, including Rrp47, the TRAMP complex (which includes Air1), and the Nrd1/Nab3 complex, and are degraded by both Rrp6 and the core exosome. Heterochromatic transcripts and other ncRNAs are characterized by very rapid degradation, and we predict that functional redundancy is an important feature of ncRNA metabolism.


Sign in / Sign up

Export Citation Format

Share Document