Sorting permutations by prefix and suffix rearrangements

2017 ◽  
Vol 15 (01) ◽  
pp. 1750002 ◽  
Author(s):  
Carla Negri Lintzmayer ◽  
Guillaume Fertin ◽  
Zanoni Dias

Some interesting combinatorial problems have been motivated by genome rearrangements, which are mutations that affect large portions of a genome. When we represent genomes as permutations, the goal is to transform a given permutation into the identity permutation with the minimum number of rearrangements. When they affect segments from the beginning (respectively end) of the permutation, they are called prefix (respectively suffix) rearrangements. This paper presents results for rearrangement problems that involve prefix and suffix versions of reversals and transpositions considering unsigned and signed permutations. We give 2-approximation and ([Formula: see text])-approximation algorithms for these problems, where [Formula: see text] is a constant divided by the number of breakpoints (pairs of consecutive elements that should not be consecutive in the identity permutation) in the input permutation. We also give bounds for the diameters concerning these problems and provide ways of improving the practical results of our algorithms.

2019 ◽  
Vol 14 (1) ◽  
Author(s):  
Andre R. Oliveira ◽  
Géraldine Jean ◽  
Guillaume Fertin ◽  
Ulisses Dias ◽  
Zanoni Dias

Abstract Background The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. Results and conclusions In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which affect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs. We also show that these algorithms improve their approximation factors when the input permutation has a higher number of inversions, where the approximation factor decreases from 3 to either 2 or 1.5, and from 5 to either 3 or 2.


Algorithms ◽  
2021 ◽  
Vol 14 (6) ◽  
pp. 175
Author(s):  
Guilherme Henrique Santos Miranda ◽  
Alexsandro Oliveira Alexandrino ◽  
Carla Negri Lintzmayer ◽  
Zanoni Dias

Understanding how different two organisms are is one question addressed by the comparative genomics field. A well-accepted way to estimate the evolutionary distance between genomes of two organisms is finding the rearrangement distance, which is the smallest number of rearrangements needed to transform one genome into another. By representing genomes as permutations, one of them can be represented as the identity permutation, and, so, we reduce the problem of transforming one permutation into another to the problem of sorting a permutation using the minimum number of rearrangements. This work investigates the problems of sorting permutations using reversals and/or transpositions, with some additional restrictions of biological relevance. Given a value λ, the problem now is how to sort a λ-permutation, which is a permutation whose elements are less than λ positions away from their correct places (regarding the identity), by applying the minimum number of rearrangements. Each λ-rearrangement must have size, at most, λ, and, when applied to a λ-permutation, the result should also be a λ-permutation. We present algorithms with approximation factors of O(λ2), O(λ), and O(1) for the problems of Sorting λ-Permutations by λ-Reversals, by λ-Transpositions, and by both operations.


Animals ◽  
2021 ◽  
Vol 11 (1) ◽  
pp. 241
Author(s):  
Dongwon Seo ◽  
Sunghyun Cho ◽  
Prabuddha Manjula ◽  
Nuri Choi ◽  
Young-Kuk Kim ◽  
...  

A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.


2008 ◽  
Vol 17 (2) ◽  
pp. 203-224 ◽  
Author(s):  
ADRIAN DUMITRESCU ◽  
CSABA D. TÓTH

We formulate and give partial answers to several combinatorial problems on volumes of simplices determined bynpoints in 3-space, and in general inddimensions.(i)The number of tetrahedra of minimum (non-zero) volume spanned bynpoints in$\mathbb{R}$3is at most$\frac{2}{3}n^3-O(n^2)$, and there are point sets for which this number is$\frac{3}{16}n^3-O(n^2)$. We also present anO(n3) time algorithm for reporting all tetrahedra of minimum non-zero volume, and thereby extend an algorithm of Edelsbrunner, O'Rourke and Seidel. In general, for every$k,d\in \mathbb{N}, 1\leq k \leq d$, the maximum number ofk-dimensional simplices of minimum (non-zero) volume spanned bynpoints in$\mathbb{R}$dis Θ(nk).(ii)The number of unit volume tetrahedra determined bynpoints in$\mathbb{R}$3isO(n7/2), and there are point sets for which this number is Ω(n3log logn).(iii)For every$d\in \mathbb{N}$, the minimum number of distinct volumes of all full-dimensional simplices determined bynpoints in$\mathbb{R}$d, not all on a hyperplane, is Θ(n).


2020 ◽  
Author(s):  
Dongwon Seo ◽  
Sunghyun Cho ◽  
Prabuddha Manjula ◽  
Nuri Choi ◽  
Young Kuk Kim ◽  
...  

Abstract BackgroundA marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would also facilitate the protection of genetic resources, especially in developing countries. MethodsIn this study, a total of 20 lines 283 samples which were consist of Korean native chicken, commercial native chicken, and commercial broilers with layer population were used for finding the minimum number of marker combinations through the 600k high-density single nucleotide polymorphism (SNP) array. Application of the machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group from control chicken groups. In the verification of the selected markers, a total of 12 lines 182 samples were used to confirm the change in the accuracy of the target chicken breed identification.ResultsA total of 47,303 SNPs was used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by Adaboost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0% and 97.9%, respectively. The selected marker combinations increased the genetic distance between the case and control groups, and reduced the number of genetic components, confirming that an efficient classification of the groups was possible using small number of marker sets. In a verification study including additional chicken breeds and samples, the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations.ConclusionsThe GWAS and PCA analysis, machine learning algorithm used in this study is able to be applied efficiently to explore the minimum combination of markers that can distinguish varieties among a large number of SNP markers.


2013 ◽  
Vol 23 (06) ◽  
pp. 461-477 ◽  
Author(s):  
MINATI DE ◽  
GAUTAM K. DAS ◽  
PAZ CARMI ◽  
SUBHAS C. NANDY

In this paper, we consider constant factor approximation algorithms for a variant of the discrete piercing set problem for unit disks. Here a set of points P is given; the objective is to choose minimum number of points in P to pierce the unit disks centered at all the points in P. We first propose a very simple algorithm that produces 12-approximation result in O(n log n) time. Next, we improve the approximation factor to 4 and then to 3. The worst case running time of these algorithms are O(n8 log n) and O(n15 log n) respectively. Apart from the space required for storing the input, the extra work-space requirement for each of these algorithms is O(1). Finally, we propose a PTAS for the same problem. Given a positive integer k, it can produce a solution with performance ratio [Formula: see text] in nO(k) time.


2007 ◽  
Vol 05 (01) ◽  
pp. 117-133 ◽  
Author(s):  
JAKKARIN SUKSAWATCHON ◽  
CHIDCHANOK LURSINSAP ◽  
MIKAEL BODÉN

Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting signed genomic data. Their algorithm determines the minimum number of reversals required for rearranging a genome to another — but only in the absence of gene duplicates. However, duplicates often account for 40% of a genome. In this paper, we show how to extend Hannenhalli and Pevzner's approach to deal with genomes with multi-gene families. We propose a new heuristic algorithm to compute the nearest reversal distance between two genomes with multi-gene families via binary integer programming. The experimental results on both synthetic and real biological data demonstrate that the proposed algorithm is able to find the reversal distance with high accuracy.


1980 ◽  
Vol 12 (3) ◽  
pp. 52-65 ◽  
Author(s):  
Georgii Gens ◽  
Evgenii Levner

BMC Genomics ◽  
2018 ◽  
Vol 19 (1) ◽  
Author(s):  
Georgia Giannoukos ◽  
Dawn M. Ciulla ◽  
Eugenio Marco ◽  
Hayat S. Abdulkerim ◽  
Luis A. Barrera ◽  
...  

2019 ◽  
Vol 19 (01) ◽  
pp. 1940004
Author(s):  
BOTING YANG ◽  
RUNTAO ZHANG ◽  
YI CAO ◽  
FARONG ZHONG

In this paper, we consider the problem of finding the minimum number of searchers to sweep networks/graphs with special topological structures. Such a number is called the search number. We first study graphs, which contain only one cycle, and present a linear time algorithm to compute the vertex separation and the optimal layout of such graphs; by a linear-time transformation, we can find the search number of this kind of graphs in linear time. We also investigate graphs, in which every vertex lies on at most one cycle and each cycle contains at most three vertices of degree more than two, and we propose a linear time algorithm to compute their search number and optimal search strategy. We prove explicit formulas for the search number of the graphs obtained from complete k-ary trees by replacing vertices by cycles. We also present some results on approximation algorithms.


Sign in / Sign up

Export Citation Format

Share Document