Sorting permutations by prefix and suffix rearrangements

Some interesting combinatorial problems have been motivated by genome rearrangements, which are mutations that affect large portions of a genome. When we represent genomes as permutations, the goal is to transform a given permutation into the identity permutation with the minimum number of rearrangements. When they affect segments from the beginning (respectively end) of the permutation, they are called prefix (respectively suffix) rearrangements. This paper presents results for rearrangement problems that involve prefix and suffix versions of reversals and transpositions considering unsigned and signed permutations. We give 2-approximation and ([Formula: see text])-approximation algorithms for these problems, where [Formula: see text] is a constant divided by the number of breakpoints (pairs of consecutive elements that should not be consecutive in the identity permutation) in the input permutation. We also give bounds for the diameters concerning these problems and provide ways of improving the practical results of our algorithms.

Download Full-text

Super short operations on both gene order and intergenic sizes

Algorithms for Molecular Biology ◽

10.1186/s13015-019-0156-5 ◽

2019 ◽

Vol 14 (1) ◽

Cited By ~ 1

Author(s):

Andre R. Oliveira ◽

Géraldine Jean ◽

Guillaume Fertin ◽

Ulisses Dias ◽

Zanoni Dias

Keyword(s):

Approximation Algorithms ◽

Gene Order ◽

Genome Rearrangement ◽

Unit Cost ◽

Genome Rearrangements ◽

Minimum Length ◽

Approximation Factor ◽

A Genome ◽

Number Of Genes ◽

Intergenic Regions

Abstract Background The evolutionary distance between two genomes can be estimated by computing a minimum length sequence of operations, called genome rearrangements, that transform one genome into another. Usually, a genome is modeled as an ordered sequence of genes, and most of the studies in the genome rearrangement literature consist in shaping biological scenarios into mathematical models. For instance, allowing different genome rearrangements operations at the same time, adding constraints to these rearrangements (e.g., each rearrangement can affect at most a given number of genes), considering that a rearrangement implies a cost depending on its length rather than a unit cost, etc. Most of the works, however, have overlooked some important features inside genomes, such as the presence of sequences of nucleotides between genes, called intergenic regions. Results and conclusions In this work, we investigate the problem of computing the distance between two genomes, taking into account both gene order and intergenic sizes. The genome rearrangement operations we consider here are constrained types of reversals and transpositions, called super short reversals (SSRs) and super short transpositions (SSTs), which affect up to two (consecutive) genes. We denote by super short operations (SSOs) any SSR or SST. We show 3-approximation algorithms when the orientation of the genes is not considered when we allow SSRs, SSTs, or SSOs, and 5-approximation algorithms when considering the orientation for either SSRs or SSOs. We also show that these algorithms improve their approximation factors when the input permutation has a higher number of inversions, where the approximation factor decreases from 3 to either 2 or 1.5, and from 5 to either 3 or 2.

Download Full-text

Approximation Algorithms for Sorting λ-Permutations by λ-Operations

Algorithms ◽

10.3390/a14060175 ◽

2021 ◽

Vol 14 (6) ◽

pp. 175

Author(s):

Guilherme Henrique Santos Miranda ◽

Alexsandro Oliveira Alexandrino ◽

Carla Negri Lintzmayer ◽

Zanoni Dias

Keyword(s):

Comparative Genomics ◽

Approximation Algorithms ◽

Evolutionary Distance ◽

Biological Relevance ◽

A Value ◽

Identity Permutation ◽

Minimum Number

Understanding how different two organisms are is one question addressed by the comparative genomics field. A well-accepted way to estimate the evolutionary distance between genomes of two organisms is finding the rearrangement distance, which is the smallest number of rearrangements needed to transform one genome into another. By representing genomes as permutations, one of them can be represented as the identity permutation, and, so, we reduce the problem of transforming one permutation into another to the problem of sorting a permutation using the minimum number of rearrangements. This work investigates the problems of sorting permutations using reversals and/or transpositions, with some additional restrictions of biological relevance. Given a value λ, the problem now is how to sort a λ-permutation, which is a permutation whose elements are less than λ positions away from their correct places (regarding the identity), by applying the minimum number of rearrangements. Each λ-rearrangement must have size, at most, λ, and, when applied to a λ-permutation, the result should also be a λ-permutation. We present algorithms with approximation factors of O(λ2), O(λ), and O(1) for the problems of Sorting λ-Permutations by λ-Reversals, by λ-Transpositions, and by both operations.

Download Full-text

Identification of Target Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

Animals ◽

10.3390/ani11010241 ◽

2021 ◽

Vol 11 (1) ◽

pp. 241

Author(s):

Dongwon Seo ◽

Sunghyun Cho ◽

Prabuddha Manjula ◽

Nuri Choi ◽

Young-Kuk Kim ◽

...

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Fixation Index ◽

Machine Learning Classification ◽

Genetic Components ◽

Marker Combination ◽

A Genome ◽

Minimum Number ◽

Native Chickens

A marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would facilitate the protection of native genetic resources in the market of each country. In this study, a total of 283 samples from 20 lines, which consisted of Korean native chickens, commercial native chickens, and commercial broilers with a layer population, were analyzed to determine the optimal marker combination comprising the minimum number of markers, using a 600 k high-density single nucleotide polymorphism (SNP) array. Machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group for comparison with control chicken groups. In the processing of marker selection, a total of 47,303 SNPs were used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by the AdaBoost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0%, and 97.9%, respectively. The selected marker combinations increased the genetic distance and fixation index (Fst) values between the case and control groups, and they reduced the number of genetic components required, confirming that efficient classification of the groups was possible by using a small number of marker sets. In a verification study including additional chicken breeds and samples (12 lines and 182 samples), the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations. The GWAS, PCA, and machine learning algorithms used in this study can be applied efficiently, to determine the optimal marker combination with the minimum number of markers that can distinguish the target population among a large number of SNP markers.

Download Full-text

On the Number of Tetrahedra with Minimum, Unit, and Distinct Volumes in Three-Space

Combinatorics Probability Computing ◽

10.1017/s096354830700884x ◽

2008 ◽

Vol 17 (2) ◽

pp. 203-224 ◽

Cited By ~ 3

Author(s):

ADRIAN DUMITRESCU ◽

CSABA D. TÓTH

Keyword(s):

Unit Volume ◽

Time Algorithm ◽

Combinatorial Problems ◽

List Type ◽

Type Number ◽

Point Sets ◽

Minimum Number ◽

Three Space

We formulate and give partial answers to several combinatorial problems on volumes of simplices determined bynpoints in 3-space, and in general inddimensions.(i)The number of tetrahedra of minimum (non-zero) volume spanned bynpoints in$\mathbb{R}$3is at most$\frac{2}{3}n^3-O(n^2)$, and there are point sets for which this number is$\frac{3}{16}n^3-O(n^2)$. We also present anO(n3) time algorithm for reporting all tetrahedra of minimum non-zero volume, and thereby extend an algorithm of Edelsbrunner, O'Rourke and Seidel. In general, for every$k,d\in \mathbb{N}, 1\leq k \leq d$, the maximum number ofk-dimensional simplices of minimum (non-zero) volume spanned bynpoints in$\mathbb{R}$dis Θ(nk).(ii)The number of unit volume tetrahedra determined bynpoints in$\mathbb{R}$3isO(n7/2), and there are point sets for which this number is Ω(n3log logn).(iii)For every$d\in \mathbb{N}$, the minimum number of distinct volumes of all full-dimensional simplices determined bynpoints in$\mathbb{R}$d, not all on a hyperplane, is Θ(n).

Download Full-text

Identification of Chicken Populations by Machine Learning Models Using the Minimum Number of SNPs

10.21203/rs.3.rs-95706/v1 ◽

2020 ◽

Author(s):

Dongwon Seo ◽

Sunghyun Cho ◽

Prabuddha Manjula ◽

Nuri Choi ◽

Young Kuk Kim ◽

...

Keyword(s):

Machine Learning ◽

Snp Array ◽

Machine Learning Algorithms ◽

Case Group ◽

Machine Learning Classification ◽

Genetic Components ◽

Native Chicken ◽

Marker Combination ◽

A Genome ◽

Minimum Number

Abstract BackgroundA marker combination capable of classifying a specific chicken population could improve commercial value by increasing consumer confidence with respect to the origin of the population. This would also facilitate the protection of genetic resources, especially in developing countries. MethodsIn this study, a total of 20 lines 283 samples which were consist of Korean native chicken, commercial native chicken, and commercial broilers with layer population were used for finding the minimum number of marker combinations through the 600k high-density single nucleotide polymorphism (SNP) array. Application of the machine learning algorithms, a genome-wide association study (GWAS), linkage disequilibrium (LD) analysis, and principal component analysis (PCA) were used to distinguish a target (case) group from control chicken groups. In the verification of the selected markers, a total of 12 lines 182 samples were used to confirm the change in the accuracy of the target chicken breed identification.ResultsA total of 47,303 SNPs was used for classifying chicken populations; 96 LD-pruned SNPs (50 SNPs per LD block) served as the best marker combination for target chicken classification. Moreover, 36, 44, and 8 SNPs were selected as the minimum numbers of markers by Adaboost (AB), Random Forest (RF), and Decision Tree (DT) machine learning classification models, which had accuracy rates of 99.6%, 98.0% and 97.9%, respectively. The selected marker combinations increased the genetic distance between the case and control groups, and reduced the number of genetic components, confirming that an efficient classification of the groups was possible using small number of marker sets. In a verification study including additional chicken breeds and samples, the accuracy did not significantly change, and the target chicken group could be clearly distinguished from the other populations.ConclusionsThe GWAS and PCA analysis, machine learning algorithm used in this study is able to be applied efficiently to explore the minimum combination of markers that can distinguish varieties among a large number of SNP markers.

Download Full-text

APPROXIMATION ALGORITHMS FOR A VARIANT OF DISCRETE PIERCING SET PROBLEM FOR UNIT DISKS

International Journal of Computational Geometry & Applications ◽

10.1142/s021819591350009x ◽

2013 ◽

Vol 23 (06) ◽

pp. 461-477 ◽

Cited By ~ 7

Author(s):

MINATI DE ◽

GAUTAM K. DAS ◽

PAZ CARMI ◽

SUBHAS C. NANDY

Keyword(s):

Approximation Algorithms ◽

Simple Algorithm ◽

Constant Factor ◽

Performance Ratio ◽

Approximation Result ◽

Worst Case ◽

Approximation Factor ◽

Minimum Number ◽

Unit Disks ◽

Set Of Points

In this paper, we consider constant factor approximation algorithms for a variant of the discrete piercing set problem for unit disks. Here a set of points P is given; the objective is to choose minimum number of points in P to pierce the unit disks centered at all the points in P. We first propose a very simple algorithm that produces 12-approximation result in O(n log n) time. Next, we improve the approximation factor to 4 and then to 3. The worst case running time of these algorithms are O(n8 log n) and O(n15 log n) respectively. Apart from the space required for storing the input, the extra work-space requirement for each of these algorithms is O(1). Finally, we propose a PTAS for the same problem. Given a positive integer k, it can produce a solution with performance ratio [Formula: see text] in nO(k) time.

Download Full-text

COMPUTING THE REVERSAL DISTANCE BETWEEN GENOMES IN THE PRESENCE OF MULTI-GENE FAMILIES VIA BINARY INTEGER PROGRAMMING

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720007002552 ◽

2007 ◽

Vol 05 (01) ◽

pp. 117-133 ◽

Cited By ~ 6

Author(s):

JAKKARIN SUKSAWATCHON ◽

CHIDCHANOK LURSINSAP ◽

MIKAEL BODÉN

Keyword(s):

Integer Programming ◽

Combinatorial Problem ◽

Gene Families ◽

Polynomial Time Algorithm ◽

Time Algorithm ◽

High Accuracy ◽

Biological Data ◽

Binary Integer Programming ◽

A Genome ◽

Minimum Number

Hannenhalli and Pevzner developed the first polynomial-time algorithm for the combinatorial problem of sorting signed genomic data. Their algorithm determines the minimum number of reversals required for rearranging a genome to another — but only in the absence of gene duplicates. However, duplicates often account for 40% of a genome. In this paper, we show how to extend Hannenhalli and Pevzner's approach to deal with genomes with multi-gene families. We propose a new heuristic algorithm to compute the nearest reversal distance between two genomes with multi-gene families via binary integer programming. The experimental results on both synthetic and real biological data demonstrate that the proposed algorithm is able to find the reversal distance with high accuracy.

Download Full-text

Complexity of approximation algorithms for combinatorial problems

ACM SIGACT News ◽

10.1145/1008861.1008867 ◽

1980 ◽

Vol 12 (3) ◽

pp. 52-65 ◽

Cited By ~ 24

Author(s):

Georgii Gens ◽

Evgenii Levner

Keyword(s):

Approximation Algorithms ◽

Combinatorial Problems

Download Full-text

UDiTaS™, a genome editing detection method for indels and genome rearrangements

BMC Genomics ◽

10.1186/s12864-018-4561-9 ◽

2018 ◽

Vol 19 (1) ◽

Cited By ~ 21

Author(s):

Georgia Giannoukos ◽

Dawn M. Ciulla ◽

Eugenio Marco ◽

Hayat S. Abdulkerim ◽

Luis A. Barrera ◽

...

Keyword(s):

Genome Editing ◽

Detection Method ◽

Genome Rearrangements ◽

A Genome

Download Full-text

Search Numbers in Networks with Special Topologies

Journal of Interconnection Networks ◽

10.1142/s0219265919400048 ◽

2019 ◽

Vol 19 (01) ◽

pp. 1940004

Author(s):

BOTING YANG ◽

RUNTAO ZHANG ◽

YI CAO ◽

FARONG ZHONG

Keyword(s):

Approximation Algorithms ◽

Search Strategy ◽

Linear Time ◽

Time Algorithm ◽

Time Transformation ◽

Linear Time Algorithm ◽

Optimal Layout ◽

Explicit Formulas ◽

Minimum Number ◽

Search Numbers

In this paper, we consider the problem of finding the minimum number of searchers to sweep networks/graphs with special topological structures. Such a number is called the search number. We first study graphs, which contain only one cycle, and present a linear time algorithm to compute the vertex separation and the optimal layout of such graphs; by a linear-time transformation, we can find the search number of this kind of graphs in linear time. We also investigate graphs, in which every vertex lies on at most one cycle and each cycle contains at most three vertices of degree more than two, and we propose a linear time algorithm to compute their search number and optimal search strategy. We prove explicit formulas for the search number of the graphs obtained from complete k-ary trees by replacing vertices by cycles. We also present some results on approximation algorithms.

Download Full-text