Abstract
Background
sorting by transpositions (SBT) is a classical problem in genome rearrangements. In 2012, SBT was proven to be $$\mathcal {NP}$$
NP
-hard and the best approximation algorithm with a 1.375 ratio was proposed in 2006 by Elias and Hartman (EH algorithm). Their algorithm employs simplification, a technique used to transform an input permutation $$\pi$$
π
into a simple permutation$${\hat{\pi }}$$
π
^
, presumably easier to handle with. The permutation $${\hat{\pi }}$$
π
^
is obtained by inserting new symbols into $$\pi$$
π
in a way that the lower bound of the transposition distance of $$\pi$$
π
is kept on $${\hat{\pi }}$$
π
^
. The simplification is guaranteed to keep the lower bound, not the transposition distance. A sequence of operations sorting $${\hat{\pi }}$$
π
^
can be mimicked to sort $$\pi$$
π
.
Results and conclusions
First, using an algebraic approach, we propose a new upper bound for the transposition distance, which holds for all $$S_n$$
S
n
. Next, motivated by a problem identified in the EH algorithm, which causes it, in scenarios involving how the input permutation is simplified, to require one extra transposition above the 1.375-approximation ratio, we propose a new approximation algorithm to solve SBT ensuring the 1.375-approximation ratio for all $$S_n$$
S
n
. We implemented our algorithm and EH’s. Regarding the implementation of the EH algorithm, two other issues were identified and needed to be fixed. We tested both algorithms against all permutations of size n, $$2\le n \le 12$$
2
≤
n
≤
12
. The results show that the EH algorithm exceeds the approximation ratio of 1.375 for permutations with a size greater than 7. The percentage of computed distances that are equal to transposition distance, computed by the implemented algorithms are also compared with others available in the literature. Finally, we investigate the performance of both implementations on longer permutations of maximum length 500. From the experiments, we conclude that maximum and the average distances computed by our algorithm are a little better than the ones computed by the EH algorithm and the running times of both algorithms are similar, despite the time complexity of our algorithm being higher.