Natural Family-Free Genomic Distance

Abstract Background: A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome.The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkamper etal. (J. Comput. Biol., 2020) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almostempty matchings give smaller distances. Results: In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our modelthen results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger searchspace, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkamper et al. for instances with the same number of multipleconnections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

Download Full-text

Natural family-free genomic distance

Algorithms for Molecular Biology ◽

10.1186/s13015-021-00183-8 ◽

2021 ◽

Vol 16 (1) ◽

Author(s):

Diego P. Rubert ◽

Fábio V. Martinez ◽

Marília D. V. Braga

Keyword(s):

Large Scale ◽

Classical Problem ◽

Optimal Solution ◽

Fruit Fly ◽

Search Space ◽

Dna Fragments ◽

Genomic Distance ◽

Natural Family ◽

Minimum Number ◽

Family Based

Abstract Background A classical problem in comparative genomics is to compute the rearrangement distance, that is the minimum number of large-scale rearrangements required to transform a given genome into another given genome. The traditional approaches in this area are family-based, i.e., require the classification of DNA fragments of both genomes into families. Furthermore, the most elementary family-based models, which are able to compute distances in polynomial time, restrict the families to occur at most once in each genome. In contrast, the distance computation in models that allow multifamilies (i.e., families with multiple occurrences) is NP-hard. Very recently, Bohnenkämper et al. (J Comput Biol 28:410–431, 2021) proposed an ILP formulation for computing the genomic distance of genomes with multifamilies, allowing structural rearrangements, represented by the generic double cut and join (DCJ) operation, and content-modifying insertions and deletions of DNA segments. This ILP is very efficient, but must maximize a matching of the genes in each multifamily, in order to prevent the free lunch artifact that would otherwise let empty or almost empty matchings give smaller distances. Results In this paper, we adopt the alternative family-free setting that, instead of family classification, simply uses the pairwise similarities between DNA fragments of both genomes to compute their rearrangement distance. We adapted the ILP mentioned above and developed a model in which pairwise similarities are used to assign weights to both matched and unmatched genes, so that an optimal solution does not necessarily maximize the matching. Our model then results in a natural family-free genomic distance, that takes into consideration all given genes, without prior classification into families, and has a search space composed of matchings of any size. In spite of its bigger search space, our ILP seems to be boosted by a reduction of the number of co-optimal solutions due to the weights. Indeed, it converged faster than the original one by Bohnenkämper et al. for instances with the same number of multiple connections. We can handle not only bacterial genomes, but also fungi and insects, or sets of chromosomes of mammals and plants. In a comparison study of six fruit fly genomes, we obtained accurate results.

Download Full-text

The potential of family-free rearrangements towards gene orthology inference

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972002140014x ◽

2021 ◽

Author(s):

Diego P. Rubert ◽

Daniel Doerr ◽

Marília D. V. Braga

Keyword(s):

Large Scale ◽

Real Data ◽

Gene Families ◽

Genome Rearrangements ◽

Second Step ◽

Genomic Distance ◽

Optimal Matching ◽

Natural Family

Recently, we proposed an efficient ILP formulation [Rubert DP, Martinez FV, Braga MDV, Natural family-free genomic distance, Algorithms Mol Biol 16:4, 2021] for exactly computing the rearrangement distance of two genomes in a family-free setting. In such a setting, neither prior classification of genes into families, nor further restrictions on the genomes are imposed. Given two genomes, the mentioned ILP computes an optimal matching of the genes taking into account simultaneously local mutations, given by gene similarities, and large-scale genome rearrangements. Here, we explore the potential of using this ILP for inferring groups of orthologs across several species. More precisely, given a set of genomes, our method first computes all pairwise optimal gene matchings, which are then integrated into gene families in the second step. Our approach is implemented into a pipeline incorporating the pre-computation of gene similarities. It can be downloaded from gitlab.ub.uni-bielefeld.de/gi/FFGC. We obtained promising results with experiments on both simulated and real data.

Download Full-text

IFFO: An Improved Fruit Fly Optimization Algorithm for Multiple Workflow Scheduling Minimizing Cost and Makespan in Cloud Computing Environments

Mathematical Problems in Engineering ◽

10.1155/2021/5205530 ◽

2021 ◽

Vol 2021 ◽

pp. 1-9

Author(s):

Ambika Aggarwal ◽

Priti Dimri ◽

Amit Agarwal ◽

Madhushi Verma ◽

Hesham A. Alhumyani ◽

...

Keyword(s):

Cloud Computing ◽

Large Scale ◽

Fruit Fly ◽

Search Space ◽

Workflow Scheduling ◽

Local Optimum ◽

Fruit Fly Optimization Algorithm ◽

Fruit Fly Optimization ◽

Qos Parameters ◽

Computing Platforms

Cloud computing platforms have been extensively using scientific workflows to execute large-scale applications. However, multiobjective workflow scheduling with scientific standards to optimize QoS parameters is a challenging task. Various metaheuristic scheduling techniques have been proposed to satisfy the QoS parameters like makespan, cost, and resource utilization. Still, traditional metaheuristic approaches are incompetent to maintain agreeable equilibrium between exploration and exploitation of the search space because of their limitations like getting trapped in local optimum value at later evolution stages and higher-dimensional nonlinear optimization problem. This paper proposes an improved Fruit Fly Optimization (IFFO) algorithm to minimize makespan and cost for scheduling multiple workflows in the cloud computing environment. The proposed algorithm is evaluated using CloudSim for scheduling multiple workflows. The comparative results depict that the proposed algorithm IFFO outperforms FFO, PSO, and GA.

Download Full-text

Whale optimization algorithm based on lateral inhibition for image matching and vision-guided AUV docking

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-200365 ◽

2020 ◽

pp. 1-12

Author(s):

Zheping Yan ◽

Jinzhong Zhang ◽

Jialing Tang

Keyword(s):

Lateral Inhibition ◽

Optimization Algorithm ◽

Image Matching ◽

Autonomous Underwater Vehicle ◽

Optimal Solution ◽

Search Space ◽

Whale Optimization Algorithm ◽

Inhibition Mechanism ◽

Whale Optimization ◽

Accuracy And Stability

The accuracy and stability of relative pose estimation of an autonomous underwater vehicle (AUV) and a target depend on whether the characteristics of the underwater image can be accurately and quickly extracted. In this paper, a whale optimization algorithm (WOA) based on lateral inhibition (LI) is proposed to solve the image matching and vision-guided AUV docking problem. The proposed method is named the LI-WOA. The WOA is motivated by the behavior of humpback whales, and it mainly imitates encircling prey, bubble-net attacking and searching for prey to obtain the globally optimal solution in the search space. The WOA not only balances exploration and exploitation but also has a faster convergence speed, higher calculation accuracy and stronger robustness than other approaches. The lateral inhibition mechanism can effectively perform image enhancement and image edge extraction to improve the accuracy and stability of image matching. The LI-WOA combines the optimization efficiency of the WOA and the matching accuracy of the LI mechanism to improve convergence accuracy and the correct matching rate. To verify its effectiveness and feasibility, the WOA is compared with other algorithms by maximizing the similarity between the original image and the template image. The experimental results show that the LI-WOA has a better average value, a higher correct rate, less execution time and stronger robustness than other algorithms. The LI-WOA is an effective and stable method for solving the image matching and vision-guided AUV docking problem.

Download Full-text

Solving the Real Power Limitations in the Dynamic Economic Dispatch of Large-Scale Thermal Power Units under the Effects of Valve-Point Loading and Ramp-Rate Limitations

Sustainability ◽

10.3390/su13031274 ◽

2021 ◽

Vol 13 (3) ◽

pp. 1274

Author(s):

Loau Al-Bahrani ◽

Mehdi Seyedmahmoudian ◽

Ben Horan ◽

Alex Stojcevski

Keyword(s):

Large Scale ◽

Thermal Power ◽

Optimization Technique ◽

Economic Dispatch ◽

Pso Algorithm ◽

Search Space ◽

Economic Benefits ◽

Optimization Techniques ◽

Ramp Rate ◽

Dynamic Economic Dispatch

Few non-traditional optimization techniques are applied to the dynamic economic dispatch (DED) of large-scale thermal power units (TPUs), e.g., 1000 TPUs, that consider the effects of valve-point loading with ramp-rate limitations. This is a complicated multiple mode problem. In this investigation, a novel optimization technique, namely, a multi-gradient particle swarm optimization (MG-PSO) algorithm with two stages for exploring and exploiting the search space area, is employed as an optimization tool. The M particles (explorers) in the first stage are used to explore new neighborhoods, whereas the M particles (exploiters) in the second stage are used to exploit the best neighborhood. The M particles’ negative gradient variation in both stages causes the equilibrium between the global and local search space capabilities. This algorithm’s authentication is demonstrated on five medium-scale to very large-scale power systems. The MG-PSO algorithm effectively reduces the difficulty of handling the large-scale DED problem, and simulation results confirm this algorithm’s suitability for such a complicated multi-objective problem at varying fitness performance measures and consistency. This algorithm is also applied to estimate the required generation in 24 h to meet load demand changes. This investigation provides useful technical references for economic dispatch operators to update their power system programs in order to achieve economic benefits.

Download Full-text

Optimum Distribution System Expansion Planning Incorporating DG Based on N-1 Criterion for Sustainable System

Sustainability ◽

10.3390/su13126708 ◽

2021 ◽

Vol 13 (12) ◽

pp. 6708

Author(s):

Hamza Mubarak ◽

Nurulafiqah Nadzirah Mansor ◽

Hazlie Mokhlis ◽

Mahazani Mohamad ◽

Hasmaini Mohamad ◽

...

Keyword(s):

Power Systems ◽

Distribution System ◽

Industrial Revolution ◽

Optimal Solution ◽

Search Space ◽

Planning Problem ◽

System Expansion ◽

Optimum Distribution ◽

Optimal Sizing ◽

Expansion Planning

Demand for continuous and reliable power supply has significantly increased, especially in this Industrial Revolution 4.0 era. In this regard, adequate planning of electrical power systems considering persistent load growth, increased integration of distributed generators (DGs), optimal system operation during N-1 contingencies, and compliance to the existing system constraints are paramount. However, these issues need to be parallelly addressed for optimum distribution system planning. Consequently, the planning optimization problem would become more complex due to the various technical and operational constraints as well as the enormous search space. To address these considerations, this paper proposes a strategy to obtain one optimal solution for the distribution system expansion planning by considering N-1 system contingencies for all branches and DG optimal sizing and placement as well as fluctuations in the load profiles. In this work, a hybrid firefly algorithm and particle swarm optimization (FA-PSO) was proposed to determine the optimal solution for the expansion planning problem. The validity of the proposed method was tested on IEEE 33- and 69-bus systems. The results show that incorporating DGs with optimal sizing and location minimizes the investment and power loss cost for the 33-bus system by 42.18% and 14.63%, respectively, and for the 69-system by 31.53% and 12%, respectively. In addition, comparative studies were done with a different model from the literature to verify the robustness of the proposed method.

Download Full-text

Temporal concatenation for Markov decision processes

Probability in the Engineering and Informational Sciences ◽

10.1017/s0269964821000206 ◽

2021 ◽

pp. 1-28

Author(s):

Ruiyang Song ◽

Kuang Xu

Keyword(s):

Markov Decision Processes ◽

Large Scale ◽

Optimal Solution ◽

Upper Bounds ◽

Black Box ◽

Decision Processes ◽

Optimal Solutions ◽

Wide Range ◽

Markov Decision ◽

Speed Up

We propose and analyze a temporal concatenation heuristic for solving large-scale finite-horizon Markov decision processes (MDP), which divides the MDP into smaller sub-problems along the time horizon and generates an overall solution by simply concatenating the optimal solutions from these sub-problems. As a “black box” architecture, temporal concatenation works with a wide range of existing MDP algorithms. Our main results characterize the regret of temporal concatenation compared to the optimal solution. We provide upper bounds for general MDP instances, as well as a family of MDP instances in which the upper bounds are shown to be tight. Together, our results demonstrate temporal concatenation's potential of substantial speed-up at the expense of some performance degradation.

Download Full-text

Fault diagnosis of industrial robot reducer by an extreme learning machine with a level-based learning swarm optimizer

Advances in Mechanical Engineering ◽

10.1177/16878140211019540 ◽

2021 ◽

Vol 13 (5) ◽

pp. 168781402110195

Author(s):

Jianwen Guo ◽

Xiaoyan Li ◽

Zhenpeng Lao ◽

Yandong Luo ◽

Jiapeng Wu ◽

...

Keyword(s):

Fault Diagnosis ◽

Extreme Learning Machine ◽

Large Scale ◽

Production Efficiency ◽

Industrial Robot ◽

Optimal Solution ◽

Industrial Robots ◽

Generalization Performance ◽

Gradient Descent Algorithm ◽

Learning Machine

Fault diagnosis is of great significance to improve the production efficiency and accuracy of industrial robots. Compared with the traditional gradient descent algorithm, the extreme learning machine (ELM) has the advantage of fast computing speed, but the input weights and the hidden node biases that are obtained at random affects the accuracy and generalization performance of ELM. However, the level-based learning swarm optimizer algorithm (LLSO) can quickly and effectively find the global optimal solution of large-scale problems, and can be used to solve the optimal combination of large-scale input weights and hidden biases in ELM. This paper proposes an extreme learning machine with a level-based learning swarm optimizer (LLSO-ELM) for fault diagnosis of industrial robot RV reducer. The model is tested by combining the attitude data of reducer gear under different fault modes. Compared with ELM, the experimental results show that this method has good stability and generalization performance.

Download Full-text

Sensor Location Problem for Network Traffic Flow Derivation Based on Turning Ratios at Intersection

Mathematical Problems in Engineering ◽

10.1155/2016/9012724 ◽

2016 ◽

Vol 2016 ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

Minhua Shao ◽

Lijun Sun ◽

Xianzhi Shao

Keyword(s):

Network Flow ◽

Location Problem ◽

Optimal Solution ◽

Coefficient Matrix ◽

Sensor Location ◽

The Road ◽

Minimum Number ◽

Flow Conservation ◽

The Greedy Algorithm ◽

Counting Points

The sensor location problem (SLP) discussed in this paper is to find the minimum number and optimum locations of the flow counting points in the road network so that the traffic flows over the whole network can be inferred uniquely. Flow conservation system at intersections is formulated firstly using the turning ratios as the prior information. Then the coefficient matrix of the flow conservation system is proved to be nonsingular. Based on that, the minimal number of counting points is determined to be the total number of exclusive incoming roads and dummy roads, which are added to the network to represent the trips generated on real roads. So the task of SLP model based on turning ratios is just to determine the optimal sensor locations. The following analysis in this paper shows that placing sensors on all the exclusive incoming roads and dummy roads can always generate a unique network flow vector for any network topology. After that, a detection set composed of only real roads is proven to exist from the view of feasibility in reality. Finally, considering the roads importance and cost of the sensors, a weighted SLP model is formulated to find the optimal detection set. The greedy algorithm is proven to be able to provide the optimal solution for the proposed weighted SLP model.

Download Full-text

Application of High-Performance Computing to Numerical Simulation of Human Movement

Journal of Biomechanical Engineering ◽

10.1115/1.2792264 ◽

1995 ◽

Vol 117 (1) ◽

pp. 155-157 ◽

Cited By ~ 29

Author(s):

F. C. Anderson ◽

J. M. Ziegler ◽

M. G. Pandy ◽

R. T. Whalen

Keyword(s):

Computer Architecture ◽

High Performance ◽

Large Scale ◽

Optimization Problems ◽

Optimal Solution ◽

Human Movement ◽

Parallel Machine ◽

Optimal Controls ◽

Processing Machine ◽

Vector Processing

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.

Download Full-text