scholarly journals Multi-objective formulation of MSA for phylogeny estimation

2018 ◽  
Author(s):  
Muhammad Ali Nayeem ◽  
Md. Shamsuzzoha Bayzid ◽  
Atif Hasan Rahman ◽  
Rifat Shahriyar ◽  
M. Sohel Rahman

AbstractMultiple sequence alignment (MSA) is a basic step in many analyses in computational biology, including predicting the structure and function of proteins, orthology prediction and estimating phylogenies. The objective of MSA is to infer the homology among the sequences of chosen species. Commonly, the MSAs are inferred by optimizing a single function or objective. The alignments estimated under one criterion may be different to the alignments generated by other criteria, inferring discordant homologies and thus leading to different evolutionary histories relating the sequences. In recent past, researchers have advocated for the multi-objective formulation of MSA, to address this issue, where multiple conflicting objective functions are being optimized simultaneously to generate a set of alignments. However, no theoretical or empirical justification with respect to a real-life application has been shown for a particular multi-objective formulation. In this study, we investigate the impact of multi-objective formulation in the context of phylogenetic tree estimation. Employing multi-objective metaheuristics, we demonstrate that trees estimated on the alignments generated by multi-objective formulation are substantially better than the trees estimated by the state-of-the-art MSA tools, including PASTA, MUSCLE, CLUSTAL, MAFFT etc. We also demonstrate that highly accurate alignments with respect to popular measures like sum-of-pair (SP) score and total-column (TC) score do not necessarily lead to highly accurate phylogenetic trees. Thus in essence we ask the question whether a phylogeny-aware metric can guide us in choosing appropriate multi-objective formulations that can result in better phylogeny estimation. And we answer the question affirmatively through carefully designed extensive empirical study. As a by-product we also suggest a methodology for primary selection of a set of objective functions for a multi-objective formulation based on the association with the resulting phylogenetic tree.

2018 ◽  
Vol 20 (4) ◽  
pp. 864-885 ◽  
Author(s):  
Younggu Her ◽  
Chounghyun Seong

Abstract Multi-objective calibration can help identify parameter sets that represent a hydrological system and enable further constraining of the parameter space. Multi-objective calibration is expected to be more frequently utilized, along with the advances in optimization algorithms and computing resources. However, the impact of the number of objective functions on modeling outputs is still unclear, and the adequate number of objective functions remains an open question. We investigated the responses of model performance, equifinality, and uncertainty to the number of objective functions incorporated in a hierarchical and sequential manner in parameter calibration. The Hydrological Simulation Program – FORTRAN (HSPF) models that were prepared for bacteria total maximum daily load (TMDL) development served as a mathematical representation to simulate the hydrological processes of three watersheds located in Virginia, and the Expert System for Calibration of HSPF (HSPEXP) statistics were employed as objective functions in parameter calibration experiments. Results showed that the amount of equifinality and output uncertainty overall decreased while the model performance was maintained as the number of objective functions increased sequentially. However, there was no further significant improvement in the equifinality and uncertainty when including more than four objective functions. This study demonstrated that the introduction of an adequate number of objective functions could improve the quality of calibration without requiring additional observations.


Author(s):  
Abolfazl Seifi ◽  
Reza Hassannejad ◽  
Mohammad Ali Hamed

In this study, a new method to improve ride comfort, vehicle handling, and workspace was presented in multi-objective optimization using nonlinear asymmetrical dampers. The main aim of this research was to provide suitable passive suspension based on more efficiency and the low cost of the mentioned dampers. Using the model with five degrees of freedom, suspension system parameters were optimized under sinusoidal road excitation. The main functions of the suspension system were chosen as objective functions. In order to better illustrate the impact of each objective functions on the suspension parameters, at first two-objective and finally five-objective were considered in the optimization problem. The obtained results indicated that the optimized viscous coefficients for five-objective optimization lead to 3.58% increase in ride comfort, 0.74% in vehicle handling ability, and 2.20% in workspace changes for the average of forward and rear suspension.


2021 ◽  
Author(s):  
Belen Escobari ◽  
Thomas Borsch ◽  
Taylor S. Quedensley ◽  
Michael Gruenstaeudl

ABSTRACTPREMISEThe genus Gynoxys and relatives form a species-rich lineage of Andean shrubs and trees with low genetic distances within the sunflower subtribe Tussilaginineae. Previous molecular phylogenetic investigations of the Tussilaginineae have included few, if any, representatives of this Gynoxoid group or reconstructed ambiguous patterns of relationships for it.METHODSWe sequenced complete plastid genomes of 21 species of the Gynoxoid group and related Tussilaginineae and conducted detailed comparisons of the phylogenetic relationships supported by the gene, intron, and intergenic spacer partitions of these genomes. We also evaluated the impact of manual, motif-based adjustments of automatic DNA sequence alignments on phylogenetic tree inference.RESULTSOur results indicate that the inclusion of all plastid genome partitions is needed to infer fully resolved phylogenetic trees of the Gynoxoid group. Whole plastome-based tree inference suggests that the genera Gynoxys and Nordenstamia are polyphyletic and form the core clade of the Gynoxoid group. This clade is sister to a clade of Aequatorium and Paragynoxys and also includes some but not all representatives of Paracalia.CONCLUSIONSThe concatenation and combined analysis of all plastid genome partitions and the construction of manually curated, motif-based DNA sequence alignments are found to be instrumental in the recovery of strongly supported relationships of the Gynoxoid group. We demonstrate that the correct assessment of homology in genome-level plastid sequence datasets is crucial for subsequent phylogeny reconstruction and that the manual post-processing of multiple sequence alignments improves the reliability of such reconstructions amid low genetic distances between taxa.


2021 ◽  
Author(s):  
Yassmine Soussi ◽  
Nizar Rokbani ◽  
Ali Wali ◽  
Adel Alimi

In this paper a new technique is integrated to Multi-Objective Particle Swarm Optimization (MOPSO) algorithm, named Pareto Neighborhood (PN) topology, to produce MOPSO-PN algorithm. This technique involves iteratively selecting a set of best solutions from the Pareto-Optimal-Fronts and trying to explore them in order to find better clustering results in the next iteration. MOPSO-PN was then used as a Multi?Objective Clustering Optimization (MOCO) Algorithm, it was tested on various datasets (real-life and artificial datasets). Two scenarios have been used to test the performances of MOPSO-PN for clustering: In the first scenario MOPSO-PN utilizes, as objective functions, two clusters validity index (Silhouette?Index and overall-cluster-deviation), three datasets for test, four algorithms for comparison and the average Minkowski Score as metric for evaluating the final clustering result; In the second scenario MOPSO-PN used, as objectives functions, three clusters validity index (I-index, Con-index and Sym?index), 20 datasets for test, ten algorithms for comparison and the F-Measure as metric for evaluating the final clustering result. In both scenarios, MOPSO-PN provided a competitive clustering results and a correct number of clusters for all datasets.


Author(s):  
Motomu Matsui ◽  
Wataru Iwasaki

Abstract A protein superfamily contains distantly related proteins that have acquired diverse biological functions through a long evolutionary history. Phylogenetic analysis of the early evolution of protein superfamilies is a key challenge because existing phylogenetic methods show poor performance when protein sequences are too diverged to construct an informative multiple sequence alignment (MSA). Here, we propose the Graph Splitting (GS) method, which rapidly reconstructs a protein superfamily-scale phylogenetic tree using a graph-based approach. Evolutionary simulation showed that the GS method can accurately reconstruct phylogenetic trees and be robust to major problems in phylogenetic estimation, such as biased taxon sampling, heterogeneous evolutionary rates, and long-branch attraction when sequences are substantially diverge. Its application to an empirical data set of the triosephosphate isomerase (TIM)-barrel superfamily suggests rapid evolution of protein-mediated pyrimidine biosynthesis, likely taking place after the RNA world. Furthermore, the GS method can also substantially improve performance of widely used MSA methods by providing accurate guide trees.


2021 ◽  
Author(s):  
Yassmine Soussi ◽  
Nizar Rokbani ◽  
Seyedali Mirjalili ◽  
Ali Wali ◽  
Adel Alimi

In this paper a new technique is integrated to Multi-Objective Particle Swarm Optimization (MOPSO) algorithm, named Pareto Neighborhood (PN) topology, to produce MOPSO-PN algorithm. This technique involves iteratively selecting a set of best solutions from the Pareto-Optimal-Fronts and trying to explore them in order to find better clustering results in the next iteration. MOPSO-PN was then used as a Multi?Objective Clustering Optimization (MOCO) Algorithm, it was tested on various datasets (real-life and artificial datasets). Two scenarios have been used to test the performances of MOPSO-PN for clustering: In the first scenario MOPSO-PN utilizes, as objective functions, two clusters validity index (Silhouette?Index and overall-cluster-deviation), three datasets for test, four algorithms for comparison and the average Minkowski Score as metric for evaluating the final clustering result; In the second scenario MOPSO-PN used, as objectives functions, three clusters validity index (I-index, Con-index and Sym?index), 20 datasets for test, ten algorithms for comparison and the F-Measure as metric for evaluating the final clustering result. In both scenarios, MOPSO-PN provided a competitive clustering results and a correct number of clusters for all datasets.


2020 ◽  
Author(s):  
yassmine Soussi ◽  
Nizar Rokbani ◽  
Ali Wali ◽  
Adel Alimi

This paper defines a new Moth-Flame optimization version with Quantum behaved moths, QMFO. The multi-objective version of QMFO (MOQMFO) is then applied to solve clustering problems. MOQMFO used three cluster validity criteria as objective functions (the I-index, Con-index and Sym-index) to establish the multi-objective clustering optimization. This paper details the proposal and the preliminary obtained results for clustering real-life datasets (including Iris, Cancer, Newthyroid, Wine, LiverDisorder and Glass) and artificial datasets (including Sph_5_2, Sph_4_3, Sph_6_2, Sph_10_2, Sph_9_2, Pat 1, Pat 2, Long 1, Sizes 5, Spiral, Square 1, Square 4, Twenty and Fourty). Compared with key multi-objectives clustering techniques, the proposal showed interesting results essentially for Iris, Newthyroid, Wine, LiverDisorder, Sph_4_3, Sph_6_2, Long 1, Sizes 5, Twenty and Fourty; and was able to provide the exact number of clusters for all datasets.


2016 ◽  
Author(s):  
Rhishikesh Bargaje ◽  
M.Milner Kumar ◽  
Sohan Prabhakar Modak

AbstractBackgroundMost molecular phylogenetic trees depict the relative closeness or the extent of similarity among a set of taxa based on comparison of sequences of homologous genes or proteins. Since the tree topology for individual monogenic traits varies among the same set of organisms and does not overlap taxonomic hierarchy, hence there is a need to generate multidimensional phylogenetic trees.ResultsPhylogenetic trees were constructed for 119 prokaryotes representing 2 phyla under Archaea and 11 phyla under Bacteria after comparing multiple sequence alignments for 15 different aminoacyl-tRNA synthetase polypeptides. The topology of Neighbor Joining (NJ) trees for individual tRNA synthetase polypeptides varied substantially. We use Euclidean geometry to estimate all-pairs distances in order to construct phylogenetic trees. Further, we used a novel “Taxonomic fidelity” algorithm to estimate clade by clade similarity between the phylogenetic tree and the taxonomic tree. We find that, as compared to trees for individual tRNA synthetase polypeptides and rDNA sequences, the topology of our Euclidean tree and that for aligned and concatenated sequences of 15 proteins are closer to the taxonomic trees and offer the best consensus. We have also aligned sequences after concatenation, and find that by changing the order of sequence joining prior to alignment, the tree topologies vary. In contrast, changing the types of polypeptides in the grouping for Euclidean trees does not affect the tree topologies.ConclusionsWe show that a consensus phylogenetic tree of 15 polypeptides from 14 aminoacyl-tRNA synthetases for 119 prokaryotes using Euclidean geometry exhibits better taxonomic fidelity than trees for individual tRNA synthetase polypeptides as well as 16S rDNA. We have also examined Euclidean N-dimensional trees for 15 tRNA synthetase polypeptides which give the same topology as that constructed after amalgamating 3-dimensional Euclidean trees for groups of 3 polypeptides. Euclidean N-dimensional trees offer a reliable future to multi-genic molecular phylogenetics.


Genes ◽  
2019 ◽  
Vol 10 (2) ◽  
pp. 73 ◽  
Author(s):  
Yongyong Kang ◽  
Xiaofei Yang ◽  
Jiadong Lin ◽  
Kai Ye

Phylogenetic tree is essential to understand evolution and it is usually constructed through multiple sequence alignment, which suffers from heavy computational burdens and requires sophisticated parameter tuning. Recently, alignment free methods based on k-mer profiles or common substrings provide alternative ways to construct phylogenetic trees. However, most of these methods ignore the global similarities between sequences or some specific valuable features, e.g., frequent patterns overall datasets. To make further improvement, we propose an alignment free algorithm based on sequential pattern mining, where each sequence is converted into a binary representation of sequential patterns among sequences. The phylogenetic tree is further constructed via clustering distance matrix which is calculated from pattern vectors. To increase accuracy for highly divergent sequences, we consider pattern weight and filtering redundancy sub-patterns. Both simulated and real data demonstrates our method outperform other alignment free methods, especially for large sequence set with low similarity.


2020 ◽  
Author(s):  
yassmine Soussi ◽  
Nizar Rokbani ◽  
Ali Wali ◽  
Adel Alimi

This paper defines a new Moth-Flame optimization version with Quantum behaved moths, QMFO. The multi-objective version of QMFO (MOQMFO) is then applied to solve clustering problems. MOQMFO used three cluster validity criteria as objective functions (the I-index, Con-index and Sym-index) to establish the multi-objective clustering optimization. This paper details the proposal and the preliminary obtained results for clustering real-life datasets (including Iris, Cancer, Newthyroid, Wine, LiverDisorder and Glass) and artificial datasets (including Sph_5_2, Sph_4_3, Sph_6_2, Sph_10_2, Sph_9_2, Pat 1, Pat 2, Long 1, Sizes 5, Spiral, Square 1, Square 4, Twenty and Fourty). Compared with key multi-objectives clustering techniques, the proposal showed interesting results essentially for Iris, Newthyroid, Wine, LiverDisorder, Sph_4_3, Sph_6_2, Long 1, Sizes 5, Twenty and Fourty; and was able to provide the exact number of clusters for all datasets.


Sign in / Sign up

Export Citation Format

Share Document