An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

AbstractPhylogenetic analysis i.e. construction of an accurate phylogenetic tree from genomic sequences of a set of species is one of the main challenges in bioinformatics. The popular approaches to this require aligning each pair of sequences to calculate pairwise distances or aligning all the sequences to construct a multiple sequence alignment. The computational complexity and difficulties in getting accurate alignments have led to development of alignment-free methods to estimate phylogenies. However, the alignment free approaches focus on computing distances between species and do not utilize statistical approaches for phylogeny estimation. Herein, we present a simple alignment free method for phylogeny construction based on contiguous sub-sequences of length k termed k-mers. The presence or absence of these k-mers are used to construct a phylogeny using a maximum likelihood approach. The results suggest our method is competitive with other alignment-free approaches, while outperforming them in some cases.

Download Full-text

Joint Alignment and Tree Inference

10.1101/2021.09.28.462230 ◽

2021 ◽

Author(s):

Jūlija Pečerska ◽

Manuel Gil ◽

Maria Anisimova

Keyword(s):

Computational Complexity ◽

Maximum Likelihood ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Combinatorial Optimisation ◽

Simultaneous Inference ◽

Inference Process ◽

Multiple Sequence ◽

Tree Inference

Multiple sequence alignment and phylogenetic tree inference are connected problems that are often solved as independent steps in the inference process. Several attempts at doing simultaneous inference have been made, however currently the available methods are greatly limited by their computational complexity and can only handle small datasets. In this manuscript we introduce a combinatorial optimisation approach that will allow us to resolve the circularity of the problem and efficiently infer both alignments and trees under maximum likelihood.

Download Full-text

LegumeDB: Development of Legume Medicinal Plant Database and Comparative Molecular Evolutionary Analysis of matK Proteins of Legumes and Mangroves

Current Nutrition & Food Science ◽

10.2174/1573401314666180223143523 ◽

2019 ◽

Vol 15 (4) ◽

pp. 353-362

Author(s):

Sambhaji B. Thakar ◽

Maruti J. Dhanavade ◽

Kailas D. Sonawane

Keyword(s):

Phylogenetic Analysis ◽

Medicinal Plants ◽

Homology Modeling ◽

Sequence Alignment ◽

Vigna Unguiculata ◽

Multiple Sequence Alignment ◽

Legume Species ◽

Mangrove Species ◽

Multiple Sequence ◽

Thespesia Populnea

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in

Download Full-text

SHOOT: phylogenetic gene search and ortholog inference

10.1101/2021.09.01.458564 ◽

2021 ◽

Author(s):

David Emms ◽

Steven Kelly

Keyword(s):

Phylogenetic Analysis ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Trees ◽

Query Sequence ◽

Gene Tree ◽

Biological Research ◽

Gene Sequences ◽

Multiple Sequence ◽

Gene Search

Determining the evolutionary relationships between gene sequences is fundamental to comparative biological research. However, conducting such analyses requires a high degree of technical proficiency in several computational tools including gene family construction, multiple sequence alignment, and phylogenetic inference. Here we present SHOOT, an easy to use phylogenetic search engine for fast and accurate phylogenetic analysis of biological sequences. SHOOT searches a user-provided query sequence against a database of phylogenetic trees of gene sequences (gene trees) and returns a gene tree with the given query sequence correctly grafted within it. We show that SHOOT can perform this search and placement with comparable speed to a conventional BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional multiple sequence alignment and maximum likelihood tree inference approaches. We further show that SHOOT can be used to identify orthologs with equivalent accuracy to conventional orthology inference methods. In summary, SHOOT is an accurate and fast tool for complete phylogenetic analysis of novel query sequences. An easy to use webserver is available online at www.shoot.bio.

Download Full-text

Multiple Sequence Alignment Reveals Diversity among Eight African Bush Mango (Irvingia gabonensis Aubry-Lecomte ex O’Rorke) Cultivars

Journal of Experimental Agriculture International ◽

10.9734/jeai/2021/v43i130635 ◽

2021 ◽

pp. 91-96

Author(s):

U. G. Adebo ◽

J. O. Matthew

Keyword(s):

Sequence Analysis ◽

Phylogenetic Tree ◽

Genetic Resources ◽

Data Base ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Bush Mango ◽

Irvingia Gabonensis ◽

Cluster 2

Multiple sequence analysis is one of the most widely used model in estimating similarity among genotypes. In a bid to access useful information for the utilization of bush mango genetic resources, nucleotide sequences of eight bush mango (Irvingia gabonensis) cultivars were sourced for and retrieved form NCBI data base, and evaluated for diversity and similarity using computational biology approach. The highest alignment score (26.18), depicting the highest similarity, was between two pairs of sequence combinations; BM07:BM58 and BM12:BM69 respectively, while the least score (19.43) was between BM01: BM13. The phylogenetic tree broadly divided the cultivars into four distinct groups; BM07, BM58 (cluster one), BM01 (cluster 2), BM15, BM13 and BM35 (cluster 3), and BM12, BM69 (cluster 4), while the sequences obtained from the analysis revealed only few fully conserved regions, with the single nucleotides A, and T, which were consistent throughout the evolution. Results obtained from this study indicate that the bush mango cultivars are divergent and can be useful genetic resources for bush mango improvement through breeding.

Download Full-text

TCS: A New Multiple Sequence Alignment Reliability Measure to Estimate Alignment Accuracy and Improve Phylogenetic Tree Reconstruction

Molecular Biology and Evolution ◽

10.1093/molbev/msu117 ◽

2014 ◽

Vol 31 (6) ◽

pp. 1625-1637 ◽

Cited By ~ 113

Author(s):

Jia-Ming Chang ◽

Paolo Di Tommaso ◽

Cedric Notredame

Keyword(s):

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Alignment Accuracy ◽

Tree Reconstruction ◽

Multiple Sequence ◽

Reliability Measure ◽

Phylogenetic Tree Reconstruction

Download Full-text

Multiple Sequence Alignment and Phylogenetic Tree Construction of Viral Protein 2 of Bluetongue virus

International Journal of Bioinformatics and Biological Science ◽

10.30954/2319-5169.01.2018.6 ◽

2018 ◽

Vol 6 (1) ◽

Keyword(s):

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Viral Protein ◽

Bluetongue Virus ◽

Multiple Sequence ◽

Phylogenetic Tree Construction ◽

Tree Construction

Download Full-text

A benchmark for evaluation of phylogeny reconstruction programs

10.7287/peerj.preprints.2628v1 ◽

2016 ◽

Author(s):

Sergei Spirin

Keyword(s):

Maximum Likelihood ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Relative Accuracy ◽

Phylogeny Reconstruction ◽

Multiple Sequence ◽

Natural Protein ◽

Large Sets ◽

The Moment

There are a lot of algorithms and programs for reconstruction of phylogeny of a set of proteins basing on multiple sequence alignment. Many programs allow users to choose a number of parameters, for example, a model for maximum likelihood programs. Different programs and different parameters often produce different results. However at the moment all published benchmarks for evaluation of relative accuracy of programs or different choices of parameters are based on simulated sequences. The aim of the present work is to create a benchmark that allows a comparison of phylogenetic programs on large sets of alignments of natural protein sequences.

Download Full-text

HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing

Algorithms for Molecular Biology ◽

10.1186/s13015-017-0116-x ◽

2017 ◽

Vol 12 (1) ◽

Cited By ~ 15

Author(s):

Shixiang Wan ◽

Quan Zou

Keyword(s):

Parallel Computing ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Tree Reconstruction ◽

Multiple Sequence ◽

Phylogenetic Tree Reconstruction ◽

Distributed And Parallel Computing

Download Full-text

Implementing Hierarchical Clustering Method for Multiple Sequence Alignment and Phylogenetic Tree Construction

International Journal of Computer Science Engineering and Information Technology ◽

10.5121/ijcseit.2013.3101 ◽

2013 ◽

Vol 3 (1) ◽

pp. 1-12

Author(s):

Harmandeep Singh ◽

Er. Rajbir Singh ◽

Navjot Kaur

Keyword(s):

Phylogenetic Tree ◽

Hierarchical Clustering ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Clustering Method ◽

Multiple Sequence ◽

Phylogenetic Tree Construction ◽

Tree Construction

Download Full-text

A Simple Genetic Algorithm for Optimizing Multiple Sequence Alignment on the Spread of the SARS Epidemic

The Open Bioinformatics Journal ◽

10.2174/1875036201912010030 ◽

2019 ◽

Vol 12 (1) ◽

pp. 30-39

Author(s):

Siti Amiroch ◽

M. Syaiful Pradana ◽

M. Isa Irawan ◽

Imam Mukhlash

Keyword(s):

Genetic Algorithms ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Dna Sequences ◽

Multiple Sequence Alignment ◽

Network System ◽

Multiple Sequence ◽

Network Analyses ◽

Multiple Alignments ◽

Mutation Region

Background:Multiple sequence alignment is a method of getting genomic relationships between 3 sequences or more. In multiple alignments, there are 3 mutation network analyses, namely topological network system, mutation region network and network system of mutation mode. In general, the three analyses show stable and unstable regions that map mutation regions. This area of mutation is described further in a phylogenetic tree which simultaneously illustrates the path of the spread of an epidemic, the Severe Acute Respiratory Syndrome (SARS) epidemic. The process of spreading the SARS viruses, in this case, is described as the process of phylogenetic tree formation, and as a novelty of this research, multiple alignments in the process are analyzed in detail and then optimized with genetic algorithms.Methods:The data used to form the phylogenetic tree for the spread of the SARS epidemic are 14 DNA sequences which are then optimized by using genetic algorithms. The phylogenetic tree is constructed by using the neighbor-joining algorithm with a distance matrix that the intended distance is the genetic distance obtained from sequence alignment by using the Needleman Wunsch Algorithm.Results & Conclusion:The results of the analysis obtained 3649 stable areas and 19 unstable areas. The results of phylogenetic tree from the network system analysis indicated that the spread of the SARS epidemic extended from Guangzhou 16/12/02 to Zhongshan 27/12/02, then spread simultaneously to Guangzhou 18/02/03 and Guangzhou hospital. After that, the virus reached Metropole, Zhongshan, Hongkong, Singapore, Taiwan, Hong kong, and Hanoi which then continued to Guangzhou 01/01/03 and Toronto at once. The results of the mutation region network system demonstrate decomposition of orthogonal mutations in the 1st order arc.

Download Full-text