Multiple sequence alignment for phylogenetic purposes

I have addressed the biological rather than bioinformatics aspects of molecular sequence alignment by covering a series of topics that have been under-valued, particularly within the context of phylogenetic analysis. First, phylogenetic analysis is only one of the many objectives of sequence alignment, and the most appropriate multiple alignment may not be the same for all of these purposes. Phylogenetic alignment thus occupies a specific place within a broader context. Second, homology assessment plays an intricate role in phylogenetic analysis, with sequence alignment consisting of primary homology assessment and tree building being secondary homology assessment. The objective of phylogenetic alignment thus distinguishes it from other sorts of alignment. Third, I summarise what is known about the serious limitations of using phenetic similarity as a criterion for automated multiple alignment, and provide an overview of what is currently being done to improve these computerised procedures. This synthesises information that is apparently not widely known among phylogeneticists. Fourth, I then consider the recent development of automated procedures for combining alignment and tree building, thus integrating primary and secondary homology assessment. Finally, I outline various strategies for increasing the biological content of sequence alignment procedures, which consists of taking into account known evolutionary processes when making alignment decisions. These procedures can be objective and repeatable, and can involve computerised algorithms to automate much of the work. Perhaps the most important suggestion is that alignment should be seen as a process where new sequences are added to a pre-existing alignment that has been manually curated by the biologist.

Download Full-text

LegumeDB: Development of Legume Medicinal Plant Database and Comparative Molecular Evolutionary Analysis of matK Proteins of Legumes and Mangroves

Current Nutrition & Food Science ◽

10.2174/1573401314666180223143523 ◽

2019 ◽

Vol 15 (4) ◽

pp. 353-362

Author(s):

Sambhaji B. Thakar ◽

Maruti J. Dhanavade ◽

Kailas D. Sonawane

Keyword(s):

Phylogenetic Analysis ◽

Medicinal Plants ◽

Homology Modeling ◽

Sequence Alignment ◽

Vigna Unguiculata ◽

Multiple Sequence Alignment ◽

Legume Species ◽

Mangrove Species ◽

Multiple Sequence ◽

Thespesia Populnea

Background: Legume plants are known for their rich medicinal and nutritional values. Large amount of medicinal information of various legume plants have been dispersed in the form of text. Objective: It is essential to design and construct a legume medicinal plants database, which integrate respective classes of legumes and include knowledge regarding medicinal applications along with their protein/enzyme sequences. Methods: The design and development of Legume Medicinal Plants Database (LegumeDB) has been done by using Microsoft Structure Query Language Server 2017. DBMS was used as back end and ASP.Net was used to lay out front end operations. VB.Net was used as arranged program for coding. Multiple sequence alignment, phylogenetic analysis and homology modeling techniques were also used. Results: This database includes information of 50 Legume medicinal species, which might be helpful to explore the information for researchers. Further, maturase K (matK) protein sequences of legumes and mangroves were retrieved from NCBI for multiple sequence alignment and phylogenetic analysis to understand evolutionary lineage between legumes and mangroves. Homology modeling technique was used to determine three-dimensional structure of matK from Legume species i.e. Vigna unguiculata using matK of mangrove species, Thespesia populnea as a template. The matK sequence analysis results indicate the conserved residues among legume and mangrove species. Conclusion: Phylogenetic analysis revealed closeness between legume species Vigna unguiculata and mangrove species Thespesia populnea to each other, indicating their similarity and origin from common ancestor. Thus, these studies might be helpful to understand evolutionary relationship between legumes and mangroves. : LegumeDB availability: http://legumedatabase.co.in

Download Full-text

Molecular homology and multiple-sequence alignment: an analysis of concepts and practice

Australian Systematic Botany ◽

10.1071/sb15001 ◽

2015 ◽

Vol 28 (1) ◽

pp. 46 ◽

Cited By ~ 20

Author(s):

David A. Morrison ◽

Matthew J. Morgan ◽

Scot A. Kelchner

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Molecular Data ◽

Simple Relationship ◽

Sequence Alignments ◽

Multiple Sequence ◽

Molecular Change ◽

Nucleotide Homology ◽

Tree Building ◽

Molecular Homology

Sequence alignment is just as much a part of phylogenetics as is tree building, although it is often viewed solely as a necessary tool to construct trees. However, alignment for the purpose of phylogenetic inference is primarily about homology, as it is the procedure that expresses homology relationships among the characters, rather than the historical relationships of the taxa. Molecular homology is rather vaguely defined and understood, despite its importance in the molecular age. Indeed, homology has rarely been evaluated with respect to nucleotide sequence alignments, in spite of the fact that nucleotides are the only data that directly represent genotype. All other molecular data represent phenotype, just as do morphology and anatomy. Thus, efforts to improve sequence alignment for phylogenetic purposes should involve a more refined use of the homology concept at a molecular level. To this end, we present examples of molecular-data levels at which homology might be considered, and arrange them in a hierarchy. The concept that we propose has many levels, which link directly to the developmental and morphological components of homology. Of note, there is no simple relationship between gene homology and nucleotide homology. We also propose terminology with which to better describe and discuss molecular homology at these levels. Our over-arching conceptual framework is then used to shed light on the multitude of automated procedures that have been created for multiple-sequence alignment. Sequence alignment needs to be based on aligning homologous nucleotides, without necessary reference to homology at any other level of the hierarchy. In particular, inference of nucleotide homology involves deriving a plausible scenario for molecular change among the set of sequences. Our clarifications should allow the development of a procedure that specifically addresses homology, which is required when performing alignment for phylogenetic purposes, but which does not yet exist.

Download Full-text

SHOOT: phylogenetic gene search and ortholog inference

10.1101/2021.09.01.458564 ◽

2021 ◽

Author(s):

David Emms ◽

Steven Kelly

Keyword(s):

Phylogenetic Analysis ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Trees ◽

Query Sequence ◽

Gene Tree ◽

Biological Research ◽

Gene Sequences ◽

Multiple Sequence ◽

Gene Search

Determining the evolutionary relationships between gene sequences is fundamental to comparative biological research. However, conducting such analyses requires a high degree of technical proficiency in several computational tools including gene family construction, multiple sequence alignment, and phylogenetic inference. Here we present SHOOT, an easy to use phylogenetic search engine for fast and accurate phylogenetic analysis of biological sequences. SHOOT searches a user-provided query sequence against a database of phylogenetic trees of gene sequences (gene trees) and returns a gene tree with the given query sequence correctly grafted within it. We show that SHOOT can perform this search and placement with comparable speed to a conventional BLAST search. We demonstrate that SHOOT phylogenetic placements are as accurate as conventional multiple sequence alignment and maximum likelihood tree inference approaches. We further show that SHOOT can be used to identify orthologs with equivalent accuracy to conventional orthology inference methods. In summary, SHOOT is an accurate and fast tool for complete phylogenetic analysis of novel query sequences. An easy to use webserver is available online at www.shoot.bio.

Download Full-text

SuiteMSA: visual tools for multiple sequence alignment comparison and molecular sequence simulation

BMC Bioinformatics ◽

10.1186/1471-2105-12-184 ◽

2011 ◽

Vol 12 (1) ◽

pp. 184 ◽

Cited By ~ 12

Author(s):

Catherine L Anderson ◽

Cory L Strope ◽

Etsuko N Moriyama

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Molecular Sequence ◽

Sequence Simulation

Download Full-text

MULTIPLE SEQUENCE ALIGNMENT USING AN EXHAUSTIVE AND GREEDY ALGORITHM

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972000500103x ◽

2005 ◽

Vol 03 (02) ◽

pp. 243-255 ◽

Cited By ~ 1

Author(s):

YI WANG ◽

KUO-BIN LI

Keyword(s):

Greedy Algorithm ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Alignment ◽

Initial Alignment ◽

Progressive Alignment ◽

Multiple Sequence ◽

Java Programming ◽

Multiple Alignments ◽

Objective Score

We describe an exhaustive and greedy algorithm for improving the accuracy of multiple sequence alignment. A simple progressive alignment approach is employed to provide initial alignments. The initial alignment is then iteratively optimized against an objective function. For any working alignment, the optimization involves three operations: insertions, deletions and shuffles of gaps. The optimization is exhaustive since the algorithm applies the above operations to all eligible positions of an alignment. It is also greedy since only the operation that gives the best improving objective score will be accepted. The algorithms have been implemented in the EGMA (Exhaustive and Greedy Multiple Alignment) package using Java programming language, and have been evaluated using the BAliBASE benchmark alignment database. Although EGMA is not guaranteed to produce globally optimized alignment, the tests indicate that EGMA is able to build alignments with high quality consistently, compared with other commonly used iterative and non-iterative alignment programs. It is also useful for refining multiple alignments obtained by other methods.

Download Full-text

Procedurally Generated Artworks Based on Multiple Sequence Alignment of Orthologous Gene Copies

Leonardo ◽

10.1162/leon_a_01787 ◽

2019 ◽

pp. 1-11

Author(s):

Martin Calvino

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Orthologous Gene ◽

Orthologous Genes ◽

Evolutionary Processes ◽

Multiple Sequence ◽

Procedural Generation ◽

Novel Approach ◽

Gene Copies ◽

Nucleotide Divergence

Here the author presents a novel approach to the procedural generation of artwork series based on multiple sequence alignment of orthologous gene copies. In the strategy developed, nucleotides present in a string of DNA (A, G, C, T) were assigned each to an existing artwork. New visual compositions were then created by collaging columns of pixels from each of the existing four artworks according to the arrangement of nucleotides after orthologous genes were aligned. The resulting outcome was a distinctive set of artworks in which visual differences were governed by nucleotide divergence ought to evolutionary processes.

Download Full-text

Gotree/Goalign : Toolkit and Go API to facilitate the development of phylogenetic workflows

10.1101/2021.06.09.447704 ◽

2021 ◽

Author(s):

Frederic Lemoine ◽

Olivier Gascuel

Keyword(s):

Phylogenetic Analysis ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Analyses ◽

Bootstrap Support ◽

Complex Task ◽

Important Data ◽

Multiple Sequence ◽

Tree Comparison ◽

User Friendly

Besides computer intensive steps, phylogenetic analysis workflows are usually composed of many small, reccuring, but important data manipulations steps. Among these, we can find file reformatting, sequence renaming, tree re-rooting, tree comparison, bootstrap support computation, etc. These are often performed by custom scripts or by several heterogeneous tools, which may be error prone, uneasy to maintain and produce results that are challenging to reproduce. For all these reasons, the development and reuse of phylogenetic workflows is often a complex task. We identified many operations that are part of most phylogenetic analyses, and implemented them in a toolkit called Gotree/Goalign. The Gotree/Goalign toolkit implements more than 120 user-friendly commands and an API dedicated to multiple sequence alignment and phylogenetic tree manipulations. It is developed in Go, which makes executables efficient, easily installable, integrable in workflow environments, and parallelizable when possible. This toolkit is freely available on most platforms (Linux, MacOS and Windows) and most architectures (amd64, i386). Sources and binaries are available on GitHub at https://github.com/evolbioinfo/{gotree|goalign} , Bioconda, and DockerHub.

Download Full-text

VP2 Gene-Based Molecular Evolutionary Patterns of Major Circulating Bluetongue Virus Serotypes Isolated during 2014–2018 from Telangana and Andhra Pradesh States of India

Intervirology ◽

10.1159/000512131 ◽

2020 ◽

pp. 1-8

Author(s):

Ravali Thota ◽

Vishweshwar Kumar Ganji ◽

Sharanya Machanagari ◽

Narasimha Reddy Yella ◽

Bhagyalakshmi Buddala ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Bluetongue Virus ◽

Andhra Pradesh ◽

Geographical Location ◽

Effective Control ◽

Effective Vaccine ◽

Multiple Sequence ◽

Vp2 Gene

Introduction: Bluetongue disease is an economically important viral disease of livestock caused by bluetongue virus (BTV) having multiple serotypes. It belongs to the genus Orbivirus of family Reoviridae and subfamily Sedoreovirinae. The genome of BTV is 10 segmented dsRNA that codes for 7 structural and 4 nonstructural proteins, of which VP2 was reported to be serotype-specific and a major antigenic determinant. Objective: It is important to know the circulating serotypes in a particular geographical location for effective control of the disease. The present study unravels the molecular evolution of the circulating BTV serotypes during 2014–2018 in Telangana and Andhra Pradesh states of India. Methods: Multiple sequence alignment with available BTV serotypes in GenBank and phylogenetic analysis were performed for the partial VP2 sequences of major circulating BTV serotypes during the study period. Results: The multiple sequence alignment of circulating serotypes with respective reference isolates revealed variations in antigenic VP2. The phylogenetic analysis revealed that the major circulating serotypes were grouped into eastern topotypes (BTV-1, BTV-2, BTV-4, and BTV-16) and Western topotypes (BTV-5, BTV-12, and BTV-24). Conclusion: Our study strengthens the need for development of an effective vaccine, which can induce the immune response for a range of serotypes within and in between topotypes.

Download Full-text

Multiple sequence alignment and phylogenetic analysis of wheat pathogens using conserved genes for identification and development of diagnostic markers

Cereal Research Communications ◽

10.1007/s42976-021-00193-7 ◽

2021 ◽

Author(s):

Sangeeta Gupta ◽

Rashmi Aggarwal ◽

Sapna Sharma ◽

Malkhan S. Gurjar ◽

Bishnu M. Bashyal ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Diagnostic Markers ◽

Multiple Sequence ◽

Conserved Genes

Download Full-text

An Alignment-free Method for Phylogeny Estimation using Maximum Likelihood

10.1101/2019.12.13.875526 ◽

2019 ◽

Author(s):

Tasfia Zahin ◽

Md. Hasin Abrar ◽

Mizanur Rahman ◽

Tahrina Tasnim ◽

Md. Shamsuzzoha Bayzid ◽

...

Keyword(s):

Phylogenetic Analysis ◽

Computational Complexity ◽

Maximum Likelihood ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Multiple Sequence ◽

Alignment Free ◽

Phylogeny Estimation ◽

Statistical Approaches

AbstractPhylogenetic analysis i.e. construction of an accurate phylogenetic tree from genomic sequences of a set of species is one of the main challenges in bioinformatics. The popular approaches to this require aligning each pair of sequences to calculate pairwise distances or aligning all the sequences to construct a multiple sequence alignment. The computational complexity and difficulties in getting accurate alignments have led to development of alignment-free methods to estimate phylogenies. However, the alignment free approaches focus on computing distances between species and do not utilize statistical approaches for phylogeny estimation. Herein, we present a simple alignment free method for phylogeny construction based on contiguous sub-sequences of length k termed k-mers. The presence or absence of these k-mers are used to construct a phylogeny using a maximum likelihood approach. The results suggest our method is competitive with other alignment-free approaches, while outperforming them in some cases.

Download Full-text