A benchmark for evaluation of phylogeny reconstruction programs

10.7287/peerj.preprints.2628v1 ◽

2016 ◽

Author(s):

Sergei Spirin

Keyword(s):

Maximum Likelihood ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Relative Accuracy ◽

Phylogeny Reconstruction ◽

Multiple Sequence ◽

Natural Protein ◽

Large Sets ◽

The Moment

There are a lot of algorithms and programs for reconstruction of phylogeny of a set of proteins basing on multiple sequence alignment. Many programs allow users to choose a number of parameters, for example, a model for maximum likelihood programs. Different programs and different parameters often produce different results. However at the moment all published benchmarks for evaluation of relative accuracy of programs or different choices of parameters are based on simulated sequences. The aim of the present work is to create a benchmark that allows a comparison of phylogenetic programs on large sets of alignments of natural protein sequences.

Download Full-text

Computational Analysis of Therapeutic Enzyme Uricase from Different Source Organisms

Current Proteomics ◽

10.2174/1570164616666190617165107 ◽

2020 ◽

Vol 17 (1) ◽

pp. 59-77

Author(s):

Anand Kumar Nelapati ◽

JagadeeshBabu PonnanEttiyappan

Keyword(s):

Uric Acid ◽

Amino Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Amino Acid Sequences ◽

Amino Acid Residues ◽

Multiple Sequence ◽

Physiochemical Properties ◽

Pharmaceutical Industries

Background:Hyperuricemia and gout are the conditions, which is a response of accumulation of uric acid in the blood and urine. Uric acid is the product of purine metabolic pathway in humans. Uricase is a therapeutic enzyme that can enzymatically reduces the concentration of uric acid in serum and urine into more a soluble allantoin. Uricases are widely available in several sources like bacteria, fungi, yeast, plants and animals.Objective:The present study is aimed at elucidating the structure and physiochemical properties of uricase by insilico analysis.Methods:A total number of sixty amino acid sequences of uricase belongs to different sources were obtained from NCBI and different analysis like Multiple Sequence Alignment (MSA), homology search, phylogenetic relation, motif search, domain architecture and physiochemical properties including pI, EC, Ai, Ii, and were performed.Results:Multiple sequence alignment of all the selected protein sequences has exhibited distinct difference between bacterial, fungal, plant and animal sources based on the position-specific existence of conserved amino acid residues. The maximum homology of all the selected protein sequences is between 51-388. In singular category, homology is between 16-337 for bacterial uricase, 14-339 for fungal uricase, 12-317 for plants uricase, and 37-361 for animals uricase. The phylogenetic tree constructed based on the amino acid sequences disclosed clusters indicating that uricase is from different source. The physiochemical features revealed that the uricase amino acid residues are in between 300- 338 with a molecular weight as 33-39kDa and theoretical pI ranging from 4.95-8.88. The amino acid composition results showed that valine amino acid has a high average frequency of 8.79 percentage compared to different amino acids in all analyzed species.Conclusion:In the area of bioinformatics field, this work might be informative and a stepping-stone to other researchers to get an idea about the physicochemical features, evolutionary history and structural motifs of uricase that can be widely used in biotechnological and pharmaceutical industries. Therefore, the proposed in silico analysis can be considered for protein engineering work, as well as for gout therapy.

Download Full-text

Benchmarking Statistical Multiple Sequence Alignment

10.1101/304659 ◽

2018 ◽

Cited By ~ 1

Author(s):

Michael Nute ◽

Ehsan Saleh ◽

Tandy Warnow

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Structural Alignment ◽

Estimation Method ◽

Simulated Data ◽

Protein Sequences ◽

Data Sets ◽

Sequence Alignments ◽

Multiple Sequence ◽

Simulated Data Sets

AbstractThe estimation of multiple sequence alignments of protein sequences is a basic step in many bioinformatics pipelines, including protein structure prediction, protein family identification, and phylogeny estimation. Statistical co-estimation of alignments and trees under stochastic models of sequence evolution has long been considered the most rigorous technique for estimating alignments and trees, but little is known about the accuracy of such methods on biological benchmarks. We report the results of an extensive study evaluating the most popular protein alignment methods as well as the statistical co-estimation method BAli-Phy on 1192 protein data sets from established benchmarks as well as on 120 simulated data sets. Our study (which used more than 230 CPU years for the BAli-Phy analyses alone) shows that BAli-Phy is dramatically more accurate than the other alignment methods on the simulated data sets, but is among the least accurate on the biological benchmarks. There are several potential causes for this discordance, including model misspecification, errors in the reference alignments, and conflicts between structural alignment and evolutionary alignments; future research is needed to understand the most likely explanation for our observations. multiple sequence alignment, BAli-Phy, protein sequences, structural alignment, homology

Download Full-text

Influence of Parameters in Multiple Sequence Alignment Methods for Protein Sequences

Advances in Intelligent Systems and Computing - Progress in Computing, Analytics and Networking ◽

10.1007/978-981-10-7871-2_18 ◽

2018 ◽

pp. 183-191

Author(s):

P. Manikandan ◽

D. Ramyachitra

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Multiple Sequence

Download Full-text

SequenceBouncer: A method to remove outlier entries from a multiple sequence alignment

10.1101/2020.11.24.395459 ◽

2020 ◽

Author(s):

Cory D. Dunn

Keyword(s):

Nucleic Acid ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Phylogenetic Analyses ◽

Protein Sequences ◽

Mitochondrial Genomes ◽

Dna Barcodes ◽

Sequence Alignments ◽

Multiple Sequence ◽

Multiple Sequence Alignments

AbstractPhylogenetic analyses can take advantage of multiple sequence alignments as input. These alignments typically consist of homologous nucleic acid or protein sequences, and the inclusion of outlier or aberrant sequences can compromise downstream analyses. Here, I describe a program, SequenceBouncer, that uses the Shannon entropy values of alignment columns to identify outlier alignment sequences in a manner responsive to overall alignment context. I demonstrate the utility of this software using alignments of available mammalian mitochondrial genomes, bird cytochrome c oxidase-derived DNA barcodes, and COVID-19 sequences.

Download Full-text

In Silico Characterization of Histidine Acid Phytase Sequences

Enzyme Research ◽

10.1155/2012/845465 ◽

2012 ◽

Vol 2012 ◽

pp. 1-8 ◽

Cited By ~ 11

Author(s):

Vinod Kumar ◽

Gopal Singh ◽

A. K. Verma ◽

Sanjeev Agrawal

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

In Silico ◽

Animal Feed ◽

Protein Sequences ◽

Homology Search ◽

Protein Database ◽

Multiple Sequence ◽

Conserved Sequence ◽

Signature Sequence

Histidine acid phytases (HAPhy) are widely distributed enzymes among bacteria, fungi, plants, and some animal tissues. They have a significant role as an animal feed enzyme and in the solubilization of insoluble phosphates and minerals present in the form of phytic acid complex. A set of 50 reference protein sequences representing HAPhy were retrieved from NCBI protein database and characterized for various biochemical properties, multiple sequence alignment (MSA), homology search, phylogenetic analysis, motifs, and superfamily search. MSA using MEGA5 revealed the presence of conserved sequences at N-terminal “RHGXRXP” and C-terminal “HD.” Phylogenetic tree analysis indicates the presence of three clusters representing different HAPhy, that is, PhyA, PhyB, and AppA. Analysis of 10 commonly distributed motifs in the sequences indicates the presence of signature sequence for each class. Motif 1 “SPFCDLFTHEEWIQYDYLQSLGKYYGYGAGNPLGPAQGIGF” was present in 38 protein sequences representing clusters 1 (PhyA) and 2 (PhyB). Cluster 3 (AppA) contains motif 9 “KKGCPQSGQVAIIADVDERTRKTGEAFAAGLAPDCAITVHTQADTSSPDP” as a signature sequence. All sequences belong to histidine acid phosphatase family as resulted from superfamily search. No conserved sequence representing 3- or 6-phytase could be identified using multiple sequence alignment. This in silico analysis might contribute in the classification and future genetic engineering of this most diverse class of phytase.

Download Full-text

Sequence similarity search, Multiple Sequence Alignment, Model Selection, Distance Matrix and Phylogeny Reconstruction

Protocol Exchange ◽

10.1038/protex.2013.065 ◽

2013 ◽

Cited By ~ 12

Author(s):

Felix Bast ◽

Felix Bast

Keyword(s):

Model Selection ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Similarity Search ◽

Sequence Similarity ◽

Distance Matrix ◽

Phylogeny Reconstruction ◽

Sequence Similarity Search ◽

Multiple Sequence ◽

Alignment Model

Download Full-text

DNA^+Pro^: an Improved Progressive Multiple Sequence Alignment Algorithm for Evolutionary Analysis Using Combined DNA-Protein Sequences

Nature Precedings ◽

10.1038/npre.2010.4898.1 ◽

2010 ◽

Author(s):

Xiaolong Wang ◽

Shuang-yong Xu ◽

Deming Gou

Keyword(s):

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Alignment Algorithm ◽

Evolutionary Analysis ◽

Multiple Sequence ◽

Sequence Alignment Algorithm ◽

Progressive Multiple Sequence Alignment

Download Full-text

In Silico Characterization of Pectate Lyase Protein Sequences from Different Source Organisms

Enzyme Research ◽

10.4061/2010/950230 ◽

2010 ◽

Vol 2010 ◽

pp. 1-11 ◽

Cited By ~ 12

Author(s):

Amit Kumar Dubey ◽

Sangeeta Yadav ◽

Manish Kumar ◽

Vinay Kumar Singh ◽

Bijaya Ketan Sarangi ◽

...

Keyword(s):

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Pectate Lyase ◽

Homology Search ◽

Motif Analysis ◽

Degenerate Primers ◽

Multiple Sequence ◽

Pectate Lyases

A total of 121 protein sequences of pectate lyases were subjected to homology search, multiple sequence alignment, phylogenetic tree construction, and motif analysis. The phylogenetic tree constructed revealed different clusters based on different source organisms representing bacterial, fungal, plant, and nematode pectate lyases. The multiple accessions of bacterial, fungal, nematode, and plant pectate lyase protein sequences were placed closely revealing a sequence level similarity. The multiple sequence alignment of these pectate lyase protein sequences from different source organisms showed conserved regions at different stretches with maximum homology from amino acid residues 439–467, 715–816, and 829–910 which could be used for designing degenerate primers or probes specific for pectate lyases. The motif analysis revealed a conserved Pec_Lyase_C domain uniformly observed in all pectate lyases irrespective of variable sources suggesting its possible role in structural and enzymatic functions.

Download Full-text

A NEW GENETIC ALGORITHM FOR MULTIPLE SEQUENCE ALIGNMENT

International Journal of Computational Intelligence and Applications ◽

10.1142/s146902681250023x ◽

2012 ◽

Vol 11 (04) ◽

pp. 1250023 ◽

Cited By ~ 3

Author(s):

ZAHRA NARIMANI ◽

HAMID BEIGY ◽

HASSAN ABOLHASSANI

Keyword(s):

Genetic Algorithm ◽

Genetic Algorithms ◽

Computational Complexity ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Protein Sequences ◽

Approximate Solutions ◽

Heuristic Methods ◽

Multiple Sequence ◽

Mutation Operators

Multiple sequence alignment (MSA) is one of the basic and important problems in molecular biology. MSA can be used for different purposes including finding the conserved motifs and structurally important regions in protein sequences and determine evolutionary distance between sequences. Aligning several sequences cannot be done in polynomial time and therefore heuristic methods such as genetic algorithms can be used to find approximate solutions of MSA problems. Several algorithms based on genetic algorithms have been developed for this problem in recent years. Most of these algorithms use very complicated, problem specific and time consuming mutation operators. In this paper, we propose a new algorithm that uses a new way of population initialization and simple mutation and recombination operators. The strength of the proposed GA is using simple mutation operators and also a special recombination operator that does not have problems of similar recombination operators in other GAs. The experimental results show that the proposed algorithm is capable of finding good MSAs in contrast to existing methods, while it uses simple operators with low computational complexity.

Download Full-text