Mix'n'Match: an improved multiple sequence alignment procedure for distantly related proteins using secondary structure predictions, designed to be independent of the choice of gap penalty and scoring matrix

The way for performing multiple sequence alignment is based on the criterion of the maximum-scored information content computed from a weight matrix, but it is possible to have two or more alignments to have the same highest score leading to ambiguities in selecting the best alignment. This paper addresses this issue by introducing the concept of joint weight matrix to eliminate the randomness in selecting the best multiple sequence alignment. Alignments with equal scores are iteratively rescored with the joint weight matrix of increasing level (nucleotide pairs, triplets, and so on) until one single best alignment is eventually found. This method for resolving ambiguity in multiple sequence alignment can be easily implemented by use of the improved scoring matrix.

Download Full-text

Predicting Consensus Structures for RNA Alignments via Pseudo-Energy Minimization

Bioinformatics and Biology Insights ◽

10.4137/bbi.s2578 ◽

2009 ◽

Vol 3 ◽

pp. BBI.S2578 ◽

Cited By ~ 8

Author(s):

Junilda Spirollari ◽

Jason T.L. Wang ◽

Kaizhong Zhang ◽

Vivian Bellofatto ◽

Yongkyu Park ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Energy Minimization ◽

Secondary Structure Prediction ◽

Sequence Alignments ◽

Rna Sequences ◽

Multiple Sequence ◽

Consensus Secondary Structure

Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict .

Download Full-text