T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension

Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to estimate a correct alignment. Hence, increasing or identifying signal inside sequence alignment has intensified over the last few years. During the presentation, I would like to share two approaches, homology extension and sampling, on this topic. The first part, transmembrane proteins (TMPs) constitute about 20~30% of all protein coding genes. The relative lack of experimental structure has so far made it hard to develop specific alignment methods and the current state of the art (PRALINE™) only manages to recapitulate 50% of the positions in the reference alignments available from the BAliBASE2-ref7. We show how homology extension can be adapted and combined with a consistency based approach in order to significantly improve the multiple sequence alignment of alpha-helical TMPs. TM-Coffee is a special mode of PSI-Coffee able to efficiently align TMPs, while using a reduced reference database for homology extension. Our benchmarking on BAliBASE2-ref7 alpha-helical TMPs shows a significant improvement over the most accurate methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The second part, homology and evolutionary modeling are the most common applications of MSAs. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. References: PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic acids research 44, W339–343(2016). TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction. Nucleic acids research 43, W3–6 (2015). TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.Molecular biology and evolution 31, 1625–37 (2014). Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. Bmc Bioinformatics 13, S1 (2012). Website: PSI/TM-Coffee http://tcoffee.crg.cat/tmcoffee, TCS http://tcoffee.crg.cat/tcs

Download Full-text

Influence of alignment uncertainty on homology and phylogenetic modeling

10.7287/peerj.preprints.3376 ◽

2017 ◽

Author(s):

Jia-Ming Chang ◽

Cedric Notredame

Keyword(s):

Nucleic Acids ◽

Phylogenetic Tree ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Web Server ◽

Transmembrane Proteins ◽

Tree Reconstruction ◽

Multiple Sequence ◽

Data Set ◽

Phylogenetic Tree Reconstruction

Most evolutionary analyses or structure modeling are based upon pre-estimated multiple sequence alignment (MSA) models. From a computational point of view, it is too complex to estimate a correct alignment. Hence, increasing or identifying signal inside sequence alignment has intensified over the last few years. During the presentation, I would like to share two approaches, homology extension and sampling, on this topic. The first part, transmembrane proteins (TMPs) constitute about 20~30% of all protein coding genes. The relative lack of experimental structure has so far made it hard to develop specific alignment methods and the current state of the art (PRALINE™) only manages to recapitulate 50% of the positions in the reference alignments available from the BAliBASE2-ref7. We show how homology extension can be adapted and combined with a consistency based approach in order to significantly improve the multiple sequence alignment of alpha-helical TMPs. TM-Coffee is a special mode of PSI-Coffee able to efficiently align TMPs, while using a reduced reference database for homology extension. Our benchmarking on BAliBASE2-ref7 alpha-helical TMPs shows a significant improvement over the most accurate methods such as MSAProbs, Kalign, PROMALS, MAFFT, ProbCons and PRALINE™. The second part, homology and evolutionary modeling are the most common applications of MSAs. In this work, we show how this problem can be partly overcome using the transitive consistency score (TCS), an extended version of the T-Coffee scoring scheme. Using this local evaluation function, we show that one can identify the most reliable portions of an MSA, as judged from BAliBASE and PREFAB structure-based reference alignments. We also show how this measure can be used to improve phylogenetic tree reconstruction using both an established simulated data set and a novel empirical yeast data set. Our approach relies on the T-Coffee framework; it uses libraries of pairwise alignments to evaluate any third party MSA. We compared TCS with Heads-or-Tails, GUIDANCE, Gblocks, and trimAl and found it to lead to significantly better estimates of structural accuracy and more accurate phylogenetic trees. References: PSI/TM-Coffee: a web server for fast and accurate multiple sequence alignments of regular and transmembrane proteins using homology extension on reduced databases. Nucleic acids research 44, W339–343(2016). TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction. Nucleic acids research 43, W3–6 (2015). TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.Molecular biology and evolution 31, 1625–37 (2014). Accurate multiple sequence alignment of transmembrane proteins with PSI-Coffee. Bmc Bioinformatics 13, S1 (2012). Website: PSI/TM-Coffee http://tcoffee.crg.cat/tmcoffee, TCS http://tcoffee.crg.cat/tcs

Download Full-text

Predicting Consensus Structures for RNA Alignments via Pseudo-Energy Minimization

Bioinformatics and Biology Insights ◽

10.4137/bbi.s2578 ◽

2009 ◽

Vol 3 ◽

pp. BBI.S2578 ◽

Cited By ~ 8

Author(s):

Junilda Spirollari ◽

Jason T.L. Wang ◽

Kaizhong Zhang ◽

Vivian Bellofatto ◽

Yongkyu Park ◽

...

Keyword(s):

Free Energy ◽

Secondary Structure ◽

Sequence Alignment ◽

Multiple Sequence Alignment ◽

Energy Minimization ◽

Secondary Structure Prediction ◽

Sequence Alignments ◽

Rna Sequences ◽

Multiple Sequence ◽

Consensus Secondary Structure

Thermodynamic processes with free energy parameters are often used in algorithms that solve the free energy minimization problem to predict secondary structures of single RNA sequences. While results from these algorithms are promising, an observation is that single sequence-based methods have moderate accuracy and more information is needed to improve on RNA secondary structure prediction, such as covariance scores obtained from multiple sequence alignments. We present in this paper a new approach to predicting the consensus secondary structure of a set of aligned RNA sequences via pseudo-energy minimization. Our tool, called RSpredict, takes into account sequence covariation and employs effective heuristics for accuracy improvement. RSpredict accepts, as input data, a multiple sequence alignment in FASTA or ClustalW format and outputs the consensus secondary structure of the input sequences in both the Vienna style Dot Bracket format and the Connectivity Table format. Our method was compared with some widely used tools including KNetFold, Pfold and RNAalifold. A comprehensive test on different datasets including Rfam sequence alignments and a multiple sequence alignment obtained from our study on the Drosophila X chromosome reveals that RSpredict is competitive with the existing tools on the tested datasets. RSpredict is freely available online as a web server and also as a jar file for download at http://datalab.njit.edu/biology/RSpredict .

Download Full-text