scholarly journals PPalign: optimal alignment of Potts models representing proteins with direct coupling information

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Hugo Talibart ◽  
François Coste

Abstract Background To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use. Methods We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between $$3\%$$ 3 % and $$20\%$$ 20 % ) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time ($$1'37''$$ 1 ′ 37 ′ ′ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean $$F_1$$ F 1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases. Conclusions These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.

2020 ◽  
Author(s):  
Hugo Talibart ◽  
François Coste

AbstractBackgroundTo assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models (pHMM), which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use.ResultsWe introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between 3% and 20%) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time (1′37″ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and PPalign without couplings. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean F1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases.ConclusionsThese results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.


Author(s):  
Hugo Talibart ◽  
François Coste

AbstractTo assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models (pHMMs), which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition. Due to the presence of non-local dependencies, aligning two Potts models is computationally hard. To tackle this task, we introduce an Integer Linear Programming formulation of the problem and present ComPotts, an implementation able to compute the optimal alignment of two Potts models representing proteins in tractable time. A first experimentation on 59 low sequence identity pairwise alignments, extracted from 3 reference alignments from sisyphus and BaliBase3 databases, shows that ComPotts finds better alignments than the other tested methods in the majority of these cases.


2020 ◽  
Vol 16 (11) ◽  
pp. e1008085
Author(s):  
Grey W. Wilburn ◽  
Sean R. Eddy

Most methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.


Author(s):  
Grey W. Wilburn ◽  
Sean R. Eddy

AbstractMost methods for biological sequence homology search and alignment work with primary sequence alone, neglecting higher-order correlations. Recently, statistical physics models called Potts models have been used to infer all-by-all pairwise correlations between sites in deep multiple sequence alignments, and these pairwise couplings have improved 3D structure predictions. Here we extend the use of Potts models from structure prediction to sequence alignment and homology search by developing what we call a hidden Potts model (HPM) that merges a Potts emission process to a generative probability model of insertion and deletion. Because an HPM is incompatible with efficient dynamic programming alignment algorithms, we develop an approximate algorithm based on importance sampling, using simpler probabilistic models as proposal distributions. We test an HPM implementation on RNA structure homology search benchmarks, where we can compare directly to exact alignment methods that capture nested RNA base-pairing correlations (stochastic context-free grammars). HPMs perform promisingly in these proof of principle experiments.Author summaryComputational homology search and alignment tools are used to infer the functions and evolutionary histories of biological sequences. Most widely used tools for sequence homology searches, such as BLAST and HMMER, rely on primary sequence conservation alone. It should be possible to make more powerful search tools by also considering higher-order covariation patterns induced by 3D structure conservation. Recent advances in 3D protein structure prediction have used a class of statistical physics models called Potts models to infer pairwise correlation structure in multiple sequence alignments. However, Potts models assume alignments are given and cannot build new alignments, limiting their use in homology search. We have extended Potts models to include a probability model of insertion and deletion so they can be applied to sequence alignment and remote homology search using a new model we call a hidden Potts model (HPM). Tests of our prototype HPM software show promising results in initial benchmarking experiments, though more work will be needed to use HPMs in practical tools.


2019 ◽  
Author(s):  
Martin Steinegger ◽  
Markus Meier ◽  
Milot Mirdita ◽  
Harald Vöhringer ◽  
Stephan J. Haunsberger ◽  
...  

AbstractBackgroundHH-suite is a widely used open source software suite for sensitive sequence similarity searches and protein fold recognition. It is based on pairwise alignment of profile Hidden Markov models (HMMs), which represent multiple sequence alignments of homologous sequences.ResultsWe developed a single-instruction multiple-data (SIMD) vectorized implementation of the Viterbi algorithm for profile HMM alignment and introduced various other speed-ups. This accelerated HHsearch by a factor 4 and HHblits by a factor 2 over the previous version 2.0.16. HHblits3 is ~10× faster than PSI-BLAST and ~20× faster than HMMER3. Jobs to perform HHsearch and HHblits searches with many query profile HMMs can be parallelized over cores and over servers in a cluster using OpenMP and message passing interface (MPI). The free, open-source, GNU GPL(v3)-licensed software is available at https://github.com/soedinglab/hh-suite.ConclusionThe added functionalities and increased speed of HHsearch and HHblits should facilitate their use in large-scale protein structure and function prediction, e.g. in metagenomics and genomics projects.


2016 ◽  
Vol 710 ◽  
pp. 409-414 ◽  
Author(s):  
Gianfranco De Matteis ◽  
Giuseppe Brando

This paper aims at providing an overview on the current state of the art and on possible future developments concerning the component method implementation for the classification of beam-to-column joints belonging to aluminum moment resisting frames.After a brief discussion on the component method theoretical bases, developed in the past to give a feasible calculation procedure for steel joints, recent experimental and numerical studies, carried out for investigating some aluminum components, are presented and discussed. In particular strengths and weaknesses of the current knowledge are put into evidence, also in light of the peculiarities that make aluminum alloys different from steel. The launch of new research fields, aimed at pursuing an update of the current codes dealing with aluminum structures, is therefore proposed.


2015 ◽  
Vol 2015 ◽  
pp. 1-23 ◽  
Author(s):  
Francesco Cartella ◽  
Jan Lemeire ◽  
Luca Dimiccoli ◽  
Hichem Sahli

Realistic predictive maintenance approaches are essential for condition monitoring and predictive maintenance of industrial machines. In this work, we propose Hidden Semi-Markov Models (HSMMs) with (i) no constraints on the state duration density function and (ii) being applied to continuous or discrete observation. To deal with such a type of HSMM, we also propose modifications to the learning, inference, and prediction algorithms. Finally, automatic model selection has been made possible using the Akaike Information Criterion. This paper describes the theoretical formalization of the model as well as several experiments performed on simulated and real data with the aim of methodology validation. In all performed experiments, the model is able to correctly estimate the current state and to effectively predict the time to a predefined event with a low overall average absolute error. As a consequence, its applicability to real world settings can be beneficial, especially where in real time the Remaining Useful Lifetime (RUL) of the machine is calculated.


2021 ◽  
Vol 2 (02) ◽  
pp. 52-58
Author(s):  
Sharmeen M.Saleem Abdullah Abdullah ◽  
Siddeeq Y. Ameen Ameen ◽  
Mohammed Mohammed sadeeq ◽  
Subhi Zeebaree

New research into human-computer interaction seeks to consider the consumer's emotional status to provide a seamless human-computer interface. This would make it possible for people to survive and be used in widespread fields, including education and medicine. Multiple techniques can be defined through human feelings, including expressions, facial images, physiological signs, and neuroimaging strategies. This paper presents a review of emotional recognition of multimodal signals using deep learning and comparing their applications based on current studies. Multimodal affective computing systems are studied alongside unimodal solutions as they offer higher accuracy of classification. Accuracy varies according to the number of emotions observed, features extracted, classification system and database consistency. Numerous theories on the methodology of emotional detection and recent emotional science address the following topics. This would encourage studies to understand better physiological signals of the current state of the science and its emotional awareness problems.


Sign in / Sign up

Export Citation Format

Share Document