ComPotts: Optimal alignment of coevolutionary models for protein sequences

AbstractTo assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models (pHMMs), which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition. Due to the presence of non-local dependencies, aligning two Potts models is computationally hard. To tackle this task, we introduce an Integer Linear Programming formulation of the problem and present ComPotts, an implementation able to compute the optimal alignment of two Potts models representing proteins in tractable time. A first experimentation on 59 low sequence identity pairwise alignments, extracted from 3 reference alignments from sisyphus and BaliBase3 databases, shows that ComPotts finds better alignments than the other tested methods in the majority of these cases.

Download Full-text

PPalign: Optimal alignment of Potts models representing proteins with direct coupling information

10.1101/2020.12.01.406504 ◽

2020 ◽

Author(s):

Hugo Talibart ◽

François Coste

Keyword(s):

Markov Models ◽

Pairwise Alignment ◽

Homology Search ◽

Sequence Alignments ◽

Potts Models ◽

Functional Annotations ◽

Current State ◽

Linear Programming Formulation ◽

Computational Bottleneck ◽

New Research

AbstractBackgroundTo assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models (pHMM), which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use.ResultsWe introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between 3% and 20%) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time (1′37″ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and PPalign without couplings. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean F1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases.ConclusionsThese results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.

Download Full-text

PPalign: optimal alignment of Potts models representing proteins with direct coupling information

BMC Bioinformatics ◽

10.1186/s12859-021-04222-4 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Hugo Talibart ◽

François Coste

Keyword(s):

Markov Models ◽

Pairwise Alignment ◽

Homology Search ◽

Sequence Alignments ◽

Potts Models ◽

Functional Annotations ◽

Current State ◽

Linear Programming Formulation ◽

Computational Bottleneck ◽

New Research

Abstract Background To assign structural and functional annotations to the ever increasing amount of sequenced proteins, the main approach relies on sequence-based homology search methods, e.g. BLAST or the current state-of-the-art methods based on profile Hidden Markov Models, which rely on significant alignments of query sequences to annotated proteins or protein families. While powerful, these approaches do not take coevolution between residues into account. Taking advantage of recent advances in the field of contact prediction, we propose here to represent proteins by Potts models, which model direct couplings between positions in addition to positional composition, and to compare proteins by aligning these models. Due to non-local dependencies, the problem of aligning Potts models is hard and remains the main computational bottleneck for their use. Methods We introduce here an Integer Linear Programming formulation of the problem and PPalign, a program based on this formulation, to compute the optimal pairwise alignment of Potts models representing proteins in tractable time. The approach is assessed with respect to a non-redundant set of reference pairwise sequence alignments from SISYPHUS benchmark which have lowest sequence identity (between $$3\%$$ 3 % and $$20\%$$ 20 % ) and enable to build reliable Potts models for each sequence to be aligned. This experimentation confirms that Potts models can be aligned in reasonable time ($$1'37''$$ 1 ′ 37 ′ ′ in average on these alignments). The contribution of couplings is evaluated in comparison with HHalign and independent-site PPalign. Although Potts models were not fully optimized for alignment purposes and simple gap scores were used, PPalign yields a better mean $$F_1$$ F 1 score and finds significantly better alignments than HHalign and PPalign without couplings in some cases. Conclusions These results show that pairwise couplings from protein Potts models can be used to improve the alignment of remotely related protein sequences in tractable time. Our experimentation suggests yet that new research on the inference of Potts models is now needed to make them more comparable and suitable for homology search. We think that PPalign’s guaranteed optimality will be a powerful asset to perform unbiased investigations in this direction.

Download Full-text

An ILS-based heuristic applied to the car renter salesman problem

RAIRO - Operations Research ◽

10.1051/ro/2020053 ◽

2020 ◽

Author(s):

Sávio Soares Dias ◽

Luidi Gelabert Simonetti ◽

Luiz Satoru Ochi

Keyword(s):

Local Search ◽

Heuristic Method ◽

Minimum Cost ◽

Random Variable ◽

Iterated Local Search ◽

Variable Neighborhood Descent ◽

Current State ◽

Different Types ◽

Linear Programming Formulation ◽

Integer Linear Programming Formulation

The present paper tackles the Car Renter Salesman Problem (CaRS), which is a Traveling Salesman Problem variant. In CaRS, the goal is to travel through a set of cities using rented vehicles at minimum cost. The main aim of the current problem is to establish an optimal route using rented vehicles of different types to each trip. Since CaRS is NP-Hard, we herein present a heuristic approach to tackle it. The approach is based on a Multi-Start Iterated Local Search metaheuristic, where the local search step is based on the Random Variable Neighborhood Descent methodology. An Integer Linear Programming Formulation based on a Quadratic Formulation from literature is also proposed in the current study. Computational results for the proposed heuristic method in euclidean instances outperform current state-of-the-art results. The proposed formulation also has stronger bounds and relaxation when compared to others from literature.

Download Full-text

Speckle Denoising With NL Filter and Stochastic Distances Under the Haar Wavelet Domain

10.5753/sibgrapi.est.2019.8307 ◽

2019 ◽

Author(s):

Pedro A. A. Penna ◽

Nelson D. A. Mascarenhas

Keyword(s):

State Of The Art ◽

Speckle Noise ◽

Haar Wavelet ◽

Imaging Systems ◽

Wavelet Domain ◽

Sar Images ◽

Local Means ◽

Current State ◽

Gamma Distributions ◽

Non Local

Synthetic aperture radar SAR imaging systems have a coherent processing that causes the appearance of the multiplicative speckle noise. This noise gives a granular appearance to the terrestrial surface scene impairing its interpretation. The similarity between patches approach is applied by the current state-of-the-art filters in remote sensing area. The goal of this manuscript is to present a method to transform the non-local means (NLM) algorithm capable to mitigate the noise. Singlelook speckle and the NLM under the Haar wavelet domain are considered in our research with intensity SAR images. To achieve our goal, we used the Exponential-Polynomial (EP) and Gamma distributions to describe the Haar coefficients. Also, stochastic distances based on these two mentioned distributions were formulated and embedded in the original NLM technique. Finally, we present analyses and comparisons of real scenarios to demonstrate the competitive performance of the proposed method with some recent filters of the literature.

Download Full-text

Detecting Transcriptomic Structural Variants in Heterogeneous Contexts via the Multiple Compatible Arrangements Problem

10.1101/697367 ◽

2019 ◽

Author(s):

Yutong Qiu ◽

Cong Ma ◽

Han Xie ◽

Carl Kingsford

Keyword(s):

Approximation Algorithm ◽

Confounding Factor ◽

State Of The Art ◽

Programming Formulation ◽

Rna Seq ◽

Structural Variants ◽

Linear Programming Formulation ◽

Improved Performance ◽

Sample Heterogeneity ◽

Integer Linear Programming Formulation

AbstractTranscriptomic structural variants (TSVs) — structural variants that affect expressed regions — are common, especially in cancer. Detecting TSVs is a challenging computational problem. Sample heterogeneity (including differences between alleles in diploid organisms) is a critical confounding factor when identifying TSVs. To improve TSV detection in heterogeneous RNA-seq samples, we introduce the MULTIPLECOMPATIBLEARRANGEMENTPROBLEM(MCAP), which seekskgenome rearrangements to maximize the number of reads that are concordant with at least one rearrangement. This directly models the situation of a heterogeneous or diploid sample. We prove that MCAP is NP-hard and provide a-approximation algorithm fork= 1 and a-approximation algorithm for the diploid case (k= 2) assuming an oracle fork= 1. Combining these, we obtain a-approximation algorithm for MCAP whenk= 2 (without an oracle). We also present an integer linear programming formulation for generalk. We completely characterize the graph structures that requirek> 1 to satisfy all edges and show such structures are prevalent in cancer samples. We evaluate our algorithms on 381 TCGA samples and 2 cancer cell lines and show improved performance compared to the state-of-the-art TSV-calling tool, SQUID.

Download Full-text

Behavioral Genetics: Concepts for Research and Practice in Language Development and Disorders

Journal of Speech Language and Hearing Research ◽

10.1044/jshr.3805.1126 ◽

1995 ◽

Vol 38 (5) ◽

pp. 1126-1142 ◽

Cited By ~ 14

Author(s):

Jeffrey W. Gilger

Keyword(s):

Language Development ◽

Behavioral Genetics ◽

State Of The Art ◽

Genetic Research ◽

Great Promise ◽

Behavioral Genetic ◽

Fine Grained ◽

Future Goals ◽

Current State ◽

Research Designs

This paper is an introduction to behavioral genetics for researchers and practioners in language development and disorders. The specific aims are to illustrate some essential concepts and to show how behavioral genetic research can be applied to the language sciences. Past genetic research on language-related traits has tended to focus on simple etiology (i.e., the heritability or familiality of language skills). The current state of the art, however, suggests that great promise lies in addressing more complex questions through behavioral genetic paradigms. In terms of future goals it is suggested that: (a) more behavioral genetic work of all types should be done—including replications and expansions of preliminary studies already in print; (b) work should focus on fine-grained, theory-based phenotypes with research designs that can address complex questions in language development; and (c) work in this area should utilize a variety of samples and methods (e.g., twin and family samples, heritability and segregation analyses, linkage and association tests, etc.).

Download Full-text

Schizophrenia Research: Current State of the Art

Contemporary Psychology ◽

10.1037/015267 ◽

1976 ◽

Vol 21 (7) ◽

pp. 497-498

Author(s):

STANLEY GRAND

Keyword(s):

State Of The Art ◽

Current State

Download Full-text

Umbral Calculus

The Electronic Journal of Combinatorics ◽

10.37236/24 ◽

2002 ◽

Vol 1000 ◽

Author(s):

A. Di Bucchianico ◽

D. Loeb

Keyword(s):

19Th Century ◽

Finite Differences ◽

State Of The Art ◽

Umbral Calculus ◽

Current State ◽

Logical Foundation ◽

Complete Bibliography ◽

The 19Th Century ◽

Mathematical Literature

We survey the mathematical literature on umbral calculus (otherwise known as the calculus of finite differences) from its roots in the 19th century (and earlier) as a set of “magic rules” for lowering and raising indices, through its rebirth in the 1970’s as Rota’s school set it on a firm logical foundation using operator methods, to the current state of the art with numerous generalizations and applications. The survey itself is complemented by a fairly complete bibliography (over 500 references) which we expect to update regularly.

Download Full-text