scholarly journals RADI (Reduced Alphabet Direct Information): Improving execution time for direct-coupling analysis

2018 ◽  
Author(s):  
Bernat Anton ◽  
Mireia Besalú ◽  
Oriol Fornes ◽  
Jaume Bonet ◽  
Gemma De las Cuevas ◽  
...  

AbstractMotivationDirect-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. Current algorithms for DCA, although efficient, have a high computational cost of determining Direct Information (DI) values for large proteins or domains. In this paper, we present RADI (Reduced Alphabet Direct Information), a variation of the original DCA algorithm that simplifies the computation of DI values by grouping physicochemically equivalent residues.ResultsWe have compared the first top ranking 40 pairs of DI values and their closest paired contact in 3D. The ranking is also compared with results obtained using a similar but faster approach based on Mutual Information (MI). When we simplify the number of symbols used to describe a protein sequence to 9, RADI achieves similar results as the original DCA (i.e. with the classical alphabet of 21 symbols), while reducing the computation time around 30-fold on large proteins (with length around 1000 residues) and with higher accuracy than predictions based on MI. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure, having a relevant and useful predictive value, while the computation time is reduced between 100 and 2500-fold.AvailabilityRADI is available at https://github.com/structuralbioinformatics/[email protected] informationSupplementary data is available in the git repository.

2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Bernat Anton ◽  
Mireia Besalú ◽  
Oriol Fornes ◽  
Jaume Bonet ◽  
Alexis Molina ◽  
...  

Abstract Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30–50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.


2019 ◽  
Vol 36 (7) ◽  
pp. 2264-2265 ◽  
Author(s):  
Mehari B Zerihun ◽  
Fabrizio Pucci ◽  
Emanuel K Peter ◽  
Alexander Schug

Abstract Motivation The ongoing advances in sequencing technologies have provided a massive increase in the availability of sequence data. This made it possible to study the patterns of correlated substitution between residues in families of homologous proteins or RNAs and to retrieve structural and stability information. Direct coupling analysis (DCA) infers coevolutionary couplings between pairs of residues indicating their spatial proximity, making such information a valuable input for subsequent structure prediction. Results Here, we present pydca, a standalone Python-based software package for the DCA of protein- and RNA-homologous families. It is based on two popular inverse statistical approaches, namely, the mean-field and the pseudo-likelihood maximization and is equipped with a series of functionalities that range from multiple sequence alignment trimming to contact map visualization. Thanks to its efficient implementation, features and user-friendly command line interface, pydca is a modular and easy-to-use tool that can be used by researchers with a wide range of backgrounds. Availability and implementation pydca can be obtained from https://github.com/KIT-MBS/pydca or from the Python Package Index under the MIT License. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 16 (3) ◽  
pp. e1007630
Author(s):  
Barbara Bravi ◽  
Riccardo Ravasio ◽  
Carolina Brito ◽  
Matthieu Wyart

2020 ◽  
Vol 36 (11) ◽  
pp. 3372-3378
Author(s):  
Alexander Gress ◽  
Olga V Kalinina

Abstract Motivation In proteins, solvent accessibility of individual residues is a factor contributing to their importance for protein function and stability. Hence one might wish to calculate solvent accessibility in order to predict the impact of mutations, their pathogenicity and for other biomedical applications. A direct computation of solvent accessibility is only possible if all atoms of a protein three-dimensional structure are reliably resolved. Results We present SphereCon, a new precise measure that can estimate residue relative solvent accessibility (RSA) from limited data. The measure is based on calculating the volume of intersection of a sphere with a cone cut out in the direction opposite of the residue with surrounding atoms. We propose a method for estimating the position and volume of residue atoms in cases when they are not known from the structure, or when the structural data are unreliable or missing. We show that in cases of reliable input structures, SphereCon correlates almost perfectly with the directly computed RSA, and outperforms other previously suggested indirect methods. Moreover, SphereCon is the only measure that yields accurate results when the identities of amino acids are unknown. A significant novel feature of SphereCon is that it can estimate RSA from inter-residue distance and contact matrices, without any information about the actual atom coordinates. Availability and implementation https://github.com/kalininalab/spherecon. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2012 ◽  
Vol 102 (3) ◽  
pp. 250a ◽  
Author(s):  
Faruck Morcos ◽  
Andrea Pagnini ◽  
Bryan Lunt ◽  
Arianna Bertolino ◽  
Debora Marks ◽  
...  

2021 ◽  
Vol 17 (4) ◽  
pp. e1008798
Author(s):  
Claudio Bassot ◽  
Arne Elofsson

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.


Sign in / Sign up

Export Citation Format

Share Document