direct coupling analysis
Recently Published Documents


TOTAL DOCUMENTS

67
(FIVE YEARS 32)

H-INDEX

11
(FIVE YEARS 3)

2021 ◽  
Author(s):  
Andre Birgy ◽  
Clement Roussel ◽  
Harry Kemble ◽  
Jimmy Mullaert ◽  
Karine Panigoni ◽  
...  

Epistasis affects genome evolution together with our ability to predict individual mutation effects. The mechanistic basis of epistasis remains, however, largely unknown. To quantify and better understand interactions between fitness-affecting mutations, we focus on a 11 amino-acid α-helix of the protein β-lactamase TEM-1, and build a comprehensive library of more than 15,000 double mutants. Analysis of the growth rates of these mutants shows pervasive epistasis, which can be largely explained by a non-linear two-state model, where inactivating, destabilizing, neutral, or stabilizing mutations additively contribute to the phenotype. Hence, most epistatic interactions can be predicted by a non-linear model informed by single-point mutational measurements only. Deviations from the two-state model are consistently found for few pairs of residues, in particular when they are in contact. This result, as well as single-point mutation parameters can be quantitatively found back through direct-coupling-analysis-based statistical models inferred from homologous sequence data. Our results thus shed light on the existence and the origins of the multiple determinants of the epistatic landscape, even at the level of small structural components of a protein, and suggest that the corresponding constraints shape the entire β-lactamase family.


2021 ◽  
Author(s):  
Andonis Gerardos ◽  
Nicola Dietler ◽  
Anne-Florence Bitbol

Inferring protein-protein interactions from sequences is an important task in computational biology. Recent methods based on Direct Coupling Analysis (DCA) or Mutual Information (MI) allow to find interaction partners among paralogs of two protein families. Does successful inference mainly rely on correlations from structural contacts or from phylogeny, or both? Do these two types of signal combine constructively or hinder each other? To address these questions, we generate and analyze synthetic data produced using a minimal model that allows us to control the amounts of structural constraints and phylogeny. We show that correlations from these two sources combine constructively to increase the performance of partner inference by DCA or MI. Furthermore, signal from phylogeny can rescue partner inference when signal from contacts becomes less informative, including in the realistic case where inter-protein contacts are restricted to a small subset of sites. We also demonstrate that DCA-inferred couplings between non-contact pairs of sites improve partner inference in the presence of strong phylogeny, while deteriorating it otherwise. Moreover, restricting to non-contact pairs of sites preserves inference performance in the presence of strong phylogeny. In a natural dataset, as well as in realistic synthetic data based on it, we find that non-contact pairs of sites contribute positively to partner inference performance, and that restricting to them preserves performance, evidencing an important role of phylogeny.


2021 ◽  
Vol 17 (4) ◽  
pp. e1008798
Author(s):  
Claudio Bassot ◽  
Arne Elofsson

Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein’s structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.


2021 ◽  
Vol 3 (2) ◽  
Author(s):  
Bernat Anton ◽  
Mireia Besalú ◽  
Oriol Fornes ◽  
Jaume Bonet ◽  
Alexis Molina ◽  
...  

Abstract Direct-coupling analysis (DCA) for studying the coevolution of residues in proteins has been widely used to predict the three-dimensional structure of a protein from its sequence. We present RADI/raDIMod, a variation of the original DCA algorithm that groups chemically equivalent residues combined with super-secondary structure motifs to model protein structures. Interestingly, the simplification produced by grouping amino acids into only two groups (polar and non-polar) is still representative of the physicochemical nature that characterizes the protein structure and it is in line with the role of hydrophobic forces in protein-folding funneling. As a result of a compressed alphabet, the number of sequences required for the multiple sequence alignment is reduced. The number of long-range contacts predicted is limited; therefore, our approach requires the use of neighboring sequence-positions. We use the prediction of secondary structure and motifs of super-secondary structures to predict local contacts. We use RADI and raDIMod, a fragment-based protein structure modelling, achieving near native conformations when the number of super-secondary motifs covers >30–50% of the sequence. Interestingly, although different contacts are predicted with different alphabets, they produce similar structures.


2021 ◽  
Author(s):  
Yunda Si ◽  
Chengfei Yan

AbstractDirect coupling analysis (DCA) has been widely used to predict residue-residue contacts to assist protein/RNA structure and interaction prediction. However, effectively selecting residue pairs for contact prediction according to the result of DCA is a non-trivial task, since the number of highly predictive residue pairs and the coupling scores obtained from DCA are highly dependent on the number and the length of the homologous sequences forming the multiple sequence alignment, the detailed settings of the DCA algorithm, the functional characteristics of the macromolecule, etc. In this study, we present a general statistical framework for selecting predictive residue pairs through significant evolutionary coupling detection, referred to as IDR-DCA, which is based on reproducibility analysis of the coupling scores from replicated DCA. IDR-DCA was applied to select residue pairs for contact prediction for 150 proteins, 30 protein-protein interactions and 36 RNAs, in which we applied three widely used DCA software to perform the DCA. We show that with the application of IDR-DCA, the predictive residue pairs can be effectively selected through a universal threshold independent on the DCA software.


2021 ◽  
Vol 33 (1) ◽  
pp. 13-23
Author(s):  
Jun Ding ◽  
You-sheng Wu ◽  
Xin-yun Ni ◽  
Qi-bin Wang ◽  
Yu-chao Chen ◽  
...  

2020 ◽  
Vol 117 (49) ◽  
pp. 31519-31526
Author(s):  
Hong-Li Zeng ◽  
Vito Dichio ◽  
Edwin Rodríguez Horta ◽  
Kaisa Thorell ◽  
Erik Aurell

Genome-wide epistasis analysis is a powerful tool to infer gene interactions, which can guide drug and vaccine development and lead to deeper understanding of microbial pathogenesis. We have considered all complete severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomes deposited in the Global Initiative on Sharing All Influenza Data (GISAID) repository until four different cutoff dates, and used direct coupling analysis together with an assumption of quasi-linkage equilibrium to infer epistatic contributions to fitness from polymorphic loci. We find eight interactions, of which three are between pairs where one locus lies in gene ORF3a, both loci holding nonsynonymous mutations. We also find interactions between two loci in gene nsp13, both holding nonsynonymous mutations, and four interactions involving one locus holding a synonymous mutation. Altogether, we infer interactions between loci in viral genes ORF3a and nsp2, nsp12, and nsp6, between ORF8 and nsp4, and between loci in genes nsp2, nsp13, and nsp14. The paper opens the prospect to use prominent epistatically linked pairs as a starting point to search for combinatorial weaknesses of recombinant viral pathogens.


Sign in / Sign up

Export Citation Format

Share Document