Fine tuned exploration of evolutionary relationships within the protein universe

Abstract In the regime of domain classifications, the protein universe unveils a discrete set of folds connected by hierarchical relationships. Instead, at sub-domain-size resolution and because of physical constraints not necessarily requiring evolution to shape polypeptide chains, networks of protein motifs depict a continuous view that lies beyond the extent of hierarchical classification schemes. A number of studies, however, suggest that universal sub-sequences could be the descendants of peptides emerged in an ancient pre-biotic world. Should this be the case, evolutionary signals retained by structurally conserved motifs, along with hierarchical features of ancient domains, could sew relationships among folds that diverged beyond the point where homology is discernable. In view of the aforementioned, this paper provides a rationale where a network with hierarchical and continuous levels of the protein space, together with sequence profiles that probe the extent of sequence similarity and contacting residues that capture the transition from pre-biotic to domain world, has been used to explore relationships between ancient folds. Statistics of detected signals have been reported. As a result, an example of an emergent sub-network that makes sense from an evolutionary perspective, where conserved signals retrieved from the assessed protein space have been co-opted, has been discussed.

Download Full-text

Lessons from equilibrium statistical physics regarding the assembly of protein complexes

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1911028117 ◽

2019 ◽

Vol 117 (1) ◽

pp. 114-120 ◽

Cited By ~ 4

Author(s):

Pablo Sartori ◽

Stanislas Leibler

Keyword(s):

Self Assembly ◽

Statistical Physics ◽

Liquid Mixture ◽

Protein Complexes ◽

Biological Evolution ◽

Cellular Protein ◽

Cellular Functions ◽

Physical Constraints ◽

Polypeptide Chains ◽

Protein Complex Assembly

Cellular functions are established through biological evolution, but are constrained by the laws of physics. For instance, the physics of protein folding limits the lengths of cellular polypeptide chains. Consequently, many cellular functions are carried out not by long, isolated proteins, but rather by multiprotein complexes. Protein complexes themselves do not escape physical constraints, one of the most important being the difficulty of assembling reliably in the presence of cellular noise. In order to lay the foundation for a theory of reliable protein complex assembly, we study here an equilibrium thermodynamic model of self-assembly that exhibits 4 distinct assembly behaviors: diluted protein solution, liquid mixture, “chimeric assembly,” and “multifarious assembly.” In the latter regime, different protein complexes can coexist without forming erroneous chimeric structures. We show that 2 conditions have to be fulfilled to attain this regime: 1) The composition of the complexes needs to be sufficiently heterogeneous, and 2) the use of the set of components by the complexes has to be sparse. Our analysis of publicly available databases of protein complexes indicates that cellular protein systems might have indeed evolved so as to satisfy both of these conditions.

Download Full-text

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Nucleic Acids Research ◽

10.1093/nar/gkaa1097 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D452-D457

Author(s):

Lisanna Paladin ◽

Martina Bevilacqua ◽

Sara Errigo ◽

Damiano Piovesan ◽

Ivan Mičetić ◽

...

Keyword(s):

Protein Data Bank ◽

Tandem Repeat ◽

Tandem Repeats ◽

Classification Scheme ◽

Sequence Similarity ◽

Protein Structures ◽

Hierarchical Classification ◽

Structural Similarity ◽

Data Bank ◽

Similarity Class

Abstract The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

Download Full-text

A functional and structural study of the major metalloprotease secreted by the pathogenic fungusAspergillus fumigatus

Acta Crystallographica Section D Biological Crystallography ◽

10.1107/s0907444913017642 ◽

2013 ◽

Vol 69 (10) ◽

pp. 1946-1957 ◽

Cited By ~ 11

Author(s):

Daniel Fernández ◽

Silvia Russi ◽

Josep Vendrell ◽

Michel Monod ◽

Irantzu Pallarès

Keyword(s):

Catalytic Domain ◽

Sequence Similarity ◽

Pathogenic Fungi ◽

Disulfide Bridge ◽

Zinc Ion ◽

Binary Complex ◽

Active Site Cleft ◽

Charged Amino Acid Residue ◽

Polypeptide Chains ◽

First Time

Fungalysins are secreted fungal peptidases with the ability to degrade the extracellular matrix proteins elastin and collagen and are thought to act as virulence factors in diseases caused by fungi. Fungalysins constitute a unique family among zinc-dependent peptidases that bears low sequence similarity to known bacterial peptidases of the thermolysin family. The crystal structure of the archetype of the fungalysin family,Aspergillus fumigatusmetalloprotease (AfuMep), has been obtained for the first time. The 1.8 Å resolution structure of AfuMep corresponds to that of an autoproteolyzed proenzyme with separate polypeptide chains corresponding to the N-terminal prodomain in a binary complex with the C-terminal zinc-bound catalytic domain. The prodomain consists of a tandem of cystatin-like folds whose C-terminal end is buried into the active-site cleft of the catalytic domain. The catalytic domain harbouring the key catalytic zinc ion and its ligands, two histidines and one glutamic acid, undergoes a conspicuous rearrangement of its N-terminal end during maturation. One key positively charged amino-acid residue and the C-terminal disulfide bridge appear to contribute to its structural–functional properties. Thus, structural, biophysical and biochemical analysis were combined to provide a deeper comprehension of the underlying properties ofA. fumigatusfungalysin, serving as a framework for the as yet poorly known metallopeptidases from pathogenic fungi.

Download Full-text

Protein space: A natural method for realizing the nature of protein universe

Journal of Theoretical Biology ◽

10.1016/j.jtbi.2012.11.005 ◽

2013 ◽

Vol 318 ◽

pp. 197-204 ◽

Cited By ~ 25

Author(s):

Chenglong Yu ◽

Mo Deng ◽

Shiu-Yuen Cheng ◽

Shek-Chung Yau ◽

Rong L. He ◽

...

Keyword(s):

Protein Universe ◽

Protein Space ◽

Natural Method

Download Full-text

Sequence Similarity Networks for the Protein Universe

The FASEB Journal ◽

10.1096/fasebj.29.1_supplement.573.17 ◽

2015 ◽

Vol 29 (S1) ◽

Author(s):

Katie Whalen ◽

Boris Sadkhin ◽

Daniel Davidson ◽

John Gerlt

Keyword(s):

Sequence Similarity ◽

Protein Universe ◽

Similarity Networks ◽

Sequence Similarity Networks

Download Full-text

Contrastive learning on protein embeddings enlightens midnight zone at lightning speed

10.1101/2021.11.14.468528 ◽

2021 ◽

Author(s):

Michael Heinzinger ◽

Maria Littmann ◽

Ian Sillitoe ◽

Nicola Bordin ◽

Christine Orengo ◽

...

Keyword(s):

Structure Prediction ◽

Sequence Similarity ◽

3D Structure ◽

Three Dimensional ◽

Hierarchical Classification ◽

Language Models ◽

Sequence Alignments ◽

Sequence Comparisons ◽

Multiple Sequence ◽

3D Structures

Thanks to the recent advances in protein three-dimensional (3D) structure prediction, in particular through AlphaFold 2 and RoseTTAFold, the abundance of protein 3D information will explode over the next year(s). Expert resources based on 3D structures such as SCOP and CATH have been organizing the complex sequence-structure-function relations into a hierarchical classification schema. Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI) transferring annotations from a protein with experimentally known annotation to a query without annotation. Here, we presented a novel approach that expands the concept of HBI from a low-dimensional sequence-distance lookup to the level of a high-dimensional embedding-based annotation transfer (EAT). Secondly, we introduced a novel solution using single protein sequence representations from protein Language Models (pLMs), so called embeddings (Prose, ESM-1b, ProtBERT, and ProtT5), as input to contrastive learning, by which a new set of embeddings was created that optimized constraints captured by hierarchical classifications of protein 3D structures. These new embeddings (dubbed ProtTucker) clearly improved what was historically referred to as threading or fold recognition. Thereby, the new embeddings enabled the intrusion into the midnight zone of protein comparisons, i.e., the region in which the level of pairwise sequence similarity is akin of random relations and therefore is hard to navigate by HBI methods. Cautious benchmarking showed that ProtTucker reached much further than advanced sequence comparisons without the need to compute alignments allowing it to be orders of magnitude faster. Code is available at https://github.com/Rostlab/EAT .

Download Full-text

Genetic characterization and cloning of mothers against dpp, a gene required for decapentaplegic function in Drosophila melanogaster.

Genetics ◽

10.1093/genetics/139.3.1347 ◽

1995 ◽

Vol 139 (3) ◽

pp. 1347-1358 ◽

Cited By ~ 46

Author(s):

J J Sekelsky ◽

S J Newfeld ◽

L A Raftery ◽

E H Chartoff ◽

W M Gelbart

Keyword(s):

Drosophila Melanogaster ◽

Growth Factor ◽

Transforming Growth Factor Beta ◽

Transforming Growth Factor ◽

Genetic Characterization ◽

Genomic Sequence ◽

Sequence Similarity ◽

Transcription Unit ◽

Loss Of Function ◽

Protein Motifs

Abstract The decapentaplegic (dpp) gene of Drosophila melanogaster encodes a growth factor that belongs to the transforming growth factor-beta (TGF-beta) superfamily and that plays a central role in multiple cell-cell signaling events throughout development. Through genetic screens we are seeking to identify other functions that act upstream, downstream or in concert with dpp to mediate its signaling role. We report here the genetic characterization and cloning of Mothers against dpp (Mad), a gene identified in two such screens. Mad loss-of-function mutations interact with dpp alleles to enhance embryonic dorsal-ventral patterning defects, as well as adult appendage defects, suggesting a role for Mad in mediating some aspect of dpp function. In support of this, homozygous Mad mutant animals exhibit defects in midgut morphogenesis, imaginal disk development and embryonic dorsal-ventral patterning that are very reminiscent of dpp mutant phenotypes. We cloned the Mad region and identified the Mad transcription unit through germline transformation rescue. We sequenced a Mad cDNA and identified three Mad point mutations that alter the coding information. The predicted MAD polypeptide lacks known protein motifs, but has strong sequence similarity to three polypeptides predicted from genomic sequence from the nematode Caenorhabditis elegans. Hence, MAD is a member of a novel, highly conserved protein family.

Download Full-text

ProtoNet: hierarchical classification of the protein space

Nucleic Acids Research ◽

10.1093/nar/gkg096 ◽

2003 ◽

Vol 31 (1) ◽

pp. 348-352 ◽

Cited By ~ 48

Author(s):

O. Sasson

Keyword(s):

Hierarchical Classification ◽

Protein Space

Download Full-text

Reclassification of SLC22 Transporters: Analysis of OAT, OCT, OCTN, and other Family Members Reveals 8 Functional Subgroups

10.1101/2019.12.23.887299 ◽

2019 ◽

Author(s):

Darcy Engelhart ◽

Jeffry C. Granados ◽

Da Shi ◽

Milton Saier ◽

Michael Baker ◽

...

Keyword(s):

Uric Acid ◽

Sequence Similarity ◽

Organic Anion ◽

Signaling Molecules ◽

In Vitro Assays ◽

Sequence Alignments ◽

Multiple Sequence ◽

In Vivo Models ◽

Protein Motifs

AbstractAmong transporters, the SLC22 family is emerging as a central hub of endogenous physiology. The family consists of organic anion transporters (OATs), organic cation transporters (OCTs) and zwitterion transporters (OCTNs). Despite being known as “drug” transporters, these multi-specific, oligo-specific, and relatively mono-specific transporters facilitate the movement of metabolites and key signaling molecules. An in-depth reanalysis supports a reassignment of these proteins into eight functional subgroups with four new subgroups arising from the previously defined OAT subclade. These OAT subgroups are: OATS1 (SLC22A6, SLC22A8, and SLC22A20), OATS2 (SLC22A7), OATS3 (SLC22A11, SLC22A12, and Slc22a22), and OATS4 (SLC22A9, SLC22A10, SLC22A24, and SLC22A25). We propose merging the OCTN (SLC22A4, SLC22A5, and Slc22a21) and OCT-related (SLC22A15 and SLC22A16) subclades into the OCTN/OCTN-related subgroup. Functional support for the eight subgroups comes from network analysis of data from GWAS, in vivo models, and in vitro assays. These data emphasize shared substrate specificity of SLC22 transporters for characteristic metabolites such as prostaglandins, uric acid, carnitine, creatinine, and estrone sulfate. Some important subgroup associations include: OATS1 with metabolites, signaling molecules, uremic toxins and odorants, OATS2 with cyclic nucleotides, OATS3 with uric acid, OATS4 with conjugated sex hormones, particularly etiocholanolone glucuronide, OCT with monoamine neurotransmitters, and OCTN/OCTN-related with ergothioneine and carnitine derivatives. The OAT-like and OAT-related subgroups remain understudied and therefore do not have assigned functionality. Relatedness within subgroups is supported by multiple sequence alignments, evolutionarily conserved protein motifs, genomic localization, and tissue expression. We also highlight low level sequence similarity of SLC22 members with other non-transport proteins. Our data suggest that the SLC22 family can work among itself, as well as with other transporters and enzymes, to optimize levels of numerous metabolites and signaling molecules, as proposed by the Remote Sensing and Signaling Theory.

Download Full-text

Exploring Protein Space: From Hydrolase to Ligase by Substitution

Molecular Biology and Evolution ◽

10.1093/molbev/msaa215 ◽

2020 ◽

Author(s):

Nir Hecht ◽

Caroline L Monteil ◽

Guy Perrière ◽

Marina Vishkautzan ◽

Eyal Gur

Keyword(s):

Sequence Similarity ◽

Structural Elements ◽

High Sequence Similarity ◽

Conserved Residues ◽

Catalytic Function ◽

Protein Tag ◽

Catalytically Active ◽

Loop Conformation ◽

Protein Space

Abstract The understanding of how proteins evolve to perform novel functions has long been sought by biologists. In this regard, two homologous bacterial enzymes, PafA and Dop, pose an insightful case study, as both rely on similar mechanistic properties, yet catalyze different reactions. PafA conjugates a small protein tag to target proteins, whereas Dop removes the tag by hydrolysis. Given that both enzymes present a similar fold and high sequence similarity, we sought to identify the differences in the amino acid sequence and folding responsible for each distinct activity. We tackled this question using analysis of sequence–function relationships, and identified a set of uniquely conserved residues in each enzyme. Reciprocal mutagenesis of the hydrolase, Dop, completely abolished the native activity, at the same time yielding a catalytically active ligase. Based on the available Dop and PafA crystal structures, this change of activity required a conformational change of a critical loop at the vicinity of the active site. We identified the conserved positions essential for stabilization of the alternative loop conformation, and tracked alternative mutational pathways that lead to a change in activity. Remarkably, all these pathways were combined in the evolution of PafA and Dop, despite their redundant effect on activity. Overall, we identified the residues and structural elements in PafA and Dop responsible for their activity differences. This analysis delineated, in molecular terms, the changes required for the emergence of a new catalytic function from a preexisting one.

Download Full-text