Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites

Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.

Download Full-text

AUTOMATED CONSTRUCTION OF STRUCTURAL MOTIFS FOR PREDICTING FUNCTIONAL SITES ON PROTEIN STRUCTURES

Biocomputing 2003 ◽

10.1142/9789812776303_0020 ◽

2002 ◽

Cited By ~ 2

Author(s):

M. P. LIANG ◽

D. L. BRUTLAG ◽

R. B. ALTMAN

Keyword(s):

Protein Structures ◽

Structural Motifs ◽

Functional Sites

Download Full-text

RECOGNIZING COMPLEX, ASYMMETRIC FUNCTIONAL SITES IN PROTEIN STRUCTURES USING A BAYESIAN SCORING FUNCTION

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720003000150 ◽

2003 ◽

Vol 01 (01) ◽

pp. 119-138 ◽

Cited By ~ 15

Author(s):

LIPING WEI ◽

RUSS B. ALTMAN

Keyword(s):

Binding Sites ◽

Calcium Binding ◽

Active Sites ◽

Large Scale ◽

Protein Structures ◽

Scoring Function ◽

Chemical Properties ◽

Data Bank ◽

Functional Sites ◽

Calcium Binding Sites

The increase in known three-dimensional protein structures enables us to build statistical profiles of important functional sites in protein molecules. These profiles can then be used to recognize sites in large-scale automated annotations of new protein structures. We report an improved FEATURE system which recognizes functional sites in protein structures. FEATURE defines multi-level physico-chemical properties and recognizes sites based on the spatial distribution of these properties in the sites' microenvironments. It uses a Bayesian scoring function to compare a query region with the statistical profile built from known examples of sites and control nonsites. We have previously shown that FEATURE can accurately recognize calcium-binding sites and have reported interesting results scanning for calcium-binding sites in the entire Protein Data Bank. Here we report the ability of the improved FEATURE to characterize and recognize geometrically complex and asymmetric sites such as ATP-binding sites and disulfide bond-forming sites. FEATURE does not rely on conserved residues or conserved residue geometry of the sites. We also demonstrate that, in the absence of a statistical profile of the sites, FEATURE can use an artificially constructed profile based on a priori knowledge to recognize the sites in new structures, using redoxin active sites as an example.

Download Full-text

UET: a database of evolutionarily-predicted functional determinants of protein sequences that cluster as functional sites in protein structures

Nucleic Acids Research ◽

10.1093/nar/gkv1279 ◽

2015 ◽

Vol 44 (D1) ◽

pp. D308-D312 ◽

Cited By ~ 16

Author(s):

Rhonald C. Lua ◽

Stephen J. Wilson ◽

Daniel M. Konecki ◽

Angela D. Wilkins ◽

Eric Venner ◽

...

Keyword(s):

Protein Structures ◽

Protein Sequences ◽

Functional Sites

Download Full-text

Matching of PDB chain sequences to information in public databases as a prerequisite for 3D functional site visualization

Journal of Integrative Bioinformatics ◽

10.1515/jib-2004-7 ◽

2004 ◽

Vol 1 (1) ◽

pp. 80-89

Author(s):

Guido Dieterich ◽

Dirk W. Heinz ◽

Joachim Reichelt

Keyword(s):

Genetic Disorders ◽

Protein Structures ◽

3D Structure ◽

Data Bank ◽

Mendelian Inheritance ◽

Biological Information ◽

Structure Comparison ◽

3D Structures ◽

Functional Sites ◽

And Function

Abstract The 3D structures of biomacromolecules stored in the Protein Data Bank [1] were correlated with different external, biological information from public databases. We have matched the feature table of SWISS-PROT [2] entries as well InterPro [3] domains and function sites with the corresponding 3D-structures. OMIM [4] (Online Mendelian Inheritance in Man) records, containing information of genetic disorders, were extracted and linked to the structures. The exhaustive all-against-all 3D structure comparison of protein structures stored in DALI [5] was condensed into single files for each PDB entry. Results are stored in XML format facilitating its incorporation into related software. The resulting annotation of the protein structures allows functional sites to be identified upon visualization.

Download Full-text

A community proposal to integrate structural bioinformatics activities in ELIXIR (3D-Bioinfo Community)

F1000Research ◽

10.12688/f1000research.20559.1 ◽

2020 ◽

Vol 9 ◽

pp. 278

Author(s):

Christine Orengo ◽

Sameer Velankar ◽

Shoshana Wodak ◽

Vincent Zoete ◽

Alexandre M.J.J. Bonvin ◽

...

Keyword(s):

Structural Biology ◽

Protein Design ◽

Sequence Data ◽

Protein Structures ◽

Structural Bioinformatics ◽

Data Sets ◽

Educational Training ◽

Scientific Methods ◽

Functional Sites ◽

Common Interests

Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.

Download Full-text

Deep learning methods for designing proteins scaffolding functional sites

10.1101/2021.11.10.468128 ◽

2021 ◽

Author(s):

Jue Wang ◽

Sidney Lisanza ◽

David Juergens ◽

Doug Tischer ◽

Ivan Anishchenko ◽

...

Keyword(s):

Structure Prediction ◽

Neutralizing Antibodies ◽

De Novo ◽

Specific Interaction ◽

Protein Structures ◽

Functional Site ◽

Viral Inhibition ◽

Structure Information ◽

Functional Sites ◽

Interaction Terms

Current approaches to de novo design of proteins harboring a desired binding or catalytic motif require pre-specification of an overall fold or secondary structure composition, and hence considerable trial and error can be required to identify protein structures capable of scaffolding an arbitrary functional site. Here we describe two complementary approaches to the general functional site design problem that employ the RosettaFold and AlphaFold neural networks which map input sequences to predicted structures. In the first "constrained hallucination" approach, we carry out gradient descent in sequence space to optimize a loss function which simultaneously rewards recapitulation of the desired functional site and the ideality of the surrounding scaffold, supplemented with problem-specific interaction terms, to design candidate immunogens presenting epitopes recognized by neutralizing antibodies, receptor traps for escape-resistant viral inhibition, metalloproteins and enzymes, and target binding proteins with designed interfaces expanding around known binding motifs. In the second "missing information recovery" approach, we start from the desired functional site and jointly fill in the missing sequence and structure information needed to complete the protein in a single forward pass through an updated RoseTTAFold trained to recover sequence from structure in addition to structure from sequence. We show that the two approaches have considerable synergy, and AlphaFold2 structure prediction calculations suggest that the approaches can accurately generate proteins containing a very wide array of functional sites.

Download Full-text

CATH: increased structural coverage of functional space

Nucleic Acids Research ◽

10.1093/nar/gkaa1079 ◽

2020 ◽

Vol 49 (D1) ◽

pp. D266-D273

Author(s):

Ian Sillitoe ◽

Nicola Bordin ◽

Natalie Dawson ◽

Vaishali P Waman ◽

Paul Ashford ◽

...

Keyword(s):

Sequence Data ◽

Protein Structures ◽

Functional Space ◽

Web Pages ◽

Functional Annotations ◽

Functional Sites ◽

Domain Structures ◽

Functional Families ◽

Structural Coverage ◽

Coherent Sequence

Abstract CATH (https://www.cathdb.info) identifies domains in protein structures from wwPDB and classifies these into evolutionary superfamilies, thereby providing structural and functional annotations. There are two levels: CATH-B, a daily snapshot of the latest domain structures and superfamily assignments, and CATH+, with additional derived data, such as predicted sequence domains, and functionally coherent sequence subsets (Functional Families or FunFams). The latest CATH+ release, version 4.3, significantly increases coverage of structural and sequence data, with an addition of 65,351 fully-classified domains structures (+15%), providing 500 238 structural domains, and 151 million predicted sequence domains (+59%) assigned to 5481 superfamilies. The FunFam generation pipeline has been re-engineered to cope with the increased influx of data. Three times more sequences are captured in FunFams, with a concomitant increase in functional purity, information content and structural coverage. FunFam expansion increases the structural annotations provided for experimental GO terms (+59%). We also present CATH-FunVar web-pages displaying variations in protein sequences and their proximity to known or predicted functional sites. We present two case studies (1) putative cancer drivers and (2) SARS-CoV-2 proteins. Finally, we have improved links to and from CATH including SCOP, InterPro, Aquaria and 2DProt.

Download Full-text