Characterizing disease-associated human proteins without available protein structures or homologues

Mutations in human proteins lead to diseases. The structure of these proteins can help understand the mechanism of such diseases and develop therapeutics against them. With improved deep learning techniques such as RoseTTAFold and AlphaFold, we can predict the structure of these proteins even in the absence of structural homologues. We modeled and extracted the domains from 553 disease-associated human proteins. We noticed that the model quality was higher and the RMSD lower between AlphaFold and RoseTTAFold models for domains that could be assigned to CATH families as compared to those which could be assigned to Pfam families of unknown structure or could not be assigned to either. We predicted ligand-binding sites, protein-protein interfaces, conserved residues and destabilising effects caused by residue mutations in these predicted structures. We then explored whether the disease-associated mutations were in the proximity of these predicted functional sites or if they destabilized the protein structure based on ddG calculations. We could explain 80% of these disease-associated mutations based on proximity to functional sites or structural destabilization. Usage of models from the two state-of-the-art techniques provide better confidence in our predictions, and we explain 93 additional mutations based on RoseTTAFold models which could not be explained based solely on AlphaFold models.

Download Full-text

Faculty Opinions recommendation of An accurate, sensitive, and scalable method to identify functional sites in protein structures.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1011808.182587 ◽

2003 ◽

Author(s):

Antonio Rosato

Keyword(s):

Protein Structures ◽

Functional Sites

Download Full-text

CATH functional families predict functional sites in proteins

Bioinformatics ◽

10.1093/bioinformatics/btaa937 ◽

2020 ◽

Author(s):

Sayoni Das ◽

Harry M Scholes ◽

Neeladri Sen ◽

Christine Orengo

Keyword(s):

Functional Characterization ◽

Functional Site ◽

Training Data ◽

Supplementary Information ◽

Conserved Residues ◽

Functional Sites ◽

Protein Protein Interaction ◽

Evolutionary Features ◽

Functional Families

Abstract Motivation Identification of functional sites in proteins is essential for functional characterization, variant interpretation and drug design. Several methods are available for predicting either a generic functional site, or specific types of functional site. Here, we present FunSite, a machine learning predictor that identifies catalytic, ligand-binding and protein–protein interaction functional sites using features derived from protein sequence and structure, and evolutionary data from CATH functional families (FunFams). Results FunSite’s prediction performance was rigorously benchmarked using cross-validation and a holdout dataset. FunSite outperformed other publicly available functional site prediction methods. We show that conserved residues in FunFams are enriched in functional sites. We found FunSite’s performance depends greatly on the quality of functional site annotations and the information content of FunFams in the training data. Finally, we analyze which structural and evolutionary features are most predictive for functional sites. Availabilityand implementation https://github.com/UCL/cath-funsite-predictor. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

FTSite: high accuracy detection of ligand binding sites on unbound protein structures

Bioinformatics ◽

10.1093/bioinformatics/btr651 ◽

2011 ◽

Vol 28 (2) ◽

pp. 286-287 ◽

Cited By ~ 100

Author(s):

Chi-Ho Ngan ◽

David R. Hall ◽

Brandon Zerbe ◽

Laurie E. Grove ◽

Dima Kozakov ◽

...

Keyword(s):

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

High Accuracy ◽

Ligand Binding Sites

Download Full-text

Improved chemistry restraints for crystallographic refinement by integrating the Amber force field into Phenix

Acta Crystallographica Section D Structural Biology ◽

10.1107/s2059798319015134 ◽

2020 ◽

Vol 76 (1) ◽

pp. 51-62 ◽

Cited By ~ 4

Author(s):

Nigel W. Moriarty ◽

Pawel A. Janowski ◽

Jason M. Swails ◽

Hai Nguyen ◽

Jane S. Richardson ◽

...

Keyword(s):

Force Field ◽

Active Sites ◽

Protein Structures ◽

Target Function ◽

Real Space ◽

Model Quality ◽

Nonbonded Interactions ◽

Amber Force Field ◽

Quantum Mechanical Representation ◽

Improved Model

The refinement of biomolecular crystallographic models relies on geometric restraints to help to address the paucity of experimental data typical in these experiments. Limitations in these restraints can degrade the quality of the resulting atomic models. Here, an integration of the full all-atom Amber molecular-dynamics force field into Phenix crystallographic refinement is presented, which enables more complete modeling of biomolecular chemistry. The advantages of the force field include a carefully derived set of torsion-angle potentials, an extensive and flexible set of atom types, Lennard–Jones treatment of nonbonded interactions and a full treatment of crystalline electrostatics. The new combined method was tested against conventional geometry restraints for over 22 000 protein structures. Structures refined with the new method show substantially improved model quality. On average, Ramachandran and rotamer scores are somewhat better, clashscores and MolProbity scores are significantly improved, and the modeling of electrostatics leads to structures that exhibit more, and more correct, hydrogen bonds than those refined using traditional geometry restraints. In general it is found that model improvements are greatest at lower resolutions, prompting plans to add the Amber target function to real-space refinement for use in electron cryo-microscopy. This work opens the door to the future development of more advanced applications such as Amber-based ensemble refinement, quantum-mechanical representation of active sites and improved geometric restraints for simulated annealing.

Download Full-text

Structure-Guided Computational Approaches to Unravel Druggable Proteomic Landscape of Mycobacterium leprae

Frontiers in Molecular Biosciences ◽

10.3389/fmolb.2021.663301 ◽

2021 ◽

Vol 8 ◽

Author(s):

Sundeep Chaitanya Vedithi ◽

Sony Malhotra ◽

Marta Acebrón-García-de-Eulate ◽

Modestas Matusevicius ◽

Pedro Henrique Monteiro Torres ◽

...

Keyword(s):

Drug Discovery ◽

Schwann Cells ◽

Protein Structures ◽

Mycobacterium Leprae ◽

Data Bank ◽

Nerve Damage ◽

Structural Proteomics ◽

Bacterial Survival ◽

Functional Sites

Leprosy, caused by Mycobacterium leprae (M. leprae), is treated with a multidrug regimen comprising Dapsone, Rifampicin, and Clofazimine. These drugs exhibit bacteriostatic, bactericidal and anti-inflammatory properties, respectively, and control the dissemination of infection in the host. However, the current treatment is not cost-effective, does not favor patient compliance due to its long duration (12 months) and does not protect against the incumbent nerve damage, which is a severe leprosy complication. The chronic infectious peripheral neuropathy associated with the disease is primarily due to the bacterial components infiltrating the Schwann cells that protect neuronal axons, thereby inducing a demyelinating phenotype. There is a need to discover novel/repurposed drugs that can act as short duration and effective alternatives to the existing treatment regimens, preventing nerve damage and consequent disability associated with the disease. Mycobacterium leprae is an obligate pathogen resulting in experimental intractability to cultivate the bacillus in vitro and limiting drug discovery efforts to repositioning screens in mouse footpad models. The dearth of knowledge related to structural proteomics of M. leprae, coupled with emerging antimicrobial resistance to all the three drugs in the multidrug therapy, poses a need for concerted novel drug discovery efforts. A comprehensive understanding of the proteomic landscape of M. leprae is indispensable to unravel druggable targets that are essential for bacterial survival and predilection of human neuronal Schwann cells. Of the 1,614 protein-coding genes in the genome of M. leprae, only 17 protein structures are available in the Protein Data Bank. In this review, we discussed efforts made to model the proteome of M. leprae using a suite of software for protein modeling that has been developed in the Blundell laboratory. Precise template selection by employing sequence-structure homology recognition software, multi-template modeling of the monomeric models and accurate quality assessment are the hallmarks of the modeling process. Tools that map interfaces and enable building of homo-oligomers are discussed in the context of interface stability. Other software is described to determine the druggable proteome by using information related to the chokepoint analysis of the metabolic pathways, gene essentiality, homology to human proteins, functional sites, druggable pockets and fragment hotspot maps.

Download Full-text

Computational design and experimental verification of a symmetric protein homodimer

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1505072112 ◽

2015 ◽

Vol 112 (34) ◽

pp. 10714-10719 ◽

Cited By ~ 25

Author(s):

Yun Mou ◽

Po-Ssu Huang ◽

Fang-Ciao Hsu ◽

Shing-Jong Huang ◽

Stephen L. Mayo

Keyword(s):

Protein Interactions ◽

De Novo ◽

Computational Design ◽

Atomic Level ◽

Protein Interfaces ◽

Unknown Structure ◽

Α Helix ◽

Distinct Features ◽

Novel Protein

Homodimers are the most common type of protein assembly in nature and have distinct features compared with heterodimers and higher order oligomers. Understanding homodimer interactions at the atomic level is critical both for elucidating their biological mechanisms of action and for accurate modeling of complexes of unknown structure. Computation-based design of novel protein–protein interfaces can serve as a bottom-up method to further our understanding of protein interactions. Previous studies have demonstrated that the de novo design of homodimers can be achieved to atomic-level accuracy by β-strand assembly or through metal-mediated interactions. Here, we report the design and experimental characterization of a α-helix–mediated homodimer with C2 symmetry based on a monomeric Drosophila engrailed homeodomain scaffold. A solution NMR structure shows that the homodimer exhibits parallel helical packing similar to the design model. Because the mutations leading to dimer formation resulted in poor thermostability of the system, design success was facilitated by the introduction of independent thermostabilizing mutations into the scaffold. This two-step design approach, function and stabilization, is likely to be generally applicable, especially if the desired scaffold is of low thermostability.

Download Full-text

QAcon: single model quality assessment using protein structural and contact information with machine learning techniques

Bioinformatics ◽

10.1093/bioinformatics/btw694 ◽

2016 ◽

pp. btw694 ◽

Cited By ~ 12

Author(s):

Renzhi Cao ◽

Badri Adhikari ◽

Debswapna Bhattacharya ◽

Miao Sun ◽

Jie Hou ◽

...

Keyword(s):

Machine Learning ◽

Quality Assessment ◽

Machine Learning Techniques ◽

Single Model ◽

Model Quality ◽

Model Quality Assessment ◽

Contact Information ◽

Learning Techniques

Download Full-text

BionoiNet: ligand-binding site classification with off-the-shelf deep neural network

Bioinformatics ◽

10.1093/bioinformatics/btaa094 ◽

2020 ◽

Vol 36 (10) ◽

pp. 3077-3083

Author(s):

Wentao Shi ◽

Jeffrey M Lemoine ◽

Abd-El-Monsif A Shawky ◽

Manali Singha ◽

Limeng Pu ◽

...

Keyword(s):

Neural Network ◽

Deep Learning ◽

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

Supplementary Information ◽

Heme Binding ◽

Unseen Data ◽

Ligand Binding Sites ◽

Binding Pockets

Abstract Motivation Fast and accurate classification of ligand-binding sites in proteins with respect to the class of binding molecules is invaluable not only to the automatic functional annotation of large datasets of protein structures but also to projects in protein evolution, protein engineering and drug development. Deep learning techniques, which have already been successfully applied to address challenging problems across various fields, are inherently suitable to classify ligand-binding pockets. Our goal is to demonstrate that off-the-shelf deep learning models can be employed with minimum development effort to recognize nucleotide- and heme-binding sites with a comparable accuracy to highly specialized, voxel-based methods. Results We developed BionoiNet, a new deep learning-based framework implementing a popular ResNet model for image classification. BionoiNet first transforms the molecular structures of ligand-binding sites to 2D Voronoi diagrams, which are then used as the input to a pretrained convolutional neural network classifier. The ResNet model generalizes well to unseen data achieving the accuracy of 85.6% for nucleotide- and 91.3% for heme-binding pockets. BionoiNet also computes significance scores of pocket atoms, called BionoiScores, to provide meaningful insights into their interactions with ligand molecules. BionoiNet is a lightweight alternative to computationally expensive 3D architectures. Availability and implementation BionoiNet is implemented in Python with the source code freely available at: https://github.com/CSBG-LSU/BionoiNet. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text