Metallothionein: Protein structure prediction and sequence analyses in pigeon pea(Cajanuscajan)

Protein therapeutics play an important role in controlling the functions and activities of disease-causing proteins in modern medicine. Despite protein therapeutics having several advantages over traditional small-molecule therapeutics, further development has been hindered by drug complexity and delivery issues. However, recent progress in deep learning-based protein structure prediction approaches such as AlphaFold opens new opportunities to exploit the complexity of these macro-biomolecules for highly-specialised design to inhibit, regulate or even manipulate specific disease-causing proteins. Anti-CRISPR proteins are small proteins from bacteriophages that counter-defend against the prokaryotic adaptive immunity of CRISPR-Cas systems. They are unique examples of natural protein therapeutics that have been optimized by the host-parasite evolutionary arms race to inhibit a wide variety of host proteins. Here, we show that these Anti-CRISPR proteins display diverse inhibition mechanisms through accurate structural prediction and functional analysis. We find that these phage-derived proteins are extremely distinct in structure, some of which have no homologues in the current protein structure domain. Furthermore, we find a novel family of Anti-CRISPR proteins which are structurally homologous to the recently-discovered mechanism of manipulating host proteins through enzymatic activity, rather than through direct inference. Using highly accurate structure prediction, we present a wide variety of protein-manipulating strategies of anti-CRISPR proteins for future protein drug design.

Download Full-text

Protein Structure Prediction: Recognition of Primary, Secondary, and Tertiary Structural Features from Amino Acid Sequence

Critical Reviews in Biochemistry and Molecular Biology ◽

10.3109/10409239509085139 ◽

1995 ◽

Vol 30 (1) ◽

pp. 1-94 ◽

Cited By ~ 105

Author(s):

Frank Eisenhaber ◽

Bengt Persson ◽

Patrick Argos

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Amino Acid Sequence ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Structural Features

Download Full-text

Bayesian inference of protein structure from chemical shift data

10.7287/peerj.preprints.692 ◽

2014 ◽

Author(s):

Lars A Bratholm ◽

Anders Steen Christensen ◽

Thomas Hamelryck ◽

Jan H Jensen

Keyword(s):

Protein Structure ◽

Chemical Shift ◽

Probability Distribution ◽

Structure Prediction ◽

Calculated Data ◽

Cauchy Distribution ◽

Joint Probability Distribution ◽

Solvent Exposure ◽

Chemical Shift Prediction ◽

Small Proteins

Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.

Download Full-text

Fast and adaptive protein structure representations for machine learning

10.1101/2021.04.07.438777 ◽

2021 ◽

Author(s):

Janani Durairaj ◽

Mehmet Akdel ◽

Dick de Ridder ◽

Aalt D.J. van Dijk

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Structural Alignment ◽

Structural Similarity ◽

Structural Features ◽

Structure Alignment ◽

Learning Tasks ◽

Alignment Free ◽

Functional Hierarchy ◽

Invariant Shape

The growing prevalence and popularity of protein structure data, both experimental and computationally modelled, necessitates fast tools and algorithms to enable exploratory and interpretable structure-based machine learning. Alignment-free approaches have been developed for divergent proteins, but proteins sharing functional and structural similarity are often better understood via structural alignment, which has typically been too computationally expensive for larger datasets. Here, we introduce the concept of rotation-invariant shape-mers to multiple structure alignment, creating a structure aligner that scales well with the number of proteins and allows for aligning over a thousand structures in 20 minutes. We demonstrate how alignment-free shape-mer counts and aligned structural features, when used in machine learning tasks, can adapt to different levels of functional hierarchy in protein kinases, pinpointing residues and structural fragments that play a role in catalytic activity.

Download Full-text

MPDB: a unified multi-domain protein structure database integrating structural analogue detection

10.1101/2021.10.27.466092 ◽

2021 ◽

Author(s):

Chunxiang Peng ◽

Xiaogen Zhou ◽

Yuhao Xia ◽

Yang Zhang ◽

Guijun Zhang

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Protein Structures ◽

Structural Similarity ◽

Structure Database ◽

Eukaryotic Proteins ◽

Domain Models ◽

Input Domain ◽

Domain Protein ◽

Multiple Domains

With the development of protein structure prediction methods and biological experimental determination techniques, the structure of single-domain proteins can be relatively easier to be modeled or experimentally solved. However, more than 80% of eukaryotic proteins and 67% of prokaryotic proteins contain multiple domains. Constructing a unified multi-domain protein structure database will promote the research of multi-domain proteins, especially in the modeling of multi-domain protein structures. In this work, we develop a unified multi-domain protein structure database (MPDB). Based on MPDB, we also develop a server with two functional modules: (1) the culling module, which filters the whole MPDB according to input criteria; (2) the detection module, which identifies structural analogues of the full-chain according to the structural similarity between input domain models and the protein in MPDB. The module can discover the potential analogue structures, which will contribute to high-quality multi-domain protein structure modeling.

Download Full-text

Protein language model embeddings for fast, accurate, alignment-free protein structure prediction

10.1101/2021.07.31.454572 ◽

2021 ◽

Author(s):

Konstantin Weissenow ◽

Michael Heinzinger ◽

Burkhard Rost

Keyword(s):

Protein Structure ◽

Structure Prediction ◽

Prediction Models ◽

Language Model ◽

Structural Features ◽

Language Models ◽

Evolutionary Information ◽

Major Advance ◽

Sequence Alignments ◽

Multiple Sequence

All state-of-the-art (SOTA) protein structure predictions rely on evolutionary information captured in multiple sequence alignments (MSAs), primarily on evolutionary couplings (co-evolution). Such information is not available for all proteins and is computationally expensive to generate. Prediction models based on Artificial Intelligence (AI) using only single sequences as input are easier and cheaper but perform so poorly that speed becomes irrelevant. Here, we described the first competitive AI solution exclusively inputting embeddings extracted from pre-trained protein Language Models (pLMs), namely from the transformer pLM ProtT5, from single sequences into a relatively shallow (few free parameters) convolutional neural network (CNN) trained on inter-residue distances, i.e. protein structure in 2D. The major advance originated from processing the attention heads learned by ProtT5. Although these models required at no point any MSA, they matched the performance of methods relying on co-evolution. Although not reaching the very top, our lean approach came close at substantially lower costs thereby speeding up development and each future prediction. By generating protein-specific rather than family-averaged predictions, these new solutions could distinguish between structural features differentiating members of the same family of proteins with similar structure predicted alike by all other top methods.

Download Full-text

Bayesian inference of protein structure from chemical shift data

10.7287/peerj.preprints.692v1 ◽

2014 ◽

Author(s):

Lars A Bratholm ◽

Anders Steen Christensen ◽

Thomas Hamelryck ◽

Jan H Jensen

Keyword(s):

Protein Structure ◽

Chemical Shift ◽

Probability Distribution ◽

Structure Prediction ◽

Calculated Data ◽

Cauchy Distribution ◽

Joint Probability Distribution ◽

Solvent Exposure ◽

Chemical Shift Prediction ◽

Small Proteins

Protein chemical shifts are routinely used to augment molecular mechanics force fields in protein structure simulations, with weights of the chemical shift restraints determined empirically. These weights, however, might not be an optimal descriptor of a given protein structure and predictive model, and a bias is introduced which might result in incorrect structures. In the inferential structure determination framework, both the unknown structure and the disagreement between experimental and back-calculated data are formulated as a joint probability distribution, thus utilizing the full information content of the data. Here, we present the formulation of such a probability distribution where the error in chemical shift prediction is described by either a Gaussian or Cauchy distribution. The methodology is demonstrated and compared to a set of empirically weighted potentials through Markov chain Monte Carlo simulations of three small proteins (ENHD, Protein G and the SMN Tudor Domain) using the PROFASI force field and the chemical shift predictor CamShift. Using a clustering-criterion for identifying the best structure, together with the addition of a solvent exposure scoring term, the simulations suggests that sampling both the structure and the uncertainties in chemical shift prediction leads more accurate structures compared to conventional methods using empirical determined weights. The Cauchy distribution, using either sampled uncertainties or predetermined weights, did, however, result in overall better convergence to the native fold, suggesting that both types of distribution might be useful in different aspects of the protein structure prediction.

Download Full-text

A systematic structural comparison of all solved small proteins (4-6 kDa) reveals the weight of disulfide bonds in proteins' foldability

10.1101/2021.03.30.437752 ◽

2021 ◽

Author(s):

Mariana Hoyer Moreira ◽

Fabio C. L. Almeida ◽

Tatiana Domitrovic ◽

Fernando L. Palhano

Keyword(s):

Protein Structure ◽

Disulfide Bond ◽

Structure Prediction ◽

Disulfide Bonds ◽

Hydrophobic Core ◽

Structural Comparison ◽

Small Proteins ◽

Protein Prediction ◽

Aqueous Solvent ◽

Tertiary Protein Structure

Defensins are small proteins, usually ranging from 4 to 6 kDa, amphipathic, disulfide-rich, and with a small or even absent hydrophobic core. Since a hydrophobic core is generally found in globular proteins that fold in an aqueous solvent, the peculiar fold of defensins can challenge tertiary protein structure predictors. We performed a PDB-wide survey of small proteins (4-6 kDa) to understand the similarities of defensins with other small disulfide-rich proteins. We found no differences when we compared defensins with non-defensins regarding the proportion and exposition to the solvent of apolar, polar, and charged residues. Then we divided all small proteins (4-6 kDa) deposited in PDB into two groups, one group with at least one disulfide bond (bonded, defensins included) and another group without any disulfide bond (unbonded). The group of bonded proteins presented apolar residues more exposed to the solvent than the unbonded group. The ab initio algorithm for tertiary protein structure prediction Robetta was more accurate to predict unbonded than bonded proteins. Our work highlights one more layer of complexity for the tertiary protein prediction structure: small disulfide-rich proteins' ability to fold even with a poor hydrophobic core.

Download Full-text

Complementing sequence-derived features with structural information extracted from fragment libraries for protein structure prediction

BMC Bioinformatics ◽

10.1186/s12859-021-04258-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Siyuan Liu ◽

Tong Wang ◽

Qijiang Xu ◽

Bin Shao ◽

Jian Yin ◽

...

Keyword(s):

Protein Folding ◽

Protein Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Structural Information ◽

Three Dimensional ◽

Structural Features ◽

Dimensional Structure ◽

Fragment Assembly ◽

Fragment Libraries

Abstract Background Fragment libraries play a key role in fragment-assembly based protein structure prediction, where protein fragments are assembled to form a complete three-dimensional structure. Rich and accurate structural information embedded in fragment libraries has not been systematically extracted and used beyond fragment assembly. Methods To better leverage the valuable structural information for protein structure prediction, we extracted seven types of structural information from fragment libraries. We broadened the usage of such structural information by transforming fragment libraries into protein-specific potentials for gradient-descent based protein folding and encoding fragment libraries as structural features for protein property prediction. Results Fragment libraires improved the accuracy of protein folding and outperformed state-of-the-art algorithms with respect to predicted properties, such as torsion angles and inter-residue distances. Conclusion Our work implies that the rich structural information extracted from fragment libraries can complement sequence-derived features to help protein structure prediction.

Download Full-text

Comparing Parallel Algorithms for Van der Waals Energy with Cell-List Technique for Protein Structure Prediction

10.5753/wperformance.2018.3322 ◽

2018 ◽

Author(s):

Daniel R. F. Bonetti ◽

Gesiel Rios Lopes ◽

Alexandre C. B. Delbem ◽

Paulo S. L. Souza ◽

Kalinka C. Branco ◽

...

Keyword(s):

Protein Structure ◽

Parallel Algorithms ◽

Ab Initio ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Van Der Waals ◽

Parallel Execution ◽

Programming Models ◽

Preliminary Results ◽

Small Proteins

This paper compares the runtime of three distinct parallel algorithms for the evaluation of an ab initio and full-atom approach based on GA and celllist technique, in order to minimize the van der Waals energy. The three parallel algorithms are developed in C and use one of these programming models: MPI, OpenMP or hybrid (MPI+OpenMP). Our preliminary results show that van der Waals Energy are executed faster and with better speedups when using hybrid and more flexible parallel algorithms to predict the structure of larger proteins. We also show that for small proteins the communication of MPI imposes a high overhead for the parallel execution and, thus the OpenMP presents a better relation cost x benefit in such cases.

Download Full-text