An extension of Wang protein design model using Blosum62 substitution matrix

Mapping Intimacies ◽

10.1101/2021.06.07.447415 ◽

2021 ◽

Author(s):

Amin Rahmani ◽

Fatemeh Zare-Mirakabad

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Protein Design ◽

Tertiary Structure ◽

Amino Acid Replacement ◽

Vital Role ◽

The Body ◽

Substitution Matrix ◽

Evolutionary Information ◽

Deep Learning Model

Humans life depends on the functionality of molecules in the body. One of these essential molecules is the protein that plays a vital role in our life, such that its malfunction can cause severe damages. Such roles make protein structure and its functionality necessary to understand. One of the problems that help us understand the relation between protein structure is the well-known protein design problem which attempts to find an amino acid sequence that can fold into a desired tertiary structure. However, despite having an acceptable accuracy in protein design, this accuracy is an identical percentage of amino acid retrieving. At the same time, it is well-known that amino acids can replace each other in evolution while the function and structure of protein stay the same. Thus the designed sequence does not have the opportunity to be close to the target in the evolutionary aspect. This paper presents an extension to Wang's deep learning model, which uses evolutionary information in the Blosum62 substitution matrix to take amino acid replacement probability into account while designing a sequence.

Download Full-text

Evolutionary divergence and salinity-mediated selection in halophilic archaea

Microbiology and Molecular Biology Reviews ◽

10.1128/mmbr.61.1.90-104.1997 ◽

1997 ◽

Vol 61 (1) ◽

pp. 90-104

Author(s):

P P Dennis ◽

L C Shimmin

Keyword(s):

Amino Acid ◽

Tertiary Structure ◽

Halophilic Archaea ◽

Amino Acid Replacement ◽

Evolutionary Divergence ◽

Ionic Balance ◽

Amino Acid Residues ◽

Nucleotide Substitutions ◽

Nonsynonymous Substitutions ◽

Environmental Salinity

Halophilic (literally salt-loving) archaea are a highly evolved group of organisms that are uniquely able to survive in and exploit hypersaline environments. In this review, we examine the potential interplay between fluctuations in environmental salinity and the primary sequence and tertiary structure of halophilic proteins. The proteins of halophilic archaea are highly adapted and magnificently engineered to function in an intracellular milieu that is in ionic balance with an external environment containing between 2 and 5 M inorganic salt. To understand the nature of halophilic adaptation and to visualize this interplay, the sequences of genes encoding the L11, L1, L10, and L12 proteins of the large ribosome subunit and Mn/Fe superoxide dismutase proteins from three genera of halophilic archaea have been aligned and analyzed for the presence of synonymous and nonsynonymous nucleotide substitutions. Compared to homologous eubacterial genes, these halophilic genes exhibit an inordinately high proportion of nonsynonymous nucleotide substitutions that result in amino acid replacement in the encoded proteins. More than one-third of the replacements involve acidic amino acid residues. We suggest that fluctuations in environmental salinity provide the driving force for fixation of the excessive number of nonsynonymous substitutions. Tinkering with the number, location, and arrangement of acidic and other amino acid residues influences the fitness (i.e., hydrophobicity, surface hydration, and structural stability) of the halophilic protein. Tinkering is also evident at halophilic protein positions monomorphic or polymorphic for serine; more than one-third of these positions use both the TCN and the AGY serine codons, indicating that there have been multiple nonsynonymous substitutions at these positions. Our model suggests that fluctuating environmental salinity prevents optimization of fitness for many halophilic proteins and helps to explain the unusual evolutionary divergence of their encoding genes.

Download Full-text

Protein structure prediction and design in a biologically-realistic implicit membrane

10.1101/630715 ◽

2019 ◽

Author(s):

Rebecca F. Alford ◽

Patrick J. Fleming ◽

Karen G. Fleming ◽

Jeffrey J. Gray

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Membrane Proteins ◽

Membrane Protein ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

De Novo ◽

Computational Design ◽

Amino Acid Distribution

ABSTRACTProtein design is a powerful tool for elucidating mechanisms of function and engineering new therapeutics and nanotechnologies. While soluble protein design has advanced, membrane protein design remains challenging due to difficulties in modeling the lipid bilayer. In this work, we developed an implicit approach that captures the anisotropic structure, shape of water-filled pores, and nanoscale dimensions of membranes with different lipid compositions. The model improves performance in computational bench-marks against experimental targets including prediction of protein orientations in the bilayer, ΔΔG calculations, native structure dis-crimination, and native sequence recovery. When applied to de novo protein design, this approach designs sequences with an amino acid distribution near the native amino acid distribution in membrane proteins, overcoming a critical flaw in previous membrane models that were prone to generating leucine-rich designs. Further, the proteins designed in the new membrane model exhibit native-like features including interfacial aromatic side chains, hydrophobic lengths compatible with bilayer thickness, and polar pores. Our method advances high-resolution membrane protein structure prediction and design toward tackling key biological questions and engineering challenges.Significance StatementMembrane proteins participate in many life processes including transport, signaling, and catalysis. They constitute over 30% of all proteins and are targets for over 60% of pharmaceuticals. Computational design tools for membrane proteins will transform the interrogation of basic science questions such as membrane protein thermodynamics and the pipeline for engineering new therapeutics and nanotechnologies. Existing tools are either too expensive to compute or rely on manual design strategies. In this work, we developed a fast and accurate method for membrane protein design. The tool is available to the public and will accelerate the experimental design pipeline for membrane proteins.

Download Full-text

Computational prediction of the tolerance to amino-acid deletion in green-fluorescent protein

10.1101/079061 ◽

2016 ◽

Author(s):

Eleisha L. Jackson ◽

Stephanie J. Spielman ◽

Claus O. Wilke

Keyword(s):

Green Fluorescent Protein ◽

Protein Structure ◽

Amino Acid ◽

Protein Design ◽

Fluorescent Protein ◽

Computational Prediction ◽

Single Amino Acid ◽

Dimensional Structure ◽

Amino Acid Deletion ◽

Green Fluorescent

AbstractProteins evolve through two primary mechanisms: substitution, where mutations alter a protein’s amino-acid sequence, and insertions and deletions (indels), where amino acids are either added to or removed from the sequence. Protein structure has been shown to influence the rate at which substitutions accumulate across sites in proteins, but whether structure similarly constrains the occurrence of indels has not been rigorously studied. Here, we investigate the extent to which structural properties known to covary with protein evolutionary rates might also predict protein tolerance to indels. Specifically, we analyze a publicly available dataset of single–amino-acid deletion mutations in enhanced green fluorescent protein (eGFP) to assess how well the functional effect of deletions can be predicted from protein structure. We find that weighted contact number (WCN), which measures how densely packed a residue is within the protein’s three-dimensional structure, provides the best single predictor for whether eGFP will tolerate a given deletion. We additionally find that using protein design to explicitly model deletions results in improved predictions of functional status when combined with other structural predictors. Our work suggests that structure plays fundamental role in constraining deletions at sites in proteins, and further that similar biophysical constraints influence both substitutions and deletions. This study therefore provides a solid foundation for future work to examine how protein structure influences tolerance of more complex indel events, such as insertions or large deletions.

Download Full-text

Protein Structure, Models of Sequence Evolution, and Data Type Effects in Phylogenetic Analyses of Mitochondrial Data: A Case Study in Birds

Diversity ◽

10.3390/d13110555 ◽

2021 ◽

Vol 13 (11) ◽

pp. 555

Author(s):

Emily L. Gordon ◽

Rebecca T. Kimball ◽

Edward L. Braun

Keyword(s):

Amino Acids ◽

Protein Structure ◽

Amino Acid ◽

Bird Species ◽

Data Type ◽

Substitution Matrix ◽

Sequence Evolution ◽

Transmembrane Helices ◽

Encoded Proteins ◽

The Impact

Phylogenomic analyses have revolutionized the study of biodiversity, but they have revealed that estimated tree topologies can depend, at least in part, on the subset of the genome that is analyzed. For example, estimates of trees for avian orders differ if protein-coding or non-coding data are analyzed. The bird tree is a good study system because the historical signal for relationships among orders is very weak, which should permit subtle non-historical signals to be identified, while monophyly of orders is strongly corroborated, allowing identification of strong non-historical signals. Hydrophobic amino acids in mitochondrially-encoded proteins, which are expected to be found in transmembrane helices, have been hypothesized to be associated with non-historical signals. We tested this hypothesis by comparing the evolution of transmembrane helices and extramembrane segments of mitochondrial proteins from 420 bird species, sampled from most avian orders. We estimated amino acid exchangeabilities for both structural environments and assessed the performance of phylogenetic analysis using each data type. We compared those relative exchangeabilities with values calculated using a substitution matrix for transmembrane helices estimated using a variety of nuclear- and mitochondrially-encoded proteins, allowing us to compare the bird-specific mitochondrial models with a general model of transmembrane protein evolution. To complement our amino acid analyses, we examined the impact of protein structure on patterns of nucleotide evolution. Models of transmembrane and extramembrane sequence evolution for amino acids and nucleotides exhibited striking differences, but there was no evidence for strong topological data type effects. However, incorporating protein structure into analyses of mitochondrially-encoded proteins improved model fit. Thus, we believe that considering protein structure will improve analyses of mitogenomic data, both in birds and in other taxa.

Download Full-text

Evolutionary Profile for (Host and Viral) MLKL Indicates Its Activities as a Battlefront for Extensive Counteradaptation

Molecular Biology and Evolution ◽

10.1093/molbev/msab256 ◽

2021 ◽

Author(s):

Suzette N Palmer ◽

Sruthi Chappidi ◽

Chelsea Pinkham ◽

Dustin C Hancks

Keyword(s):

Amino Acid ◽

Rapid Evolution ◽

Amino Acid Replacement ◽

Vital Role ◽

Evolutionary Time ◽

Cytokine Induction ◽

Regulated Cell Death ◽

Mouse Cells ◽

Selection For ◽

Human And Mouse

Abstract Pathogen infection triggers host innate defenses which may result in the activation of regulated cell death (RCD) pathways such as apoptosis. Given a vital role in immunity, apoptotic effectors are often counteracted by pathogen-encoded antagonists. Mounting evidence indicates that programmed necrosis, which is mediated by the RIPK3/MLKL axis and termed necroptosis, evolved as a countermeasure to pathogen-mediated inhibition of apoptosis. Yet, it is unclear whether components of this emerging RCD pathway display signatures associated with pathogen conflict that are rare in combination but common to key host defense factors, namely, rapid evolution, viral homolog (virolog), and cytokine induction. We leveraged evolutionary sequence analysis that examines rates of amino acid replacement, which revealed: 1) strong and recurrent signatures of positive selection for primate and bat RIPK3 and MLKL, and 2) elevated rates of amino acid substitution on multiple RIPK3/MLKL surfaces suggestive of past antagonism with multiple, distinct pathogen-encoded inhibitors. Furthermore, our phylogenomics analysis across poxvirus genomes illuminated volatile patterns of evolution for a recently described MLKL viral homolog. Specifically, poxviral MLKLs have undergone numerous gene replacements mediated by duplication and deletion events. In addition, MLKL protein expression is stimulated by interferons in human and mouse cells. Thus, MLKL displays all three hallmarks of pivotal immune factors of which only a handful of factors like OAS1 exhibit. These data support the hypothesis that over evolutionary time MLKL functions—which may include execution of necroptosis—have served as a major determinant of infection outcomes despite gene loss in some host genomes.

Download Full-text

Phylogenetic mixture models for proteins

Philosophical Transactions of the Royal Society B Biological Sciences ◽

10.1098/rstb.2008.0180 ◽

2008 ◽

Vol 363 (1512) ◽

pp. 3965-3976 ◽

Cited By ~ 119

Author(s):

Si Quang Le ◽

Nicolas Lartillot ◽

Olivier Gascuel

Keyword(s):

Amino Acid ◽

Mixture Models ◽

Model Comparison ◽

Tertiary Structure ◽

Amino Acid Replacement ◽

Single Amino Acid ◽

Learning Approaches ◽

Solvent Exposure ◽

Substitution Pattern ◽

Better Than

Standard protein substitution models use a single amino acid replacement rate matrix that summarizes the biological, chemical and physical properties of amino acids. However, site evolution is highly heterogeneous and depends on many factors: genetic code; solvent exposure; secondary and tertiary structure; protein function; etc. These impact the substitution pattern and, in most cases, a single replacement matrix is not enough to represent all the complexity of the evolutionary processes. This paper explores in maximum-likelihood framework phylogenetic mixture models that combine several amino acid replacement matrices to better fit protein evolution. We learn these mixture models from a large alignment database extracted from HSSP, and test the performance using independent alignments from TreeBase . We compare unsupervised learning approaches, where the site categories are unknown, to supervised ones, where in estimations we use the known category of each site, based on its exposure or its secondary structure. All our models are combined with gamma-distributed rates across sites. Results show that highly significant likelihood gains are obtained when using mixture models compared with the best available single replacement matrices. Mixtures of matrices also improve over mixtures of profiles in the manner of the CAT model. The unsupervised approach tends to be better than the supervised one, but it appears difficult to implement and highly sensitive to the starting values of the parameters, meaning that the supervised approach is still of interest for initialization and model comparison. Using an unsupervised model involving three matrices, the average AIC gain per site with TreeBase test alignments is 0.31, 0.49 and 0.61 compared with LG (named after Le & Gascuel 2008 Mol. Biol. Evol. 25 , 1307–1320), WAG and JTT, respectively. This three-matrix model is significantly better than LG for 34 alignments (among 57), and significantly worse for 1 alignment only. Moreover, tree topologies inferred with our mixture models frequently differ from those obtained with single matrices, indicating that using these mixtures impacts not only the likelihood value but also the output tree. All our models and a PhyML implementation are available from http://atgc.lirmm.fr/mixtures .

Download Full-text

PUResNet: prediction of protein-ligand binding sites using deep residual neural network

Journal of Cheminformatics ◽

10.1186/s13321-021-00547-7 ◽

2021 ◽

Vol 13 (1) ◽

Author(s):

Jeevan Kandel ◽

Hilal Tayara ◽

Kil To Chong

Keyword(s):

Protein Structure ◽

Ligand Binding ◽

Binding Sites ◽

Protein Structures ◽

Structural Similarity ◽

Vital Role ◽

Biological Functions ◽

True Nature ◽

Ligand Binding Sites ◽

Deep Learning Model

Abstract Background Predicting protein-ligand binding sites is a fundamental step in understanding the functional characteristics of proteins, which plays a vital role in elucidating different biological functions and is a crucial step in drug discovery. A protein exhibits its true nature after binding to its interacting molecule known as a ligand that binds only in the favorable binding site of the protein structure. Different computational methods exploiting the features of proteins have been developed to identify the binding sites in the protein structure, but none seems to provide promising results, and therefore, further investigation is required. Results In this study, we present a deep learning model PUResNet and a novel data cleaning process based on structural similarity for predicting protein-ligand binding sites. From the whole scPDB (an annotated database of druggable binding sites extracted from the Protein DataBank) database, 5020 protein structures were selected to address this problem, which were used to train PUResNet. With this, we achieved better and justifiable performance than the existing methods while evaluating two independent sets using distance, volume and proportion metrics.

Download Full-text

Improved protein structure prediction by deep learning irrespective of co-evolution information

10.1101/2020.10.12.336859 ◽

2020 ◽

Cited By ~ 1

Author(s):

Jinbo Xu ◽

Matthew Mcpartlon ◽

Jin Li

Keyword(s):

Deep Learning ◽

Protein Structure ◽

Protein Structure Prediction ◽

Protein Design ◽

Structure Prediction ◽

Model Building ◽

Evolutionary Information ◽

Designed Proteins ◽

Structure Relationship ◽

Over The Top

We describe our latest study of the deep convolutional residual neural networks (ResNet) for protein structure prediction, including deeper and wider ResNets, the efficacy of different input features, and improved 3D model building methods. Our ResNet can predict correct folds (TMscore>0.5) for 26 out of 32 CASP13 FM (template-free-modeling) targets and L/5 long-range contacts for these targets with precision over 80%, a significant improvement over the CASP13 results. Although co-evolution analysis plays an important role in the most successful structure prediction methods, we show that when co-evolution is not used, our ResNet can still predict correct folds for 18 of the 32 CASP13 FM targets including several large ones. This marks a significant improvement over the top co-evolution-based, non-deep learning methods at CASP13, and other non-coevolution-based deep learning models, such as the popular recurrent geometric network (RGN). With only primary sequence, our ResNet can also predict correct folds for all 21 human-designed proteins we tested. In contrast, RGN predicts correct folds for only 3 human-designed proteins and zero CASP13 FM target. In addition, we find that ResNet may fare better for the human-designed proteins when trained without co-evolution information than with co-evolution. These results suggest that ResNet does not simply denoise co-evolution signals, but instead is able to learn important sequence-structure relationship from experimental structures. This has important implications on protein design and engineering especially when evolutionary information is not available.

Download Full-text

D76V, L161R, and C117S are the most pathogenic amino acid substitutions with several dangerous consequences on leptin structure, function, and stability

Egyptian Journal of Medical Human Genetics ◽

10.1186/s43042-019-0033-2 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Mohammed Baqur S. Al-Shuhaib

Keyword(s):

Protein Structure ◽

Structure Function ◽

Deleterious Mutation ◽

The Body ◽

Evolutionary Information ◽

Missense Mutations ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Obesity And Diabetes ◽

Polar Interactions

Abstract Background Leptin is a versatile hormone with a variety of functions, including regulation of food intake by inhibiting hunger. Any deleterious mutation in this protein can lead to serious consequences for the body. This study was conducted to identify the most deleterious non-synonymous single-nucleotide polymorphisms (nsSNPs) of human LEP gene and their impact on its encoded protein. Methods To predict the possible impact of nsSNPs on leptin, a total of 90 nsSNPs were retrieved from dbSNP and investigated using many in silico tools which specially designed to analyze nsSNPs’ consequences on the protein structure, function, and stability. Results Three nsSNPs, namely D76V, L161R, and C117S, were found to be completely deleterious by all utilized nsSNPs prediction tools, thus affecting leptin protein structure, biological activity, and stability. Evolutionary information indicated L161R and C117S mutations to be located in extremely high conserved positions. Furthermore, several deleterious mechanisms controlled by both L161R and C117S mutations which alter several motifs in the secondary structure of leptin were detected. However, all D76V, L161R, and C117S mutations exhibited alteration in polar interactions in their representative positions. Further in-depth analyses proved several harmful structural effects of the three nsSNPs on leptin, which may lead to multiple intrinsic disorders in the altered protein forms. Conclusions This study provides the first comprehensive computation of the effect of the most damaging nsSNPs on leptin. The exploration of these missense mutations may present novel perspectives for various deleterious consequences originated from such amino acids substitutions. The dynamics of leptin performance, therefore, in many biological pathways, may be changed to create a variety of disorders, such as obesity and diabetes. These findings will help in detecting the most harmful variations needed to be screened for clinically diagnosed patients with leptin disorders. Trial registration ISRCTN73824458

Download Full-text

A divergent Articulavirus in an Australian gecko identified using meta-transcriptomics and protein structure comparisons

10.1101/2020.05.21.109603 ◽

2020 ◽

Cited By ~ 1

Author(s):

Ayda Susana Ortiz-Baez ◽

John-Sebastian Eden ◽

Craig Moritz ◽

Edward C. Holmes

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Rna Viruses ◽

Sequence Data ◽

Gene Segment ◽

Amino Acid Sequences ◽

Evolutionary Information ◽

Negative Sense

AbstractThe discovery of highly divergent RNA viruses is compromised by their limited sequence similarity to known viruses. Evolutionary information obtained from protein structural modelling offers a powerful approach to detect distantly related viruses based on the conservation of tertiary structures in key proteins such as the viral RNA-dependent RNA polymerase (RdRp). We utilised a template-based approach for protein structure prediction from amino acid sequences to identify distant evolutionary relationships among viruses detected in meta-transcriptomic sequencing data from Australian wildlife. The best predicted protein structural model was compared with the results of similarity searches against protein databases based on amino acid sequence data. Using this combination of meta-transcriptomics and protein structure prediction we identified the RdRp (PB1) gene segment of a divergent negative-sense RNA virus in a native Australian gecko (Geyra lauta) that was confirmed by PCR and Sanger sequencing. Phylogenetic analysis identified the Gecko articulavirus (GECV) as a newly described genus within the family Amnoonviridae, order Articulavirales, that is most closely related to the fish virus Tilapia tilapinevirus (TiLV). These findings provide important insights into the evolution of negative-sense RNA viruses and structural conservation of the viral replicase among members of the order Articulavirales.

Download Full-text