Thousands of protein linear motif classes may still be undiscovered

Linear motifs are short protein subsequences that mediate protein interactions. Hundreds of motif classes including thousands of motif instances are known. Our theory estimates how many motif classes remain undiscovered. As commonly done, we describe motif classes as regular expressions specifying motif length and the allowed amino acids at each motif position. We measure motif specificity for a pair of motif classes by quantifying how many motif-discriminating positions prevent a protein subsequence from matching the two classes at once. We derive theorems for the maximal number of motif classes that can simultaneously maintain a certain number of motif-discriminating positions between all pairs of classes in the motif universe, for a given amino acid alphabet. We also calculate the fraction of all protein subsequences that would belong to a motif class if all potential motif classes came into existence. Naturally occurring pairs of motif classes present most often a single motif-discriminating position. This mild specificity maximizes the potential number of coexisting motif classes, the expansion of the motif universe due to amino acid modifications and the fraction of amino acid sequences that code for a motif instance. As a result, thousands of linear motif classes may remain undiscovered.

Download Full-text

Comparison of Naturally Occurring Vitamin K-Dependent Proteins: Correlation of Amino Acid Sequences and Membrane Binding Properties Suggests a Membrane Contact Site†

Biochemistry ◽

10.1021/bi9626160 ◽

1997 ◽

Vol 36 (17) ◽

pp. 5120-5127 ◽

Cited By ~ 75

Author(s):

John F. McDonald ◽

Amit M. Shah ◽

Ruth A. Schwalbe ◽

Walter Kisiel ◽

Björn Dahlbäck ◽

...

Keyword(s):

Amino Acid ◽

Vitamin K ◽

Membrane Binding ◽

Amino Acid Sequences ◽

Contact Site ◽

Binding Properties ◽

Naturally Occurring ◽

Membrane Contact

Download Full-text

Predicting Protein-Protein Interactions from Amino Acid Sequences Using SaE-ELM Combined with Continuous Wavelet Descriptor and PseAA Composition

Intelligent Computing Theories and Methodologies - Lecture Notes in Computer Science ◽

10.1007/978-3-319-22186-1_63 ◽

2015 ◽

pp. 634-645 ◽

Cited By ~ 2

Author(s):

Yu-An Huang ◽

Zhu-Hong You ◽

Jianqiang Li ◽

Leon Wong ◽

Shubin Cai

Keyword(s):

Amino Acid ◽

Protein Interactions ◽

Amino Acid Sequences ◽

Continuous Wavelet ◽

Protein Protein Interactions ◽

Wavelet Descriptor

Download Full-text

Detection of Protein-Protein Interactions from Amino Acid Sequences Using a Rotation Forest Model with a Novel PR-LPQ Descriptor

Lecture Notes in Computer Science - Advanced Intelligent Computing Theories and Applications ◽

10.1007/978-3-319-22053-6_75 ◽

2015 ◽

pp. 713-720 ◽

Cited By ~ 13

Author(s):

Leon Wong ◽

Zhu-Hong You ◽

Shuai Li ◽

Yu-An Huang ◽

Gang Liu

Keyword(s):

Amino Acid ◽

Protein Interactions ◽

Amino Acid Sequences ◽

Protein Protein Interactions ◽

Rotation Forest ◽

Forest Model

Download Full-text

Prediction of protein-protein interactions from amino acid sequences with ensemble extreme learning machines and principal component analysis

BMC Bioinformatics ◽

10.1186/1471-2105-14-s8-s10 ◽

2013 ◽

Vol 14 (S8) ◽

Cited By ~ 140

Author(s):

Zhu-Hong You ◽

Ying-Ke Lei ◽

Lin Zhu ◽

Junfeng Xia ◽

Bing Wang

Keyword(s):

Principal Component Analysis ◽

Amino Acid ◽

Protein Interactions ◽

Principal Component ◽

Component Analysis ◽

Amino Acid Sequences ◽

Extreme Learning Machines ◽

Protein Protein Interactions ◽

Learning Machines

Download Full-text

Co-evolution analysis to predict protein–protein interactions within influenza virus envelope

Journal of Bioinformatics and Computational Biology ◽

10.1142/s021972001441008x ◽

2014 ◽

Vol 12 (02) ◽

pp. 1441008 ◽

Cited By ~ 8

Author(s):

Ramil R. Mintaev ◽

Andrei V. Alexeevski ◽

Larisa V. Kordyukova

Keyword(s):

Amino Acid ◽

Protein Interactions ◽

Influenza A ◽

Matrix Protein ◽

Amino Acid Sequences ◽

Viral Envelope ◽

Research Database ◽

Virus Envelope ◽

Physical Constraints ◽

Matrix Protein M1

Interactions between integral membrane proteins hemagglutinin (HA), neuraminidase (NA), M2 and membrane-associated matrix protein M1 of influenza A virus are thought to be crucial for assembly of functionally competent virions. We hypothesized that the amino acid residues located at the interface of two different proteins are under physical constraints and thus probably co-evolve. To predict co-evolving residue pairs, the EvFold ( http://evfold.org ) program searching the (nontransitive) Direct Information scores was applied for large samplings of amino acid sequences from Influenza Research Database ( http://www.fludb.org/ ). Having focused on the HA, NA, and M2 cytoplasmic tails as well as C-terminal domain of M1 (being the less conserved among the protein domains) we captured six pairs of correlated positions. Among them, there were one, two, and three position pairs for HA–M2, HA–M1, and M2–M1 protein pairs, respectively. As expected, no co-varying positions were found for NA–HA, NA–M1, and NA–M2 pairs obviously due to high conservation of the NA cytoplasmic tail. The sum of frequencies calculated for two major amino acid patterns observed in pairs of correlated positions was up to 0.99 meaning their high to extreme evolutionary sustainability. Based on the predictions a hypothetical model of pair-wise protein interactions within the viral envelope was proposed.

Download Full-text

Multi-scale encoding of amino acid sequences for predicting protein interactions using gradient boosting decision tree

PLoS ONE ◽

10.1371/journal.pone.0181426 ◽

2017 ◽

Vol 12 (8) ◽

pp. e0181426 ◽

Cited By ~ 15

Author(s):

Chang Zhou ◽

Hua Yu ◽

Yijie Ding ◽

Fei Guo ◽

Xiu-Jun Gong

Keyword(s):

Amino Acid ◽

Decision Tree ◽

Protein Interactions ◽

Amino Acid Sequences ◽

Gradient Boosting ◽

Multi Scale

Download Full-text

Proteome-scale amino-acid resolution footprinting of protein-binding sites in the intrinsically disordered regions of the human proteome

10.1101/2021.04.13.439572 ◽

2021 ◽

Author(s):

Caroline Benz ◽

Muhammad Ali ◽

Izabella Krystkowiak ◽

Leandro Simonetti ◽

Ahmed Sayadi ◽

...

Keyword(s):

Amino Acid ◽

Protein Interactions ◽

Human Proteome ◽

Cell Physiology ◽

Protein Protein Interactions ◽

Linear Motif ◽

Intrinsically Disordered ◽

Intrinsically Disordered Regions ◽

Wide Range ◽

Disordered Regions

Specific protein-protein interactions are central to all processes that underlie cell physiology. Numerous studies using a wide range of experimental approaches have identified tens of thousands of human protein-protein interactions. However, many interactions remain to be discovered, and low affinity, conditional and cell type-specific interactions are likely to be disproportionately under-represented. Moreover, for most known protein-protein interactions the binding regions remain uncharacterized. We previously developed proteomic peptide phage display (ProP-PD), a method for simultaneous proteome-scale identification of short linear motif (SLiM)-mediated interactions and footprinting of the binding region with amino acid resolution. Here, we describe the second-generation human disorderome (HD2), an optimized ProP-PD library that tiles all disordered regions of the human proteome and allows the screening of ~1,000,000 overlapping peptides in a single binding assay. We define guidelines for how to process, filter and rank the results and provide PepTools, a toolkit for annotation and analysis of identified hits. We uncovered 2,161 interaction pairs for 35 known SLiM-binding domains and confirmed a subset of 38 interactions by biophysical or cell-based assays. Finally, we show how the amino acid resolution binding site information can be used to pinpoint functionally important disease mutations and phosphorylation events in intrinsically disordered regions of the human proteome. The HD2 ProP-PD library paired with PepTools represents a powerful pipeline for unbiased proteome-wide discovery of SLiM-based interactions.

Download Full-text

An in silico approach to analyze HCV genotype-specific binding-site variation and its effect on drug–protein interaction

Scientific Reports ◽

10.1038/s41598-020-77720-9 ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Ramsha Khalid ◽

Muhammad Faraz Anwar ◽

Muhammad Aanish Raees ◽

Sadaf Naeem ◽

Syed Hani Abidi ◽

...

Keyword(s):

Amino Acid ◽

Protein Interaction ◽

Protein Interactions ◽

In Silico ◽

Specific Binding ◽

Antiviral Treatment ◽

Amino Acid Sequences ◽

Genotype 1A ◽

Genotype Variation ◽

Site Variation

AbstractGenotype variation in viruses can affect the response of antiviral treatment. Several studies have established approaches to determine genotype-specific variations; however, analyses to determine the effect of these variations on drug–protein interactions remain unraveled. We present an in-silico approach to explore genotype-specific variations and their effect on drug–protein interaction. We have used HCV NS3 helicase and fluoroquinolones as a model for drug–protein interaction and have investigated the effect of amino acid variations in HCV NS3 of genotype 1a, 1b, 2b and 3a on NS3-fluoroquinolone interaction. We retrieved 687, 667, 101 and 248 nucleotide sequences of HCV NS3 genotypes 1a, 1b, 2b, and 3a, respectively, and translated these into amino acid sequences and used for genotype variation analysis, and also to construct 3D protein models for 2b and 3a genotypes. For 1a and 1b, crystal structures were used. Drug–protein interactions were determined using molecular docking analyses. Our results revealed that individual genotype-specific HCV NS3 showed substantial sequence heterogeneity that resulted in variations in docking interactions. We believe that our approach can be extrapolated to include other viruses to study the clinical significance of genotype-specific variations in drug–protein interactions.

Download Full-text

Computer Analysis of Phytochrome Sequences from Five Species: Implications for the Mechanism of Action

Zeitschrift für Naturforschung C ◽

10.1515/znc-1990-9-1010 ◽

1990 ◽

Vol 45 (9-10) ◽

pp. 987-998 ◽

Cited By ~ 10

Author(s):

Michael D. Partis ◽

Rudolf Grimm

Keyword(s):

Arabidopsis Thaliana ◽

Amino Acid ◽

Lipid Bilayers ◽

Protein Interactions ◽

Avena Sativa ◽

Sequence Similarity ◽

Amino Acid Sequences ◽

Amino Acid Sequence Similarity ◽

Post Translational Modification ◽

C Terminus

Abstract The amino acid sequences of phytochrome from Avena sativa, Oryza sativa, Curcurbita pepo, Pisum sativum and Arabidopsis thaliana have been analyzed with a variety of computer programs, with a view to identifying areas of the protein which contribute to the properties of this photoreceptor. A region at the C-terminus has been shown to be amphiphilic, and by analogy with surface-seeking peptides, may be responsible for interaction of phytochrome with lipid bilayers. Possible targeting sequences in phytochromes have been identified, including a series of four basic residues which correspond to those responsible for transport of nuclear-located proteins. Sites capable of post-translational modification have been found in monocot sequences, but not in dicot sequences. Areas of the phytochrome molecule which are exposed on the surface of the portein, and which are therefore capable of interaction with other cellular macromolecules, have been identified. Analogies with other biliproteins have been used to define minimum chromophore-protein interactions. Possible enzymic activities associated with phytochromes have been discussed with respect to local amino acid sequence similarity with enzymes.

Download Full-text

Cloning and functional expression of B chains of β-bungarotoxins from Bungarus multicinctus (Taiwan banded krait)

Biochemical Journal ◽

10.1042/bj3340087 ◽

1998 ◽

Vol 334 (1) ◽

pp. 87-92 ◽

Cited By ~ 27

Author(s):

Pei-Fung WU ◽

Sheng-Nan WU ◽

Chun-Chang CHANG ◽

Long-Sen CHANG

Keyword(s):

Amino Acid ◽

Fusion Protein ◽

Protein Interactions ◽

Functional Expression ◽

Coli Strain ◽

Amino Acid Sequences ◽

Protein Sequencing ◽

K Channel ◽

Venom Glands ◽

A Chain

The cDNA species encoding the B chains (B1 and B2) of β-bungarotoxins (β-Bgt) were constructed from the cellular RNA isolated from the venom glands of Bungarus multicinctus (Taiwan banded krait). The deduced amino acid sequences of the B chains were different from those determined previously by a protein sequencing technique. One additional Arg residue is inserted between Val-19 and Arg-20 of the B1 chain. Similarly the insertion of one additional Val residue between Val-19 and Arg-20 of the B2 chain is noted. Thus the B chains should comprise 61 amino acid residues. Moreover, the residues at positions 44–46 are Gly-Asn-His, in contrast with a previous result showing the sequence His-Gly-Asn. Instead of Asp, the residues at positions 41 and 43 are Asn. The B chain was subcloned into the expression vector pET-32a(+) and transformed into Escherichia coli strain BL21(DE3). The recombinant B chain was expressed as a fusion protein and purified on a His-Bind resin column. The yield of affinity-purified fusion protein was increased markedly by replacing Cys-55 of the B chain with Ser. However, the isolated B(C55S) chain became insoluble in aqueous solution after removal of the fused protein from the affinity-purified product, suggesting that protein–protein interactions might be crucial for stabilizing the structure of the B chain. The B(C55S) chain fusion protein showed activity in blocking the voltage-dependent K+ channel, but did not inhibit the binding of β-Bgt to synaptosomal membranes. These results, together with the finding that modification of His-48 of the A chain of β-Bgt caused a marked decrease in the ability to bind toxin to its acceptor proteins, suggest that the B chain is involved in the K+ channel blocking action observed with β-Bgt, and that the binding of β-Bgt to neuronal receptors is not heavily dependent on the B chain.

Download Full-text