Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method

Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .

Download Full-text

Prediction of MoRFs Based on n-gram Convolutional Neural Network

10.29007/5k4z ◽

2019 ◽

Author(s):

Fang Chun ◽

Yoshitaka Moriwaki ◽

Caihong Li ◽

Kentaro Shimizu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Intrinsically Disordered Proteins ◽

Interaction Networks ◽

Disordered Proteins ◽

Single Model ◽

One Dimensional ◽

Intrinsically Disordered ◽

Machine Learning Model ◽

N Gram

MoRFs usually play as "hub" site in interaction networks of intrinsically disordered proteins. With more and more serious diseases being found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we introduce a multichannel convolutional neural network (CNN) model for MoRFs prediction. This model is generated by expanding the standard one-dimensional CNN model using multiple parallel CNNs that read the sequence with different n-gram sizes (groups of residues). In addition, we add an averaging step to refine the output result of machine learning model. When compared with other methods on the same dataset, our approach achieved a balanced accuracy of 0.682 and an AUC of 0.723, which is the best performance among the single model-based approaches.

Download Full-text

MoRFPred_en: Sequence-based prediction of MoRFs using an ensemble learning strategy

Journal of Bioinformatics and Computational Biology ◽

10.1142/s0219720019400158 ◽

2019 ◽

Vol 17 (06) ◽

pp. 1940015

Author(s):

Chun Fang ◽

Yoshitaka Moriwaki ◽

Caihong Li ◽

Kentaro Shimizu

Keyword(s):

Neural Network ◽

Convolutional Neural Network ◽

Ensemble Learning ◽

Intrinsically Disordered Proteins ◽

Learning Strategy ◽

Disordered Proteins ◽

Support Vector ◽

One Dimensional ◽

Intrinsically Disordered ◽

Molecular Recognition Features

Molecular recognition features (MoRFs) usually act as “hub” sites in the interaction networks of intrinsically disordered proteins (IDPs). Because an increasing number of serious diseases have been found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we propose an ensemble learning strategy, named MoRFPred_en, to predict MoRFs from protein sequences. This approach combines four submodels that utilize different sequence-derived features for the prediction, including a multichannel one-dimensional convolutional neural network (CNN_1D multichannel) based model, two deep two-dimensional convolutional neural network (DCNN_2D) based models, and a support vector machine (SVM) based model. When compared with other methods on the same datasets, the MoRFPred_en approach produced better results than existing state-of-the-art MoRF prediction methods, achieving an AUC of 0.762 on the VALIDATION419 dataset, 0.795 on the TEST45 dataset, and 0.776 on the TEST49 dataset. Availability: http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/MoRFPred_en.php .

Download Full-text

Faculty Opinions recommendation of Interaction between intrinsically disordered proteins frequently occurs in a human protein-protein interaction network.

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1164197.624888 ◽

2009 ◽

Author(s):

Vladimir Uversky

Keyword(s):

Protein Interaction ◽

Protein Interaction Network ◽

Intrinsically Disordered Proteins ◽

Interaction Network ◽

Human Protein ◽

Disordered Proteins ◽

Protein Protein Interaction ◽

Intrinsically Disordered ◽

Protein Protein Interaction Network

Download Full-text

Sequence Versus Composition: What Prescribes IDP Biophysical Properties?

Entropy ◽

10.3390/e21070654 ◽

2019 ◽

Vol 21 (7) ◽

pp. 654 ◽

Cited By ~ 2

Author(s):

Jiří Vymětal ◽

Jiří Vondrášek ◽

Klára Hlouchová

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Intrinsically Disordered Proteins ◽

Random Sequence ◽

Bioinformatic Analysis ◽

Globular Proteins ◽

Disordered Proteins ◽

Compositional Bias ◽

Intrinsically Disordered ◽

Sequence Versus

Intrinsically disordered proteins (IDPs) represent a distinct class of proteins and are distinguished from globular proteins by conformational plasticity, high evolvability and a broad functional repertoire. Some of their properties are reminiscent of early proteins, but their abundance in eukaryotes, functional properties and compositional bias suggest that IDPs appeared at later evolutionary stages. The spectrum of IDP properties and their determinants are still not well defined. This study compares rudimentary physicochemical properties of IDPs and globular proteins using bioinformatic analysis on the level of their native sequences and random sequence permutations, addressing the contributions of composition versus sequence as determinants of the properties. IDPs have, on average, lower predicted secondary structure contents and aggregation propensities and biased amino acid compositions. However, our study shows that IDPs exhibit a broad range of these properties. Induced fold IDPs exhibit very similar compositions and secondary structure/aggregation propensities to globular proteins, and can be distinguished from unfoldable IDPs based on analysis of these sequence properties. While amino acid composition seems to be a major determinant of aggregation and secondary structure propensities, sequence randomization does not result in dramatic changes to these properties, but for both IDPs and globular proteins seems to fine-tune the tradeoff between folding and aggregation.

Download Full-text

Analysis of Heterodimeric “Mutual Synergistic Folding”-Complexes

International Journal of Molecular Sciences ◽

10.3390/ijms20205136 ◽

2019 ◽

Vol 20 (20) ◽

pp. 5136 ◽

Cited By ~ 3

Author(s):

Mentes ◽

Magyar ◽

Fichó ◽

Simon

Keyword(s):

Amino Acid ◽

Amino Acid Composition ◽

Acid Composition ◽

Intrinsically Disordered Proteins ◽

Driving Forces ◽

Disordered Proteins ◽

Subunit Interactions ◽

Amino Acid Compositions ◽

Intrinsically Disordered ◽

Β Sheet

Several intrinsically disordered proteins (IDPs) are capable to adopt stable structures without interacting with a folded partner. When the folding of all interacting partners happens at the same time, coupled with the interaction in a synergistic manner, the process is called Mutual Synergistic Folding (MSF). These complexes represent a discrete subset of IDPs. Recently, we collected information on their complexes and created the MFIB (Mutual Folding Induced by Binding) database. In a previous study, we compared homodimeric MSF complexes with homodimeric and monomeric globular proteins with similar amino acid sequence lengths. We concluded that MSF homodimers, compared to globular homodimeric proteins, have a greater solvent accessible main-chain surface area on the contact surface of the subunits, which becomes buried during dimerization. The main driving force of the folding is the mutual shielding of the water-accessible backbones, but the formation of further intermolecular interactions can also be relevant. In this paper, we will report analyses of heterodimeric MSF complexes. Our results indicate that the amino acid composition of the heterodimeric MSF monomer subunits slightly diverges from globular monomer proteins, while after dimerization, the amino acid composition of the overall MSF complexes becomes more similar to overall amino acid compositions of globular complexes. We found that inter-subunit interactions are strengthened, and additionally to the shielding of the solvent accessible backbone, other factors might play an important role in the stabilization of the heterodimeric structures, likewise energy gain resulting from the interaction of the two subunits with different amino acid compositions. We suggest that the shielding of the β-sheet backbones and the formation of a buried structural core along with the general strengthening of inter-subunit interactions together could be the driving forces of MSF protein structural ordering upon dimerization.

Download Full-text

Roles, Characteristics, and Analysis of Intrinsically Disordered Proteins: A Minireview

Life ◽

10.3390/life10120320 ◽

2020 ◽

Vol 10 (12) ◽

pp. 320

Author(s):

Frederik Lermyte

Keyword(s):

Amino Acid ◽

Amino Acid Sequence ◽

Neurological Disorders ◽

Intrinsically Disordered Proteins ◽

Intrinsic Disorder ◽

Disordered Proteins ◽

Dynamic Nature ◽

Intrinsically Disordered ◽

Unfolded State ◽

Underlying Causes

In recent years, there has been a growing understanding that a significant fraction of the eukaryotic proteome is intrinsically disordered, and that these conformationally dynamic proteins play a myriad of vital biological roles in both normal and pathological states. In this review, selected examples of intrinsically disordered proteins are highlighted, with particular attention for a few which are relevant in neurological disorders and in viral infection. Next, the underlying causes for intrinsic disorder are discussed, along with computational methods used to predict whether a given amino acid sequence is likely to adopt a folded or unfolded state in solution. Finally, biophysical methods for the analysis of intrinsically disordered proteins will be discussed, as well as the unique challenges they pose in this context due to their highly dynamic nature.

Download Full-text

Exclusively Heteronuclear13C-Detected Amino-Acid-Selective NMR Experiments for the Study of Intrinsically Disordered Proteins (IDPs)

ChemBioChem ◽

10.1002/cbic.201200447 ◽

2012 ◽

Vol 13 (16) ◽

pp. 2425-2432 ◽

Cited By ~ 35

Author(s):

Wolfgang Bermel ◽

Ivano Bertini ◽

Jordan Chill ◽

Isabella C. Felli ◽

Noam Haba ◽

...

Keyword(s):

Amino Acid ◽

Intrinsically Disordered Proteins ◽

Disordered Proteins ◽

Intrinsically Disordered

Download Full-text

NOT THAT RIGID MIDGETS AND NOT SO FLEXIBLE GIANTS: ON THE ABUNDANCE AND ROLES OF INTRINSIC DISORDER IN SHORT AND LONG PROTEINS

Journal of Biological System ◽

10.1142/s0218339012400086 ◽

2012 ◽

Vol 20 (04) ◽

pp. 471-511 ◽

Cited By ~ 12

Author(s):

MARK HOWELL ◽

RYAN GREEN ◽

ALEXIS KILLEEN ◽

LAMAR WEDDERBURN ◽

VINCENT PICASCIO ◽

...

Keyword(s):

Amino Acid ◽

Intrinsically Disordered Proteins ◽

Biological Activities ◽

Intrinsic Disorder ◽

Amino Acid Sequences ◽

Disordered Proteins ◽

Biological Functions ◽

Intrinsically Disordered ◽

Eukaryotic Proteins ◽

Disordered Regions

Intrinsically disordered proteins or proteins with disordered regions are very common in nature. These proteins have numerous biological functions which are complementary to the biological activities of traditional ordered proteins. A noticeable difference in the amino acid sequences encoding long and short disordered regions was found and this difference was used in the development of length-dependent predictors of intrinsic disorder. In this study, we analyze the scaling of intrinsic disorder in eukaryotic proteins and investigate the presence of length-dependent functions attributed to proteins containing long disordered regions.

Download Full-text

Conformational Entropy of Intrinsically Disordered Proteins from Amino Acid Triads

Scientific Reports ◽

10.1038/srep11740 ◽

2015 ◽

Vol 5 (1) ◽

Cited By ~ 24

Author(s):

Anupaul Baruah ◽

Pooja Rani ◽

Parbati Biswas

Keyword(s):

Amino Acid ◽

Intrinsically Disordered Proteins ◽

Disordered Proteins ◽

Conformational Entropy ◽

Intrinsically Disordered

Download Full-text

Expanding the proteome: disordered and alternatively folded proteins

Quarterly Reviews of Biophysics ◽

10.1017/s0033583511000060 ◽

2011 ◽

Vol 44 (4) ◽

pp. 467-518 ◽

Cited By ~ 116

Author(s):

H. Jane Dyson

Keyword(s):

Intrinsically Disordered Proteins ◽

Significant Proportion ◽

Experimental Studies ◽

Three Dimensional ◽

Protein Sequences ◽

Open Reading Frames ◽

Disordered Proteins ◽

Characteristic Functions ◽

Intrinsically Disordered ◽

Folded Proteins

AbstractProteins provide much of the scaffolding for life, as well as undertaking a variety of essential catalytic reactions. These characteristic functions have led us to presuppose that proteins are in general functional only when well structured and correctly folded. As we begin to explore the repertoire of possible protein sequences inherent in the human and other genomes, two stark facts that belie this supposition become clear: firstly, the number of apparent open reading frames in the human genome is significantly smaller than appears to be necessary to code for all of the diverse proteins in higher organisms, and secondly that a significant proportion of the protein sequences that would be coded by the genome would not be expected to form stable three-dimensional (3D) structures. Clearly the genome must include coding for a multitude of alternative forms of proteins, some of which may be partly or fully disordered or incompletely structured in their functional states. At the same time as this likelihood was recognized, experimental studies also began to uncover examples of important protein molecules and domains that were incompletely structured or completely disordered in solution, yet remained perfectly functional. In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence. Answers to the question ‘why would a particular domain need to be unstructured?’ are as varied as the systems where such domains are found. This review provides a survey of recent new directions in this field, and includes an evaluation of the role not only of intrinsically disordered proteins but also of partially structured and highly dynamic members of the disorder–order continuum.

Download Full-text