Identifying short disorder-to-order binding regions in disordered proteins with a deep convolutional neural network method

2019 ◽  
Vol 17 (01) ◽  
pp. 1950004 ◽  
Author(s):  
Chun Fang ◽  
Yoshitaka Moriwaki ◽  
Aikui Tian ◽  
Caihong Li ◽  
Kentaro Shimizu

Molecular recognition features (MoRFs) are key functional regions of intrinsically disordered proteins (IDPs), which play important roles in the molecular interaction network of cells and are implicated in many serious human diseases. Identifying MoRFs is essential for both functional studies of IDPs and drug design. This study adopts the cutting-edge machine learning method of artificial intelligence to develop a powerful model for improving MoRFs prediction. We proposed a method, named as en_DCNNMoRF (ensemble deep convolutional neural network-based MoRF predictor). It combines the outcomes of two independent deep convolutional neural network (DCNN) classifiers that take advantage of different features. The first, DCNNMoRF1, employs position-specific scoring matrix (PSSM) and 22 types of amino acid-related factors to describe protein sequences. The second, DCNNMoRF2, employs PSSM and 13 types of amino acid indexes to describe protein sequences. For both single classifiers, DCNN with a novel two-dimensional attention mechanism was adopted, and an average strategy was added to further process the output probabilities of each DCNN model. Finally, en_DCNNMoRF combined the two models by averaging their final scores. When compared with other well-known tools applied to the same datasets, the accuracy of the novel proposed method was comparable with that of state-of-the-art methods. The related web server can be accessed freely via http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/en_MoRFs.php .

10.29007/5k4z ◽  
2019 ◽  
Author(s):  
Fang Chun ◽  
Yoshitaka Moriwaki ◽  
Caihong Li ◽  
Kentaro Shimizu

MoRFs usually play as "hub" site in interaction networks of intrinsically disordered proteins. With more and more serious diseases being found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we introduce a multichannel convolutional neural network (CNN) model for MoRFs prediction. This model is generated by expanding the standard one-dimensional CNN model using multiple parallel CNNs that read the sequence with different n-gram sizes (groups of residues). In addition, we add an averaging step to refine the output result of machine learning model. When compared with other methods on the same dataset, our approach achieved a balanced accuracy of 0.682 and an AUC of 0.723, which is the best performance among the single model-based approaches.


2019 ◽  
Vol 17 (06) ◽  
pp. 1940015
Author(s):  
Chun Fang ◽  
Yoshitaka Moriwaki ◽  
Caihong Li ◽  
Kentaro Shimizu

Molecular recognition features (MoRFs) usually act as “hub” sites in the interaction networks of intrinsically disordered proteins (IDPs). Because an increasing number of serious diseases have been found to be associated with disordered proteins, identifying MoRFs has become increasingly important. In this study, we propose an ensemble learning strategy, named MoRFPred_en, to predict MoRFs from protein sequences. This approach combines four submodels that utilize different sequence-derived features for the prediction, including a multichannel one-dimensional convolutional neural network (CNN_1D multichannel) based model, two deep two-dimensional convolutional neural network (DCNN_2D) based models, and a support vector machine (SVM) based model. When compared with other methods on the same datasets, the MoRFPred_en approach produced better results than existing state-of-the-art MoRF prediction methods, achieving an AUC of 0.762 on the VALIDATION419 dataset, 0.795 on the TEST45 dataset, and 0.776 on the TEST49 dataset. Availability: http://vivace.bi.a.u-tokyo.ac.jp:8008/fang/MoRFPred_en.php .


Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 654 ◽  
Author(s):  
Jiří Vymětal ◽  
Jiří Vondrášek ◽  
Klára Hlouchová

Intrinsically disordered proteins (IDPs) represent a distinct class of proteins and are distinguished from globular proteins by conformational plasticity, high evolvability and a broad functional repertoire. Some of their properties are reminiscent of early proteins, but their abundance in eukaryotes, functional properties and compositional bias suggest that IDPs appeared at later evolutionary stages. The spectrum of IDP properties and their determinants are still not well defined. This study compares rudimentary physicochemical properties of IDPs and globular proteins using bioinformatic analysis on the level of their native sequences and random sequence permutations, addressing the contributions of composition versus sequence as determinants of the properties. IDPs have, on average, lower predicted secondary structure contents and aggregation propensities and biased amino acid compositions. However, our study shows that IDPs exhibit a broad range of these properties. Induced fold IDPs exhibit very similar compositions and secondary structure/aggregation propensities to globular proteins, and can be distinguished from unfoldable IDPs based on analysis of these sequence properties. While amino acid composition seems to be a major determinant of aggregation and secondary structure propensities, sequence randomization does not result in dramatic changes to these properties, but for both IDPs and globular proteins seems to fine-tune the tradeoff between folding and aggregation.


2019 ◽  
Vol 20 (20) ◽  
pp. 5136 ◽  
Author(s):  
Mentes ◽  
Magyar ◽  
Fichó ◽  
Simon

Several intrinsically disordered proteins (IDPs) are capable to adopt stable structures without interacting with a folded partner. When the folding of all interacting partners happens at the same time, coupled with the interaction in a synergistic manner, the process is called Mutual Synergistic Folding (MSF). These complexes represent a discrete subset of IDPs. Recently, we collected information on their complexes and created the MFIB (Mutual Folding Induced by Binding) database. In a previous study, we compared homodimeric MSF complexes with homodimeric and monomeric globular proteins with similar amino acid sequence lengths. We concluded that MSF homodimers, compared to globular homodimeric proteins, have a greater solvent accessible main-chain surface area on the contact surface of the subunits, which becomes buried during dimerization. The main driving force of the folding is the mutual shielding of the water-accessible backbones, but the formation of further intermolecular interactions can also be relevant. In this paper, we will report analyses of heterodimeric MSF complexes. Our results indicate that the amino acid composition of the heterodimeric MSF monomer subunits slightly diverges from globular monomer proteins, while after dimerization, the amino acid composition of the overall MSF complexes becomes more similar to overall amino acid compositions of globular complexes. We found that inter-subunit interactions are strengthened, and additionally to the shielding of the solvent accessible backbone, other factors might play an important role in the stabilization of the heterodimeric structures, likewise energy gain resulting from the interaction of the two subunits with different amino acid compositions. We suggest that the shielding of the β-sheet backbones and the formation of a buried structural core along with the general strengthening of inter-subunit interactions together could be the driving forces of MSF protein structural ordering upon dimerization.


Life ◽  
2020 ◽  
Vol 10 (12) ◽  
pp. 320
Author(s):  
Frederik Lermyte

In recent years, there has been a growing understanding that a significant fraction of the eukaryotic proteome is intrinsically disordered, and that these conformationally dynamic proteins play a myriad of vital biological roles in both normal and pathological states. In this review, selected examples of intrinsically disordered proteins are highlighted, with particular attention for a few which are relevant in neurological disorders and in viral infection. Next, the underlying causes for intrinsic disorder are discussed, along with computational methods used to predict whether a given amino acid sequence is likely to adopt a folded or unfolded state in solution. Finally, biophysical methods for the analysis of intrinsically disordered proteins will be discussed, as well as the unique challenges they pose in this context due to their highly dynamic nature.


ChemBioChem ◽  
2012 ◽  
Vol 13 (16) ◽  
pp. 2425-2432 ◽  
Author(s):  
Wolfgang Bermel ◽  
Ivano Bertini ◽  
Jordan Chill ◽  
Isabella C. Felli ◽  
Noam Haba ◽  
...  

2012 ◽  
Vol 20 (04) ◽  
pp. 471-511 ◽  
Author(s):  
MARK HOWELL ◽  
RYAN GREEN ◽  
ALEXIS KILLEEN ◽  
LAMAR WEDDERBURN ◽  
VINCENT PICASCIO ◽  
...  

Intrinsically disordered proteins or proteins with disordered regions are very common in nature. These proteins have numerous biological functions which are complementary to the biological activities of traditional ordered proteins. A noticeable difference in the amino acid sequences encoding long and short disordered regions was found and this difference was used in the development of length-dependent predictors of intrinsic disorder. In this study, we analyze the scaling of intrinsic disorder in eukaryotic proteins and investigate the presence of length-dependent functions attributed to proteins containing long disordered regions.


2011 ◽  
Vol 44 (4) ◽  
pp. 467-518 ◽  
Author(s):  
H. Jane Dyson

AbstractProteins provide much of the scaffolding for life, as well as undertaking a variety of essential catalytic reactions. These characteristic functions have led us to presuppose that proteins are in general functional only when well structured and correctly folded. As we begin to explore the repertoire of possible protein sequences inherent in the human and other genomes, two stark facts that belie this supposition become clear: firstly, the number of apparent open reading frames in the human genome is significantly smaller than appears to be necessary to code for all of the diverse proteins in higher organisms, and secondly that a significant proportion of the protein sequences that would be coded by the genome would not be expected to form stable three-dimensional (3D) structures. Clearly the genome must include coding for a multitude of alternative forms of proteins, some of which may be partly or fully disordered or incompletely structured in their functional states. At the same time as this likelihood was recognized, experimental studies also began to uncover examples of important protein molecules and domains that were incompletely structured or completely disordered in solution, yet remained perfectly functional. In the ensuing years, we have seen an explosion of experimental and genome-annotation studies that have mapped the extent of the intrinsic disorder phenomenon and explored the possible biological rationales for its widespread occurrence. Answers to the question ‘why would a particular domain need to be unstructured?’ are as varied as the systems where such domains are found. This review provides a survey of recent new directions in this field, and includes an evaluation of the role not only of intrinsically disordered proteins but also of partially structured and highly dynamic members of the disorder–order continuum.


Sign in / Sign up

Export Citation Format

Share Document