scholarly journals Uncovering non-random sequence patterns within intrinsically disordered proteins

2021 ◽  
Author(s):  
Megan C. Cohan ◽  
Min Kyung Shinn ◽  
Jared M. Lalmansingh ◽  
Rohit V. Pappu

AbstractIntrinsically disordered proteins / regions (IDPs / IDRs) pose unique challenges for deriving sequence-function relationships from multiple sequence alignments. These challenges arise from variations in sequence lengths, similarities, and identities across orthologs. Recent computational efforts have demonstrated the utility of comparing large numbers of distinct sequence features as a strategy to identify conserved sequence-function relationships in IDPs / IDRs. Inspired by these efforts, and by biophysical studies that have established the importance of binary patterning features in IDPs / IDRs, we present here a computational method, NARDINI (Non-random Arrangement of Residues in Disordered Regions Inferred using Numerical Intermixing), to uncover truly non-random binary patterns within disordered proteins / regions. Binary patterns refer to the linear clustering or dispersion of specific residues or residue types with respect to all other residues or specific types of residues. Our approach does not use, nor does it require sequence alignments. Instead for each IDR, we generate an ensemble of scrambled sequences and use this to set up expectations from a composition-specific null model for the patterning parameters of interest. We annotate each IDR in terms of pattern-specific z-score matrices by computing how specific patterns deviate from the null model. The z-scores help in identifying the non-random linear sequence patterns within an IDR. We tested the accuracy of NARDINI derived z-scores by assessing the ability to identify sequence patterns that have been identified as determinants of sequence-function relationships in specific IDPs / IDRs.

Entropy ◽  
2019 ◽  
Vol 21 (7) ◽  
pp. 654 ◽  
Author(s):  
Jiří Vymětal ◽  
Jiří Vondrášek ◽  
Klára Hlouchová

Intrinsically disordered proteins (IDPs) represent a distinct class of proteins and are distinguished from globular proteins by conformational plasticity, high evolvability and a broad functional repertoire. Some of their properties are reminiscent of early proteins, but their abundance in eukaryotes, functional properties and compositional bias suggest that IDPs appeared at later evolutionary stages. The spectrum of IDP properties and their determinants are still not well defined. This study compares rudimentary physicochemical properties of IDPs and globular proteins using bioinformatic analysis on the level of their native sequences and random sequence permutations, addressing the contributions of composition versus sequence as determinants of the properties. IDPs have, on average, lower predicted secondary structure contents and aggregation propensities and biased amino acid compositions. However, our study shows that IDPs exhibit a broad range of these properties. Induced fold IDPs exhibit very similar compositions and secondary structure/aggregation propensities to globular proteins, and can be distinguished from unfoldable IDPs based on analysis of these sequence properties. While amino acid composition seems to be a major determinant of aggregation and secondary structure propensities, sequence randomization does not result in dramatic changes to these properties, but for both IDPs and globular proteins seems to fine-tune the tradeoff between folding and aggregation.


2018 ◽  
Author(s):  
Ricardo J. Cordeiro Rodrigues ◽  
António Miguel de Jesus Domingues ◽  
Svenja Hellmann ◽  
Sabrina Dietz ◽  
Bruno F. M. de Albuquerque ◽  
...  

AbstractPiwi proteins are important for germ cell development in almost all animals studied thus far. These proteins are guided to specific targets, such as transposable elements, by small guide RNAs, often referred to as piRNAs, or 21U RNAs in C. elegans. In this organism, even though genetic screens have uncovered a number of potential 21U RNA biogenesis factors, little is known about how these factors interact or what they do. Based on the previously identified 21U biogenesis factor PID-1, we here define a novel protein complex, PETISCO, that is required for 21U RNA biogenesis. PETISCO contains both potential 5’-cap and 5’-phosphate RNA binding domains, suggesting involvement in 5’ end processing. We define the interaction architecture of PETISCO and reveal a second function for PETISCO in embryonic development. This essential function of PETISCO is not mediated by PID-1, but by TOST-1. Vice versa, TOST-1 is not involved in 21U RNA biogenesis. Both PID-1 and TOST-1 are small, intrinsically disordered proteins that interact directly with the PETISCO protein ERH-2 (enhancer of rudimentary homolog 2) using a conserved sequence motif. Finally, our data suggest an important role for TOST-1:PETISCO in SL1 homeostasis in the early embryo. Our work describes the first molecular platform for 21U RNA production in C. elegans, and strengthens the view that 21U RNA biogenesis is built upon a much more widely used, snRNA-related pathway.


2021 ◽  
Vol 22 (12) ◽  
pp. 6190
Author(s):  
Nikoletta Murvai ◽  
Lajos Kalmar ◽  
Beata Szabo ◽  
Eva Schad ◽  
András Micsonai ◽  
...  

Disordered plant chaperones play key roles in helping plants survive in harsh conditions, and they are indispensable for seeds to remain viable. Aside from well-known and thoroughly characterized globular chaperone proteins, there are a number of intrinsically disordered proteins (IDPs) that can also serve as highly effective protecting agents in the cells. One of the largest groups of disordered chaperones is the group of dehydrins, proteins that are expressed at high levels under different abiotic stress conditions, such as drought, high temperature, or osmotic stress. Dehydrins are characterized by the presence of different conserved sequence motifs that also serve as the basis for their categorization. Despite their accepted importance, the exact role and relevance of the conserved regions have not yet been formally addressed. Here, we explored the involvement of each conserved segment in the protective function of the intrinsically disordered stress protein (IDSP) A. thaliana’s Early Response to Dehydration (ERD14). We show that segments that are directly involved in partner binding, and others that are not, are equally necessary for proper function and that cellular protection emerges from the balanced interplay of different regions of ERD14.


2018 ◽  
Vol 19 (9) ◽  
pp. 2483 ◽  
Author(s):  
Yumeng Liu ◽  
Xiaolong Wang ◽  
Bin Liu

Accurate prediction of intrinsically disordered proteins/regions is one of the most important tasks in bioinformatics, and some computational predictors have been proposed to solve this problem. How to efficiently incorporate the sequence-order effect is critical for constructing an accurate predictor because disordered region distributions show global sequence patterns. In order to capture these sequence patterns, several sequence labelling models have been applied to this field, such as conditional random fields (CRFs). However, these methods suffer from certain disadvantages. In this study, we proposed a new computational predictor called IDP–CRF, which is trained on an updated benchmark dataset based on the MobiDB database and the DisProt database, and incorporates more comprehensive sequence-based features, including PSSMs (position-specific scoring matrices), kmer, predicted secondary structures, and relative solvent accessibilities. Experimental results on the benchmark dataset and two independent datasets show that IDP–CRF outperforms 25 existing state-of-the-art methods in this field, demonstrating that IDP–CRF is a very useful tool for identifying IDPs/IDRs (intrinsically disordered proteins/regions). We anticipate that IDP–CRF will facilitate the development of protein sequence analysis.


2021 ◽  
Vol 3 ◽  
pp. e20
Author(s):  
Naoki Matsuo ◽  
Natsuko Goda ◽  
Takeshi Tenno ◽  
Hidekazu Hiroaki

Background Intrinsically disordered proteins (IDPs) have been shown to exhibit cryoprotective activity toward other cellular enzymes without any obvious conserved sequence motifs. This study investigated relationships between the physical properties of several human genome-derived IDPs and their cryoprotective activities. Methods Cryoprotective activity of three human-genome derived IDPs and their truncated peptides toward lactate dehydrogenase (LDH) and glutathione S-transferase (GST) was examined. After the shortest cryoprotective peptide was defined (named FK20), cryoprotective activity of all-D-enantiomeric isoform of FK20 (FK20-D) as well as a racemic mixture of FK20 and FK20-D was examined. In order to examine the lack of increase of thermal stability of the target enzyme, the CD spectra of GST and LDH in the presence of a racemic mixture of FK20 and FK20-D at varying temperatures were measured and used to estimate Tm. Results Cryoprotective activity of IDPs longer than 20 amino acids was nearly independent of the amino acid length. The shortest IDP-derived 20 amino acid length peptide with sufficient cryoprotective activity was developed from a series of TNFRSF11B fragments (named FK20). FK20, FK20-D, and an equimolar mixture of FK20 and FK20-D also showed similar cryoprotective activity toward LDH and GST. Tm of GST in the presence and absence of an equimolar mixture of FK20 and FK20-D are similar, suggesting that IDPs’ cryoprotection mechanism seems partly from a molecular shielding effect rather than a direct interaction with the target enzymes.


2020 ◽  
Vol 117 (38) ◽  
pp. 23356-23364 ◽  
Author(s):  
Micayla A. Bowman ◽  
Joshua A. Riback ◽  
Anabel Rodriguez ◽  
Hongyu Guo ◽  
Jun Li ◽  
...  

Much attention is being paid to conformational biases in the ensembles of intrinsically disordered proteins. However, it is currently unknown whether or how conformational biases within the disordered ensembles of foldable proteins affect function in vivo. Recently, we demonstrated that water can be a good solvent for unfolded polypeptide chains, even those with a hydrophobic and charged sequence composition typical of folded proteins. These results run counter to the generally accepted model that protein folding begins with hydrophobicity-driven chain collapse. Here we investigate what other features, beyond amino acid composition, govern chain collapse. We found that local clustering of hydrophobic and/or charged residues leads to significant collapse of the unfolded ensemble of pertactin, a secreted autotransporter virulence protein fromBordetella pertussis, as measured by small angle X-ray scattering (SAXS). Sequence patterns that lead to collapse also correlate with increased intermolecular polypeptide chain association and aggregation. Crucially, sequence patterns that support an expanded conformational ensemble enhance pertactin secretion to the bacterial cell surface. Similar sequence pattern features are enriched across the large and diverse family of autotransporter virulence proteins, suggesting sequence patterns that favor an expanded conformational ensemble are under selection for efficient autotransporter protein secretion, a necessary prerequisite for virulence. More broadly, we found that sequence patterns that lead to more expanded conformational ensembles are enriched across water-soluble proteins in general, suggesting protein sequences are under selection to regulate collapse and minimize protein aggregation, in addition to their roles in stabilizing folded protein structures.


2019 ◽  
Author(s):  
Alex S Holehouse ◽  
Shahar Sukenik

AbstractIntrinsically disordered proteins or regions (IDRs) differ from their well-folded counterparts by lacking a stable tertiary state. Instead, IDRs exist in an ensemble of conformations and often possess localized, loosely held residual structure that can be a key determinant of their activity. With no extensive network of non-covalent bonds and a high propensity for exposed surface areas, the various features of an IDR’s ensemble – including local residual structure and global conformational biases – are an emergent property of both the amino acid sequence and the solution environment. Here, we attempt to understand how shifting solution conditions can alter an IDR’s ensemble. We present an efficient computational method to alter solution-protein interactions we term Solution Space (SolSpace) Scanning. SolSpace scanning uses all-atom Monte-Carlo simulations to construct ensembles under a wide range of distinct solution conditions. By tuning the interactions of specific protein moieties with the solution in a systematic manner we can both enhance and reduce local residual structure. This approach allows the ‘design’ of distinct residual structures in IDRs, offering an alternative approach to mutational studies for exploring sequence-to-ensemble relationships. Our results raise the possibility of solution-based regulation of protein functions both outside and within the dynamic solution environment of cells.


Sign in / Sign up

Export Citation Format

Share Document