scholarly journals Assessing the low complexity of protein sequences via the low complexity triangle

PLoS ONE ◽  
2020 ◽  
Vol 15 (12) ◽  
pp. e0239154
Author(s):  
Pablo Mier ◽  
Miguel A. Andrade-Navarro

Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins with extreme compositions. We apply a representation called ‘low complexity triangle’ as a proof-of-concept to represent the low complexity measured values. Results show that proteomes have distinct signatures in the low complexity triangle, and that these signatures are associated to complexity features of the sequences. We developed a web tool called LCT (http://cbdm-01.zdv.uni-mainz.de/~munoz/lct/) to allow users to calculate the low complexity triangle of a given protein or region of interest. Conclusions The low complexity triangle proves to be a suitable procedure to represent the general low complexity of a sequence or protein dataset. Homorepeats, direpeats, compositionally biased regions and globular regions occupy characteristic positions in the triangle. The described pipeline can be used to characterize LCRs and may help in quantifying the content of degenerated tandem repeats in proteins and proteomes.

2018 ◽  
Author(s):  
Sean M. Cascarina ◽  
Eric D. Ross

AbstractProteins with low-complexity domains continue to emerge as key players in both normal and pathological cellular processes. Although low-complexity domains are often grouped into a single class, individual low-complexity domains can differ substantially with respect to amino acid composition. These differences may strongly influence the physical properties, cellular regulation, and molecular functions of low-complexity domains. Therefore, we developed a bioinformatic approach to explore relationships between amino acid composition, protein metabolism, and protein function. We find that local compositional enrichment within protein sequences affects the translation efficiency, abundance, half-life, subcellular localization, and molecular functions of proteins on a proteome-wide scale. However, these effects depend upon the type of amino acid enriched in a given sequence, highlighting the importance of distinguishing between different types of low-complexity domains. Furthermore, many of these effects are discernible at amino acid compositions below those required for classification as low-complexity or statistically-biased by traditional methods and in the absence of homopolymeric amino acid repeats, indicating that thresholds employed by classical methods may not reflect biologically relevant criteria. Application of our analyses to composition-driven processes, such as the formation of membraneless organelles, reveals distinct composition profiles even for closely related organelles. Collectively, these results provide a unique perspective and detailed insights into relationships between amino acid composition, protein metabolism, and protein functions.Author SummaryLow-complexity domains in protein sequences are regions that are composed of only a few amino acids in the protein “alphabet”. These domains often have unique chemical properties and play important biological roles in both normal and disease-related processes.While a number of approaches have been developed to define low-complexity domains, these methods each possess conceptual limitations. Therefore, we developed a complementary approach that focuses on local amino acid composition (i.e. the amino acid composition within small regions of proteins). We find that high local composition of individual amino acids is associated with pervasive effects on protein metabolism, subcellular localization, and molecular function on a proteome-wide scale. Importantly, the nature of the effects depend on the type of amino acid enriched within the examined domains, and are observable in the absence of classically-defined low-complexity (and related) domains. Furthermore, we define the compositions of proteins involved in the formation of membraneless, protein-rich organelles such as stress granules and P-bodies. Our results provide a coherent view and unprecedented resolution of the effects of local amino acid enrichment on protein biology.


2021 ◽  
Author(s):  
Sarah N. Medley ◽  
Alyssa Beaudet ◽  
Helen Piontkivska ◽  
Fabia U. Battistuzzi

AbstractDespite decades-long efforts to eradicate malaria, pathogens genomic complexity and variability continue to pose major challenges for the vaccine and drug development. Here we examined the evolutionary history of epitopes and epitope-like regions to determine whether they share underlying evolutionary mechanisms and potential functions that are relevant to pathogens interactions with the host immune response. Our comparative sequence analyses contrasted patterns of sequence conservation, amino acid composition, and protein structure of epitopes and low complexity regions (LCRs) in 21 Plasmodium species. Our results revealed many similarities in amino acid composition and preferred secondary structures between epitopes and LCRs; however, we also identified differences in evolutionary trends where LCRs exhibit overall lower sequence conservation and higher disorder. We also found that both epitopes and LCRs have a wide array of configurations, with various levels of sequence conservation and structural order. We propose that such combination of different levels of conservation and structural order between epitopes and LCRs in the same gene play a role in maintaining the functional integrity required by the pathogen along with the variability necessary to evade the host immune response, with LCRs playing a role in the evasion particularly in the vicinity of conserved epitopes. Overall, our results suggest that there are at least two categories of LCRs, where some LCRs play a potential protective role for conserved (ordered) epitopes because of their variable (or disordered) sequence, while others are less disordered and are as conserved as epitopes. The former ones may be an evolutionary necessity for Plasmodium to maintain the diversity of epitopes, while the latter category may serve currently unknown function(s) and deserve to be examined in greater detail. Our findings show that there may be many more candidate targets for future anti-malarial treatments than initially thought and that some of these targets may work across strains and species.


1989 ◽  
Vol 9 (1) ◽  
pp. 268-277 ◽  
Author(s):  
R W Graham ◽  
D Jones ◽  
E P Candido

Ubiquitin is a multifunctional 76-amino-acid protein which plays critical roles in many aspects of cellular metabolism. In Caenorhabditis elegans, the major source of ubiquitin RNA is the polyubiquitin locus, UbiA. UbiA is transcribed as a polycistronic mRNA which contains 11 tandem repeats of ubiquitin sequence and possesses a 2-amino-acid carboxy-terminal extension on the final repeat. The UbiA locus possesses several unusual features not seen in the ubiquitin genes of other organisms studied to date. Mature UbiA mRNA acquires a 22-nucleotide leader sequence via a trans-splicing reaction involving a 100-nucleotide splice leader RNA derived from a different chromosome. UbiA is also unique among known polyubiquitin genes in containing four cis-spliced introns within its coding sequence. Thus, UbiA is one of a small class of genes found in higher eucaryotes whose heterogeneous nuclear RNA undergoes both cis and trans splicing. The putative promoter region of UbiA contains a number of potential regulatory elements: (i) a cytosine-rich block, (ii) two sequences resembling the heat shock regulatory element, and (iii) a palindromic sequence with homology to the DNA-binding site of the mammalian steroid hormone receptor. The expression of the UbiA gene has been studied under various heat shock conditions and has been monitored during larval moulting and throughout the major stages of development. These studies indicate that the expression of the UbiA gene is not inducible by acute or chronic heat shock and does not appear to be under nutritional or developmental regulation in C. elegans.


Gene ◽  
2006 ◽  
Vol 378 ◽  
pp. 19-30 ◽  
Author(s):  
Mark A. DePristo ◽  
Martine M. Zilversmit ◽  
Daniel L. Hartl

2012 ◽  
Vol 13 (S3) ◽  
Author(s):  
Marco Pellegrini ◽  
Maria Elena Renda ◽  
Alessio Vecchio

1993 ◽  
Vol 294 (2) ◽  
pp. 465-472 ◽  
Author(s):  
T Ohsumi ◽  
T Ichimura ◽  
H Sugano ◽  
S Omata ◽  
T Isobe ◽  
...  

Protein p34 is a non-glycosylated membrane protein characteristic of rough microsomes and is believed to play a role in the ribosome-membrane association. In the present study we isolated cDNA encoding p34 from a rat liver cDNA library and determined its complete amino acid sequence. p34 mRNA is 3.2 kb long and encodes a polypeptide of 307 amino acids with a molecular mass of about 34.9 kDa. Primary sequence analysis, coupled with biochemical studies on the topology, suggested that p34 is a type II signal-anchor protein; it is composed of a large cytoplasmic domain, a membrane-spanning segment and a 38-amino-acid-long luminally disposed C-terminus. The cytoplasmic domain of p34 has several noteworthy structural features, including a region of 4.5 tandem repeats of 23-24 amino acids. The repeated motif shows structural similarity to the leucine-rich repeat which is found in a variety of proteins widely distributed among eukaryotic cells and which potentially functions in mediating protein-protein interactions. The cytoplasmic domain also contains a characteristic hydrophilic region with abundant charged amino acids. These structural regions may be important for the observed ribosome-binding activity of the p34 protein.


2021 ◽  
Vol 22 (13) ◽  
pp. 7096
Author(s):  
Valentina Rudenko ◽  
Eugene Korotkov

We report a Method to Search for Highly Divergent Tandem Repeats (MSHDTR) in protein sequences which considers pairwise correlations between adjacent residues. MSHDTR was compared with some previously developed methods for searching for tandem repeats (TRs) in amino acid sequences, such as T-REKS and XSTREAM, which focus on the identification of TRs with significant sequence similarity, whereas MSHDTR detects repeats that significantly diverged during evolution, accumulating deletions, insertions, and substitutions. The application of MSHDTR to a search of the Swiss-Prot databank revealed over 15 thousand TR-containing amino acid sequences that were difficult to find using the other methods. Among the detected TRs, the most representative were those with consensus lengths of two and seven residues; these TRs were subjected to cluster analysis and the classes of patterns were identified. All TRs detected in this study have been combined into a databank accessible over the WWW.


Sign in / Sign up

Export Citation Format

Share Document