scholarly journals Atypical Structural Tendencies Among Low-Complexity Domains in the Protein Data Bank Proteome

2019 ◽  
Author(s):  
Sean M. Cascarina ◽  
Mikaela R. Elder ◽  
Eric D. Ross

AbstractA variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment.Author SummaryThe structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.

2021 ◽  
Author(s):  
Oliviero Carugo

Abstract A novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F – the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets –7000 protein chains – is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.


2007 ◽  
Vol 88 (9) ◽  
pp. 2611-2618 ◽  
Author(s):  
Hajime Yaegashi ◽  
Masamichi Isogai ◽  
Hiroko Tajima ◽  
Teruo Sano ◽  
Nobuyuki Yoshikawa

Amino acid sequences of apple chlorotic leaf spot virus (ACLSV) coat protein (CP) were compared between 12 isolates from apple, plum and cherry, and 109 cDNA clones that were amplified directly from infected apple tissues. Phylogenetic analysis based on the amino acid sequences of CP showed that the isolates and cDNA clones were separated into two major clusters in which the combinations of the five amino acids at positions 40, 59, 75, 130 and 184 (Ala40-Val59-Phe75-Ser130-Met184 or Ser40-Leu59-Tyr75-Thr130-Leu184) were highly conserved within each cluster. Site-directed mutagenesis using an infectious cDNA clone of ACLSV indicated that the combinations of two amino acids (Ala40 and Phe75 or Ser40 and Tyr75) are necessary for infectivity to Chenopodium quinoa plants by mechanical inoculation. Moreover, an agroinoculation assay indicated that the substitution of a single amino acid (Ala40 to Ser40 or Phe75 to Tyr75) resulted in extreme reduction in the accumulation of viral genomic RNA, double-stranded RNAs and viral proteins (movement protein and CP) in infiltrated tissues, suggesting that the combinations of the two amino acids at positions 40 and 75 are important for effective replication in host plant cells.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Oliviero Carugo

AbstractA novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F—the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets—7000 protein chains—is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.


2020 ◽  
Author(s):  
Kun Tian ◽  
Xin Zhao ◽  
Xiaogeng Wan ◽  
Stephen Yau

Abstract Background Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins, such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy, are complex and may require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction.Results Here we describe a new approach that uses integration and analysis of torsion angle information from the Protein Data Bank to enable prediction of protein structures from amino acid sequences. Our prediction model performed well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. This new prediction model performs well with an average of 92.5% accuracy for structure classification, which is higher than the previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence.Conclusions The results show that our method provides a new effective and reliable tool for protein structure prediction research.


2018 ◽  
Author(s):  
Nidhi Gour ◽  
Bharti Koshti ◽  
Chandra Kanth P. ◽  
Dhruvi Shah ◽  
Vivek Shinh Kshatriya ◽  
...  

We report for the very first time self-assembly of Cysteine and Methionine to discrenible strucutres under neutral condition. To get insights into the structure formation, thioflavin T and Congo red binding assays were done which revealed that aggregates may not have amyloid like characteristics. The nature of interactions which lead to such self-assemblies was purported by coincubating assemblies in urea and mercaptoethanol. Further interaction of aggregates with short amyloidogenic dipeptide diphenylalanine (FF) was assessed. While cysteine aggregates completely disrupted FF fibres, methionine albeit triggered fibrillation. The cytotoxicity assays of cysteine and methionine structures were performed on Human Neuroblastoma IMR-32 cells which suggested that aggregates are not cytotoxic in nature and thus, may not have amyloid like etiology. The results presented in the manuscript are striking, since to the best of our knowledge,this is the first report which demonstrates that even non-aromatic amino acids (cysteine and methionine) can undergo spontaneous self-assembly to form ordered aggregates.


2021 ◽  
Vol 22 (3) ◽  
pp. 1018
Author(s):  
Hiroaki Yokota

Helicases are nucleic acid-unwinding enzymes that are involved in the maintenance of genome integrity. Several parts of the amino acid sequences of helicases are very similar, and these quite well-conserved amino acid sequences are termed “helicase motifs”. Previous studies by X-ray crystallography and single-molecule measurements have suggested a common underlying mechanism for their function. These studies indicate the role of the helicase motifs in unwinding nucleic acids. In contrast, the sequence and length of the C-terminal amino acids of helicases are highly variable. In this paper, I review past and recent studies that proposed helicase mechanisms and studies that investigated the roles of the C-terminal amino acids on helicase and dimerization activities, primarily on the non-hexermeric Escherichia coli (E. coli) UvrD helicase. Then, I center on my recent study of single-molecule direct visualization of a UvrD mutant lacking the C-terminal 40 amino acids (UvrDΔ40C) used in studies proposing the monomer helicase model. The study demonstrated that multiple UvrDΔ40C molecules jointly participated in DNA unwinding, presumably by forming an oligomer. Thus, the single-molecule observation addressed how the C-terminal amino acids affect the number of helicases bound to DNA, oligomerization, and unwinding activity, which can be applied to other helicases.


2015 ◽  
Vol 24 (4) ◽  
pp. 197-205
Author(s):  
Dwi Wulandari ◽  
Lisnawati Rachmadi ◽  
Tjahjani M. Sudiro

Background: E6 and E7 are oncoproteins of HPV16. Natural amino acid variation in HPV16 E6 can alter its carcinogenic potential. The aim of this study was to analyze phylogenetically E6 and E7 genes and proteins of HPV16 from Indonesia and predict the effects of single amino acid substitution on protein function. This analysis could be used to reduce time, effort, and research cost as initial screening in selection of protein or isolates to be tested in vitro or in vivo.Methods: In this study, E6 and E7 gene sequences were obtained from 12 samples of  Indonesian isolates, which  were compared with HPV16R (prototype) and 6 standard isolates in the category of European (E), Asian (As), Asian-American (AA), African-1 (Af-1), African-2 (Af-2), and North American (NA) branch from Genbank. Bioedit v.7.0.0 was used to analyze the composition and substitution of single amino acids. Phylogenetic analysis of E6 and E7 genes and proteins was performed using Clustal X (1.81) and NJPLOT softwares. Effects of single amino acid substitutions on protein function of E6 and E7 were analysed by SNAP.Results: Java variants and isolate ui66* belonged to European branch, while the others belonged to Asian and African branches. Twelve changes of amino acids were found in E6 and one in E7 proteins. SNAP analysis showed two non neutral mutations, i.e. R10I and C63G in E6 proteins. R10I mutations were found in Af-2 genotype (AF472509) and Indonesian isolates (Af2*), while C63G mutation was found only in Af2*.Conclusion: E6 proteins of HPV16 variants were more variable than E7. SNAP analysis showed that only E6 protein of African-2 branch had functional differences compared to HPV16R.


1988 ◽  
Vol 8 (3) ◽  
pp. 1247-1252 ◽  
Author(s):  
E Lazar ◽  
S Watanabe ◽  
S Dalton ◽  
M B Sporn

To study the relationship between the primary structure of transforming growth factor alpha (TGF-alpha) and some of its functional properties (competition with epidermal growth factor (EGF) for binding to the EGF receptor and induction of anchorage-independent growth), we introduced single amino acid mutations into the sequence for the fully processed, 50-amino-acid human TGF-alpha. The wild-type and mutant proteins were expressed in a vector by using a yeast alpha mating pheromone promoter. Mutations of two amino acids that are conserved in the family of the EGF-like peptides and are located in the carboxy-terminal part of TGF-alpha resulted in different biological effects. When aspartic acid 47 was mutated to alanine or asparagine, biological activity was retained; in contrast, substitutions of this residue with serine or glutamic acid generated mutants with reduced binding and colony-forming capacities. When leucine 48 was mutated to alanine, a complete loss of binding and colony-forming abilities resulted; mutation of leucine 48 to isoleucine or methionine resulted in very low activities. Our data suggest that these two adjacent conserved amino acids in positions 47 and 48 play different roles in defining the structure and/or biological activity of TGF-alpha and that the carboxy terminus of TGF-alpha is involved in interactions with cellular TGF-alpha receptors. The side chain of leucine 48 appears to be crucial either indirectly in determining the biologically active conformation of TGF-alpha or directly in the molecular recognition of TGF-alpha by its receptor.


1973 ◽  
Vol 131 (3) ◽  
pp. 485-498 ◽  
Author(s):  
R. P. Ambler ◽  
Margaret Wynn

The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5.


Sign in / Sign up

Export Citation Format

Share Document