Atypical Structural Tendencies Among Low-Complexity Domains in the Protein Data Bank Proteome

AbstractA variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment.Author SummaryThe structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.

Download Full-text

Random Sampling of the Protein Data Bank - RaSPDB

10.21203/rs.3.rs-952385/v1 ◽

2021 ◽

Author(s):

Oliviero Carugo

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Protein Data Bank ◽

Standard Error ◽

Random Sampling ◽

Simple Procedure ◽

Data Bank ◽

Protein Chain ◽

Average Value ◽

Secondary Structure Composition

Abstract A novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F – the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets –7000 protein chains – is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.

Download Full-text

Combinations of two amino acids (Ala40 and Phe75 or Ser40 and Tyr75) in the coat protein of apple chlorotic leaf spot virus are crucial for infectivity

Journal of General Virology ◽

10.1099/vir.0.82984-0 ◽

2007 ◽

Vol 88 (9) ◽

pp. 2611-2618 ◽

Cited By ~ 29

Author(s):

Hajime Yaegashi ◽

Masamichi Isogai ◽

Hiroko Tajima ◽

Teruo Sano ◽

Nobuyuki Yoshikawa

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Coat Protein ◽

Leaf Spot ◽

Amino Acid Sequences ◽

Site Directed Mutagenesis ◽

Single Amino Acid ◽

Cdna Clones ◽

Spot Virus ◽

Chlorotic Leaf

Amino acid sequences of apple chlorotic leaf spot virus (ACLSV) coat protein (CP) were compared between 12 isolates from apple, plum and cherry, and 109 cDNA clones that were amplified directly from infected apple tissues. Phylogenetic analysis based on the amino acid sequences of CP showed that the isolates and cDNA clones were separated into two major clusters in which the combinations of the five amino acids at positions 40, 59, 75, 130 and 184 (Ala40-Val59-Phe75-Ser130-Met184 or Ser40-Leu59-Tyr75-Thr130-Leu184) were highly conserved within each cluster. Site-directed mutagenesis using an infectious cDNA clone of ACLSV indicated that the combinations of two amino acids (Ala40 and Phe75 or Ser40 and Tyr75) are necessary for infectivity to Chenopodium quinoa plants by mechanical inoculation. Moreover, an agroinoculation assay indicated that the substitution of a single amino acid (Ala40 to Ser40 or Phe75 to Tyr75) resulted in extreme reduction in the accumulation of viral genomic RNA, double-stranded RNAs and viral proteins (movement protein and CP) in infiltrated tissues, suggesting that the combinations of the two amino acids at positions 40 and 75 are important for effective replication in host plant cells.

Download Full-text

Random sampling of the Protein Data Bank: RaSPDB

Scientific Reports ◽

10.1038/s41598-021-03615-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Oliviero Carugo

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Protein Data Bank ◽

Standard Error ◽

Random Sampling ◽

Simple Procedure ◽

Data Bank ◽

Protein Chain ◽

Average Value ◽

Secondary Structure Composition

AbstractA novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F—the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets—7000 protein chains—is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.

Download Full-text

Amino acid torsion angles enable prediction of protein fold classification

10.21203/rs.2.20475/v1 ◽

2020 ◽

Author(s):

Kun Tian ◽

Xin Zhao ◽

Xiaogeng Wan ◽

Stephen Yau

Keyword(s):

Protein Structure ◽

Amino Acid ◽

Prediction Model ◽

Structure Prediction ◽

High Throughput Sequencing ◽

Protein Structures ◽

Data Bank ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

New Approach

Abstract Background Protein structure can provide insights that help biologists to predict and understand protein functions and interactions. However, the number of known protein structures has not kept pace with the number of protein sequences determined by high-throughput sequencing. Current techniques used to determine the structure of proteins, such as X-ray crystallography and Nuclear Magnetic Resonance (NMR) spectroscopy, are complex and may require a lot of time to analyze the experimental results, especially for large protein molecules. The limitations of these methods have motivated us to create a new approach for protein structure prediction.Results Here we describe a new approach that uses integration and analysis of torsion angle information from the Protein Data Bank to enable prediction of protein structures from amino acid sequences. Our prediction model performed well in comparison with previous methods when applied to the structural classification of two CATH datasets with more than 5000 protein domains. This new prediction model performs well with an average of 92.5% accuracy for structure classification, which is higher than the previous research. We also used our model to predict four known protein structures with a single amino acid sequence, while many other existing methods could only obtain one possible structure for a given sequence.Conclusions The results show that our method provides a new effective and reliable tool for protein structure prediction research.

Download Full-text

Single Amino Acid Based Self-Assemblies of Cysteine and Methionine

10.26434/chemrxiv.5972173.v1 ◽

2018 ◽

Author(s):

Nidhi Gour ◽

Bharti Koshti ◽

Chandra Kanth P. ◽

Dhruvi Shah ◽

Vivek Shinh Kshatriya ◽

...

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Self Assembly ◽

Aromatic Amino Acids ◽

Neutral Condition ◽

Single Amino Acid ◽

Binding Assays ◽

Human Neuroblastoma ◽

Cytotoxicity Assays ◽

First Time

We report for the very first time self-assembly of Cysteine and Methionine to discrenible strucutres under neutral condition. To get insights into the structure formation, thioflavin T and Congo red binding assays were done which revealed that aggregates may not have amyloid like characteristics. The nature of interactions which lead to such self-assemblies was purported by coincubating assemblies in urea and mercaptoethanol. Further interaction of aggregates with short amyloidogenic dipeptide diphenylalanine (FF) was assessed. While cysteine aggregates completely disrupted FF fibres, methionine albeit triggered fibrillation. The cytotoxicity assays of cysteine and methionine structures were performed on Human Neuroblastoma IMR-32 cells which suggested that aggregates are not cytotoxic in nature and thus, may not have amyloid like etiology. The results presented in the manuscript are striking, since to the best of our knowledge,this is the first report which demonstrates that even non-aromatic amino acids (cysteine and methionine) can undergo spontaneous self-assembly to form ordered aggregates.

Download Full-text

Contmann: A Tool to Calculate Contact Distances Between Amino Acid and Mannose Using Protein Data Bank File at Distance Cutoff

Bioscience Biotechnology Research Communications ◽

10.21786/bbrc/13.4/35 ◽

2020 ◽

Vol 13 (4) ◽

pp. 1868-1870

Author(s):

Afnan Abdalrhman Slama Alomrani

Keyword(s):

Amino Acid ◽

Protein Data Bank ◽

Data Bank ◽

Protein Data Bank File ◽

Distance Cutoff

Download Full-text

Roles of the C-Terminal Amino Acids of Non-Hexameric Helicases: Insights from Escherichia coli UvrD

International Journal of Molecular Sciences ◽

10.3390/ijms22031018 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1018

Author(s):

Hiroaki Yokota

Keyword(s):

Escherichia Coli ◽

Amino Acids ◽

Amino Acid ◽

Single Molecule ◽

Underlying Mechanism ◽

Amino Acid Sequences ◽

E Coli ◽

X Ray Crystallography ◽

Terminal Amino

Helicases are nucleic acid-unwinding enzymes that are involved in the maintenance of genome integrity. Several parts of the amino acid sequences of helicases are very similar, and these quite well-conserved amino acid sequences are termed “helicase motifs”. Previous studies by X-ray crystallography and single-molecule measurements have suggested a common underlying mechanism for their function. These studies indicate the role of the helicase motifs in unwinding nucleic acids. In contrast, the sequence and length of the C-terminal amino acids of helicases are highly variable. In this paper, I review past and recent studies that proposed helicase mechanisms and studies that investigated the roles of the C-terminal amino acids on helicase and dimerization activities, primarily on the non-hexermeric Escherichia coli (E. coli) UvrD helicase. Then, I center on my recent study of single-molecule direct visualization of a UvrD mutant lacking the C-terminal 40 amino acids (UvrDΔ40C) used in studies proposing the monomer helicase model. The study demonstrated that multiple UvrDΔ40C molecules jointly participated in DNA unwinding, presumably by forming an oligomer. Thus, the single-molecule observation addressed how the C-terminal amino acids affect the number of helicases bound to DNA, oligomerization, and unwinding activity, which can be applied to other helicases.

Download Full-text

Phylogenetic analysis and predicted functional effect of protein mutations of E6 and E7 HPV16 strains isolated in Indonesia

Medical Journal of Indonesia ◽

10.13181/mji.v24i4.1197 ◽

2015 ◽

Vol 24 (4) ◽

pp. 197-205

Author(s):

Dwi Wulandari ◽

Lisnawati Rachmadi ◽

Tjahjani M. Sudiro

Keyword(s):

Amino Acids ◽

Phylogenetic Analysis ◽

Amino Acid ◽

Asian American ◽

Protein Function ◽

Single Amino Acid ◽

Hpv16 E6 ◽

E6 And E7 ◽

Neutral Mutations

Background: E6 and E7 are oncoproteins of HPV16. Natural amino acid variation in HPV16 E6 can alter its carcinogenic potential. The aim of this study was to analyze phylogenetically E6 and E7 genes and proteins of HPV16 from Indonesia and predict the effects of single amino acid substitution on protein function. This analysis could be used to reduce time, effort, and research cost as initial screening in selection of protein or isolates to be tested in vitro or in vivo.Methods: In this study, E6 and E7 gene sequences were obtained from 12 samples of Indonesian isolates, which were compared with HPV16R (prototype) and 6 standard isolates in the category of European (E), Asian (As), Asian-American (AA), African-1 (Af-1), African-2 (Af-2), and North American (NA) branch from Genbank. Bioedit v.7.0.0 was used to analyze the composition and substitution of single amino acids. Phylogenetic analysis of E6 and E7 genes and proteins was performed using Clustal X (1.81) and NJPLOT softwares. Effects of single amino acid substitutions on protein function of E6 and E7 were analysed by SNAP.Results: Java variants and isolate ui66* belonged to European branch, while the others belonged to Asian and African branches. Twelve changes of amino acids were found in E6 and one in E7 proteins. SNAP analysis showed two non neutral mutations, i.e. R10I and C63G in E6 proteins. R10I mutations were found in Af-2 genotype (AF472509) and Indonesian isolates (Af2*), while C63G mutation was found only in Af2*.Conclusion: E6 proteins of HPV16 variants were more variable than E7. SNAP analysis showed that only E6 protein of African-2 branch had functional differences compared to HPV16R.

Download Full-text

Transforming growth factor alpha: mutation of aspartic acid 47 and leucine 48 results in different biological activities.

Molecular and Cellular Biology ◽

10.1128/mcb.8.3.1247 ◽

1988 ◽

Vol 8 (3) ◽

pp. 1247-1252 ◽

Cited By ~ 35

Author(s):

E Lazar ◽

S Watanabe ◽

S Dalton ◽

M B Sporn

Keyword(s):

Amino Acids ◽

Biological Activity ◽

Amino Acid ◽

Growth Factor ◽

Aspartic Acid ◽

Transforming Growth Factor ◽

Biologically Active ◽

Single Amino Acid ◽

Transforming Growth Factor Alpha ◽

Factor Alpha

To study the relationship between the primary structure of transforming growth factor alpha (TGF-alpha) and some of its functional properties (competition with epidermal growth factor (EGF) for binding to the EGF receptor and induction of anchorage-independent growth), we introduced single amino acid mutations into the sequence for the fully processed, 50-amino-acid human TGF-alpha. The wild-type and mutant proteins were expressed in a vector by using a yeast alpha mating pheromone promoter. Mutations of two amino acids that are conserved in the family of the EGF-like peptides and are located in the carboxy-terminal part of TGF-alpha resulted in different biological effects. When aspartic acid 47 was mutated to alanine or asparagine, biological activity was retained; in contrast, substitutions of this residue with serine or glutamic acid generated mutants with reduced binding and colony-forming capacities. When leucine 48 was mutated to alanine, a complete loss of binding and colony-forming abilities resulted; mutation of leucine 48 to isoleucine or methionine resulted in very low activities. Our data suggest that these two adjacent conserved amino acids in positions 47 and 48 play different roles in defining the structure and/or biological activity of TGF-alpha and that the carboxy terminus of TGF-alpha is involved in interactions with cellular TGF-alpha receptors. The side chain of leucine 48 appears to be crucial either indirectly in determining the biologically active conformation of TGF-alpha or directly in the molecular recognition of TGF-alpha by its receptor.

Download Full-text

The amino acid sequence of cytochromes c-551 from three species of Pseudomonas

Biochemical Journal ◽

10.1042/bj1310485 ◽

1973 ◽

Vol 131 (3) ◽

pp. 485-498 ◽

Cited By ~ 113

Author(s):

R. P. Ambler ◽

Margaret Wynn

Keyword(s):

Amino Acids ◽

Pseudomonas Aeruginosa ◽

Amino Acid ◽

Cytochrome C ◽

Amino Acid Sequence ◽

Amino Acid Sequences ◽

Mitochondrial Cytochrome ◽

Cytochromes C ◽

Lending Library ◽

Single Peptide

The amino acid sequences of the cytochromes c-551 from three species of Pseudomonas have been determined. Each resembles the protein from Pseudomonas strain P6009 (now known to be Pseudomonas aeruginosa, not Pseudomonas fluorescens) in containing 82 amino acids in a single peptide chain, with a haem group covalently attached to cysteine residues 12 and 15. In all four sequences 43 residues are identical. Although by bacteriological criteria the organisms are closely related, the differences between pairs of sequences range from 22% to 39%. These values should be compared with the differences in the sequence of mitochondrial cytochrome c between mammals and amphibians (about 18%) or between mammals and insects (about 33%). Detailed evidence for the amino acid sequences of the proteins has been deposited as Supplementary Publication SUP 50015 at the National Lending Library for Science and Technology, Boston Spa, Yorks. LS23 7BQ, U.K., from whom copies can be obtained on the terms indicated in Biochem. J. (1973), 131, 5.

Download Full-text