Random Sampling of the Protein Data Bank - RaSPDB

Mapping Intimacies ◽

10.21203/rs.3.rs-952385/v1 ◽

2021 ◽

Author(s):

Oliviero Carugo

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Protein Data Bank ◽

Standard Error ◽

Random Sampling ◽

Simple Procedure ◽

Data Bank ◽

Protein Chain ◽

Average Value ◽

Secondary Structure Composition

Abstract A novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F – the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets –7000 protein chains – is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.

Download Full-text

Random sampling of the Protein Data Bank: RaSPDB

Scientific Reports ◽

10.1038/s41598-021-03615-y ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Oliviero Carugo

Keyword(s):

Amino Acid ◽

Secondary Structure ◽

Protein Data Bank ◽

Standard Error ◽

Random Sampling ◽

Simple Procedure ◽

Data Bank ◽

Protein Chain ◽

Average Value ◽

Secondary Structure Composition

AbstractA novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F—the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets—7000 protein chains—is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.

Download Full-text

Atypical Structural Tendencies Among Low-Complexity Domains in the Protein Data Bank Proteome

10.1101/807438 ◽

2019 ◽

Cited By ~ 1

Author(s):

Sean M. Cascarina ◽

Mikaela R. Elder ◽

Eric D. Ross

Keyword(s):

Amino Acids ◽

Amino Acid ◽

Secondary Structure ◽

Physical Properties ◽

Protein Data Bank ◽

Data Bank ◽

Low Complexity ◽

Amino Acid Sequences ◽

Single Amino Acid ◽

Intrinsically Disordered

AbstractA variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment.Author SummaryThe structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.

Download Full-text

Contmann: A Tool to Calculate Contact Distances Between Amino Acid and Mannose Using Protein Data Bank File at Distance Cutoff

Bioscience Biotechnology Research Communications ◽

10.21786/bbrc/13.4/35 ◽

2020 ◽

Vol 13 (4) ◽

pp. 1868-1870

Author(s):

Afnan Abdalrhman Slama Alomrani

Keyword(s):

Amino Acid ◽

Protein Data Bank ◽

Data Bank ◽

Protein Data Bank File ◽

Distance Cutoff

Download Full-text

IMAAAGINE: a webserver for searching hypothetical 3D amino acid side chain arrangements in the Protein Data Bank

Nucleic Acids Research ◽

10.1093/nar/gkt431 ◽

2013 ◽

Vol 41 (W1) ◽

pp. W432-W440 ◽

Cited By ~ 11

Author(s):

Nurul Nadzirin ◽

Peter Willett ◽

Peter J. Artymiuk ◽

Mohd Firdaus-Raih

Keyword(s):

Amino Acid ◽

Protein Data Bank ◽

Data Bank ◽

Side Chain ◽

Amino Acid Side Chain

Download Full-text

Amino acid modifications for conformationally constraining naturally occurring and engineered peptide backbones: Insights from the Protein Data Bank

Biopolymers ◽

10.1002/bip.23230 ◽

2018 ◽

Vol 109 (10) ◽

pp. e23230 ◽

Cited By ~ 1

Author(s):

Luigi Di Costanzo ◽

Shuchismita Dutta ◽

Stephen K. Burley

Keyword(s):

Amino Acid ◽

Protein Data Bank ◽

Data Bank ◽

Naturally Occurring

Download Full-text

On the diffusion of alpha-helical proteins in solvents

International Journal of Modern Physics Conference Series ◽

10.1142/s2010194515600186 ◽

2015 ◽

Vol 36 ◽

pp. 1560018

Author(s):

Wilson I. Barredo ◽

Jinky B. Bornales ◽

Christopher C. Bernido ◽

Henry P. Aringa

Keyword(s):

Diffusion Coefficient ◽

Amino Acid ◽

Protein Data Bank ◽

Empirical Formula ◽

Bessel Functions ◽

Probability Function ◽

Data Bank ◽

Alpha Helical Proteins ◽

Helical Proteins ◽

Drift Coefficient

The winding probability function for a biopolymer diffusing in a crowded cell is obtained with the drift coefficient f(s) involving Bessel functions of general form f(s) = kJ2p+1 (νs). The variable s is the length along the chain and ν is a constant which can be used to simulate the frequency of appearance of a certain type of amino acid. Application of a particular case p = 3 to protein chains is carried out for different alpha helical proteins found in the Protein Data Bank (PDB). Analysis of our results leads us to an empirical formula that can be used to conveniently predict k/D and ν, where D is the diffusion coefficient of various α-helical proteins in solvents.

Download Full-text

POTENSI SENYAWA BIOAKTIF TANAMAN GENUS Phyllanthus SEBAGAI INHIBITOR REPLIKASI VIRUS HEPATITIS B

Jurnal Bioteknologi & Biosains Indonesia (JBBI) ◽

10.29122/jbbi.v4i2.2589 ◽

2017 ◽

Vol 4 (2) ◽

pp. 85

Author(s):

. Firdayani ◽

Susi Kusumaningrum ◽

Yosephine Ria Miranti

Keyword(s):

Molecular Docking ◽

Amino Acid ◽

Hepatitis B ◽

Protein Data Bank ◽

Bioactive Compounds ◽

Core Protein ◽

Data Bank ◽

Amino Acid Residues ◽

Virus Hepatitis ◽

Molegro Virtual Docker

Potency of Plant Bioactive Compounds from the Genus Phyllanthus as Hepatitis B Virus Replication InhibitorIn this research, simulations of molecular docking of Phyllanthus bioactive compounds were performed into the core protein of HBV. This simulation aimed to predict the interaction between compounds with virus core protein causing disruption of capsid formation and inhibiting its replication. The docking simulation was completed by Molegro Virtual Docker 6.0. The 3D stable conformation of molecule structures were docked into HBV core protein downloaded from Protein Data Bank, then the results were analyzed to view the minimum energy and interactions that occurred. The coordinate docking was done at the same coordinate as the previously docked reference ligand position and was validated. From the results it was known that repandusinic acid formed the most stable affinity bond with amino acid residues of viral core proteins. Interaction of B chain forming hydrogen bonds with the amino acid residues of Thr 33, Trp 102, Phe 23, Leu 140, Tyr 118 and Ser 141, and C chain with Thr 128, Val 124 and Glu 117.These compounds can be used as marker for anti HBV.Keyword: Bioactive compounds, core protein, HBV , molecular docking, Phyllanthus ABSTRAKPada penelitian ini dilakukan simulasi penambatan molekul senyawa-senyawa bioaktif Phyllanthus ke dalam protein inti virus hepatitis B. Simulasi ini bertujuan untuk memprediksi interaksi terbentuk antara senyawa dengan protein yang menyebabkan terganggunya pembentukan kapsid virus dan menghambat replikasinya. Simulasi penambatan molekul dilakukan menggunakan program Molegro Virtual Docker 6.0. Sebagai reseptor target digunakan struktur 3D protein inti yang diunduh dari Protein Data Bank. Posisi penambatan dilakukan pada koordinat yang sama dengan posisi ligan referensi yang sudah tertambat sebelumnya dan tervalidasi. Dari hasil simulasi diketahui bahwa asam repandusinat membentuk komplek dengan energi afinitas ikatan yang paling kecil dengan residu asam amino protein inti virus. Interaksi terjadi dengan rantai B yang membentuk ikatan hidrogen dengan asam amino Thr 33, Trp 102, Phe 23, Leu 140, Tyr 118 dan Ser 141, dan rantai C dengan asam amino Thr 128, Val 124 dan Glu 117. Senyawa ini dapat dijadikan sebagai marka untuk anti VHB.Kata kunci: Penambatan molekul, Phyllanthus, protein inti, senyawa bioaktif, VHBReceived: 11 December 2017 Accepted: 27 December 2017 Published: 31 December 2017

Download Full-text

On the evolution of protein–adenine binding

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1911349117 ◽

2020 ◽

Vol 117 (9) ◽

pp. 4701-4709 ◽

Cited By ~ 4

Author(s):

Aya Narunsky ◽

Amit Kessel ◽

Ron Solan ◽

Vikram Alva ◽

Rachel Kolodny ◽

...

Keyword(s):

Amino Acid ◽

Molecular Recognition ◽

Hydrogen Bonds ◽

Protein Data Bank ◽

Data Bank ◽

Computational Pipeline ◽

Specific Amino Acid ◽

Proteins Interactions

Proteins’ interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein–adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein–adenine interactions in the Watson–Crick edge of adenine and shows that all of adenine’s edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments (“themes”) that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.

Download Full-text

Strength and Character of R–X···π Interactions Involving Aromatic Amino Acid Sidechains in Protein-Ligand Complexes Derived from Crystal Structures in the Protein Data Bank

Crystals ◽

10.3390/cryst7090273 ◽

2017 ◽

Vol 7 (9) ◽

pp. 273 ◽

Cited By ~ 5

Author(s):

Kevin Riley ◽

Khanh-An Tran

Keyword(s):

Amino Acid ◽

Crystal Structures ◽

Protein Data Bank ◽

Aromatic Amino Acid ◽

Data Bank ◽

Π Interactions

Download Full-text

Making glycoproteins a little bit sweeter withPDB-REDO

Acta Crystallographica Section F Structural Biology Communications ◽

10.1107/s2053230x18004016 ◽

2018 ◽

Vol 74 (8) ◽

pp. 463-472 ◽

Cited By ~ 8

Author(s):

Bart van Beusekom ◽

Thomas Lütteke ◽

Robbie P. Joosten

Keyword(s):

Experimental Data ◽

Amino Acid ◽

Protein Data Bank ◽

Model Building ◽

Data Bank ◽

Structure Model ◽

Amino Acid Residues ◽

Post Translational Modification ◽

High Quality ◽

Glycoprotein Structure

Glycosylation is one of the most common forms of protein post-translational modification, but is also the most complex. Dealing with glycoproteins in structure model building, refinement, validation and PDB deposition is more error-prone than dealing with nonglycosylated proteins owing to limitations of the experimental data and available software tools. Also, experimentalists are typically less experienced in dealing with carbohydrate residues than with amino-acid residues. The results of the reannotation and re-refinement byPDB-REDOof 8114 glycoprotein structure models from the Protein Data Bank are analyzed. The positive aspects of 3620 reannotations and subsequent refinement, as well as the remaining challenges to obtaining consistently high-quality carbohydrate models, are discussed.

Download Full-text