scholarly journals Random Sampling of the Protein Data Bank - RaSPDB

Author(s):  
Oliviero Carugo

Abstract A novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F – the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets –7000 protein chains – is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.

2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Oliviero Carugo

AbstractA novel and simple procedure (RaSPDB) for Protein Data Bank mining is described. 10 PDB subsets, each containing 7000 randomly selected protein chains, are built and used to make 10 estimations of the average value of a generic feature F—the length of the protein chain, the amino acid composition, the crystallographic resolution, and the secondary structure composition. These 10 estimations are then used to compute an average estimation of F together with its standard error. It is heuristically verified that the dimension of these 10 subsets—7000 protein chains—is sufficiently small to avoid redundancy within each subset and sufficiently large to guarantee stable estimations amongst different subsets. RaSPDB has two major advantages over classical procedures aimed to build a single, non-redundant PDB subset: a larger fraction of the information stored in the PDB is used and an estimation of the standard error of F is possible.


2019 ◽  
Author(s):  
Sean M. Cascarina ◽  
Mikaela R. Elder ◽  
Eric D. Ross

AbstractA variety of studies have suggested that low-complexity domains (LCDs) tend to be intrinsically disordered and are relatively rare within structured proteins in the protein data bank (PDB). Although LCDs are often treated as a single class, we previously found that LCDs enriched in different amino acids can exhibit substantial differences in protein metabolism and function. Therefore, we wondered whether the structural conformations of LCDs are likewise dependent on which specific amino acids are enriched within each LCD. Here, we directly examined relationships between enrichment of individual amino acids and secondary structure preferences across the entire PDB proteome. Secondary structure preferences varied as a function of the identity of the amino acid enriched and its degree of enrichment. Furthermore, divergence in secondary structure profiles often occurred for LCDs enriched in physicochemically similar amino acids (e.g. valine vs. leucine), indicating that LCDs composed of related amino acids can have distinct secondary structure preferences. Comparison of LCD secondary structure preferences with numerous pre-existing secondary structure propensity scales resulted in relatively poor correlations for certain types of LCDs, indicating that these scales may not capture secondary structure preferences as sequence complexity decreases. Collectively, these observations provide a highly resolved view of structural preferences among LCDs parsed by the nature and magnitude of single amino acid enrichment.Author SummaryThe structures that proteins adopt are directly related to their amino acid sequences. Low-complexity domains (LCDs) in protein sequences are unusual regions made up of only a few different types of amino acids. Although this is the key feature that classifies sequences as LCDs, the physical properties of LCDs will differ based on the types of amino acids that are found in each domain. For example, the sequences “AAAAAAAAAA”, “EEEEEEEEEE”, and “EEKRKEEEKE” will have very different properties, even though they would all be classified as LCDs by traditional methods. In a previous study, we developed a new method to further divide LCDs into categories that more closely reflect the differences in their physical properties. In this study, we apply that approach to examine the structures of LCDs when sorted into different categories based on their amino acids. This allowed us to define relationships between the types of amino acids in the LCDs and their corresponding structures. Since protein structure is closely related to protein function, this has important implications for understanding the basic functions and properties of LCDs in a variety of proteins.


2013 ◽  
Vol 41 (W1) ◽  
pp. W432-W440 ◽  
Author(s):  
Nurul Nadzirin ◽  
Peter Willett ◽  
Peter J. Artymiuk ◽  
Mohd Firdaus-Raih

2015 ◽  
Vol 36 ◽  
pp. 1560018
Author(s):  
Wilson I. Barredo ◽  
Jinky B. Bornales ◽  
Christopher C. Bernido ◽  
Henry P. Aringa

The winding probability function for a biopolymer diffusing in a crowded cell is obtained with the drift coefficient f(s) involving Bessel functions of general form f(s) = kJ2p+1 (νs). The variable s is the length along the chain and ν is a constant which can be used to simulate the frequency of appearance of a certain type of amino acid. Application of a particular case p = 3 to protein chains is carried out for different alpha helical proteins found in the Protein Data Bank (PDB). Analysis of our results leads us to an empirical formula that can be used to conveniently predict k/D and ν, where D is the diffusion coefficient of various α-helical proteins in solvents.


2017 ◽  
Vol 4 (2) ◽  
pp. 85
Author(s):  
. Firdayani ◽  
Susi Kusumaningrum ◽  
Yosephine Ria Miranti

Potency of Plant Bioactive Compounds from the Genus Phyllanthus as Hepatitis B Virus Replication InhibitorIn this research, simulations of molecular docking of Phyllanthus bioactive compounds were performed into the core protein of HBV. This simulation aimed to predict the interaction between compounds with virus core protein causing disruption of capsid formation and inhibiting its replication. The docking simulation was completed by Molegro Virtual Docker 6.0. The 3D stable conformation of molecule structures were docked into HBV core protein downloaded from Protein Data Bank, then the results were analyzed to view the minimum energy and interactions that occurred. The coordinate docking was done at the same coordinate as the previously docked reference ligand position and was validated. From the results it was known that repandusinic acid formed the most stable affinity bond with amino acid residues of viral core proteins. Interaction of B chain forming hydrogen bonds with the amino acid residues of Thr 33, Trp 102, Phe 23, Leu 140, Tyr 118 and Ser 141, and C chain with Thr 128, Val 124 and Glu 117.These compounds can be used as marker for anti HBV.Keyword: Bioactive compounds, core protein, HBV , molecular docking, Phyllanthus ABSTRAKPada penelitian ini dilakukan simulasi penambatan molekul senyawa-senyawa bioaktif Phyllanthus ke dalam protein inti virus hepatitis B. Simulasi ini bertujuan untuk memprediksi interaksi terbentuk antara senyawa dengan protein yang menyebabkan terganggunya pembentukan kapsid virus dan menghambat replikasinya. Simulasi penambatan molekul dilakukan menggunakan program Molegro Virtual Docker 6.0. Sebagai reseptor target digunakan struktur 3D protein inti yang diunduh dari Protein Data Bank. Posisi penambatan dilakukan pada koordinat yang sama dengan posisi ligan referensi yang sudah tertambat sebelumnya dan tervalidasi. Dari hasil simulasi diketahui bahwa asam repandusinat membentuk komplek dengan energi afinitas ikatan yang paling kecil dengan residu asam amino protein inti virus. Interaksi terjadi dengan rantai B yang membentuk ikatan hidrogen dengan asam amino Thr 33, Trp 102, Phe 23, Leu 140, Tyr 118 dan Ser 141, dan rantai C dengan asam amino Thr 128, Val 124 dan Glu 117. Senyawa ini dapat dijadikan sebagai marka untuk anti VHB.Kata kunci: Penambatan molekul, Phyllanthus, protein inti, senyawa bioaktif, VHBReceived: 11 December 2017                 Accepted: 27 December 2017           Published: 31 December 2017 


2020 ◽  
Vol 117 (9) ◽  
pp. 4701-4709 ◽  
Author(s):  
Aya Narunsky ◽  
Amit Kessel ◽  
Ron Solan ◽  
Vikram Alva ◽  
Rachel Kolodny ◽  
...  

Proteins’ interactions with ancient ligands may reveal how molecular recognition emerged and evolved. We explore how proteins recognize adenine: a planar rigid fragment found in the most common and ancient ligands. We have developed a computational pipeline that extracts protein–adenine complexes from the Protein Data Bank, structurally superimposes their adenine fragments, and detects the hydrogen bonds mediating the interaction. Our analysis extends the known motifs of protein–adenine interactions in the Watson–Crick edge of adenine and shows that all of adenine’s edges may contribute to molecular recognition. We further show that, on the proteins' side, binding is often mediated by specific amino acid segments (“themes”) that recur across different proteins, such that different proteins use the same themes when binding the same adenine-containing ligands. We identify numerous proteins that feature these themes and are thus likely to bind adenine-containing ligands. Our analysis suggests that adenine binding has emerged multiple times in evolution.


Author(s):  
Bart van Beusekom ◽  
Thomas Lütteke ◽  
Robbie P. Joosten

Glycosylation is one of the most common forms of protein post-translational modification, but is also the most complex. Dealing with glycoproteins in structure model building, refinement, validation and PDB deposition is more error-prone than dealing with nonglycosylated proteins owing to limitations of the experimental data and available software tools. Also, experimentalists are typically less experienced in dealing with carbohydrate residues than with amino-acid residues. The results of the reannotation and re-refinement byPDB-REDOof 8114 glycoprotein structure models from the Protein Data Bank are analyzed. The positive aspects of 3620 reannotations and subsequent refinement, as well as the remaining challenges to obtaining consistently high-quality carbohydrate models, are discussed.


Sign in / Sign up

Export Citation Format

Share Document