scholarly journals DNAproDB: an expanded database and web-based tool for structural analysis of DNA–protein complexes

Author(s):  
Jared M Sagendorf ◽  
Nicholas Markarian ◽  
Helen M Berman ◽  
Remo Rohs

Abstract DNAproDB (https://dnaprodb.usc.edu) is a web-based database and structural analysis tool that offers a combination of data visualization, data processing and search functionality that improves the speed and ease with which researchers can analyze, access and visualize structural data of DNA–protein complexes. In this paper, we report significant improvements made to DNAproDB since its initial release. DNAproDB now supports any DNA secondary structure from typical B-form DNA to single-stranded DNA to G-quadruplexes. We have updated the structure of our data files to support complex DNA conformations, multiple DNA–protein complexes within a DNAproDB entry and model indexing for analysis of ensemble data. Support for chemically modified residues and nucleotides has been significantly improved along with the addition of new structural features, improved structural moiety assignment and use of more sequence-based annotations. We have redesigned our report pages and search forms to support these enhancements, and the DNAproDB website has been improved to be more responsive and user-friendly. DNAproDB is now integrated with the Nucleic Acid Database, and we have increased our coverage of available Protein Data Bank entries. Our database now contains 95% of all available DNA–protein complexes, making our tools for analysis of these structures accessible to a broad community.

2008 ◽  
Vol 3 ◽  
pp. ACI.S551 ◽  
Author(s):  
John Geraldine Sandana Mala ◽  
Satoru Takeuchi

The structural elucidations of microbial lipases have been of prime interest since the 1980s. Knowledge of structural features plays an important role in designing and engineering lipases for specific purposes. Significant structural data have been presented for few microbial lipases, while, there is still a structure-deficit, that is, most lipase structures are yet to be resolved. A search for ‘lipase structure’ in the RCSB Protein Data Bank ( http://www.rcsb.org/pdb/ ) returns only 93 hits (as of September 2007) and, the NCBI database ( http://www.ncbi.nlm.nih.gov ) reports 89 lipase structures as compared to 14719 core nucleotide records. It is therefore worthwhile to consider investigations on the structural analysis of microbial lipases. This review is intended to provide a collection of resources on the instrumental, chemical and bioinformatics approaches for structure analyses. X-ray crystallography is a versatile tool for the structural biochemists and is been exploited till today. The chemical methods of recent interests include molecular modeling and combinatorial designs. Bioinformatics has surged striking interests in protein structural analysis with the advent of innumerable tools. Furthermore, a literature platform of the structural elucidations so far investigated has been presented with detailed descriptions as applicable to microbial lipases. A case study of Candida rugosa lipase (CRL) has also been discussed which highlights important structural features also common to most lipases. A general profile of lipase has been vividly described with an overview of lipase research reviewed in the past.


Molecules ◽  
2019 ◽  
Vol 24 (1) ◽  
pp. 179 ◽  
Author(s):  
Dariusz Mrozek ◽  
Tomasz Dąbek ◽  
Bożena Małysiak-Mrozek

Calculation of structural features of proteins, nucleic acids, and nucleic acid-protein complexes on the basis of their geometries and studying various interactions within these macromolecules, for which high-resolution structures are stored in Protein Data Bank (PDB), require parsing and extraction of suitable data stored in text files. To perform these operations on large scale in the face of the growing amount of macromolecular data in public repositories, we propose to perform them in the distributed environment of Azure Data Lake and scale the calculations on the Cloud. In this paper, we present dedicated data extractors for PDB files that can be used in various types of calculations performed over protein and nucleic acids structures in the Azure Data Lake. Results of our tests show that the Cloud storage space occupied by the macromolecular data can be successfully reduced by using compression of PDB files without significant loss of data processing efficiency. Moreover, our experiments show that the performed calculations can be significantly accelerated when using large sequential files for storing macromolecular data and by parallelizing the calculations and data extractions that precede them. Finally, the paper shows how all the calculations can be performed in a declarative way in U-SQL scripts for Data Lake Analytics.


2002 ◽  
Vol 11 (03) ◽  
pp. 369-387 ◽  
Author(s):  
PETRI MYLLYMÄKI ◽  
TOMI SILANDER ◽  
HENRY TIRRI ◽  
PEKKA URONEN

B-Course is a free web-based online data analysis tool, which allows the users to analyze their data for multivariate probabilistic dependencies. These dependencies are represented as Bayesian network models. In addition to this, B-Course also offers facilities for inferring certain type of causal dependencies from the data. The software uses a novel "tutorial stylerdquo; user-friendly interface which intertwines the steps in the data analysis with support material that gives an informal introduction to the Bayesian approach adopted. Although the analysis methods, modeling assumptions and restrictions are totally transparent to the user, this transparency is not achieved at the expense of analysis power: with the restrictions stated in the support material, B-Course is a powerful analysis tool exploiting several theoretically elaborate results developed recently in the fields of Bayesian and causal modeling. B-Course can be used with most web-browsers (even Lynx), and the facilities include features such as automatic missing data handling and discretization, a flexible graphical interface for probabilistic inference on the constructed Bayesian network models (for Java enabled browsers), automatic prettyHyphen;printed layout for the networks, exportation of the models, and analysis of the importance of the derived dependencies. In this paper we discuss both the theoretical design principles underlying the B-Course tool, and the pragmatic methods adopted in the implementation of the software.


2020 ◽  
Author(s):  
Joeri van Strien ◽  
Alexander Haupt ◽  
Uwe Schulte ◽  
Hans-Peter Braun ◽  
Alfredo Cabrero-Orefice ◽  
...  

Complexome profiling is an emerging 'omics approach that systematically interrogates the composition of protein complexes (the complexome) of a sample, by combining biochemical separation of native protein complexes with mass-spectrometry based quantitation proteomics. The resulting fractionation profiles hold comprehensive information on the abundance and composition of the complexome, and have a high potential for reuse by experimental and computational researchers. However, the lack of a central resource that provides access to these data, reported with adequate descriptions and an analysis tool, has limited their reuse. Therefore, we established the ComplexomE profiling DAta Resource (CEDAR, www3.cmbi.umcn.nl/cedar/), an openly accessible database for depositing and exploring mass spectrometry data from complexome profiling studies. Compatibility and reusability of the data is ensured by a standardized data and reporting format containing the "minimum information required for a complexome profiling experiment" (MIACE). The data can be accessed through a user-friendly web interface, as well as programmatically using the REST API portal. Additionally, all complexome profiles available on CEDAR can be inspected directly on the website with the profile viewer tool that allows the detection of correlated profile sand inference of potential complexes. In conclusion, CEDAR is a unique,growing and invaluable resource for the study of protein complex composition and dynamics across biological systems.


2017 ◽  
Author(s):  
Amir I. Mina ◽  
Raymond A. LeClair ◽  
Katherine B. LeClair ◽  
David E. Cohen ◽  
Louise Lantier ◽  
...  

AbstractWe report a web-based tool for analysis of indirect calorimetry experiments which measure physiological energy balance. CalR easily imports raw data files, generates plots, and determines the most appropriate statistical tests for interpretation. Analysis with the general linear model (which includes ANOVA and ANCOVA) allows for flexibility to interpret experiments of obesity and thermogenesis. Users may also produce standardized output files of an experiment which can be shared and subsequently re-evaluated using CalR. This framework will provide the transparency necessary to enhance consistency and reproducibility in experiments of energy expenditure. CalR analysis software will greatly increase the speed and efficiency with which metabolic experiments can be organized, analyzed according to accepted norms, and reproduced—and will likely become a standard tool for the field. CalR is accessible at https://CalR.bwh.harvard.edu.Graphical Abstract


2020 ◽  
Vol 36 (2) ◽  
pp. 28-32
Author(s):  
Vinay Kumar ◽  
Sarita Rani ◽  
Ram Niwas ◽  
O.P. Sheoran ◽  
Komal Malik

In biological and field experiments, the Augmented Randomized Complete Block Design (ARCBD) is widely used for screening and selection of a large number of germplasm lines/varieties/entries/test treatments with non replicated test treatments and replicated control treatments to estimate the experimental error. A web based online module for analysis of ARCBD was developed using scripting language Active Server Pages (ASP) based on server client architecture. The data have been taken from Federer (1956) and output compared accordingly. The outputs produced by the module are in agreement with the output generated from SAS package. An attempt was made to provide a user friendly interface for entering/pasting the data, characters names, number of observations and number of characters for analysis of augmented randomized complete block design. The module produces different output tables such as check x block table, block effects, control means and control effects, adjusted mean for test genotypes and genotypic effects. It also computes sum of squares in the analysis of variance tables after ignoring/eliminating treatment and eliminating/ignoring blocks for block and treatment effects, respectively. Critical difference table for comparing different mean differences at 5% and 1% level of significance is also given. A complete procedure is also provided in the help file to make a user friendly interface for analysis of the design.


2021 ◽  
Author(s):  
Mehmet Akdel ◽  
Douglas EV Pires ◽  
Eduard Porta-Pardo ◽  
Jurgen Janes ◽  
Arthur O Zalevsky ◽  
...  

Most proteins fold into 3D structures that determine how they function and orchestrate the biological processes of the cell. Recent developments in computational methods have led to protein structure predictions that have reached the accuracy of experimentally determined models. While this has been independently verified, the implementation of these methods across structural biology applications remains to be tested. Here, we evaluate the use of AlphaFold 2 (AF2) predictions in the study of characteristic structural elements; the impact of missense variants; function and ligand binding site predictions; modelling of interactions; and modelling of experimental structural data. For 11 proteomes, an average of 25% additional residues can be confidently modelled when compared to homology modelling, identifying structural features rarely seen in the PDB. AF2-based predictions of protein disorder and protein complexes surpass state-of-the-art tools and AF2 models can be used across diverse applications equally well compared to experimentally determined structures, when the confidence metrics are critically considered. In summary, we find that these advances are likely to have a transformative impact in structural biology and broader life science research.


2021 ◽  
Author(s):  
Ian Kotthoff ◽  
Petras Kundrotas ◽  
Ilya Vakser

Membrane proteins play essential role in cellular mechanisms. Despite that and the major progress in experimental structure determination, they are still significantly underrepresented in Protein Data Bank. Thus, computational approaches to protein structure determination, which are important in general, are especially valuable in the case of membrane proteins and protein-protein assemblies. Due to a number of reasons, not the least of which is much greater availability of structural data, the main focus of structure prediction techniques has been on soluble proteins. Structure prediction of protein-protein complexes is a well-developed field of study. However, because of the differences in physicochemical environment in the membranes and the spatial constraints of the membranes, the generic protein-protein docking approaches are not optimal for the membrane proteins. Thus, specialized computational methods for docking of the membrane proteins must be developed. Development and benchmarking of such methods requires high-quality datasets of membrane protein-protein complexes. In this study we present a new dataset of 456 non-redundant alpha helical binary complexes. The set is significantly larger and more representative than previously developed ones. In the future, this set will become the basis for the development of docking and scoring benchmarks, similar to the ones developed for soluble proteins in the DOCKGROUND resource http://dockground.compbio.ku.edu.


2021 ◽  
Vol 22 (21) ◽  
pp. 11627
Author(s):  
Narcis Fernandez-Fuentes ◽  
Ruben Molina ◽  
Baldo Oliva

The angiotensin-converting enzyme 2 (ACE2) is the receptor used by SARS-CoV and SARS-CoV-2 coronaviruses to attach to cells via the receptor-binding domain (RBD) of their viral spike protein. Since the start of the COVID-19 pandemic, several structures of protein complexes involving ACE2 and RBD as well as monoclonal antibodies and nanobodies have become available. We have leveraged the structural data to design peptides to target the interaction between the RBD of SARS-CoV-2 and ACE2 and SARS-CoV and ACE2, as contrasting exemplar, as well as the dimerization surface of ACE2 monomers. The peptides were modelled using our original method: PiPreD that uses native elements of the interaction between the targeted protein and cognate partner(s) that are subsequently included in the designed peptides. These peptides recapitulate stretches of residues present in the native interface plus novel and highly diverse conformations surrogating key interactions at the interface. To facilitate the access to this information we have created a freely available and dedicated web-based repository, PepI-Covid19 database, providing convenient access to this wealth of information to the scientific community with the view of maximizing its potential impact in the development of novel therapeutic and diagnostic agents.


2021 ◽  
Author(s):  
Ruihan Zhang ◽  
Shoupeng Ren ◽  
Qi Dai ◽  
Tianze Shen ◽  
Xiaoli Li ◽  
...  

Abstract Natural products (NPs) are a valuable source for anti-inflammatory drug discovery. However, they are limited by the unpredictability of the structures and functions. Therefore, computational and data-driven pre-evaluation could enable more efficient NP-inspired drug development. Since NPs possess structural features that differ from synthetic compounds, models trained with synthetic compounds may not perform well with NPs. There is also an urgent demand for well-curated databases and user-friendly predictive tools. We presented a comprehensive online web platform (InflamNat, http://www.inflamnat.com/ or http://39.104.56.4/) for anti-inflammatory natural product research. InflamNat is a database containing the physicochemical properties, cellular anti-inflammatory bioactivities, and molecular targets of 1351 NPs that tested on their anti-inflammatory activities. InflamNat provides two machine learning-based predictive tools specifically designed for NPs that (a) predict the anti-inflammatory activity of NPs, and (b) predict the compound-target relationship for compounds and targets collected in the database but lacking existing relationship data. A novel multi-tokenization transformer model (MTT) was proposed as the sequential encoder for both predictive tools to obtain a high-quality representation of sequential data. Experimental results demonstrated that the proposed predictive tools achieved the desired performance in terms of AUC.


Sign in / Sign up

Export Citation Format

Share Document