scholarly journals The importance of residue-level filtering, and the Top2018 best-parts dataset of high-quality protein residues

2021 ◽  
Author(s):  
Christopher J. Williams ◽  
David C. Richardson ◽  
Jane S. Richardson

AbstractWe have curated a high-quality, “best parts” reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting pre-filtered data typically contains the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 pre-filtered datasets have been released on the Zenodo online web service and is freely available for all uses under a Creative Commons license. Currently, one dataset is residue-filtered on mainchain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at 4 different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore the open distribution of these very large, pre-filtered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.The Top2018 dataset provides the first representative sample of 3D protein structure for which excellence of experimental data constrains the detailed local conformation to be correct for essentially all 3 million residues included. Earlier generations of residue-filtered datasets were central in developing MolProbity validation used worldwide, and now Zenodo has enabled anyone to use out latest version as a sound basis for structural bioinformatics, protein design, prediction, improving biomedically important structures, or other applications.

F1000Research ◽  
2020 ◽  
Vol 9 ◽  
pp. 278
Author(s):  
Christine Orengo ◽  
Sameer Velankar ◽  
Shoshana Wodak ◽  
Vincent Zoete ◽  
Alexandre M.J.J. Bonvin ◽  
...  

Structural bioinformatics provides the scientific methods and tools to analyse, archive, validate, and present the biomolecular structure data generated by the structural biology community. It also provides an important link with the genomics community, as structural bioinformaticians also use the extensive sequence data to predict protein structures and their functional sites. A very broad and active community of structural bioinformaticians exists across Europe, and 3D-Bioinfo will establish formal platforms to address their needs and better integrate their activities and initiatives. Our mission will be to strengthen the ties with the structural biology research communities in Europe covering life sciences, as well as chemistry and physics and to bridge the gap between these researchers in order to fully realize the potential of structural bioinformatics. Our Community will also undertake dedicated educational, training and outreach efforts to facilitate this, bringing new insights and thus facilitating the development of much needed innovative applications e.g. for human health, drug and protein design. Our combined efforts will be of critical importance to keep the European research efforts competitive in this respect. Here we highlight the major European contributions to the field of structural bioinformatics, the most pressing challenges remaining and how Europe-wide interactions, enabled by ELIXIR and its platforms, will help in addressing these challenges and in coordinating structural bioinformatics resources across Europe. In particular, we present recent activities and future plans to consolidate an ELIXIR 3D-Bioinfo Community in structural bioinformatics and propose means to develop better links across the community. These include building new consortia, organising workshops to establish data standards and seeking community agreement on benchmark data sets and strategies. We also highlight existing and planned collaborations with other ELIXIR Communities and other European infrastructures, such as the structural biology community supported by Instruct-ERIC, with whom we have synergies and overlapping common interests.


2020 ◽  
Vol 3 (1) ◽  
pp. 5-18
Author(s):  
Max Dratwa ◽  
Christian Verger

En Janvier 2020 la Société Internationale de Dialyse Péritonéale a publié en "open access" ses dernières recommandations pour prescrire une dialyse péritonéale de haute qualité dirigée par un objectif . Ces recommandations sont un guide important pour les équipes médicales, infirmières de tous les pays.  Elles sont d'emblée traduite en pluiseurs langus afin d'assurer la meilleure diffusion possible. Comme lors de précédents récommandations  le Registre de Dialyse Péritonéale de Langue Française  (RDPLF) a assuré la traduction de ce texte. Pour toute référence dans une publication, il est indispensable que seul le texte original soit cité :International Society for Peritoneal Dialysis practice recommendations: Prescribing high-quality goal-directed peritoneal dialysisEdwina A Brown, Peter G Blake, Neil Boudville et al. https://journals.sagepub.com/doi/10.1177/0896860819895364Au nom de la communauté néphrologique francophone nous remercions chaleureusement l'ISPD de nous avoir accordé l'autorisation de réaliser cette traduction. Cette traduction adhère au copyright de la version originale anglaise. Ce(tte) œuvre est mise à disposition selon les termes de la Licence Creative Commons Attribution - Pas d'Utilisation Commerciale - Pas de Modification 4.0 International.


2015 ◽  
Author(s):  
Dan Ofer ◽  
Nadav Brandes ◽  
Michal Linial

Determining residue level protein properties, such as the sites for post-translational modifications (PTMs) are vital to understanding proteins at all levels of function. Experimental methods are costly and time-consuming, thus high confidence predictions become essential for functional knowledge at a genomic scale. Traditional computational methods based on strict rules (e.g. regular expressions) fail to annotate sites that lack substantial similarity. Thus, Machine Learning (ML) methods become fundamental in annotating proteins with unknown function. We present ASAP (Amino-acid Sequence Annotation Prediction), a universal ML framework for residue-level predictions. ASAP extracts efficiently and fast large set of window-based features from raw sequences. The platform also supports easy integration of external features such as secondary structure or PSSM profiles. The features are then combined to train underlying ML classifiers. We present a detailed case study for ASAP that was used to train CleavePred, a state-of-the-art protein precursor cleavage sites predictor. Protein cleavage is a fundamental PTM shared by a wide variety of protein groups with minimal sequence similarity. Current computational methods have high false positive rates, making them suboptimal for this task. CleavePred has a simple Python API, and is freely accessible via a web- based application. The high performance of ASAP toward the task of precursor cleavage is suited for analyzing new proteomes at a genomic scale. The tool is attractive to protein design, mass spectrometry search engines and the discovery of new peptide hormones. In summary, we illustrate ASAP as an entry point for predicting PTMs. The approach and flexibility of the platform can easily be extended for additional residue specific tasks. ASAP and CleavePred source code available at https://github.com/ddofer/asap.


2013 ◽  
Vol 4 (1) ◽  
pp. 1-30 ◽  
Author(s):  
Peter C. R. Lane ◽  
Fernand Gobet

Abstract Creating robust, reproducible and optimal computational models is a key challenge for theorists in many sciences. Psychology and cognitive science face particular challenges as large amounts of data are collected and many models are not amenable to analytical techniques for calculating parameter sets. Particular problems are to locate the full range of acceptable model parameters for a given dataset, and to confirm the consistency of model parameters across different datasets. Resolving these problems will provide a better understanding of the behaviour of computational models, and so support the development of general and robust models. In this article, we address these problems using evolutionary algorithms to develop parameters for computational models against multiple sets of experimental data; in particular, we propose the ‘speciated non-dominated sorting genetic algorithm’ for evolving models in several theories. We discuss the problem of developing a model of categorisation using twenty-nine sets of data and models drawn from four different theories. We find that the evolutionary algorithms generate high quality models, adapted to provide a good fit to all available data.


Author(s):  
Sung K. Koh ◽  
G. K. Ananthasuresh

The sequence of 20 types of amino acid residues in a heteropolymer chain of a protein is believed to be the basis for the 3-D conformation (folded structure) that a protein assumes to serve its functions. We present a deterministic optimization method to design the sequence of a simplified model of proteins for a desired conformation. A design methodology developed for the topology optimization of compliant mechanisms is adapted here by converting the discrete combinatorial problem of protein sequence design to a continuous optimization problem. It builds upon our recent work which used a minimum energy criterion on a deterministic approach to protein design using continuous models. This paper focuses on the energy gap criterion, which is argued to be one of the most important characteristics determining the stable folding of a protein chain. The concepts, methodology, and illustrative examples are presented using HP models of proteins where only two types (H: hydrophobic and P: polar) of monomers are considered instead of 20. The highlight of the method presented in this paper is the drastic reduction in computational costs.


Biotechnology ◽  
2019 ◽  
pp. 322-343
Author(s):  
Dariusz Mrozek

Bioinformatics as a scientific domain develops tools that enable understanding the wealth of information hidden in huge volumes of biological data. However, there are several problems in bioinformatics that, although already solved or at least equipped with promising algorithms, still require huge computing power in order to be completed in a reasonable time. Cloud computing responds to these demands. This chapter shows several cloud-based computing architectures for solving hot issues in structural bioinformatics, such as protein structure similarity searching or 3D protein structure prediction. Presented architectures have been implemented in Microsoft Azure public cloud and tested in several projects developed by Cloud4Proteins research group.


Author(s):  
Dariusz Mrozek

Bioinformatics as a scientific domain develops tools that enable understanding the wealth of information hidden in huge volumes of biological data. However, there are several problems in bioinformatics that, although already solved or at least equipped with promising algorithms, still require huge computing power in order to be completed in a reasonable time. Cloud computing responds to these demands. This chapter shows several cloud-based computing architectures for solving hot issues in structural bioinformatics, such as protein structure similarity searching or 3D protein structure prediction. Presented architectures have been implemented in Microsoft Azure public cloud and tested in several projects developed by Cloud4Proteins research group.


Sign in / Sign up

Export Citation Format

Share Document