scholarly journals Predicting residue solvent accessibility from protein sequence by considering the sequence environment

2000 ◽  
Vol 13 (9) ◽  
pp. 607-609 ◽  
Author(s):  
O. Carugo
2012 ◽  
Vol 19 (1) ◽  
pp. 50-56 ◽  
Author(s):  
Ganesan Pugalenthi ◽  
Krishna Kumar Kandaswamy ◽  
Kuo-Chen Chou ◽  
Saravanan Vivekanandan ◽  
Prasanna Kolatkar

2013 ◽  
Author(s):  
◽  
Xin Deng

Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our group’s MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition.


2019 ◽  
Author(s):  
Pathmanaban Ramasamy ◽  
Demet Turan ◽  
Natalia Tichshenko ◽  
Niels Hulstaert ◽  
Elien Vandermarliere ◽  
...  

AbstractProtein phosphorylation is a key post-translational modification (PTM) in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation and functional analysis of phosphosites is therefore crucial to understand their various roles. Phosphosites (P-sites) are mainly analysed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosphoproteomics data (PRIDE) to annotate all known human phosphosites. Furthermore, these sites are put into biophysical context by annotating each phosphoprotein with perresidue structural propensity, solvent accessibility, disordered probability, and early folding information. Scop3P, available at https://iomics.ugent.be/scop3p, presents a unique resource for visualization and analysis of phosphosites, and for understanding of phosphosite structure-function relationships.


2014 ◽  
Vol 2014 ◽  
pp. 1-10 ◽  
Author(s):  
So-Wei Yeh ◽  
Tsun-Tsao Huang ◽  
Jen-Wei Liu ◽  
Sung-Huan Yu ◽  
Chien-Hua Shih ◽  
...  

Functional and biophysical constraints result in site-dependent patterns of protein sequence variability. It is commonly assumed that the key structural determinant of site-specific rates of evolution is the Relative Solvent Accessibility (RSA). However, a recent study found that amino acid substitution rates correlate better with two Local Packing Density (LPD) measures, the Weighted Contact Number (WCN) and the Contact Number (CN), than with RSA. This work aims at a more thorough assessment. To this end, in addition to substitution rates, we considered four other sequence variability scores, four measures of solvent accessibility (SA), and other CN measures. We compared all properties for each protein of a structurally and functionally diverse representative dataset of monomeric enzymes. We show that the best sequence variability measures take into account phylogenetic tree topology. More importantly, we show that both LPD measures (WCN and CN) correlate better than all of the SA measures, regardless of the sequence variability score used. Moreover, the independent contribution of the best LPD measure is approximately four times larger than that of the best SA measure. This study strongly supports the conclusion that a site’s packing density rather than its solvent accessibility is the main structural determinant of its rate of evolution.


2019 ◽  
Vol 36 (1) ◽  
pp. 136-144 ◽  
Author(s):  
Peng Xiong ◽  
Xiuhong Hu ◽  
Bin Huang ◽  
Jiahai Zhang ◽  
Quan Chen ◽  
...  

Abstract Motivation The ABACUS (a backbone-based amino acid usage survey) method uses unique statistical energy functions to carry out protein sequence design. Although some of its results have been experimentally verified, its accuracy remains improvable because several important components of the method have not been specifically optimized for sequence design or in contexts of other parts of the method. The computational efficiency also needs to be improved to support interactive online applications or the consideration of a large number of alternative backbone structures. Results We derived a model to measure solvent accessibility with larger mutual information with residue types than previous models, optimized a set of rotamers which can approximate the sidechain atomic positions more accurately, and devised an empirical function to treat inter-atomic packing with parameters fitted to native structures and optimized in consistence with the rotamer set. Energy calculations have been accelerated by interpolation between pre-determined representative points in high-dimensional structural feature spaces. Sidechain repacking tests showed that ABACUS2 can accurately reproduce the conformation of native sidechains. In sequence design tests, the native residue type recovery rate reached 37.7%, exceeding the value of 32.7% for ABACUS1. Applying ABACUS2 to designed sequences on three native backbones produced proteins shown to be well-folded by experiments. Availability and implementation The ABACUS2 sequence design server can be visited at http://biocomp.ustc.edu.cn/servers/abacus-design.php. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Farisa T S ◽  
Elizabeth Isaac

Protein and DNA have vital role in our biological processes. For accurately predicting DNA binding protein, develop a new sequence based prediction method from the protein sequence. Sequence based method only considers the protein sequence information as input. For accurately predicting DBP, first develop a reliable benchmark data set from the protein data bank. Second, using Amino Acid Composition (AAC), Position Specific Scoring Matrix (PSSM), Predicted Solvent Accessibility (PSA), and Predicted Probabilities of DNA-Binding Sites (PDBS) to produce four specific protein sequence baselines. Using a differential evolution algorithm, weights of the properties are taught. Based on those attained properties, merge the characteristics with weights to create an original super feature. And tensor-flow is used to paralyze the weights. A suitable feature selection algorithm of tensor flow’s binary classifier is used to extract the excellent subset from weighted feature vector. The training sample set is obtained in the training process, after generating final features. The classification is learned through the support vector machine and the tensor flow. And the output is measured using a tensor surface. The choice is done on the basis of threshold of likelihood and protein with above-threshold chance is considered to be DBP and others are non-DBP.


2021 ◽  
Author(s):  
Michael Bernhofer ◽  
Christian Dallago ◽  
Tim Karl ◽  
Venkata Satagopam ◽  
Michael Heinzinger ◽  
...  

AbstractSince 1992 PredictProtein (https://predictprotein.org) is a one-stop online resource for protein sequence analysis with its main site hosted at the Luxembourg Centre for Systems Biomedicine (LCSB) and queried monthly by over 3,000 users in 2020. PredictProtein was the first Internet server for protein predictions. It pioneered combining evolutionary information and machine learning. Given a protein sequence as input, the server outputs multiple sequence alignments, predictions of protein structure in 1D and 2D (secondary structure, solvent accessibility, transmembrane segments, disordered regions, protein flexibility, and disulfide bridges) and predictions of protein function (functional effects of sequence variation or point mutations, Gene Ontology (GO) terms, subcellular localization, and protein-, RNA-, and DNA binding). PredictProtein’s infrastructure has moved to the LCSB increasing throughput; the use of MMseqs2 sequence search reduced runtime five-fold; user interface elements improved usability, and new prediction methods were added. PredictProtein recently included predictions from deep learning embeddings (GO and secondary structure) and a method for the prediction of proteins and residues binding DNA, RNA, or other proteins. PredictProtein.org aspires to provide reliable predictions to computational and experimental biologists alike. All scripts and methods are freely available for offline execution in high-throughput settings.AvailabilityFreely accessible webserverPredictProtein.org; Source and docker images: github.com/rostlab


Author(s):  
Arundhati Banerjee ◽  
Sujay Ray

A computationally optimized molecular analysis into the cell-fate regulations from embryonic development is one of the unexplored zones in human neurogenic field. It is governed by SOX11 (Sex determining regions-Y bOX-11) protein domain's interaction with DNA. In the present study, 3D monomer of the responsible domain of SOX11 was constructed, simulated and analyzed. Residues indulged with DNA interaction were examined. The observed conserved residue, Arg3 and Arg16 in the wild-type SOX11-DNA interaction were mutated with Ala3 and Ala16. Mutated SOX11-HMG protein sequence was re-modeled and optimized. Residue-level alteration on DNA interaction was examined. On mutation, stability of the proteins (on DNA interaction) and protein-DNA complexes were discerned via energy-calculating parameters, solvent-accessibility area, electrostatic surface-potential and conformational switching, with supportive statistical significance. Therefore, this probe provides an outlook to discern SOX11 to interact firmly with DNA via mutations and thereby perform cell-fate determinations more efficiently.


Sign in / Sign up

Export Citation Format

Share Document