Computational Methods for Protein Fold Prediction: an Ab-initio Topological Approach

Author(s):  
G. Ceci ◽  
A. Mucherino ◽  
M. D’Apuzzo ◽  
D. Di Serafino ◽  
S. Costantini ◽  
...  
2020 ◽  
Vol 10 (5) ◽  
pp. 6306-6316

Protein fold prediction is a milestone step towards predicting protein tertiary structure from protein sequence. It is considered one of the most researched topics in the area of Computational Biology. It has applications in the area of structural biology and medicines. Extracting sensitive features for prediction is a key step in protein fold prediction. The actionable features are extracted from keywords of sequence header and secondary structure representations of protein sequence. The keywords holding species information are used as features after verifying with uniref100 dataset using TaxId. Prominent patterns are identified experimentally based on the nature of protein structural class and protein fold. Global and native features are extracted capturing the nature of patterns experimentally. It is found that keywords based features have positive correlation with protein folds. Keywords indicating species are important for observing functional differences which help in guiding the prediction process. SCOPe 2.07 and EDD datasets are used. EDD is a benchmark dataset and SCOPe 2.07 is the latest and largest dataset holding astral protein sequences. The training set of SCOPe 2.07 is trained using 93 dimensional features vector using Random forest algorithm. The prediction results of SCOPe 2.07 test set reports the accuracy of better than 95%. The accuracy achieved on benchmark dataset EDD is better than 93%, which is best reported as per our knowledge.


Author(s):  
S. Kaur ◽  
J. Gomez-Blanco ◽  
A. Khalifa ◽  
S. Adinarayanan ◽  
R. Sanchez-Garcia ◽  
...  

AbstractCryo-electron microscopy (cryo-EM) maps usually show heterogeneous distributions of B-factors and electron density occupancies and are typically B-factor sharpened to improve their contrast and interpretability at high-resolutions. However, ‘over-sharpening’ due to the application of a single global B-factor can distort processed maps causing connected densities to appear broken and disconnected. This issue limits the interpretability of cryo-EM maps, i.e. ab initio modelling. In this work, we propose 1) approaches to enhance high-resolution features of cryo-EM maps, while preventing map distortions and 2) methods to obtain local B-factors and electron density occupancy maps. These algorithms have as common link the use of the spiral phase transformation and are called LocSpiral, LocBSharpen, LocBFactor and LocOccupancy. Our results, which include improved maps of recent SARS-CoV-2 structures, show that our methods can improve the interpretability and analysis of obtained reconstructions.


2011 ◽  
Vol 8 (1) ◽  
pp. 66-77 ◽  
Author(s):  
Tabrez Anwar Shamim Mohammad ◽  
Hampapathalu Adimurthy Nagarajaram

Summary Fold recognition, assigning novel proteins to known structures, forms an important component of the overall protein structure discovery process. The available methods for protein fold recognition are limited by the low fold-coverage and/or low prediction accuracies. We describe here a new Support Vector Machine (SVM)-based method for protein fold prediction with high prediction accuracy and high fold-coverage. The new method of fold prediction with high fold-coverage was developed by training and testing on a large number of folds in order to make the method suitable for large scale fold predictions. However, presence of large number of folds in the training set made the classification task difficult as a consequence of increased complexity involved in binary classifications of SVMs. In order to overcome this complexity we adopted a hierarchical approach where fold-prediction is made in two steps. At the first step structural class of the query is predicted and at the second step fold is predicted within the predicted structural class. This decreased the complexity of the classification problem and also improved the overall fold prediction accuracy. To the best of our knowledge this is the first taxonomic fold recognition method to cover over 700 protein-folds and gives prediction accuracy of around 70% on a benchmark dataset. Since the new method gives rise to state of the art prediction performance and hence can be very useful for structural characterization of proteins discovered in various genomes.


2010 ◽  
Vol 11 (1) ◽  
pp. 172 ◽  
Author(s):  
Jonathan J Ellis ◽  
Fabien PE Huard ◽  
Charlotte M Deane ◽  
Sheenal Srivastava ◽  
Graham R Wood
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document