scholarly journals SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity

2014 ◽  
Vol 30 (18) ◽  
pp. 2592-2597 ◽  
Author(s):  
C. N. Magnan ◽  
P. Baldi
2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Thanh Thi Nguyen ◽  
Pubudu N. Pathirana ◽  
Thin Nguyen ◽  
Quoc Viet Hung Nguyen ◽  
Asim Bhatti ◽  
...  

AbstractSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly pathogenic virus that has caused the global COVID-19 pandemic. Tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. This paper reports and analyses genomic mutations in the coding regions of SARS-CoV-2 and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. Prediction results suggest that mutation D614G in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. Based on 6324 viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of SARS-CoV-2 in many perspectives, especially in tracing the evolution and worldwide spread of the virus. Our analysis results also show that coding genes E, M, ORF6, ORF7a, ORF7b and ORF10 are most stable, potentially suitable to be targeted for vaccine and drug development.


2020 ◽  
Author(s):  
Thanh Thi Nguyen ◽  
Pubudu N. Pathirana ◽  
Thin Nguyen ◽  
Henry Nguyen ◽  
Asim Bhatti ◽  
...  

ABSTRACTSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly pathogenic virus that has caused the global COVID-19 pandemic. Tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. This paper reports and analyses genomic mutations in the coding regions of SARS-CoV-2 and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. Prediction results suggest that mutation D614G in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. Based on 6,324 viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of SARS-CoV-2 in many perspectives, especially in tracing the evolution and worldwide spread of the virus. Our analysis results also show that coding genes E, M, ORF6, ORF7a, ORF7b and ORF10 are most stable, potentially suitable to be targeted for vaccine and drug development.


2012 ◽  
Vol 1 (1) ◽  
pp. 79-87 ◽  
Author(s):  
Shangping Wang ◽  
Harriëtte Oldenhof ◽  
Andres Hilfiker ◽  
Michael Harder ◽  
Willem F. Wolkers

2019 ◽  
Author(s):  
Larry Bliss ◽  
Ben Pascoe ◽  
Samuel K Sheppard

AbstractMotivationProtein structure predictions, that combine theoretical chemistry and bioinformatics, are an increasingly important technique in biotechnology and biomedical research, for example in the design of novel enzymes and drugs. Here, we present a new ensemble bi-layered machine learning architecture, that directly builds on ten existing pipelines providing rapid, high accuracy, 3-State secondary structure prediction of proteins.ResultsAfter training on 1348 solved protein structures, we evaluated the model with four independent datasets: JPRED4 - compiled by the authors of the successful predictor with the same name, and CASP11, CASP12 & CASP13 - assembled by the Critical Assessment of protein Structure Prediction consortium who run biannual experiments focused on objective testing of predictors. These rigorous, pre-established protocols included 7-fold cross-validation and blind testing. This led to a mean Hermes accuracy of 95.5%, significantly (p<0.05) better than the ten previously published models analysed in this paper. Furthermore, Hermes yielded a reduction in standard deviation, lower boundary outliers, and reduced dependency on solved structures of homologous proteins, as measured by NEFF score. This architecture provides advantages over other pipelines, while remaining accessible to users at any level of bioinformatics experience.Availability and ImplementationThe source code for Hermes is freely available at: https://github.com/HermesPrediction/Hermes. This page also includes the cross-validation with corresponding models, and all training/testing data presented in this study with predictions and accuracy.


2013 ◽  
Author(s):  
◽  
Xin Deng

Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our group’s MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition.


Sign in / Sign up

Export Citation Format

Share Document