Genomic Mutations and Changes in Protein Secondary Structure and Solvent Accessibility of SARS-CoV-2 (COVID-19 Virus)

ABSTRACTSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly pathogenic virus that has caused the global COVID-19 pandemic. Tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. This paper reports and analyses genomic mutations in the coding regions of SARS-CoV-2 and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. Prediction results suggest that mutation D614G in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. Based on 6,324 viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of SARS-CoV-2 in many perspectives, especially in tracing the evolution and worldwide spread of the virus. Our analysis results also show that coding genes E, M, ORF6, ORF7a, ORF7b and ORF10 are most stable, potentially suitable to be targeted for vaccine and drug development.

Download Full-text

Genomic mutations and changes in protein secondary structure and solvent accessibility of SARS-CoV-2 (COVID-19 virus)

Scientific Reports ◽

10.1038/s41598-021-83105-3 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Thanh Thi Nguyen ◽

Pubudu N. Pathirana ◽

Thin Nguyen ◽

Quoc Viet Hung Nguyen ◽

Asim Bhatti ◽

...

Keyword(s):

Secondary Structure ◽

Solvent Accessibility ◽

Point Mutations ◽

Protein Secondary Structure ◽

Intervention Strategies ◽

Relative Solvent Accessibility ◽

Highly Pathogenic ◽

Coding Regions ◽

Pathogenic Virus ◽

And Control

AbstractSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly pathogenic virus that has caused the global COVID-19 pandemic. Tracing the evolution and transmission of the virus is crucial to respond to and control the pandemic through appropriate intervention strategies. This paper reports and analyses genomic mutations in the coding regions of SARS-CoV-2 and their probable protein secondary structure and solvent accessibility changes, which are predicted using deep learning models. Prediction results suggest that mutation D614G in the virus spike protein, which has attracted much attention from researchers, is unlikely to make changes in protein secondary structure and relative solvent accessibility. Based on 6324 viral genome sequences, we create a spreadsheet dataset of point mutations that can facilitate the investigation of SARS-CoV-2 in many perspectives, especially in tracing the evolution and worldwide spread of the virus. Our analysis results also show that coding genes E, M, ORF6, ORF7a, ORF7b and ORF10 are most stable, potentially suitable to be targeted for vaccine and drug development.

Download Full-text

SSpro/ACCpro 5: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, machine learning and structural similarity

Bioinformatics ◽

10.1093/bioinformatics/btu352 ◽

2014 ◽

Vol 30 (18) ◽

pp. 2592-2597 ◽

Cited By ~ 188

Author(s):

C. N. Magnan ◽

P. Baldi

Keyword(s):

Machine Learning ◽

Secondary Structure ◽

Solvent Accessibility ◽

Protein Secondary Structure ◽

Structural Similarity ◽

Relative Solvent Accessibility

Download Full-text

Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility

Bioinformatics ◽

10.1093/bioinformatics/btt344 ◽

2013 ◽

Vol 29 (16) ◽

pp. 2056-2058 ◽

Cited By ~ 61

Author(s):

C. Mirabello ◽

G. Pollastri

Keyword(s):

Secondary Structure ◽

Solvent Accessibility ◽

Protein Secondary Structure ◽

High Accuracy ◽

Relative Solvent Accessibility

Download Full-text

Fast learning optimized prediction methodology for protein secondary structure prediction, relative solvent accessibility prediction and phosphorylation prediction

10.31274/etd-180810-4315 ◽

2011 ◽

Author(s):

Saraswathi Sundararajan

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Solvent Accessibility ◽

Protein Secondary Structure ◽

Relative Solvent Accessibility ◽

Protein Secondary Structure Prediction ◽

Fast Learning ◽

Solvent Accessibility Prediction

Download Full-text

Protein secondary structure and solvent accessibility of proteins in decellularized heart valve scaffolds

Biomedical Spectroscopy and Imaging ◽

10.3233/bsi-2012-0007 ◽

2012 ◽

Vol 1 (1) ◽

pp. 79-87 ◽

Cited By ~ 3

Author(s):

Shangping Wang ◽

Harriëtte Oldenhof ◽

Andres Hilfiker ◽

Michael Harder ◽

Willem F. Wolkers

Keyword(s):

Secondary Structure ◽

Heart Valve ◽

Solvent Accessibility ◽

Protein Secondary Structure

Download Full-text

Mutation screening of the UBE3A gene in Chinese Han population with autism

BMC Psychiatry ◽

10.1186/s12888-020-03000-5 ◽

2020 ◽

Vol 20 (1) ◽

Author(s):

Xue Zhao ◽

Ran Zhang ◽

Shunying Yu

Keyword(s):

Secondary Structure ◽

Sample Size ◽

Association Analysis ◽

Sanger Sequencing ◽

Healthy Controls ◽

Control Groups ◽

Chinese Han ◽

Han Population ◽

Coding Regions ◽

And Control

Abstract Background 15q11–13 region is one of the most complex chromosomal regions in the human genome. UBE3A is an important candidate gene of autism spectrum disorder (ASD), which located at the 15q11–13 region and encodes ubiquitin-protein ligase E3A. Previous studies about UBE3A gene and ASD have shown inconsistent results and few studies were performed in Chinese population. This study aimed to detect the genetic mutations of UBE3A gene in Chinese Han population with ASD and analyze genetic association between these variants and ASD. Methods The samples consisted of 192 patients with autism according to the DSM-IV diagnostic criteria and 192 healthy controls. We searched for mutations at coding sequence (CDS) regions and their adjacent non-coding regions of UBE3A gene using the high resolution melting (HRM) and Sanger sequencing methods. We further increased sample size to validate the detected variants using HRM and conducted association analysis between case and control groups. Results A known single nucleotide polymorphism (T > C, rs150331504) located at the CDS4 and a known 5 bp insertion/deletion variation (AACTC+/−, rs71127053) located at the intron region of the upstream 288 bp of the CDS2 of UBE3A gene were detected using Sanger sequencing method. The ASD samples of case group were 391 for rs71127053, 384 for rs150331504 and 384 healthy controls, which were used to make an association analysis. The results of association analysis suggested that there were no significant difference about the allele and genotype frequencies of rs71127053 and rs150331504 between case and control groups after extending the sample size. Besides, rs150331504 is a synonymous mutation and we compared the secondary structure and minimum free energy (MFE) of mRNA harboring the allele T or C of rs150331504 using RNAfold software. We found that the centroid secondary structure apparently differs along with the polymorphisms of rs150331504 T > C, the results suggested that this variant might change the secondary structure of mRNA of UBE3A gene. We did not detect mutations in other coding regions of UBE3A gene. Conclusions These findings showed that UBE3A gene might not be a major disease gene in Chinese ASD cases.

Download Full-text

Accurate prediction of protein secondary structure and solvent accessibility by consensus combiners of sequence and structure information

BMC Bioinformatics ◽

10.1186/1471-2105-8-201 ◽

2007 ◽

Vol 8 (1) ◽

Cited By ~ 74

Author(s):

Gianluca Pollastri ◽

Alberto JM Martin ◽

Catherine Mooney ◽

Alessandro Vullo

Keyword(s):

Secondary Structure ◽

Solvent Accessibility ◽

Protein Secondary Structure ◽

Accurate Prediction ◽

Structure Information

Download Full-text

Determination of Protein Secondary Structure and Solvent Accessibility Using Site-Directed Fluorescence Labeling. Studies of T4 Lysozyme Using the Fluorescent Probe Monobromobimane†

Biochemistry ◽

10.1021/bi991331v ◽

1999 ◽

Vol 38 (49) ◽

pp. 16383-16393 ◽

Cited By ~ 43

Author(s):

Steven E. Mansoor ◽

Hassane S. Mchaourab ◽

David L. Farrens

Keyword(s):

Secondary Structure ◽

Fluorescent Probe ◽

Solvent Accessibility ◽

Protein Secondary Structure ◽

Fluorescence Labeling ◽

T4 Lysozyme

Download Full-text

Improved computational methods of protein sequence alignment, model selection and tertiary structure prediction

10.32469/10355/46126 ◽

2013 ◽

Author(s):

◽

Xin Deng

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Model Selection ◽

Sequence Alignment ◽

Protein Sequence ◽

Structure Prediction ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Relative Solvent Accessibility ◽

Tertiary Structure Prediction

Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our groupâ€™s MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition.

Download Full-text