An evolutionary analysis of the SARS-CoV-2 genomes from the countries in the same meridian

Mapping Intimacies ◽

10.1101/2020.11.12.380816 ◽

2020 ◽

Author(s):

Emilio Mastriani ◽

Alexey V. Rakov ◽

Shu-Lin Liu

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Selective Pressure ◽

Virus Evolution ◽

Evolutionary Analysis ◽

Uncharacterized Protein ◽

Binding Probability ◽

Ontological Analysis ◽

Codon Mutation

AbstractIn the current study we analyzed the genomes of SARS-CoV-2 strains isolated from Italy, Sweden, Congo (countries in the same meridian) and Brazil, as outgroup country. Evolutionary analysis revealed codon 9628 under episodic selective pressure for all four countries, suggesting it as a key site for the virus evolution. Belonging to the P0DTD3 (Y14_SARS2) uncharacterized protein 14, further investigation has been conducted showing the codon mutation as responsible for the helical modification in the secondary structure. According to the predictions done, the codon is placed into the more ordered region of the gene (41-59) and close the area acting as transmembrane (54-67), suggesting its involvement into the attachment phase of the virus. The predicted structures of P0DTD3 mutated and not confirmed the importance of the codon to define the protein structure and the ontological analysis of the protein emphasized that the mutation enhances the binding probability.

Download Full-text

An evolutionary analysis of the SARS-CoV-2 genomes from the countries in the same meridian (Preprint)

10.2196/preprints.25995 ◽

2020 ◽

Author(s):

Emilio Mastriani ◽

Alexey V. Rakov ◽

Shu-Lin Liu

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Selective Pressure ◽

Virus Evolution ◽

Evolutionary Analysis ◽

Uncharacterized Protein ◽

Binding Probability ◽

Ontological Analysis ◽

Codon Mutation

UNSTRUCTURED In the current study we analyzed the genomes of SARS-CoV-2 strains isolated from Italy, Sweden, Congo (countries in the same meridian) and Brazil, as outgroup country. Evolutionary analysis revealed codon 9628 under episodic selective pressure for all four countries, suggesting it as a key site for the virus evolution. Belonging to the P0DTD3 (Y14_SARS2) uncharacterized protein 14, further investigation has been conducted showing the codon mutation as responsible for the helical modification in the secondary structure. According to the predictions done, the codon is placed into the more ordered region of the gene (41-59) and close the area acting as transmembrane (54-67), suggesting its involvement into the attachment phase of the virus. The predicted structures of P0DTD3 mutated and not confirmed the importance of the codon to define the protein structure and the ontological analysis of the protein emphasized that the mutation enhances the binding probability.

Download Full-text

Isolating SARS-CoV-2 Strains From Countries in the Same Meridian: Genome Evolutionary Analysis

JMIR Bioinformatics and Biotechnology ◽

10.2196/25995 ◽

2021 ◽

Vol 2 (1) ◽

pp. e25995

Author(s):

Emilio Mastriani ◽

Alexey V Rakov ◽

Shu-Lin Liu

Keyword(s):

Secondary Structure ◽

Structure Prediction ◽

Secondary Structure Prediction ◽

Protein Structures ◽

Data Repository ◽

Evolutionary Analysis ◽

Wild Type ◽

Uncharacterized Protein ◽

Climate Conditions

Background COVID-19, caused by the novel SARS-CoV-2, is considered the most threatening respiratory infection in the world, with over 40 million people infected and over 0.934 million related deaths reported worldwide. It is speculated that epidemiological and clinical features of COVID-19 may differ across countries or continents. Genomic comparison of 48,635 SARS-CoV-2 genomes has shown that the average number of mutations per sample was 7.23, and most SARS-CoV-2 strains belong to one of 3 clades characterized by geographic and genomic specificity: Europe, Asia, and North America. Objective The aim of this study was to compare the genomes of SARS-CoV-2 strains isolated from Italy, Sweden, and Congo, that is, 3 different countries in the same meridian (longitude) but with different climate conditions, and from Brazil (as an outgroup country), to analyze similarities or differences in patterns of possible evolutionary pressure signatures in their genomes. Methods We obtained data from the Global Initiative on Sharing All Influenza Data repository by sampling all genomes available on that date. Using HyPhy, we achieved the recombination analysis by genetic algorithm recombination detection method, trimming, removal of the stop codons, and phylogenetic tree and mixed effects model of evolution analyses. We also performed secondary structure prediction analysis for both sequences (mutated and wild-type) and “disorder” and “transmembrane” analyses of the protein. We analyzed both protein structures with an ab initio approach to predict their ontologies and 3D structures. Results Evolutionary analysis revealed that codon 9628 is under episodic selective pressure for all SARS-CoV-2 strains isolated from the 4 countries, suggesting it is a key site for virus evolution. Codon 9628 encodes the P0DTD3 (Y14_SARS2) uncharacterized protein 14. Further investigation showed that the codon mutation was responsible for helical modification in the secondary structure. The codon was positioned in the more ordered region of the gene (41-59) and near to the area acting as the transmembrane (54-67), suggesting its involvement in the attachment phase of the virus. The predicted protein structures of both wild-type and mutated P0DTD3 confirmed the importance of the codon to define the protein structure. Moreover, ontological analysis of the protein emphasized that the mutation enhances the binding probability. Conclusions Our results suggest that RNA secondary structure may be affected and, consequently, the protein product changes T (threonine) to G (glycine) in position 50 of the protein. This position is located close to the predicted transmembrane region. Mutation analysis revealed that the change from G (glycine) to D (aspartic acid) may confer a new function to the protein—binding activity, which in turn may be responsible for attaching the virus to human eukaryotic cells. These findings can help design in vitro experiments and possibly facilitate a vaccine design and successful antiviral strategies.

Download Full-text

AlphaFold at CASP13

Bioinformatics ◽

10.1093/bioinformatics/btz422 ◽

2019 ◽

Vol 35 (22) ◽

pp. 4862-4865 ◽

Cited By ~ 48

Author(s):

Mohammed AlQuraishi

Keyword(s):

Protein Structure ◽

Protein Sequence ◽

Structure Prediction ◽

Computational Prediction ◽

Data Bank ◽

Academic Community ◽

Physical Contact ◽

Evolutionary Analysis ◽

History Of ◽

First Time

Abstract Summary: Computational prediction of protein structure from sequence is broadly viewed as a foundational problem of biochemistry and one of the most difficult challenges in bioinformatics. Once every two years the Critical Assessment of protein Structure Prediction (CASP) experiments are held to assess the state of the art in the field in a blind fashion, by presenting predictor groups with protein sequences whose structures have been solved but have not yet been made publicly available. The first CASP was organized in 1994, and the latest, CASP13, took place last December, when for the first time the industrial laboratory DeepMind entered the competition. DeepMind's entry, AlphaFold, placed first in the Free Modeling (FM) category, which assesses methods on their ability to predict novel protein folds (the Zhang group placed first in the Template-Based Modeling (TBM) category, which assess methods on predicting proteins whose folds are related to ones already in the Protein Data Bank.) DeepMind's success generated significant public interest. Their approach builds on two ideas developed in the academic community during the preceding decade: (i) the use of co-evolutionary analysis to map residue co-variation in protein sequence to physical contact in protein structure, and (ii) the application of deep neural networks to robustly identify patterns in protein sequence and co-evolutionary couplings and convert them into contact maps. In this Letter, we contextualize the significance of DeepMind's entry within the broader history of CASP, relate AlphaFold's methodological advances to prior work, and speculate on the future of this important problem.

Download Full-text

Protein Structure Prediction: Assembly of Secondary Structure Elements by Basin-Hopping

ChemPhysChem ◽

10.1002/cphc.201402247 ◽

2014 ◽

Vol 15 (15) ◽

pp. 3378-3390 ◽

Cited By ~ 1

Author(s):

Falk Hoffmann ◽

Ioan Vancea ◽

Sanjay G. Kamat ◽

Birgit Strodel

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Basin Hopping

Download Full-text

Advances in Protein Super-Secondary Structure Prediction and Application to Protein Structure Prediction

Methods in Molecular Biology - Protein Supersecondary Structures ◽

10.1007/978-1-4939-9161-7_2 ◽

2019 ◽

pp. 15-45 ◽

Cited By ~ 3

Author(s):

Elijah MacCarthy ◽

Derrick Perry ◽

Dukka B. KC

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Protein Structure Prediction ◽

Structure Prediction ◽

Secondary Structure Prediction

Download Full-text

Protein Structure Abstractionand Automatic Clustering Using Secondary Structure Element Sequences

Computational Science and Its Applications – ICCSA 2005 - Lecture Notes in Computer Science ◽

10.1007/11424826_136 ◽

2005 ◽

pp. 1284-1292 ◽

Cited By ~ 1

Author(s):

Sung Hee Park ◽

Chan Yong Park ◽

Dae Hee Kim ◽

Seon Hee Park ◽

Jeong Seop Sim

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Secondary Structure Element ◽

Automatic Clustering

Download Full-text

SSNN, a method for neural network protein secondary structure fitting using circular dichroism data

Analytical Methods ◽

10.1039/c3ay41831f ◽

2014 ◽

Vol 6 (17) ◽

pp. 6721-6726 ◽

Cited By ~ 6

Author(s):

Vincent Hall ◽

Anthony Nash ◽

Alison Rodger

Keyword(s):

Neural Network ◽

Circular Dichroism ◽

Protein Structure ◽

Secondary Structure ◽

Protein Secondary Structure ◽

Network Approach ◽

Cd Spectra ◽

Neural Network Approach ◽

Self Organising Map ◽

Circular Dichroïsm

SSNN is a self-organising map neural network approach for estimating protein structure from circular dichroism (CD) spectra. The method for using SSNN is described here, and SSNN is compared with CDSSTR, a well-known methodology for finding secondary structures from CD. SSNN compares well with similar methodologies.

Download Full-text

Hermes: an ensemble machine learning architecture for protein secondary structure prediction

10.1101/640656 ◽

2019 ◽

Author(s):

Larry Bliss ◽

Ben Pascoe ◽

Samuel K Sheppard

Keyword(s):

Machine Learning ◽

Protein Structure ◽

Secondary Structure ◽

Structure Prediction ◽

Cross Validation ◽

Secondary Structure Prediction ◽

Protein Structures ◽

Lower Boundary ◽

Protein Secondary Structure ◽

Homologous Proteins

AbstractMotivationProtein structure predictions, that combine theoretical chemistry and bioinformatics, are an increasingly important technique in biotechnology and biomedical research, for example in the design of novel enzymes and drugs. Here, we present a new ensemble bi-layered machine learning architecture, that directly builds on ten existing pipelines providing rapid, high accuracy, 3-State secondary structure prediction of proteins.ResultsAfter training on 1348 solved protein structures, we evaluated the model with four independent datasets: JPRED4 - compiled by the authors of the successful predictor with the same name, and CASP11, CASP12 & CASP13 - assembled by the Critical Assessment of protein Structure Prediction consortium who run biannual experiments focused on objective testing of predictors. These rigorous, pre-established protocols included 7-fold cross-validation and blind testing. This led to a mean Hermes accuracy of 95.5%, significantly (p<0.05) better than the ten previously published models analysed in this paper. Furthermore, Hermes yielded a reduction in standard deviation, lower boundary outliers, and reduced dependency on solved structures of homologous proteins, as measured by NEFF score. This architecture provides advantages over other pipelines, while remaining accessible to users at any level of bioinformatics experience.Availability and ImplementationThe source code for Hermes is freely available at: https://github.com/HermesPrediction/Hermes. This page also includes the cross-validation with corresponding models, and all training/testing data presented in this study with predictions and accuracy.

Download Full-text

Bioinformatic analysis of multi-drug resistant class 1 integron-coded protein of Citrobacter freundii

African Journal of Clinical and Experimental Microbiology ◽

10.4314/ajcem.v22i3.10 ◽

2021 ◽

Vol 22 (3) ◽

pp. 391-396

Author(s):

O.D. Popoola ◽

B.T. Thomas

Keyword(s):

Secondary Structure ◽

Alpha Helix ◽

Random Coil ◽

Pairwise Distance ◽

Evolutionary Analysis ◽

Citrobacter Freundii ◽

Beta Turns ◽

Class 1 Integron ◽

Neutrality Test ◽

Class 1

Background: The understanding of the secondary structure of the class 1 integron coded protein is necessary to decipher potential drug target and also to infer evolutionary ancestry at the proteomic level. This study was therefore aimed at determining the secondary structure of class 1 integron-coded protein and also to provide information on their evolutionary ancestry. Methodology: Five different sequences of Citrobacter freundii with the following accession numbers; KP902625.1, KP902624.1, KP902623.1, KP901093.1 and KP902609.1 were obtained using nucleotide BLAST (http://blast. ncbi.nlm.nih.gov/Blast.cgi) and subjected to evolutionary analysis, pairwise distance calculation, secondary structure and neutrality test using MEGA explorer, Kimura 2 parameter, SOPMA tool and Tajima’s test respectively. Results: Results of the NCBI queries revealed significant identity with class 1 integron of the studied Citrobacter freundii. The nucleotide sequence alignment depicted several conserved regions with varying degree of transitions, transversions, insertions, and deletions while the amino acid sequences of the nucleotides showed 42 conserved sites among all the sequences. The secondary structure of the class 1 integron coded protein depicted significant representation of the random coil (43.74±3.24), alpha helix (25.69±6.29) and the extended strands (22.42±2.41) than the beta turns (8.15±1.12). The Tajima’s Neutrality test of five nucleotide sequences of Citrobacter freundii analyzed by considering the first, second and third codons as well as the non-coding regions revealed a total of 127 positions in the final datasets while the Tajima’s Neutrality test was estimated to be -0.1038. Conclusion: The study confirmed common evolutionary ancestor for the class 1 integron coded protein found in Citrobacter freundii. Our study also documents the higher representation of random coil, alpha helix and extended strands than the beta turns. The negative value of the Tajima’s neutrality test suggests higher levels of both low and high frequency polymorphisms thus indicating a decrease in the class 1 integron population size and balancing selection Keywords: Evolutionary, Protein structure, Class 1 integrons, Citrobacter freundii

Download Full-text

Improved computational methods of protein sequence alignment, model selection and tertiary structure prediction

10.32469/10355/46126 ◽

2013 ◽

Author(s):

◽

Xin Deng

Keyword(s):

Protein Structure ◽

Secondary Structure ◽

Model Selection ◽

Sequence Alignment ◽

Protein Sequence ◽

Structure Prediction ◽

Tertiary Structure ◽

Solvent Accessibility ◽

Relative Solvent Accessibility ◽

Tertiary Structure Prediction

Protein sequence and profile alignment has been used essentially in most bioinformatics tasks such as protein structure modeling, function prediction, and phylogenetic analysis. We designed a new algorithm MSACompro to incorporate predicted secondary structure, relative solvent accessibility, and residue-residue contact information into multiple protein sequence alignment. Our experiments showed that it improved multiple sequence alignment accuracy over most existing methods without using the structural information and performed comparably to the method using structural features and additional homologous sequences by slightly lower scores. We also developed HHpacom, a new profile-profile pairwise alignment by integrating secondary structure, solvent accessibility, torsion angle and inferred residue pair coupling information. The evaluation showed that the secondary structure, relative solvent accessibility and torsion angle information significantly improved the alignment accuracy in comparison with the state of the art methods HHsearch and HHsuite. The evolutionary constraint information did help in some cases, especially the alignments of the proteins which are of short lengths, typically 100 to 500 residues. Protein Model selection is also a key step in protein tertiary structure prediction. We developed two SVM model quality assessment methods taking query-template alignment as input. The assessment results illustrated that this could help improve the model selection, protein structure prediction and many other bioinformatics problems. Moreover, we also developed a protein tertiary structure prediction pipeline, of which many components were built in our groupâ€™s MULTICOM system. The MULTICOM performed well in the CASP10 (Critical Assessment of Techniques for Protein Structure Prediction) competition.

Download Full-text