scholarly journals World-wide Sequence Variant and Non-synonymous Amino Acid Substitution Signature in SARS-COV-2 Structural Proteins

Author(s):  
Jayanta Das ◽  
Swarup Roy

Like other viruses, SARS-COV-2 too mutating and thus creating divergent variants across the world. Protein sequence variation occurs due to non-synonymous single-nucleotide polymorphism (SNP) that alter the amino acid. Amino acid substitutions on homooligomer interfaces may change the structure of the protein and hence alter the regular or known functional activities of a viral protein. Studies reveal that even a single point mutation in virus protein can significantly change their biology, leads to peculiar pathogenic properties. Therefore, an in-depth investigation of the amino acid substitution in the genomic signature of a protein is highly essential for the rapidly evolving virus-like SARS-COV-2. Investigation of world-wide and country-specific substitution features may be crucial and highly essential to decipher pathogenicity. These might be also helpful to precise structure prediction and identification of possible therapeutic targets for effective drug design. We perform extensive analysis towards highlighting and characterizing the amino acid substitution signature occurs in the four structural proteins (Spike-S, Nucleocapsid-N, Membrane-M, Envelope-E) of SARS-COV-2. We use a total of 9587 viral sequences reported from 49 different countries across the globe. In this study, we try to study the amino acid substitution patterns and its impact on change in biochemical properties, thereby possible changes in protein structures. We perform the following analysis: a) isolating and grouping variants we considered, for different protein sequences; b) identifying amino acid substitution type that are frequently and rarely occurring and reporting their location within the sequence; c) change in chemical properties due to amino acid substitution; and f) highlight country-specific divergent variation and substitution signature. In terms of mutational changes, E and M proteins are relatively stable than N and S proteins. A significant quantity of variations is observed in spike (S) proteins. Our study further reveals an interesting fact that the substitution location is random in N protein, whereas the substitution sites in M protein is less varying and almost stable. Substitutions specific to active sub-domains in S and N proteins reveals that sub-domains like Heptapeptide Repeat (HR2), Fusion peptides (FP), and Transmembrane (TM), which are involved in cellular membrane fusion and entry of the virus into the host cells, are significantly mutated. Majority of the substitutions leads to change in biochemical properties (side chain and hydropathy) of amino acid. A good number of exclusive variants are found specific to a particular country. We strongly believe that the current findings will be helpful for protein structure analysis of viral structural proteins and antiviral drug discovery.

Author(s):  
Ina Baļķe ◽  
Gunta Resēviča ◽  
Dace Skrastiņa ◽  
Andris Zeltiņš

Expression and characterisation of the ryegrass mottle virus non-structural proteins The Ryegrass mottle virus (RGMoV) single-stranded RNA genome is organised into four open reading frames (ORF) which encode several proteins: ORF1 encodes protein P1, ORF2a contains the membrane-associated 3C-like serine protease, genome-linked protein VPg and a P16 protein gene. ORF2b encodes replicase RdRP and the only structural protein, coat protein, is synthesised from ORF3. To obtain the non-structural proteins in preparative quantities and to characterise them, the corresponding RGMoV gene cDNAs were cloned in pET- and pColdI-derived expression vectors and overexpressed in several E. coli host cells. For protease and RdRP, the best expression system containing pColdI vector and E. coli WK6 strain was determined. VPg and P16 proteins were obtained from the pET- or pACYC- vectors and E. coli BL21 (DE3) host cells and purified using Ni-Sepharose affinity chromatography. Attempts to crystallize VPg and P16 were unsuccessful, possibly due to non-structured amino acid sequences in both protein structures. Methods based on bioinformatic analysis indicated that the entire VPg domain and the C-terminal part of the P16 contain unstructured amino acid stretches, which possibly prevented the formation of crystals.


2012 ◽  
Vol 10 (03) ◽  
pp. 1242010 ◽  
Author(s):  
FILIP JAGODZINSKI ◽  
JEANNE HARDY ◽  
ILEANA STREINU

Predicting the effect of a single amino acid substitution on the stability of a protein structure is a fundamental task in macromolecular modeling. It has relevance to drug design and understanding of disease-causing protein variants. We present KINARI-Mutagen, a web server for performing in silico mutation experiments on protein structures from the Protein Data Bank. Our rigidity-theoretical approach permits fast evaluation of the effects of mutations that may not be easy to perform in vitro, because it is not always possible to express a protein with a specific amino acid substitution. We use KINARI-Mutagen to identify critical residues, and we show that our predictions correlate with destabilizing mutations to glycine. In two in-depth case studies we show that the mutated residues identified by KINARI-Mutagen as critical correlate with experimental data, and would not have been identified by other methods such as Solvent Accessible Surface Area measurements or residue ranking by contributions to stabilizing interactions. We also generate 48 mutants for 14 proteins, and compare our rigidity-based results against experimental mutation stability data. KINARI-Mutagen is available at http://kinari.cs.umass.edu .


Author(s):  
O. Borisova ◽  
N. Gadua ◽  
A. Pimenova ◽  
A. Chaplin ◽  
I. Chagina ◽  
...  

The aim of the study was to characterize toxigenic strains of Corynebacterium diphtheriae by examining 12 toxigenic strains of C.diphtheriae isolated in Russia between January, 2017 to June, 2019. The morphological, toxigenic and biochemical properties of C.diphtheriae was studied. Genotyping of C.diphtheriae strains was performed by using MLST and dtxR gene sequencing with subsequent phylogenetic analysis. Results. Toxigenic strains of C.diphtheriae were isolated in the Novosibirsk, Samara and Chelyabinsk regions, the Khanty-Mansi autonomous okrug - Yugra as well as the republic Northern Ossetia - Alania. Among these strains, 5 of them were isolated from diphtheria patients (moderate disease found in one case, mild course – remaining patients) and 7 strains were isolated from bacterial carriers. In two cases C.diphtheriae from diphtheria patients were identified as ST25 sequence type, gravis variant; in one case – ST8 type, gravis variant; two cases – ST67 sequence type, mitis variant. In asymptomatic carriers of tox-positive C.diphtheriae strains they belonged to ST25 sequence type, gravis variant – in two cases, ST67 type, mitis variant – in four cases. A sequencing type was not identified in one case. All sequence types were widespread globally being presented by a large number of isolates in the PubMLST and characterized by a substantial amount of derivative sequence types. At the same time, they belonged to different clonal complexes and differed markedly from each other contributing to their reliable difference as assessed by MLST. Study of gene dtxR sequence diversity showed that all allelic variants were typical for the representatives of these a sequence types. New alleles of gene dtxR were not revealed in strains examined. It was shown that non-synonymous substitution C440T leading to A147V amino acid substitution was found solely in one allele distributed in ST8, ST185, ST195 and ST451 types suggesting at late mutation. In contrast, the polymorphism C640A resulting in the amino acid substitution L214I was found not only in the same allele, but also in the basal tree branches indicating that isoleucine was in the ancestral sequence of the protein.


2020 ◽  
Author(s):  
Christoffer Norn ◽  
Ingemar André ◽  
Douglas L. Theobald

AbstractProteins evolve under a myriad of biophysical selection pressures that collectively control the patterns of amino acid substitutions. Averaged over time and across proteins, these evolutionary pressures are sufficiently consistent to produce global substitution patterns that can be used to successfully find homologues, infer phylogenies, and reconstruct ancestral sequences. Although the factors which govern the variation of protein substitution rates has received much attention, the influence of thermodynamic stability constraints remains unresolved. Here we develop a simple model to calculate amino acid rate matrices from evolutionary dynamics controlled by a fitness function that reports on the thermodynamic effects of amino acid mutations in protein structures. This hybrid biophysical and evolutionary model accounts for nucleotide transition/transversion rate bias, multi-nucleotide codon changes, the number of codons per amino acid, and thermodynamic protein stability. We find that our theoretical model accurately recapitulates the complex pattern of empirical rates observed in common global amino acid substitution matrices used in phylogenetics. These results suggest that selection for thermodynamically stable proteins, coupled with nucleotide mutation bias filtered by the structure of the genetic code, is the primary global driver behind the amino acid substitution patterns observed in proteins throughout the tree of life.


2018 ◽  
Author(s):  
Elias Primetis ◽  
Spyridon Chavlis ◽  
Pavlos Pavlidis

AbstractIntra-protein residual vicinities depend on the involved amino acids. Energetically favorable vicinities (or interactions) have been preserved during evolution, while unfavorable vicinities have been eliminated. We describe, statistically, the interactions between amino acids using resolved protein structures. Based on the frequency of amino acid interactions, we have devised an amino acid substitution model that implements the following idea: amino acids that have similar neighbors in the protein tertiary structure can replace each other, while substitution is more difficult between amino acids that prefer different spatial neighbors. Using known tertiary structures for α-helical membrane (HM) proteins, we build evolutionary substitution matrices. We constructed maximum likelihood phylogenies using our amino acid substitution matrices and compared them to widely-used methods. Our results suggest that amino acid substitutions are associated with the spatial neighborhoods of amino acid residuals, providing, therefore, insights into the amino acid substitution process.


1994 ◽  
Vol 71 (06) ◽  
pp. 773-777 ◽  
Author(s):  
Michiaki Ohiwa ◽  
Tatsuya Hayashi ◽  
Hideo Wada ◽  
Kouzou Minamikawa ◽  
Shigeru Shirakawa ◽  
...  

SummaryWe found hereditary factor VII deficiency in a clinically asymptomatic family, and characterized their factor VII gene and the abnormal molecule using recombinant DNA techniques. The propositus was a 45-year-old woman who was noted to have a prolonged prothrombin time. The level of factor VII antigen of the patient was 25.9% of that of normal individuals and the level of factor VII activity was 28% and 24%, when tested using rabbit brain tissue factor and human placental tissue factor in a one-stage clotting assay, respectively. Two of her sisters had almost the same reduced levels of factor VII antigen and activity, and her parents who are first cousins, a son, a daughter and a niece had moderately reduced leves of both factor VII activity and antigen. To identify the mutation site, all the coding exons and exon-intron boundaries of the factor VII gene of the propositus were amplified using the polymerase chain reaction (PCR), then subcloned and sequenced. One mis- sense mutation (G to A) was identified in exon VIII of the gene resulting in an amino acid substitution of His(CAC) for Arg(247)(CGC) in the gene product. PCR using a mutagenic primer to introduce a new kpaL I site into the mutant allele of the patient’s factor VII gene revealed that this allele was inherited in the affected individuals in the pedigree. Transient expression assays using BHK cells transfected with an expression vector containing the mutant factor VII cDNA suggested that this mutation leads to factor VII deficiency by impairing secretion of the mutated factor VII. This is the first report of a single point mutation which induces factor VII deficiency with both activity and antigen reduced in parallel.


2019 ◽  
Vol 68 (2) ◽  
pp. 233-246
Author(s):  
KLAUDIA BRODZIK ◽  
KATARZYNA KRYSZTOPA-GRZYBOWSKA ◽  
MACIEJ POLAK ◽  
JAKUB LACH ◽  
DOMINIK STRAPAGIEL ◽  
...  

The aim of this study was to identify the potential vaccine antigens in Corynebacterium diphtheriae strains by in silico analysis of the amino acid variation in the 67–72p surface protein that is involved in the colonization and induction of epithelial cell apoptosis in the early stages of infection. The analysis of pili structural proteins involved in bacterial adherence to host cells and related to various types of infections was also performed. A polymerase chain reaction (PCR) was carried out to amplify the genes encoding the 67–72p protein and three pili structural proteins (SpaC, SpaI, SapD) and the products obtained were sequenced. The nucleotide sequences of the particular genes were translated into amino acid sequences, which were then matched among all the tested strains using bioinformatics tools. In the last step, the affinity of the tested proteins to major histocompatibility complex (MHC) classes I and II, and linear B-cell epitopes was analyzed. The variations in the nucleotide sequence of the 67–72p protein and pili structural proteins among C. diphtheriae strains isolated from various infections were noted. A transposition of the insertion sequence within the gene encoding the SpaC pili structural proteins was also detected. In addition, the bioinformatics analyses enabled the identification of epitopes for B-cells and T-cells in the conserved regions of the proteins, thus, demonstrating that these proteins could be used as antigens in the potential vaccine development. The results identified the most conserved regions in all tested proteins that are exposed on the surface of C. diphtheriae cells.


2011 ◽  
Vol 45 (1) ◽  
pp. 127-129 ◽  
Author(s):  
V. Suresh ◽  
K. Ganesan ◽  
S. Parthasarathy

This article describes the development of a curated online protein block sequence database, PDB-2-PB. The protein block sequences for protein structures with complete backbone coordinates have been encoded using the encoding procedure of de Brevern, Etchebest & Hazout [Proteins(2000),41, 271–287]. In the current release of the PDB-2-PB database (version 1.0), the protein entries from a recent release of the World Wide Protein Data Bank (wwPDB), which has 74 297 solved PDB entries as of 7 July 2011, have been used as a primary source. The PDB-2-PB database stores the protein block sequences for all the chains present in a protein structure. PDB-2-PB version 1.0 has the curated protein block sequences for 103 252 PDB chain entries (93 547 X-ray, 7033 NMR and 2672 other experimental chain entries). From the PDB-2-PB database, users can extract the curated protein block sequence and its corresponding amino acid sequence, which is extracted from the PDB ATOM records. Users can download these sequences either by using the PDB code or by using various parameters listed in the database. The PDB-2-PB database is freely available at http://bioinfo.bdu.ac.in/~pb/.


Author(s):  
Renganayaki G. ◽  
Achuthsankar S. Nair

Sequence alignment algorithms and  database search methods use BLOSUM and PAM substitution matrices constructed from general proteins. These de facto matrices are not optimal to align sequences accurately, for the proteins with markedly different compositional bias in the amino acid.   In this work, a new amino acid substitution matrix is calculated for the disorder and low complexity rich region of Hub proteins, based on residue characteristics. Insights into the amino acid background frequencies and the substitution scores obtained from the Hubsm unveils the  residue substitution patterns which differs from commonly used scoring matrices .When comparing the Hub protein sequences for detecting homologs,  the use of this Hubsm matrix yields better results than PAM and BLOSUM matrices. Usage of Hubsm matrix can be optimal in database search and for the construction of more accurate sequence alignments of Hub proteins.


Sign in / Sign up

Export Citation Format

Share Document