scholarly journals Large-scale analysis of SARS-CoV-2 spike-glycoprotein mutants demonstrates the need for continuous screening of virus isolates

PLoS ONE ◽  
2021 ◽  
Vol 16 (9) ◽  
pp. e0249254
Author(s):  
Barbara Schrörs ◽  
Pablo Riesgo-Ferreiro ◽  
Patrick Sorn ◽  
Ranganath Gudimella ◽  
Thomas Bukur ◽  
...  

Due to the widespread of the COVID-19 pandemic, the SARS-CoV-2 genome is evolving in diverse human populations. Several studies already reported different strains and an increase in the mutation rate. Particularly, mutations in SARS-CoV-2 spike-glycoprotein are of great interest as it mediates infection in human and recently approved mRNA vaccines are designed to induce immune responses against it. We analyzed 1,036,030 SARS-CoV-2 genome assemblies and 30,806 NGS datasets from GISAID and European Nucleotide Archive (ENA) focusing on non-synonymous mutations in the spike protein. Only around 2.5% of the samples contained the wild-type spike protein with no variation from the reference. Among the spike protein mutants, we confirmed a low mutation rate exhibiting less than 10 non-synonymous mutations in 99.6% of the analyzed sequences, but the mean and median number of spike protein mutations per sample increased over time. 5,472 distinct variants were found in total. The majority of the observed variants were recurrent, but only 21 and 14 recurrent variants were found in at least 1% of the mutant genome assemblies and NGS samples, respectively. Further, we found high-confidence subclonal variants in about 2.6% of the NGS data sets with mutant spike protein, which might indicate co-infection with various SARS-CoV-2 strains and/or intra-host evolution. Lastly, some variants might have an effect on antibody binding or T-cell recognition. These findings demonstrate the continuous importance of monitoring SARS-CoV-2 sequences for an early detection of variants that require adaptations in preventive and therapeutic strategies.

2021 ◽  
Author(s):  
Barbara Schrörs ◽  
Ranganath Gudimella ◽  
Thomas Bukur ◽  
Thomas Rösler ◽  
Martin Löwer ◽  
...  

AbstractDue to the widespread of the COVID-19 pandemic, the SARS-CoV-2 genome is evolving in diverse human populations. Several studies already reported different strains and an increase in the mutation rate. Particularly, mutations in SARS-CoV-2 spike-glycoprotein are of great interest as it mediates infection in human and recently approved mRNA vaccines are designed to induce immune responses against it.We analyzed 146,920 SARS-CoV-2 genome assemblies and 2,393 NGS datasets from GISAID, NCBI Virus and NCBI SRA archives focusing on non-synonymous mutations in the spike protein. Only around 13.6% of the samples contained the wild-type spike protein with no variation from the reference. Among the spike protein mutants, we confirmed a low mutation rate exhibiting less than 10 non-synonymous mutations in 99.98% of the analyzed sequences, but the mean and median number of spike protein mutations per sample increased over time. 2,592 distinct variants were found in total. The majority of the observed variants were recurrent, but only nine and 23 recurrent variants were found in at least 0.5% of the mutant genome assemblies and NGS samples, respectively. Further, we found high-confidence subclonal variants in about 15.1% of the NGS data sets with mutant spike protein, which might indicate co-infection with various SARS-CoV-2 strains and/or intra-host evolution. Lastly, some variants might have an effect on antibody binding or T-cell recognition.These findings demonstrate the increasing importance of monitoring SARS-CoV-2 sequences for an early detection of variants that require adaptations in preventive and therapeutic strategies.


2016 ◽  
Author(s):  
Yuval B. Simons ◽  
Guy Sella

AbstractOver the past decade, there has been both great interest and confusion about whether recent demographic events—notably the Out-of-Africa-bottleneck and recent population growth—have led to differences in mutation load among human populations. The confusion can be traced to the use of different summary statistics to measure load, which lead to apparently conflicting results. We argue, however, that when statistics more directly related to load are used, the results of different studies and data sets consistently reveal little or no difference in the load of non-synonymous mutations among human populations. Theory helps to understand why no such differences are seen, as well as to predict in what settings they are to be expected. In particular, as predicted by modeling, there is evidence for changes in the load of recessive loss of function mutations in founder and inbred human populations. Also as predicted, eastern subspecies of gorilla, Neanderthals and Denisovans, who are thought to have undergone reductions in population sizes that exceed the human Out-of-Africa bottleneck in duration and severity, show evidence for increased load of non-synonymous mutations (relative to western subspecies of gorillas and modern humans, respectively). A coherent picture is thus starting to emerge about the effects of demographic history on the mutation load in populations of humans and close evolutionary relatives.


Mobile DNA ◽  
2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Malte Petersen ◽  
Sven Winter ◽  
Raphael Coimbra ◽  
Menno J. de Jong ◽  
Vladimir V. Kapitonov ◽  
...  

Abstract Background The majority of structural variation in genomes is caused by insertions of transposable elements (TEs). In mammalian genomes, the main TE fraction is made up of autonomous and non-autonomous non-LTR retrotransposons commonly known as LINEs and SINEs (Long and Short Interspersed Nuclear Elements). Here we present one of the first population-level analysis of TE insertions in a non-model organism, the giraffe. Giraffes are ruminant artiodactyls, one of the few mammalian groups with genomes that are colonized by putatively active LINEs of two different clades of non-LTR retrotransposons, namely the LINE1 and RTE/BovB LINEs as well as their associated SINEs. We analyzed TE insertions of both types, and their associated SINEs in three giraffe genome assemblies, as well as across a population level sampling of 48 individuals covering all extant giraffe species. Results The comparative genome screen identified 139,525 recent LINE1 and RTE insertions in the sampled giraffe population. The analysis revealed a drastically reduced RTE activity in giraffes, whereas LINE1 is still actively propagating in the genomes of extant (sub)-species. In concert with the extremely low activity of the giraffe RTE, we also found that RTE-dependent SINEs, namely Bov-tA and Bov-A2, have been virtually immobile in the last 2 million years. Despite the high current activity of the giraffe LINE1, we did not find evidence for the presence of currently active LINE1-dependent SINEs. TE insertion heterozygosity rates differ among the different (sub)-species, likely due to divergent population histories. Conclusions The horizontally transferred RTE/BovB and its derived SINEs appear to be close to inactivation and subsequent extinction in the genomes of extant giraffe species. This is the first time that the decline of a TE family has been meticulously analyzed from a population genetics perspective. Our study shows how detailed information about past and present TE activity can be obtained by analyzing large-scale population-level genomic data sets.


2018 ◽  
Author(s):  
Patrick K. Albers ◽  
Gil McVean

AbstractThe origin and fate of new mutations within species is the fundamental process underlying evolution. However, while much attention has been focused on characterizing the presence, frequency, and phenotypic impact of genetic variation, the evolutionary histories of most variants are largely unexplored. We have developed a non-parametric approach for estimating the date of origin of genetic variants in large-scale sequencing data sets. The accuracy and robustness of the approach is demonstrated through simulation. Using data from two publicly available human genomic diversity resources, we estimated the age of more than 45 million single nucleotide polymorphisms (SNPs) in the human genome and release the Atlas of Variant Age as a public online database. We characterize the relationship between variant age and frequency in different geographical regions, and demonstrate the value of age information in interpreting variants of functional and selective importance. Finally, we use allele age estimates to power a rapid approach for inferring the ancestry shared between individual genomes, to quantify genealogical relationships at different points in the past, as well as describe and explore the evolutionary history of modern human populations.


Author(s):  
Syed Mohammad Lokman ◽  
Md. Rasheduzzaman ◽  
Asma Salauddin ◽  
Rocktim Barua ◽  
Afsana Yeasmin Tanzina ◽  
...  

AbstractThe newly identified SARS-CoV-2 has now been reported from around 183 countries with more than a million confirmed human cases including more than 68000 deaths. The genomes of SARS-COV-2 strains isolated from different parts of the world are now available and the unique features of constituent genes and proteins have gotten substantial attention recently. Spike glycoprotein is widely considered as a possible target to be explored because of its role during the entry of coronaviruses into host cells. We analyzed 320 whole-genome sequences and 320 spike protein sequences of SARS-CoV-2 using multiple sequence alignment tools. In this study, 483 unique variations have been identified among the genomes including 25 non-synonymous mutations and one deletion in the spike protein of SARS-CoV-2. Among the 26 variations detected, 12 variations were located at the N-terminal domain and 6 variations at the receptor-binding domain (RBD) which might alter the interaction with receptor molecules. In addition, 22 amino acid insertions were identified in the spike protein of SARS-CoV-2 in comparison with that of SARS-CoV. Phylogenetic analyses of spike protein revealed that Bat coronavirus have a close evolutionary relationship with circulating SARS-CoV-2. The genetic variation analysis data presented in this study can help a better understanding of SARS-CoV-2 pathogenesis. Based on our findings, potential inhibitors can be designed and tested targeting these proposed sites of variation.


2021 ◽  
Vol 22 (12) ◽  
pp. 6490
Author(s):  
Olga A. Postnikova ◽  
Sheetal Uppal ◽  
Weiliang Huang ◽  
Maureen A. Kane ◽  
Rafael Villasmil ◽  
...  

The SARS-CoV-2 Spike glycoprotein (S protein) acquired a unique new 4 amino acid -PRRA- insertion sequence at amino acid residues (aa) 681–684 that forms a new furin cleavage site in S protein as well as several new glycosylation sites. We studied various statistical properties of the -PRRA- insertion at the RNA level (CCUCGGCGGGCA). The nucleotide composition and codon usage of this sequence are different from the rest of the SARS-CoV-2 genome. One of such features is two tandem CGG codons, although the CGG codon is the rarest codon in the SARS-CoV-2 genome. This suggests that the insertion sequence could cause ribosome pausing as the result of these rare codons. Due to population variants, the Nextstrain divergence measure of the CCU codon is extremely large. We cannot exclude that this divergence might affect host immune responses/effectiveness of SARS-CoV-2 vaccines, possibilities awaiting further investigation. Our experimental studies show that the expression level of original RNA sequence “wildtype” spike protein is much lower than for codon-optimized spike protein in all studied cell lines. Interestingly, the original spike sequence produces a higher titer of pseudoviral particles and a higher level of infection. Further mutagenesis experiments suggest that this dual-effect insert, comprised of a combination of overlapping translation pausing and furin sites, has allowed SARS-CoV-2 to infect its new host (human) more readily. This underlines the importance of ribosome pausing to allow efficient regulation of protein expression and also of cotranslational subdomain folding.


Author(s):  
Lior Shamir

Abstract Several recent observations using large data sets of galaxies showed non-random distribution of the spin directions of spiral galaxies, even when the galaxies are too far from each other to have gravitational interaction. Here, a data set of $\sim8.7\cdot10^3$ spiral galaxies imaged by Hubble Space Telescope (HST) is used to test and profile a possible asymmetry between galaxy spin directions. The asymmetry between galaxies with opposite spin directions is compared to the asymmetry of galaxies from the Sloan Digital Sky Survey. The two data sets contain different galaxies at different redshift ranges, and each data set was annotated using a different annotation method. The results show that both data sets show a similar asymmetry in the COSMOS field, which is covered by both telescopes. Fitting the asymmetry of the galaxies to cosine dependence shows a dipole axis with probabilities of $\sim2.8\sigma$ and $\sim7.38\sigma$ in HST and SDSS, respectively. The most likely dipole axis identified in the HST galaxies is at $(\alpha=78^{\rm o},\delta=47^{\rm o})$ and is well within the $1\sigma$ error range compared to the location of the most likely dipole axis in the SDSS galaxies with $z>0.15$ , identified at $(\alpha=71^{\rm o},\delta=61^{\rm o})$ .


Algorithms ◽  
2021 ◽  
Vol 14 (5) ◽  
pp. 154
Author(s):  
Marcus Walldén ◽  
Masao Okita ◽  
Fumihiko Ino ◽  
Dimitris Drikakis ◽  
Ioannis Kokkinakis

Increasing processing capabilities and input/output constraints of supercomputers have increased the use of co-processing approaches, i.e., visualizing and analyzing data sets of simulations on the fly. We present a method that evaluates the importance of different regions of simulation data and a data-driven approach that uses the proposed method to accelerate in-transit co-processing of large-scale simulations. We use the importance metrics to simultaneously employ multiple compression methods on different data regions to accelerate the in-transit co-processing. Our approach strives to adaptively compress data on the fly and uses load balancing to counteract memory imbalances. We demonstrate the method’s efficiency through a fluid mechanics application, a Richtmyer–Meshkov instability simulation, showing how to accelerate the in-transit co-processing of simulations. The results show that the proposed method expeditiously can identify regions of interest, even when using multiple metrics. Our approach achieved a speedup of 1.29× in a lossless scenario. The data decompression time was sped up by 2× compared to using a single compression method uniformly.


Genetics ◽  
1999 ◽  
Vol 153 (1) ◽  
pp. 497-506 ◽  
Author(s):  
Rasmus Nielsen ◽  
Daniel M Weinreich

Abstract McDonald/Kreitman tests performed on animal mtDNA consistently reveal significant deviations from strict neutrality in the direction of an excess number of polymorphic nonsynonymous sites, which is consistent with purifying selection acting on nonsynonymous sites. We show that under models of recurrent neutral and deleterious mutations, the mean age of segregating neutral mutations is greater than the mean age of segregating selected mutations, even in the absence of recombination. We develop a test of the hypothesis that the mean age of segregating synonymous mutations equals the mean age of segregating nonsynonymous mutations in a sample of DNA sequences. The power of this age-of-mutation test and the power of the McDonald/Kreitman test are explored by computer simulations. We apply the new test to 25 previously published mitochondrial data sets and find weak evidence for selection against nonsynonymous mutations.


Sign in / Sign up

Export Citation Format

Share Document