scholarly journals Whole-genome Sequencing of SARS-CoV-2: Using Phylogeny and Structural Modeling to Contextualize Local Viral Evolution

2021 ◽  
Author(s):  
Ashley E Nazario-Toole ◽  
Hui Xia ◽  
Thomas F Gibbons

ABSTRACT Introduction The outbreak of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has created a global pandemic resulting in over 1 million deaths worldwide. In the Department of Defense (DoD), over 129,000 personnel (civilians, dependents, and active duty) have been infected with the virus to date. Rapid estimations of transmission and mutational patterns of virus outbreaks can be accomplished using whole-genome viral sequencing. Deriving interpretable and actionable results from pathogen sequence data is accomplished by the construction of phylogenetic trees (from local and global virus sequences) and by the creation of protein maps, to visualize and predict the effects of structural protein amino acid mutations. Materials and Methods We developed a sequencing and bioinformatics workflow for molecular epidemiological SARS-CoV-2 surveillance using excess clinical specimens collected under an institutional review board exempt protocol at Joint Base San Antonio, Lackland AFB. This workflow includes viral RNA isolation, viral load quantification, tiling-based next-generation sequencing, sequencing and bioinformatics analysis, and data visualization via phylogenetic trees and protein mapping. Results Sequencing of 37 clinical specimens collected at JBSA/Lackland revealed that by June 2020, SAR-CoV-2 strains carrying the 614G mutation were the predominant cause of local coronavirus disease 2019 infections. We identified 109 nucleotide changes in the coding region of the SARS-CoV-2 genome (which lead to 63 unique, non-synonymous amino acid mutations), one mutation in the 5ʹ-untranslated region (UTR), and two mutations in the 3ʹUTR. Furthermore, we identified and mapped six additional spike protein amino acid changes—information which could potentially aid vaccine design. Conclusion The workflow presented here is designed to enable DoD public health officials to track viral evolution and conduct near real-time evaluation of future outbreaks. The generation of molecular epidemiological sequence data is critical for the development of disease intervention strategies—most notably, vaccine design. Overall, we present a streamlined sequencing and bioinformatics methodology aimed at improving long-term readiness efforts in the DoD.

1980 ◽  
Vol 187 (1) ◽  
pp. 65-74 ◽  
Author(s):  
D Penny ◽  
M D Hendy ◽  
L R Foulds

We have recently reported a method to identify the shortest possible phylogenetic tree for a set of protein sequences [Foulds Hendy & Penny (1979) J. Mol. Evol. 13. 127–150; Foulds, Penny & Hendy (1979) J. Mol. Evol. 13, 151–166]. The present paper discusses issues that arise during the construction of minimal phylogenetic trees from protein-sequence data. The conversion of the data from amino acid sequences into nucleotide sequences is shown to be advantageous. A new variation of a method for constructing a minimal tree is presented. Our previous methods have involved first constructing a tree and then either proving that it is minimal or transforming it into a minimal tree. The approach presented in the present paper progressively builds up a tree, taxon by taxon. We illustrate this approach by using it to construct a minimal tree for ten mammalian haemoglobin alpha-chain sequences. Finally we define a measure of the complexity of the data and illustrate a method to derive a directed phylogenetic tree from the minimal tree.


Viruses ◽  
2019 ◽  
Vol 11 (8) ◽  
pp. 701 ◽  
Author(s):  
Kumar ◽  
Chaudhary ◽  
Lu ◽  
Duff ◽  
Heffel ◽  
...  

Viruses belonging to the genus Bocaparvovirus (BoV) are a genetically diverse group of DNA viruses known to cause respiratory, enteric, and neurological diseases in animals, including humans. An intestinal sample from an alpaca (Vicugna pacos) herd with reoccurring diarrhea and respiratory disease was submitted for next-generation sequencing, revealing the presence of a BoV strain. The alpaca BoV strain (AlBoV) had a 58.58% whole genome nucleotide percent identity to a camel BoV from Dubai, belonging to a tentative ungulate BoV 8 species (UBoV8). Recombination events were lacking with other UBoV strains. The AlBoV genome was comprised of the NS1, NP1, and VP1 proteins. The NS1 protein had the highest amino acid percent identity range (57.89–67.85%) to the members of UBoV8, which was below the 85% cut-off set by the International Committee on Taxonomy of Viruses. The low NS1 amino acid identity suggests that AlBoV is a tentative new species. The whole genome, NS1, NP1, and VP1 phylogenetic trees illustrated distinct branching of AlBoV, sharing a common ancestor with UBoV8. Walker loop and Phospholipase A2 (PLA2) motifs that are vital for virus infectivity were identified in NS1 and VP1 proteins, respectively. Our study reports a novel BoV strain in an alpaca intestinal sample and highlights the need for additional BoV research.


1993 ◽  
Vol 4 (3) ◽  
pp. 287-292 ◽  
Author(s):  
D.L. Kauffman ◽  
P.J. Keller ◽  
A. Bennick ◽  
M. Blum

Human proline-rich proteins (PRPs) constitute a complex family of salivary proteins that are encoded by a small number of genes. The primary gene product is cleaved by proteases, thereby giving rise to about 20 secreted proteins. To determine the genes for the secreted PRPs, therefore, it is necessary to obtain sequences of both the secreted proteins and the DNA encoding these proteins. We have sequenced most PRPs from one donor (D.K.) and aligned the protein sequences with available DNA sequences from unrelated individuals. Partial sequence data have now been obtained for an additional PRP from D.K. named II-1. This protein was purified from parotid saliva by gel filtration and ion-exchange chromatography. Peptides were obtained by cleavage with trypsin, clostripain, and N-bromosuccinimide, followed by column chromatography. The peptides were sequenced on a gas-phase protein sequenator. Overlapping peptide sequences were obtained for most of II-1 and aligned with translated DNA sequences. The best fit was obtained with clones containing sequences for the allele PRB4" (Lyons et al., 1988). However, there was not complete identity of the protein amino acid sequence and the DNA-derived sequences, indicating that II-1 is not encoded by PRB4". Other PRPs isolated from D.K. also fail to conform to any DNA structure so far reported. This shows the need to obtain amino acid sequences and corresponding DNA sequences from the same person to assign genes for the PRPs and to determine the location of the postribosomal cleavage points in the primary translation product.


2018 ◽  
Author(s):  
Marianne Aspbury ◽  
James Sciberras ◽  
Jukka Corander ◽  
Sion C. Bayliss ◽  
Tjibbe Donker ◽  
...  

AbstractWhole genome sequence (WGS) data for bacterial pathogens can provide evidence as to the source of nosocomial infection, and more specifically the ability to distinguish between intra- and inter-hospital transmission. This is currently achieved either through using SNP thresholds, which can lack statistical robustness, or by constructing phylogenetic trees, which can be computationally expensive and difficult to interpret. Here we compare two alternative statistical approaches using 1022 genomes of methicillin resistantStaphylococcus aureus(MRSA) clone ST22. In 71% of cases both methods predict the same hospital origin, which is also supported by the ML tree. Robust assignments are divided approximately equally between intra-hospital transmission and inter-hospital transmission. Our approaches are rapid and produce intuitive output that could inform on immediate infection control priorities, as well as providing long-term data on inter-hospital transmission networks. We discuss the strengths and weakness of our methods, and the generalisability of this approach.One Sentence SummaryWe present rapid statistical methods for distinguishing intra- versus inter-hospital transmission of bacterial pathogens using whole genome sequence data; these methods do not require the use of SNP thresholds or the generation and interpretation of phylogenetic trees.


Author(s):  
Tao Zhang ◽  
Qunfu Wu ◽  
Zhigang Zhang

AbstractTo explore potential intermediate host of a novel coronavirus is vital to rapidly control continuous COVID-19 spread. We found genomic and evolutionary evidences of the occurrence of 2019-nCoV-like coronavirus (named as Pangolin-CoV) from dead Malayan Pangolins. Pangolin-CoV is 91.02% and 90.55% identical at the whole genome level to 2019-nCoV and BatCoV RaTG13, respectively. Pangolin-CoV is the lowest common ancestor of 2019-nCoV and RaTG13. The S1 protein of Pangolin-CoV is much more closely related to 2019-nCoV than RaTG13. Five key amino-acid residues involved in the interaction with human ACE2 are completely consistent between Pangolin-CoV and 2019-nCoV but four amino-acid mutations occur in RaTG13. It indicates Pangolin-CoV has similar pathogenic potential to 2019-nCoV, and would be helpful to trace the origin and probable intermediate host of 2019-nCoV.


2020 ◽  
Author(s):  
Babatunde Olarenwaju Motayo ◽  
Olukunle Oluwapamilerin Oluwasemowo ◽  
Paul Akiniyi Akinduti ◽  
Babatunde Adebiyi Olusola ◽  
Olumide T Aerege ◽  
...  

ABSTRACTThe ongoing SARSCoV-2 pandemic was introduced into Africa on 14th February 2020 and has rapidly spread across the continent causing severe public health crisis and mortality. We investigated the genetic diversity and evolution of this virus during the early outbreak months using whole genome sequences. We performed; recombination analysis against closely related CoV, Bayesian time scaled phylogeny and investigated spike protein amino acid mutations. Results from our analysis showed recombination signals between the AfrSARSCoV-2 sequences and reference sequences within the N and S genes. The evolutionary rate of the AfrSARSCoV-2 was 4.133 × 10−4 high posterior density HPD (4.132 × 10−4 to 4.134 × 10−4) substitutions/site/year. The time to most recent common ancestor TMRCA of the African strains was December 7th 2019. The AfrSARCoV-2 sequences diversified into two lineages A and B with B being more diverse with multiple sub-lineages confirmed by both maximum clade credibility MCC tree and PANGOLIN software. There was a high prevalence of the D614-G spike protein amino acid mutation (82.61%) among the African strains. Our study has revealed a rapidly diversifying viral population with the G614 spike protein variant dominating, we advocate for up scaling NGS sequencing platforms across Africa to enhance surveillance and aid control effort of SARSCoV-2 in Africa.


2013 ◽  
Vol 94 (1) ◽  
pp. 128-135 ◽  
Author(s):  
Junichi Soma ◽  
Hiroshi Tsunemitsu ◽  
Takeshi Miyamoto ◽  
Goro Suzuki ◽  
Takashi Sasaki ◽  
...  

Rotavirus C (RVC) has been detected frequently in epidemic cases and/or outbreaks of diarrhoea in humans and animals worldwide. Because it is difficult to cultivate RVCs serially in cell culture, the sequence data available for RVCs are limited, despite their potential economical and epidemiological impact. Although whole-genome sequences of one porcine RVC and seven human RVC strains have been analysed, this has not yet been done for a bovine RVC strain. In the present study, we first determined the nucleotide sequences for five as-yet underresearched genes, including the NSP4 gene, from a cultivable bovine RVC, the Shintoku strain, identified in Hokkaido Prefecture, Japan, in 1991. In addition, we elucidated the ORF sequences of all segments from another bovine RVC, the Toyama strain, detected in Toyama Prefecture, Japan, in 2010, in order to investigate genetic divergence among bovine RVCs. Comparison of segmental nucleotide and deduced amino acid sequences among RVCs indicates high identity among bovine RVCs and low identity between human and porcine RVCs. Phylogenetic analysis of each gene showed that the two bovine RVCs belong to a cluster distinct from human and porcine RVCs. These data demonstrate that RVCs can be classified into different genotypes according to host species. Moreover, RVC NSP1, NSP2 and VP1 amino acid sequences contain a unique motif that is highly conserved among rotavirus A (RVA) strains and, hence, several proteins from bovine RVCs are suggested to play important roles that are similar to those of RVAs.


2021 ◽  
Author(s):  
Julia Doelger ◽  
Mehran Kardar ◽  
Arup K. Chakraborty

There still are no effective long-term protective vaccines against viruses that continuously evolve under immune pressure such as seasonal influenza, which has caused, and can cause, devastating epidemics in the human population. For finding such a broadly protective immunization strategy it is useful to know how easily the virus can escape via mutation from specific antibody responses. This information is encoded in the fitness landscape of the viral proteins (i.e., knowledge of the viral fitness as a function of sequence). Here we present a computational method to infer the intrinsic mutational fitness landscape of influenza-like evolving antigens from yearly sequence data. We test inference performance with computer-generated sequence data that are based on stochastic simulations mimicking basic features of immune-driven viral evolution. Although the numerically simulated model does create a phylogeny based on the allowed mutations, the inference scheme does not use this information. This provides a contrast to other methods that rely on reconstruction of phylogenetic trees. Our method just needs a sufficient number of samples over multiple years. With our method we are able to infer single- as well as pairwise mutational fitness effects from the simulated sequence time series for short antigenic proteins. Our fitness inference approach may have potential future use for design of immunization protocols by identifying intrinsically vulnerable immune target combinations on antigens that evolve under immune-driven selection. This approach may in the future be applied to influenza and other novel viruses such as SARS-CoV-2, which evolves and, like influenza, might continue to escape the natural and vaccine-mediated immune pressures.


2019 ◽  
Vol 400 (11) ◽  
pp. 1519-1527 ◽  
Author(s):  
Martin Peng ◽  
Manfred Maier ◽  
Jan Esch ◽  
Alexander Schug ◽  
Kersten S. Rabe

Abstract The optimization of enzyme properties for specific reaction conditions enables their tailored use in biotechnology. Predictions using established computer-based methods, however, remain challenging, especially regarding physical parameters such as thermostability without concurrent loss of activity. Employing established computational methods such as energy calculations using FoldX can lead to the identification of beneficial single amino acid substitutions for the thermostabilization of enzymes. However, these methods require a three-dimensional (3D)-structure of the enzyme. In contrast, coevolutionary analysis is a computational method, which is solely based on sequence data. To enable a comparison, we employed coevolutionary analysis together with structure-based approaches to identify mutations, which stabilize an enzyme while retaining its activity. As an example, we used the delicate dimeric, thiamine pyrophosphate dependent enzyme ketoisovalerate decarboxylase (Kivd) and experimentally determined its stability represented by a T50 value indicating the temperature where 50% of enzymatic activity remained after incubation for 10 min. Coevolutionary analysis suggested 12 beneficial mutations, which were not identified by previously established methods, out of which four mutations led to a functional Kivd with an increased T50 value of up to 3.9°C.


Sign in / Sign up

Export Citation Format

Share Document