Worldwide tracing of mutations and the evolutionary dynamics of SARS-CoV-2

AbstractUnderstanding the mutational and evolutionary dynamics of SARS-CoV-2 is essential for treating COVID-19 and the development of a vaccine. Here, we analyzed publicly available 15,818 assembled SARS-CoV-2 genome sequences, along with 2,350 raw sequence datasets sampled worldwide. We investigated the distribution of inter-host single nucleotide polymorphisms (inter-host SNPs) and intra-host single nucleotide variations (iSNVs). Mutations have been observed at 35.6% (10,649/29,903) of the bases in the genome. The substitution rate in some protein coding regions is higher than the average in SARS-CoV-2 viruses, and the high substitution rate in some regions might be driven to escape immune recognition by diversifying selection. Both recurrent mutations and human-to-human transmission are mechanisms that generate fitness advantageous mutations. Furthermore, the frequency of three mutations (S protein, F400L; ORF3a protein, T164I; and ORF1a protein, Q6383H) has gradual increased over time on lineages, which provides new clues for the early detection of fitness advantageous mutations. Our study provides theoretical support for vaccine development and the optimization of treatment for COVID-19. We call researchers to submit raw sequence data to public databases.

Download Full-text

Analysis of Inter-Chromosomal Distribution of Disease-Related Genes in Human Genome

Current Protein and Peptide Science ◽

10.2174/1389203721666200426233158 ◽

2020 ◽

Vol 21 (11) ◽

pp. 1068-1077

Author(s):

Xiaochao Sun ◽

Bin Yang ◽

Qunye Zhang

Keyword(s):

Spatial Distribution ◽

Model Organisms ◽

Nucleotide Polymorphisms ◽

Chromosomal Distribution ◽

Single Nucleotide ◽

Protein Coding ◽

Single Chromosome ◽

Deletion Mutations ◽

Protein Coding Genes ◽

Disease Related Genes

: Many studies have shown that the spatial distribution of genes within a single chromosome exhibits distinct patterns. However, little is known about the characteristics of inter-chromosomal distribution of genes (including protein-coding genes, processed transcripts and pseudogenes) in different genomes. In this study, we explored these issues using the available genomic data of both human and model organisms. Moreover, we also analyzed the distribution pattern of protein-coding genes that have been associated with 14 common diseases and the insert/deletion mutations and single nucleotide polymorphisms detected by whole genome sequencing in an acute promyelocyte leukemia patient. We obtained the following novel findings. Firstly, inter-chromosomal distribution of genes displays a nonstochastic pattern and the gene densities in different chromosomes are heterogeneous. This kind of heterogeneity is observed in genomes of both lower and higher species. Secondly, protein-coding genes involved in certain biological processes tend to be enriched in one or a few chromosomes. Our findings have added new insights into our understanding of the spatial distribution of genome and disease- related genes across chromosomes. These results could be useful in improving the efficiency of disease-associated gene screening studies by targeting specific chromosomes.

Download Full-text

PSI-40 Two mitochondrial lineages revealed in North American yak

Journal of Animal Science ◽

10.1093/jas/skaa278.833 ◽

2020 ◽

Vol 98 (Supplement_4) ◽

pp. 477-477

Author(s):

Leah K Treffer ◽

Edward S Rice ◽

Anna M Fuller ◽

Samuel Cutler ◽

Jessica L Petersen

Keyword(s):

Sequence Data ◽

Haplotype Network ◽

Ovis Aries ◽

Similar Species ◽

Nucleotide Polymorphisms ◽

Mt Dna ◽

Protein Coding ◽

Sister Clade ◽

Mtdna Sequence ◽

The Impact

Abstract Domestic yak (Bos grunniens) are bovids native to the Asian Qinghai-Tibetan Plateau. Studies of Asian yak have revealed that introgression with domestic cattle has contributed to the evolution of the species. When imported to North America (NA), some hybridization with B. taurus did occur. The objective of this study was to use mitochondrial (mt) DNA sequence data to better understand the mtDNA origin of NA yak and their relationship to Asian yak and related species. The complete mtDNA sequence of 14 individuals (12 NA yak, 1 Tibetan yak, 1 Tibetan B. indicus) was generated and compared with sequences of similar species from GeneBank (B. indicus, B. grunniens (Chinese), B. taurus, B. gaurus, B. primigenius, B. frontalis, Bison bison, and Ovis aries). Individuals were aligned to the B. grunniens reference genome (ARS_UNL_BGru_maternal_1.0), which was also included in the analyses. The mtDNA genes were annotated using the ARS-UCD1.2 cattle sequence as a reference. Ten unique NA yak haplotypes were identified, which a haplotype network separated into two clusters. Variation among the NA haplotypes included 93 nonsynonymous single nucleotide polymorphisms. A maximum likelihood tree including all taxa was made using IQtree after the data were partitioned into twenty-two subgroups using PartitionFinder2. Notably, six NA yak haplotypes formed a clade with B. indicus; the other four haplotypes grouped with B. grunniens and fell as a sister clade to bison, gaur and gayal. These data demonstrate two mitochondrial origins of NA yak with genetic variation in protein coding genes. Although these data suggest yak introgression with B. indicus, it appears to date prior to importation into NA. In addition to contributing to our understanding of the species history, these results suggest the two major mtDNA haplotypes in NA yak may functionally differ. Characterization of the impact of these differences on cellular function is currently underway.

Download Full-text

Different Within-Host Viral Evolution Dynamics in Severely Immunosuppressed Cases with Persistent SARS-CoV-2

Biomedicines ◽

10.3390/biomedicines9070808 ◽

2021 ◽

Vol 9 (7) ◽

pp. 808

Author(s):

Laura Pérez-Lago ◽

Teresa Aldámiz-Echevarría ◽

Rita García-Martínez ◽

Leire Pérez-Latorre ◽

Marta Herranz ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Evolutionary Dynamics ◽

Viral Evolution ◽

Special Focus ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

New Variant ◽

History Of ◽

Evolution Dynamics ◽

The Uk

A successful Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) variant, B.1.1.7, has recently been reported in the UK, causing global alarm. Most likely, the new variant emerged in a persistently infected patient, justifying a special focus on these cases. Our aim in this study was to explore certain clinical profiles involving severe immunosuppression that may help explain the prolonged persistence of viable viruses. We present three severely immunosuppressed cases (A, B, and C) with a history of lymphoma and prolonged SARS-CoV-2 shedding (2, 4, and 6 months), two of whom finally died. Whole-genome sequencing of 9 and 10 specimens from Cases A and B revealed extensive within-patient acquisition of diversity, 12 and 28 new single nucleotide polymorphisms, respectively, which suggests ongoing SARS-CoV-2 replication. This diversity was not observed for Case C after analysing 5 sequential nasopharyngeal specimens and one plasma specimen, and was only observed in one bronchoaspirate specimen, although viral viability was still considered based on constant low Ct values throughout the disease and recovery of the virus in cell cultures. The acquired viral diversity in Cases A and B followed different dynamics. For Case A, new single nucleotide polymorphisms were quickly fixed (13–15 days) after emerging as minority variants, while for Case B, higher diversity was observed at a slower emergence: fixation pace (1–2 months). Slower SARS-CoV-2 evolutionary pace was observed for Case A following the administration of hyperimmune plasma. This work adds knowledge on SARS-CoV-2 prolonged shedding in severely immunocompromised patients and demonstrates viral viability, noteworthy acquired intra-patient diversity, and different SARS-CoV-2 evolutionary dynamics in persistent cases.

Download Full-text

Stability of SARS-CoV-2 Phylogenies

10.1101/2020.06.08.141127 ◽

2020 ◽

Cited By ~ 3

Author(s):

Yatish Turakhia ◽

Bryan Thornlow ◽

Landen Gozashti ◽

Angie S. Hinrichs ◽

Jason D. Fernandes ◽

...

Keyword(s):

Binding Sites ◽

Sequence Data ◽

Scientific Discovery ◽

Lineage Tracing ◽

Protein Coding ◽

Sequencing Errors ◽

Scientific Inference ◽

Recurrent Mutations ◽

Sequence Quality ◽

Essential Sequence

AbstractThe SARS-CoV-2 pandemic has led to unprecedented, nearly real-time genetic tracing due to the rapid community sequencing response. Researchers immediately leveraged these data to infer the evolutionary relationships among viral samples and to study key biological questions, including whether host viral genome editing and recombination are features of SARS-CoV-2 evolution. This global sequencing effort is inherently decentralized and must rely on data collected by many labs using a wide variety of molecular and bioinformatic techniques. There is thus a strong possibility that systematic errors associated with lab-specific practices affect some sequences in the repositories. We find that some recurrent mutations in reported SARS-CoV-2 genome sequences have been observed predominantly or exclusively by single labs, co-localize with commonly used primer binding sites and are more likely to affect the protein coding sequences than other similarly recurrent mutations. We show that their inclusion can affect phylogenetic inference on scales relevant to local lineage tracing, and make it appear as though there has been an excess of recurrent mutation and/or recombination among viral lineages. We suggest how samples can be screened and problematic mutations removed. We also develop tools for comparing and visualizing differences among phylogenies and we show that consistent clade- and tree-based comparisons can be made between phylogenies produced by different groups. These will facilitate evolutionary inferences and comparisons among phylogenies produced for a wide array of purposes. Building on the SARS-CoV-2 Genome Browser at UCSC, we present a toolkit to compare, analyze and combine SARS-CoV-2 phylogenies, find and remove potential sequencing errors and establish a widely shared, stable clade structure for a more accurate scientific inference and discourse.ForewordWe wish to thank all groups that responded rapidly by producing these invaluable and essential sequence data. Their contributions have enabled an unprecedented, lightning-fast process of scientific discovery---truly an incredible benefit for humanity and for the scientific community. We emphasize that most lab groups with whom we associate specific suspicious alleles are also those who have produced the most sequence data at a time when it was urgently needed. We commend their efforts. We have already contacted each group and many have updated their sequences. Our goal with this work is not to highlight potential errors, but to understand the impacts of these and other kinds of highly recurrent mutations so as to identify commonalities among the suspicious examples that can improve sequence quality and analysis going forward.

Download Full-text

Draft genome sequence of Solanum aethiopicum provides insights into disease resistance, drought tolerance, and the evolution of the genome

GigaScience ◽

10.1093/gigascience/giz115 ◽

2019 ◽

Vol 8 (10) ◽

Cited By ~ 3

Author(s):

Bo Song ◽

Yue Song ◽

Yuan Fu ◽

Elizabeth Balyejusa Kizito ◽

Sandra Ndagire Kamenya ◽

...

Keyword(s):

Disease Resistance ◽

Single Nucleotide Polymorphisms ◽

Drought Tolerance ◽

Resistance Genes ◽

Draft Genome ◽

Disease Resistance Genes ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Protein Coding Genes

Abstract Background The African eggplant (Solanum aethiopicum) is a nutritious traditional vegetable used in many African countries, including Uganda and Nigeria. It is thought to have been domesticated in Africa from its wild relative, Solanum anguivi. S. aethiopicum has been routinely used as a source of disease resistance genes for several Solanaceae crops, including Solanum melongena. A lack of genomic resources has meant that breeding of S. aethiopicum has lagged behind other vegetable crops. Results We assembled a 1.02-Gb draft genome of S. aethiopicum, which contained predominantly repetitive sequences (78.9%). We annotated 37,681 gene models, including 34,906 protein-coding genes. Expansion of disease resistance genes was observed via 2 rounds of amplification of long terminal repeat retrotransposons, which may have occurred ∼1.25 and 3.5 million years ago, respectively. By resequencing 65 S. aethiopicum and S. anguivi genotypes, 18,614,838 single-nucleotide polymorphisms were identified, of which 34,171 were located within disease resistance genes. Analysis of domestication and demographic history revealed active selection for genes involved in drought tolerance in both “Gilo” and “Shum” groups. A pan-genome of S. aethiopicum was assembled, containing 51,351 protein-coding genes; 7,069 of these genes were missing from the reference genome. Conclusions The genome sequence of S. aethiopicum enhances our understanding of its biotic and abiotic resistance. The single-nucleotide polymorphisms identified are immediately available for use by breeders. The information provided here will accelerate selection and breeding of the African eggplant, as well as other crops within the Solanaceae family.

Download Full-text

Complete Genome Sequence of Yersinia pestis Strains Antiqua and Nepal516: Evidence of Gene Reduction in an Emerging Pathogen

Journal of Bacteriology ◽

10.1128/jb.00124-06 ◽

2006 ◽

Vol 188 (12) ◽

pp. 4453-4463 ◽

Cited By ~ 114

Author(s):

Patrick S. G. Chain ◽

Ping Hu ◽

Stephanie A. Malfatti ◽

Lyndsay Radnedge ◽

Frank Larimer ◽

...

Keyword(s):

Single Nucleotide Polymorphisms ◽

Yersinia Pestis ◽

Yersinia Pseudotuberculosis ◽

Open Reading Frames ◽

Genomic Diversity ◽

Avirulent Strain ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Definition Of

ABSTRACT Yersinia pestis, the causative agent of bubonic and pneumonic plagues, has undergone detailed study at the molecular level. To further investigate the genomic diversity among this group and to help characterize lineages of the plague organism that have no sequenced members, we present here the genomes of two isolates of the “classical” antiqua biovar, strains Antiqua and Nepal516. The genomes of Antiqua and Nepal516 are 4.7 Mb and 4.5 Mb and encode 4,138 and 3,956 open reading frames, respectively. Though both strains belong to one of the three classical biovars, they represent separate lineages defined by recent phylogenetic studies. We compare all five currently sequenced Y. pestis genomes and the corresponding features in Yersinia pseudotuberculosis. There are strain-specific rearrangements, insertions, deletions, single nucleotide polymorphisms, and a unique distribution of insertion sequences. We found 453 single nucleotide polymorphisms in protein-coding regions, which were used to assess the evolutionary relationships of these Y. pestis strains. Gene reduction analysis revealed that the gene deletion processes are under selective pressure, and many of the inactivations are probably related to the organism's interaction with its host environment. The results presented here clearly demonstrate the differences between the two biovar antiqua lineages and support the notion that grouping Y. pestis strains based strictly on the classical definition of biovars (predicated upon two biochemical assays) does not accurately reflect the phylogenetic relationships within this species. A comparison of four virulent Y. pestis strains with the human-avirulent strain 91001 provides further insight into the genetic basis of virulence to humans.

Download Full-text

Complete overview of protein-inactivating sequence variations in 36 sequenced mouse inbred strains

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1706168114 ◽

2017 ◽

Vol 114 (34) ◽

pp. 9158-9163 ◽

Cited By ~ 14

Author(s):

Steven Timmermans ◽

Marc Van Montagu ◽

Claude Libert

Keyword(s):

Mouse Genome ◽

Inbred Strains ◽

Nucleotide Polymorphisms ◽

Genome Sequences ◽

Single Nucleotide ◽

Protein Coding ◽

Stop Codons ◽

Protein Coding Genes ◽

Sequence Variations ◽

Genetic Background Effects

Mouse inbred strains remain essential in science. We have analyzed the publicly available genome sequences of 36 popular inbred strains and provide lists for each strain of protein-coding genes that acquired sequence variations that cause premature STOP codons, loss of STOP codons and single nucleotide polymorphisms, and short in-frame insertions and deletions. Our data give an overview of predicted defective proteins, including predicted impact scores, of all these strains compared with the reference mouse genome of C57BL/6J. These data can also be retrieved via a searchable website (mousepost.be) and allow a global, better interpretation of genetic background effects and a source of naturally defective alleles in these 36 sequenced classical and high-priority mouse inbred strains.

Download Full-text

Regulatory Variants and Disease: The E-Cadherin −160C/A SNP as an Example

Molecular Biology International ◽

10.1155/2014/967565 ◽

2014 ◽

Vol 2014 ◽

pp. 1-9 ◽

Cited By ~ 17

Author(s):

Gongcheng Li ◽

Tiejun Pan ◽

Dan Guo ◽

Long-Cheng Li

Keyword(s):

Association Studies ◽

Genome Wide Association Studies ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Protein Coding ◽

Regulate Gene Expression ◽

Regulatory Variants ◽

Genome Wide ◽

Regulatory Snps ◽

E Cadherin

Single nucleotide polymorphisms (SNPs) occurring in noncoding sequences have largely been ignored in genome-wide association studies (GWAS). Yet, amounting evidence suggests that many noncoding SNPs especially those that are in the vicinity of protein coding genes play important roles in shaping chromatin structure and regulate gene expression and, as such, are implicated in a wide variety of diseases. One of such regulatory SNPs (rSNPs) is the E-cadherin (CDH1) promoter −160C/A SNP (rs16260) which is known to affect E-cadherin promoter transcription by displacing transcription factor binding and has been extensively scrutinized for its association with several diseases especially malignancies. Findings from studying this SNP highlight important clinical relevance of rSNPs and justify their inclusion in future GWAS to identify novel disease causing SNPs.

Download Full-text

Discovery of single‐nucleotide polymorphisms (SNPs) in the uncharacterized genome of the ascomycete Ophiognomonia clavigignenti‐juglandacearum from 454 sequence data

Molecular Ecology Resources ◽

10.1111/j.1755-0998.2011.02998.x ◽

2011 ◽

Vol 11 (4) ◽

pp. 693-702 ◽

Cited By ~ 16

Author(s):

K. D. BRODERS ◽

K. E. WOESTE ◽

P. J. SanMIGUEL ◽

R. P. WESTERMAN ◽

G. J. BOLAND

Keyword(s):

Single Nucleotide Polymorphisms ◽

Sequence Data ◽

Nucleotide Polymorphisms ◽

Single Nucleotide

Download Full-text

Limited genetic diversity of blaCMY-2-containing IncI1-pST12 plasmids from Enterobacteriaceae of human and broiler chicken origin in the Netherlands

10.1101/2020.07.09.195461 ◽

2020 ◽

Author(s):

Evert den Drijver ◽

Joep J.J.M. Stohr ◽

Jaco J. Verweij ◽

Carlo Verhulst ◽

Francisca C. Velkers ◽

...

Keyword(s):

Escherichia Coli ◽

Sequence Data ◽

Sequence Similarity ◽

Nucleotide Polymorphisms ◽

Single Nucleotide ◽

Pan Genome ◽

Long Read ◽

Short Read Sequence ◽

High Degree ◽

Limited Genetic Diversity

AbstractDistinguishing epidemiologically related and unrelated plasmids is essential to confirm plasmid transmission. We compared IncI1-pST12 plasmids from both human and livestock origin and explored the degree of sequence similarity between plasmids from Enterobacteriaceae with different epidemiological links. Short-read sequence data of Enterobacteriaceae cultured from humans and broilers were screened for the presence of both a blaCMY-2 gene and an IncI1-pST12 replicon. Isolates were long-read sequenced on a MinION sequencer (OxfordNanopore Technologies). After plasmid reconstruction using hybrid assembly, pairwise single nucleotide polymorphisms (SNP) were determined. The plasmids were annotated, and a pan-genome was constructed to compare genes variably present between the different plasmids. Nine Escherichia coli sequences of broiler origin, four Escherichia coli sequences and one Salmonella enterica sequence of human origin were selected for the current analysis. A circular contig with the IncI1-pST12 replicon and blaCMY-2 gene was extracted from the assembly graph of all fourteen isolates. Analysis of the IncI1-pST12 plasmids revealed a low number of SNP differences (range of 0-9 SNPs). The range of SNP differences overlapped in isolates with different epidemiological links. One-hundred and twelve from a total of 113 genes of the pan-genome were present in all plasmid constructs. NGS-analysis of blaCMY--2-containing IncI1-pST12 plasmids isolated from Enterobacteriaceae with different epidemiological links show a high degree of sequence similarity in terms of SNP differences and the number of shared genes. Therefore, statements on the horizontal transfer of these plasmids based on genetic identity should be made with caution.

Download Full-text