scholarly journals Comparative genomics suggests limited variability and similar evolutionary patterns between major clades of SARS-CoV-2

Author(s):  
Matteo Chiara ◽  
David S. Horner ◽  
Carmela Gissi ◽  
Graziano Pesole

AbstractPhylogenomic analysis of SARS-CoV-2 as available from publicly available repositories suggests the presence of 3 prevalent groups of viral episomes (super-clades), which are mostly associated with outbreaks in distinct geographic locations (China, USA and Europe). While levels of genomic variability between SARS-CoV-2 isolates are limited, to our knowledge, it is not clear whether the observed patterns of variability in viral super-clades reflect ongoing adaptation of SARS-CoV-2, or merely genetic drift and founder effects. Here, we analyze more than 1100 complete, high quality SARS-CoV-2 genome sequences, and provide evidence for the absence of distinct evolutionary patterns/signatures in the genomes of the currently known major clades of SARS-CoV-2. Our analyses suggest that the presence of distinct viral episomes at different geographic locations are consistent with founder effects, coupled with the rapid spread of this novel virus. We observe that while cross species adaptation of the virus is associated with hypervariability of specific protein coding regions (including the RDB domain of the spike protein), the more variable genomic regions between extant SARS-CoV-2 episomes correspond with the 3’ and 5’ UTRs, suggesting that at present viral protein coding genes should not be subjected to different adaptive evolutionary pressures in different viral strains. Although this study can not be conclusive, we believe that the evidence presented here is strongly consistent with the notion that the biased geographic distribution of SARS-CoV-2 isolates should not be associated with adaptive evolution of this novel pathogen.

Zootaxa ◽  
2020 ◽  
Vol 4748 (1) ◽  
pp. 182-194 ◽  
Author(s):  
JING ZHANG ◽  
ERNST BROCKMANN ◽  
QIAN CONG ◽  
JINHUI SHEN ◽  
NICK V. GRISHIN

We obtained whole genome shotgun sequences and phylogenetically analyzed protein-coding regions of representative skipper butterflies from the genus Carcharodus Hübner, [1819] and its close relatives. Type species of all available genus-group names were sequenced. We find that species attributed to four exclusively Old World genera (Spialia Swinhoe, 1912, Gomalia Moore, 1879, Carcharodus Hübner, [1819] and Muschampia Tutt, 1906) form a monophyletic group that we call a subtribe Carcharodina Verity, 1940. In the phylogenetic trees built from various genomic regions, these species form 7 (not 4) groups that we treat as genera. We find that Muschampia Tutt, 1906 is not monophyletic, and the 5th group is formed by currently monotypic genus Favria Tutt, 1906 new status (type species Hesperia cribrellum Eversmann, 1841), which is sister to Gomalia. The 6th and 7th groups are composed of mostly African species presently placed in Spialia. These groups do not have names and are described here as Ernsta Grishin, gen. n. (type species Pyrgus colotes Druce, 1875) and Agyllia Grishin, gen. n. (type species Pyrgus agylla Trimen, 1889). Two subgroups are recognized in Ernsta: the nominal subgenus and a new one: Delaga Grishin, subgen. n. (type species Pyrgus delagoae Trimen, 1898). Next, we observe that Carcharodus is not monophyletic, and species formerly placed in subgenera Reverdinus Ragusa, 1919 and Lavatheria Verity, 1940 are here transferred to Muschampia. Furthermore, due to differences in male genitalia or DNA sequences, we reinstate Gomalia albofasciata Moore, 1879 and Gomalia jeanneli (Picard, 1949) as species, not subspecies or synonyms of Gomalia elma (Trimen, 1862), and Spialia bifida (Higgins, 1924) as a species, not subspecies of Spialia zebra (Butler, 1888). Sequencing of the type specimens reveals 2.2-3.2% difference in COI barcodes, the evidence that combined with wing pattern differences suggests a new status of a species for Spialia lugens (Staudinger, 1886) and Spialia carnea (Reverdin, 1927), formerly subspecies of Spialia orbifer (Hübner, [1823]). 


Entropy ◽  
2021 ◽  
Vol 23 (10) ◽  
pp. 1324
Author(s):  
Garin Newcomb ◽  
Khalid Sayood

One of the important steps in the annotation of genomes is the identification of regions in the genome which code for proteins. One of the tools used by most annotation approaches is the use of signals extracted from genomic regions that can be used to identify whether the region is a protein coding region. Motivated by the fact that these regions are information bearing structures we propose signals based on measures motivated by the average mutual information for use in this task. We show that these signals can be used to identify coding and noncoding sequences with high accuracy. We also show that these signals are robust across species, phyla, and kingdom and can, therefore, be used in species agnostic genome annotation algorithms for identifying protein coding regions. These in turn could be used for gene identification.


Viruses ◽  
2021 ◽  
Vol 13 (8) ◽  
pp. 1592
Author(s):  
Enikő Fehér ◽  
Szilvia Jakab ◽  
Krisztina Bali ◽  
Eszter Kaszab ◽  
Borbála Nagy ◽  
...  

Duck hepatitis A virus (DHAV), an avian picornavirus, causes high-mortality acute disease in ducklings. Among the three serotypes, DHAV-1 is globally distributed, whereas DHAV-2 and DHAV-3 serotypes are chiefly restricted to Southeast Asia. In this study, we analyzed the genomic evolution of DHAV-1 strains using extant GenBank records and genomic sequences of 10 DHAV-1 strains originating from a large disease outbreak in 2004–2005, in Hungary. Recombination analysis revealed intragenotype recombination within DHAV-1 as well as intergenotype recombination events involving DHAV-1 and DHAV-3 strains. The intergenotype recombination occurred in the VP0 region. Diversifying selection seems to act at sites of certain genomic regions. Calculations estimated slightly lower rates of evolution of DHAV-1 (mean rates for individual protein coding regions, 5.6286 × 10−4 to 1.1147 × 10−3 substitutions per site per year) compared to other picornaviruses. The observed evolutionary mechanisms indicate that whole-genome-based analysis of DHAV strains is needed to better understand the emergence of novel strains and their geographical dispersal.


2019 ◽  
Author(s):  
Chaitanya Erady ◽  
David Chong ◽  
Narendra Meena ◽  
Shraddha Puntambekar ◽  
Ruchi Chauhan ◽  
...  

AbstractTranslation products encoded by non canonical or novel open reading frame (ORF) genomic regions are generally considered too small to play any significant biological role, and dismissed as inconsequential. In this study, we show that mutations mapping to novel ORFs have significantly higher pathogenicity scores than mutations in protein-coding regions. Importantly, novel ORFs can translate into protein-like structures with putative independent biological functions that can be of relevance in disease states, including cancer. We thus provide strong evidence to support the systematic study of novel ORFs to gain new insights into normal biological and disease processes.One Sentence SummaryNon coding regions may encode protein-like products that are important to understand diseases.


2021 ◽  
Vol 17 (7) ◽  
pp. e1009147
Author(s):  
Lukasz Jaroszewski ◽  
Mallika Iyer ◽  
Arghavan Alisoltani ◽  
Mayya Sedova ◽  
Adam Godzik

The unprecedented pace of the sequencing of the SARS-CoV-2 virus genomes provides us with unique information about the genetic changes in a single pathogen during ongoing pandemic. By the analysis of close to 200,000 genomes we show that the patterns of the SARS-CoV-2 virus mutations along its genome are closely correlated with the structural and functional features of the encoded proteins. Requirements of foldability of proteins’ 3D structures and the conservation of their key functional regions, such as protein-protein interaction interfaces, are the dominant factors driving evolutionary selection in protein-coding genes. At the same time, avoidance of the host immunity leads to the abundance of mutations in other regions, resulting in high variability of the missense mutation rate along the genome. “Unexplained” peaks and valleys in the mutation rate provide hints on function for yet uncharacterized genomic regions and specific protein structural and functional features they code for. Some of these observations have immediate practical implications for the selection of target regions for PCR-based COVID-19 tests and for evaluating the risk of mutations in epitopes targeted by specific antibodies and vaccine design strategies.


2021 ◽  
Vol 33 (2) ◽  
pp. 157-165
Author(s):  
Xuanzong Guo ◽  
Uwe Ohler ◽  
Ferah Yildirim

Abstract Genetic variants associated with human diseases are often located outside the protein coding regions of the genome. Identification and functional characterization of the regulatory elements in the non-coding genome is therefore of crucial importance for understanding the consequences of genetic variation and the mechanisms of disease. The past decade has seen rapid progress in high-throughput analysis and mapping of chromatin accessibility, looping, structure, and occupancy by transcription factors, as well as epigenetic modifications, all of which contribute to the proper execution of regulatory functions in the non-coding genome. Here, we review the current technologies for the definition and functional validation of non-coding regulatory regions in the genome.


2004 ◽  
Vol 78 (12) ◽  
pp. 6666-6675 ◽  
Author(s):  
Han-Xin Lin ◽  
Luis Rubio ◽  
Ashleigh B. Smythe ◽  
Bryce W. Falk

ABSTRACT The structure and genetic diversity of a California Cucumber mosaic virus (CMV) population was assessed by single-strand conformation polymorphism and nucleotide sequence analyses of genomic regions 2b, CP, MP, and the 3′ nontranslated region of RNA3. The California CMV population exhibited low genetic diversity and was composed of one to three predominant haplotypes and a large number of minor haplotypes for specific genomic regions. Extremely low diversity and close evolutionary relationships among isolates in a subpopulation suggested that founder effects might play a role in shaping the genetic structure. Phylogenetic analysis indicated a naturally occurring reassortant between subgroup IA and IB isolates and potential reassortants between subgroup IA isolates, suggesting that genetic exchange by reassortment contributed to the evolution of the California CMV population. Analysis of various population genetics parameters and distribution of synonymous and nonsynonymous mutations revealed that different coding regions and even different parts of coding regions were under different evolutionary constraints, including a short region of the 2b gene for which evidence suggests possible positive selection.


2013 ◽  
Vol 94 (10) ◽  
pp. 2360-2365 ◽  
Author(s):  
Go Atsumi ◽  
Reiko Tomita ◽  
Kappei Kobayashi ◽  
Ken-Taro Sekine

Gentian Kobu-sho-associated virus (GKaV) is a recently discovered novel virus from Kobu-sho (a hyperplastic or tumorous disorder)-affected Japanese gentians. To obtain insight into GKaV transmission and pathogenesis, the genetic diversity of the virus in the putative helicase and RNA-dependent RNA polymerase coding regions was studied. The extent of GKaV sequence diversity within single host plants differed within samples and between viral genomic regions. Phylogenetic analysis of 30 Kobu-sho-affected samples from different production areas and host cultivars revealed that GKaV populations have diverged as they became prevalent in different geographical regions. The diversification of GKaV was shown to be driven by geographical isolation rather than host adaptation; however, no geographical patterns were found. Therefore, it was not feasible to trace the pathway of GKaV spread.


2016 ◽  
Author(s):  
Florian Massip ◽  
Michael Sheinman ◽  
Sophie Schbath ◽  
Peter F. Arndt

Since several decades, sequence alignment is a widely used tool in bioinformatics. For instance, finding homologous sequences with known function in large databases is used to get insight into the function of non-annotated genomic regions. Very efficient tools, like BLAST have been developed to identify and rank possible homologous sequences. To estimate the significance of the homology, the ranking of alignment scores takes a background model for random sequences into account. Using this model one can estimate the probability to find two exactly matching subsequences by chance in two unrelated sequences. The corresponding probability for two homologous sequences is much higher allowing to identify them. Here we focus on the distribution of lengths of exact sequence matches in protein coding regions pairs of evolutionary distant genomes. We show that this distribution exhibits a power-law tail with exponent α = —5. Developing a simple model of sequence evolution by substitutions and segmental duplications, we show analytically that paralogous and orthologous gene pairs contribute differently to this distribution. Our model explains the differences observed in the comparison of coding and non-coding parts of genomes, thus providing with a better understanding of statistical properties of genomic sequences and their evolution.


Author(s):  
Debaleena Bhowmik ◽  
Sourav Pal ◽  
Abhishake Lahiri ◽  
Arindam Talukdar ◽  
Sandip Paul

AbstractThis study explores the divergence pattern of SARS-CoV-2 using whole genome sequences of the isolates from various COVID-19 affected countries. The phylogenomic analysis indicates the presence of at least four distinct groups of the SARS-CoV-2 genomes. The emergent groups have been found to be associated with signature structural changes in specific proteins. Also, this study reveals the differential levels of divergence patterns for the protein coding regions. Moreover, we have predicted the impact of structural changes on a couple of important viral proteins via structural modelling techniques. This study further advocates for more viral genetic studies with associated clinical outcomes and hosts’ response for better understanding of SARS-CoV-2 pathogenesis enabling better mitigation of this pandemic situation.


Sign in / Sign up

Export Citation Format

Share Document