scholarly journals Stepwise evolution and exceptional conservation of ORF1a/b overlap in coronaviruses

2021 ◽  
Author(s):  
Han Mei ◽  
Anton Nekrutenko

The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for propagation of coronaviruses. A combination of genomic features that make up PFE--the overlap between the two reading frames, a slippery sequence, as well as an ensemble of complex secondary structure elements--puts severe constraints on this region as most possible nucleotide substitution may disrupt one or more of these elements. The vast amount of SARS-CoV-2 sequencing data generated within the past year provides an opportunity to assess evolutionary dynamics of PFE in great detail. Here we performed a comparative analysis of all available coronaviral genomic data available to date. We show that the overlap between ORF1a and b evolved as a set of discrete 7, 16, 22, 25, and 31 nucleotide stretches with a well defined phylogenetic specificity. We further examined sequencing data from over 350,000 complete genomes and 55,000 raw read datasets to demonstrate exceptional conservation of the PFE region.

Author(s):  
Han Mei ◽  
Sergei Kosakovsky Pond ◽  
Anton Nekrutenko

Abstract The programmed frameshift element (PFE) rerouting translation from ORF1a to ORF1b is essential for propagation of coronaviruses. The overlap between the two reading frames, a slippery sequence, and an ensemble of secondary structure elements places severe constraints on this region as most possible nucleotide substitution may disrupt one or more of these features. Here we performed a comparative analysis of all available coronaviral genomic data available to date to demonstrate exceptional conservation and detect signatures of selection within the PFE region.


2019 ◽  
Author(s):  
Kate Chkhaidze ◽  
Timon Heide ◽  
Benjamin Werner ◽  
Marc J. Williams ◽  
Weini Huang ◽  
...  

AbstractQuantification of the effect of spatial tumour sampling on the patterns of mutations detected in next-generation sequencing data is largely lacking. Here we use a spatial stochastic cellular automaton model of tumour growth that accounts for somatic mutations, selection, drift and spatial constrains, to simulate multi-region sequencing data derived from spatial sampling of a neoplasm. We show that the spatial structure of a solid cancer has a major impact on the detection of clonal selection and genetic drift from bulk sequencing data and single-cell sequencing data. Our results indicate that spatial constrains can introduce significant sampling biases when performing multi-region bulk sampling and that such bias becomes a major confounding factor for the measurement of the evolutionary dynamics of human tumours. We present a statistical inference framework that takes into account the spatial effects of a growing tumour and allows inferring the evolutionary dynamics from patient genomic data. Our analysis shows that measuring cancer evolution using next-generation sequencing while accounting for the numerous confounding factors requires a mechanistic model-based approach that captures the sources of noise in the data.SummarySequencing the DNA of cancer cells from human tumours has become one of the main tools to study cancer biology. However, sequencing data are complex and often difficult to interpret. In particular, the way in which the tissue is sampled and the data are collected, impact the interpretation of the results significantly. We argue that understanding cancer genomic data requires mathematical models and computer simulations that tell us what we expect the data to look like, with the aim of understanding the impact of confounding factors and biases in the data generation step. In this study, we develop a spatial simulation of tumour growth that also simulates the data generation process, and demonstrate that biases in the sampling step and current technological limitations severely impact the interpretation of the results. We then provide a statistical framework that can be used to overcome these biases and more robustly measure aspects of the biology of tumours from the data.


2015 ◽  
Author(s):  
Andrea Sottoriva ◽  
Trevor Graham

Despite extraordinary efforts to profile cancer genomes on a large scale, interpreting the vast amount of genomic data in the light of cancer evolution and in a clinically relevant manner remains challenging. Here we demonstrate that cancer next-generation sequencing data is dominated by the signature of growth governed by a power-law distribution of mutant allele frequencies. The power-law signature is common to multiple tumor types and is a consequence of the effectively-neutral evolutionary dynamics that underpin the evolution of a large proportion of cancers, giving rise to the abundance of mutations responsible for intra-tumor heterogeneity. Importantly, the law allows the measurement, in each individual cancer, of the in vivo mutation rate and the timing of mutations with remarkable precision. This result provides a new way to interpret cancer genomic data by considering the physics of tumor growth in a way that is both patient-specific and clinically relevant.


2021 ◽  
Vol 54 (1) ◽  
pp. 1-22
Author(s):  
Rayan Chikhi ◽  
Jan Holub ◽  
Paul Medvedev

The analysis of biological sequencing data has been one of the biggest applications of string algorithms. The approaches used in many such applications are based on the analysis of k -mers, which are short fixed-length strings present in a dataset. While these approaches are rather diverse, storing and querying a k -mer set has emerged as a shared underlying component. A set of k -mers has unique features and applications that, over the past 10 years, have resulted in many specialized approaches for its representation. In this survey, we give a unified presentation and comparison of the data structures that have been proposed to store and query a k -mer set. We hope this survey will serve as a resource for researchers in the field as well as make the area more accessible to researchers outside the field.


2020 ◽  
Vol 367 (11) ◽  
Author(s):  
Andrea Fasolo ◽  
Laura Treu ◽  
Piergiorgio Stevanato ◽  
Giuseppe Concheri ◽  
Stefano Campanaro ◽  
...  

ABSTRACT Microbial metabarcoding is the standard approach to assess communities’ diversity. However reports are often limited to simple OTU abundances for each phylum, giving rather one-dimensional views of microbial assemblages, overlooking other accessible aspects. The first is masked by databases incompleteness; OTU picking involves clustering at 97% (near-species) sequence identity, but different OTUs regularly end up under a same taxon name. When expressing diversity as number of obtained taxonomical names, a large portion of the real diversity lying within the data remains underestimated. Using the 16S sequencing results of an environmental transect across a gradient of 17 coastal habitats we first extracted the number of OTUs hidden under the same name. Further, we observed which was the deepest rank yielded by annotation, revealing for which microbial groups are we missing most knowledge. Data were then used to infer an evolutionary aspect: what is, in each phylum the success of the present time individuals (abundances for each OTU) in relation to their prior evolutionary success in differentiation (number of OTUs). This information reveals whether the past speciation/diversification force is matched by the present competitiveness in reproduction/persistence. The final layer explored is functional diversity, i.e. abundances of groups involved in specific environmental processes.


PeerJ ◽  
2018 ◽  
Vol 6 ◽  
pp. e4840 ◽  
Author(s):  
Kai Wei ◽  
Tingting Zhang ◽  
Lei Ma

Housekeeping genes are ubiquitously expressed and maintain basic cellular functions across tissue/cell type conditions. The present study aimed to develop a set of pig housekeeping genes and compare the structure, evolution and function of housekeeping genes in the human–pig lineage. By using RNA sequencing data, we identified 3,136 pig housekeeping genes. Compared with human housekeeping genes, we found that pig housekeeping genes were longer and subjected to slightly weaker purifying selection pressure and faster neutral evolution. Common housekeeping genes, shared by the two species, achieve stronger purifying selection than species-specific genes. However, pig- and human-specific housekeeping genes have similar functions. Some species-specific housekeeping genes have evolved independently to form similar protein active sites or structure, such as the classical catalytic serine–histidine–aspartate triad, implying that they have converged for maintaining the basic cellular function, which allows them to adapt to the environment. Human and pig housekeeping genes have varied structures and gene lists, but they have converged to maintain basic cellular functions essential for the existence of a cell, regardless of its specific role in the species. The results of our study shed light on the evolutionary dynamics of housekeeping genes.


2020 ◽  
Author(s):  
Sungsik Kong ◽  
Laura S. Kubatko

AbstractInterspecific hybridization is an important evolutionary phenomenon that generates genetic variability in a population and fosters species diversity in nature. The availability of large genome scale datasets has revolutionized hybridization studies to shift from the examination of the presence or absence of hybrids in nature to the investigation of the genomic constitution of hybrids and their genome-specific evolutionary dynamics. Although a handful of methods have been proposed in an attempt to identify hybrids, accurate detection of hybridization from genomic data remains a challenging task. The available methods can be classified broadly as site pattern frequency based and population genetic clustering approaches, though the performance of the two classes of methods under different hybridization scenarios has not been extensively examined. Here, we use simulated data to comparatively evaluate the performance of four tools that are commonly used to infer hybridization events: the site pattern frequency based methods HyDe and the D-statistic (i.e., the ABBA-BABA test), and the population clustering approaches structure and ADMIXTURE. We consider single hybridization scenarios that vary in the time of hybridization and the amount of incomplete lineage sorting (ILS) for different proportions of parental contributions (γ); introgressive hybridization; multiple hybridization scenarios; and a mixture of ancestral and recent hybridization scenarios. We focus on the statistical power to detect hybridization, the false discovery rate (FDR) for the D-statistic and HyDe, and the accuracy of the estimates of γ as measured by the mean squared error for HyDe, structure, and ADMIXTURE. Both HyDe and the D-statistic demonstrate a high level of detection power in all scenarios except those with high ILS, although the D-statistic often has an unacceptably high FDR. The estimates of γ in HyDe are impressively robust and accurate whereas structure and ADMIXTURE sometimes fail to identify hybrids, particularly when the proportional parental contributions are asymmetric (i.e., when γ is close to 0). Moreover, the posterior distribution estimated using structure exhibits multimodality in many scenarios, making interpretation difficult. Our results provide guidance in selecting appropriate methods for identifying hybrid populations from genomic data.


Blood ◽  
2020 ◽  
Vol 136 (Supplement 1) ◽  
pp. 37-37
Author(s):  
Kimberly Skead ◽  
Armande Ang Houle ◽  
Sagi Abelson ◽  
Marie-Julie Fave ◽  
Boxi Lin ◽  
...  

The age-associated accumulation of somatic mutations and large-scale structural variants (SVs) in the early hematopoietic hierarchy have been linked to premalignant stages for cancer and cardiovascular disease (CVD). However, only a small proportion of individuals harboring these mutations progress to disease, and mechanisms driving the transformation to malignancy remains unclear. Hematopoietic evolution, and cancer evolution more broadly, has largely been studied through a lens of adaptive evolution and the contribution of functionally neutral or mildly damaging mutations to early disease-associated clonal expansions has not been well characterised despite comprising the majority of the mutational burden in healthy or tumoural tissues. Through combining deep learning with population genetics, we interrogate the hematopoietic system to capture signatures of selection acting in healthy and pre-cancerous blood populations. Here, we leverage high-coverage sequencing data from healthy and pre-cancerous individuals from the European Prospective Investigation into Cancer and Nutrition Study (n=477) and dense genotyping from the Canadian Partnership for Tomorrow's Health (n=5,000) to show that blood rejects the paradigm of strictly adaptive or neutral evolution and is subject to pervasive negative selection. We observe clear age associations across hematopoietic populations and the dominant class of selection driving evolutionary dynamics acting at an individual level. We find that both the location and ratio of passenger to driver mutations are critical in determining if positive selection acting on driver mutations is able to overwhelm regulated hematopoiesis and allow clones harbouring disease-predisposing mutations to rise to dominance. Certain genes are enriched for passenger mutations in healthy individuals fitting purifying models of evolution, suggesting that the presence of passenger mutations in a subset of genes might confer a protective role against disease-predisposing clonal expansions. Finally, we find that the density of gene disruption events with known pathogenic associations in somatic SVs impacts the frequency at which the SV segregates in the population with variants displaying higher gene disruption density segregating at lower frequencies. Understanding how blood evolves towards malignancy will allow us to capture cancer in its earliest stages and identify events initiating departures from healthy blood evolution. Further, as the majority of mutations are passengers, studying their contribution to tumorigenesis, will unveil novel therapeutic targets thus enabling us to better understand patterns of clonal evolution in order to diagnose and treat disease in its infancy. Disclosures Dick: Bristol-Myers Squibb/Celgene: Research Funding.


2021 ◽  
Author(s):  
Juan F Cornejo-Franco ◽  
Francisco Flores ◽  
Dimitre Mollov ◽  
diego fernando quito-avila

Abstract The complete sequence of a new viral RNA from babaco (Vasconcellea x heilbornii) was determined. The genome consisted of 4,584 nucleotides organized in two non-overlapping open reading frames (ORFs 1 and 2), a 9-nt-long noncoding region (NCR) at the 5’ terminus and a 1,843 -nt-long NCR at the 3’ terminus. Sequence comparisons of ORF 2 revealed homology to the RNA-dependent-RNA-polymerase (RdRp) of several umbra- and umbra-related viruses. Phylogenetic analysis of the RdRp placed the new virus in a well-supported and cohesive clade that includes umbra-like viruses reported from papaya, citrus, opuntia, maize and sugarcane hosts. This clade shares a most recent ancestor with the umbraviruses but has different genomic features. The creation of a new genus, within the Tombusviridae, is proposed for the classification of these novel viruses.


2020 ◽  
Vol 7 ◽  
Author(s):  
Xiao Chen ◽  
Chundi Wang ◽  
Bo Pan ◽  
Borong Lu ◽  
Chao Li ◽  
...  

Peritrichs are one of the largest groups of ciliates with over 1,000 species described so far. However, their genomic features are largely unknown. By single-cell genomic sequencing, we acquired the genomic data of three sessilid peritrichs (Cothurnia ceramicola, Vaginicola sp., and Zoothamnium sp. 2). Using genomic data from another 53 ciliates including 14 peritrichs, we reconstructed their evolutionary relationships and confirmed genome skimming as an efficient approach for expanding sampling. In addition, we profiled the stop codon usage and programmed ribosomal frameshifting (PRF) events in peritrichs for the first time. Our analysis reveals no evidence of stop codon reassignment for peritrichs, but they have prevalent +1 or -1 PRF events. These genomic features are distinguishable from other ciliates, and our observations suggest a unique evolutionary strategy for peritrichs.


Sign in / Sign up

Export Citation Format

Share Document