scholarly journals Expansion of known ssRNA phage genomes: From tens to over a thousand

2020 ◽  
Vol 6 (6) ◽  
pp. eaay5981 ◽  
Author(s):  
J. Callanan ◽  
S. R. Stockdale ◽  
A. Shkoporov ◽  
L. A. Draper ◽  
R. P. Ross ◽  
...  

The first sequenced genome was that of the 3569-nucleotide single-stranded RNA (ssRNA) bacteriophage MS2. Despite the recent accumulation of vast amounts of DNA and RNA sequence data, only 12 representative ssRNA phage genome sequences are available from the NCBI Genome database (June 2019). The difficulty in detecting RNA phages in metagenomic datasets raises questions as to their abundance, taxonomic structure, and ecological importance. In this study, we iteratively applied profile hidden Markov models to detect conserved ssRNA phage proteins in 82 publicly available metatranscriptomic datasets generated from activated sludge and aquatic environments. We identified 15,611 nonredundant ssRNA phage sequences, including 1015 near-complete genomes. This expansion in the number of known sequences enabled us to complete a phylogenetic assessment of both sequences identified in this study and known ssRNA phage genomes. Our expansion of these viruses from two environments suggests that they have been overlooked within microbiome studies.

mBio ◽  
2020 ◽  
Vol 11 (5) ◽  
Author(s):  
Ignacio de la Higuera ◽  
George W. Kasun ◽  
Ellis L. Torrance ◽  
Alyssa A. Pratt ◽  
Amberlee Maluenda ◽  
...  

ABSTRACT The discovery of cruciviruses revealed the most explicit example of a common protein homologue between DNA and RNA viruses to date. Cruciviruses are a novel group of circular Rep-encoding single-stranded DNA (ssDNA) (CRESS-DNA) viruses that encode capsid proteins that are most closely related to those encoded by RNA viruses in the family Tombusviridae. The apparent chimeric nature of the two core proteins encoded by crucivirus genomes suggests horizontal gene transfer of capsid genes between DNA and RNA viruses. Here, we identified and characterized 451 new crucivirus genomes and 10 capsid-encoding circular genetic elements through de novo assembly and mining of metagenomic data. These genomes are highly diverse, as demonstrated by sequence comparisons and phylogenetic analysis of subsets of the protein sequences they encode. Most of the variation is reflected in the replication-associated protein (Rep) sequences, and much of the sequence diversity appears to be due to recombination. Our results suggest that recombination tends to occur more frequently among groups of cruciviruses with relatively similar capsid proteins and that the exchange of Rep protein domains between cruciviruses is rarer than intergenic recombination. Additionally, we suggest members of the stramenopiles/alveolates/Rhizaria supergroup as possible crucivirus hosts. Altogether, we provide a comprehensive and descriptive characterization of cruciviruses. IMPORTANCE Viruses are the most abundant biological entities on Earth. In addition to their impact on animal and plant health, viruses have important roles in ecosystem dynamics as well as in the evolution of the biosphere. Circular Rep-encoding single-stranded (CRESS) DNA viruses are ubiquitous in nature, many are agriculturally important, and they appear to have multiple origins from prokaryotic plasmids. A subset of CRESS-DNA viruses, the cruciviruses, have homologues of capsid proteins encoded by RNA viruses. The genetic structure of cruciviruses attests to the transfer of capsid genes between disparate groups of viruses. However, the evolutionary history of cruciviruses is still unclear. By collecting and analyzing cruciviral sequence data, we provide a deeper insight into the evolutionary intricacies of cruciviruses. Our results reveal an unexpected diversity of this virus group, with frequent recombination as an important determinant of variability.


2004 ◽  
Vol 85 (1) ◽  
pp. 45-48 ◽  
Author(s):  
Linda M. Kohn

Astract Phylogenetic or genealogical interpretation of DNA sequence data from multiple genomic regions has become the gold standard for species delimitation and population genetics. Precise species concepts can inform quarantine decisions but are likely to reflect evolutionary events too far in the past to impact disease management. On the other hand, multilocus approaches at the population level can identify patterns of endemism or migration directly associated with episodes of disease, including host shifts and associated changes in determinants of pathogenicity and avirulence. We used the genome database of Magnaporthe grisea to frame a comparative, multilocus genomics approach from which we demonstrate a single origin for rice infecting genotypes with concomitant loss of sex in pandemic clonal lineages, and patterns of gain and loss of avirulence genes. In the Sclerotinia sclerotiorum pathosystem, we identified significant associations of multilocus haplotypes with specific pathogen populations in North America. Following the introduction of a new crop, endemic pathogen genotypes and newly evolved migrant genotypes caused novel, early-season symptoms.


2018 ◽  
Vol 30 (1) ◽  
pp. 216-236
Author(s):  
Rasmus Troelsgaard ◽  
Lars Kai Hansen

Model-based classification of sequence data using a set of hidden Markov models is a well-known technique. The involved score function, which is often based on the class-conditional likelihood, can, however, be computationally demanding, especially for long data sequences. Inspired by recent theoretical advances in spectral learning of hidden Markov models, we propose a score function based on third-order moments. In particular, we propose to use the Kullback-Leibler divergence between theoretical and empirical third-order moments for classification of sequence data with discrete observations. The proposed method provides lower computational complexity at classification time than the usual likelihood-based methods. In order to demonstrate the properties of the proposed method, we perform classification of both simulated data and empirical data from a human activity recognition study.


2010 ◽  
Vol 10 (4-6) ◽  
pp. 449-464
Author(s):  
HENNING CHRISTIANSEN ◽  
CHRISTIAN THEIL HAVE ◽  
OLE TORP LASSEN ◽  
MATTHIEU PETIT

AbstractA Hidden Markov Model (HMM) is a common statistical model which is widely used for analysis of biological sequence data and other sequential phenomena. In the present paper we show how HMMs can be extended with side-constraints and present constraint solving techniques for efficient inference. Defining HMMs with side-constraints in Constraint Logic Programming has advantages in terms of more compact expression and pruning opportunities during inference. We present a PRISM-based framework for extending HMMs with side-constraints and show how well-known constraints such as cardinality and all_different are integrated. We experimentally validate our approach on the biologically motivated problem of global pairwise alignment.


2017 ◽  
Author(s):  
Scott A. Funkhouser ◽  
Juan P. Steibel ◽  
Ronald O. Bates ◽  
Nancy E. Raney ◽  
Darius Schenk ◽  
...  

AbstractBackgroundRNA editing by ADAR (adenosine deaminase acting on RNA) proteins is a form of transcriptional regulation that is widespread among humans and other primates. Based on high-throughput scans used to identify putative RNA editing sites, ADAR appears to catalyze a substantial number of adenosine to inosine transitions within repetitive regions of the primate transcriptome, thereby dramatically enhancing genetic variation beyond what is encoded in the genome.ResultsHere, we demonstrate the editing potential of the pig transcriptome by utilizing DNA and RNA sequence data from the same pig. We identified a total of 8550 mismatches between DNA and RNA sequences across three tissues, with 75% of these exhibiting an A-to-G (DNA to RNA) discrepancy, indicative of a canonical ADAR-catalyzed RNA editing event. When we consider only mismatches within repetitive regions of the genome, the A-to-G percentage increases to 94%, with the majority of these located within the swine specific SINE retrotransposon PRE-1. We also observe evidence of A-to-G editing within coding regions that were previously verified in primates.ConclusionsThus, our high-throughput evidence suggests that pervasive RNA editing by ADAR can exist outside of the primate lineage to dramatically enhance genetic variation in pigs.


2021 ◽  
Author(s):  
Erich Kucs ◽  
Peter Schönswetter ◽  
Gerald M. Schneeweiss

AbstractDraba (Brassicaeae), a model group for diversification and evolution in Arctic and mountain habitats, is taxonomically challenging and many of its species are insufficiently investigated. One such species is D. pacheri, an endemic of the eastern European Alps and the western Carpathians (here presumably extinct). Several hypotheses exist with respect to the phylogenetic position and the taxonomy of this species, but none of these has ever been tested using molecular data. In this article we examine (i) DNA sequence data to assess the phylogenetic position of D. pacheri within the genus and (ii) AFLP fingerprint data as well as morphometric data to address whether this species can be divided taxonomically into species or subspecies. DNA sequence data firmly place D. pacheri within the Core Draba Group III, whose internal relationships are, however, insufficiently resolved to precisely identify the closest relative of D. pacheri. AFLP data identify several genetically divergent lineages corresponding to geographically distinct regions. Although these lineages are congruent with hypotheses distinguishing either two species (D. pacheri s. str., D. norica) or one species with several subspecies, the lack of clear morphological separation, both with respect to the entire set of traits and single presumably diagnostic characters such as trichome morphology, renders recognition of a single species D. pacheri, as suggested previously, the best taxonomic solution. The deep and geographically strongly structured splits of D. pacheri likely are the result of isolation in several Pleistocene refugia and warrant that conservation efforts should involve populations from each of the main geographic subgroups.


2017 ◽  
Author(s):  
Harun Mustafa ◽  
André Kahles ◽  
Mikhail Karasikov ◽  
Gunnar Rätsch

AbstractMuch of the DNA and RNA sequencing data available is in the form of high-throughput sequencing (HTS) reads and is currently unindexed by established sequence search databases. Recent succinct data structures for indexing both reference sequences and HTS data, along with associated metadata, have been based on either hashing or graph models, but many of these structures are static in nature, and thus, not well-suited as backends for dynamic databases.We propose a parallel construction method for and novel application of the wavelet trie as a dynamic data structure for compressing and indexing graph metadata. By developing an algorithm for merging wavelet tries, we are able to construct large tries in parallel by merging smaller tries constructed concurrently from batches of data.When compared against general compression algorithms and those developed specifically for graph colors (VARI and Rainbowfish), our method achieves compression ratios superior to gzip and VARI, converging to compression ratios of 6.5% to 2% on data sets constructed from over 600 virus genomes.While marginally worse than compression by bzip2 or Rainbowfish, this structure allows for both fast extension and query. We also found that additionally encoding graph topology metadata improved compression ratios, particularly on data sets consisting of several mutually-exclusive reference genomes.It was also observed that the compression ratio of wavelet tries grew sublinearly with the density of the annotation matrices.This work is a significant step towards implementing a dynamic data structure for indexing large annotated sequence data sets that supports fast query and update operations. At the time of writing, no established standard tool has filled this niche.


Viruses ◽  
2018 ◽  
Vol 10 (9) ◽  
pp. 506 ◽  
Author(s):  
Jean-Michel Claverie ◽  
Chantal Abergel

Since 1998, when Jim van Etten’s team initiated its characterization, Paramecium bursaria Chlorella virus 1 (PBCV-1) had been the largest known DNA virus, both in terms of particle size and genome complexity. In 2003, the Acanthamoeba-infecting Mimivirus unexpectedly superseded PBCV-1, opening the era of giant viruses, i.e., with virions large enough to be visible by light microscopy and genomes encoding more proteins than many bacteria. During the following 15 years, the isolation of many Mimivirus relatives has made Mimiviridae one of the largest and most diverse families of eukaryotic viruses, most of which have been isolated from aquatic environments. Metagenomic studies of various ecosystems (including soils) suggest that many more remain to be isolated. As Mimiviridae members are found to infect an increasing range of phytoplankton species, their taxonomic position compared to the traditional Phycodnaviridae (i.e., etymologically “algal viruses”) became a source of confusion in the literature. Following a quick historical review of the key discoveries that established the Mimiviridae family, we describe its current taxonomic structure and propose a set of operational criteria to help in the classification of future isolates.


2021 ◽  
Author(s):  
Antonio Pedro Camargo ◽  
Rafael Soares Correa de Souza ◽  
Juliana Jose ◽  
Isabel Rodrigues Gerhardt ◽  
Ricardo Augusto Dante ◽  
...  

The substrates of the Brazilian campos rupestres have extremely low concentrations of key nutrients, mainly phosphorus, imposing severe restrictions to plant growth. Regardless, this ecosystem harbors enormous biodiversity which raises the question of how nutrients are cycled and acquired by the biosphere. To uncover the nutrient turnover potential of plant-associated microorganisms in the campos rupestres, we investigated the compositions and functions of microbiomes associated with two species of the Velloziaceae family that grow over distinct substrates (soil and rock). Amplicon, metagenomic, and metagenome-assembled genome sequence data showed that the campos rupestres harbor a novel assemblage of plant-associated prokaryotes and fungi. Compositional analysis revealed that the plant-associated soil and rock communities differed in taxonomic structure but shared a core of highly efficient colonizers that were strongly coupled with nutrient mobilization. Investigation of functional and abundance data revealed that the plant hosts actively recruit communities by exuding organic compounds and that the root-associated microbiomes possess a diverse repertoire of phosphorus turnover mechanisms. We also showed that the microbiomes of both plant species encompass novel populations capable of mobilizing nitrogen and that the substrate strongly influences the dynamics of this cycle. Our results show that the interplay between plants and their microbiomes shapes nutrient turnover in the campos rupestres. We highlight that investigation of microbial diversity is fundamental to understand plant fitness in stressful environments.


2016 ◽  
Vol 67 (3) ◽  
pp. 380 ◽  
Author(s):  
Michael Shackleton ◽  
Gavin N. Rees

Identification of macroinvertebrates is a key component of monitoring programs that seek to understand the condition of aquatic environments. Classical identification approaches underpin such programs, but molecular approaches are gaining recognition as valuable ways to identify organisms for research and monitoring programs. We applied DNA barcoding data to specimens collected as part of monitoring programs in the Murray–Darling Basin, to investigate the possible informational benefits these data may provide. We also tested the performances of two online DNA databases in assigning taxon names to our sequence data. We found that relying on the online databases to determine species identifications was currently problematic for the Australian freshwater fauna because of a lack of available sequence data. However, we also found that collecting and applying barcode data to our monitoring programs gave considerable informational benefits by providing greater resolution of specimen identity, highlighting the presence of potential cryptic species, providing information on larval and adult associations, demonstrating instances where misidentification had occurred though classical approaches, and providing conformation of the performance of diagnostic characters currently used in keys to determine species identities.


Sign in / Sign up

Export Citation Format

Share Document