scholarly journals Comparative genomics identifies thousands of candidate structured RNAs in human microbiomes

2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Brayon J. Fremin ◽  
Ami S. Bhatt

Abstract Background Structured RNAs play varied bioregulatory roles within microbes. To date, hundreds of candidate structured RNAs have been predicted using informatic approaches that search for motif structures in genomic sequence data. The human microbiome contains thousands of species and strains of microbes. Yet, much of the metagenomic data from the human microbiome remains unmined for structured RNA motifs primarily due to computational limitations. Results We sought to apply a large-scale, comparative genomics approach to these organisms to identify candidate structured RNAs. With a carefully constructed, though computationally intensive automated analysis, we identify 3161 conserved candidate structured RNAs in intergenic regions, as well as 2022 additional candidate structured RNAs that may overlap coding regions. We validate the RNA expression of 177 of these candidate structures by analyzing small fragment RNA-seq data from four human fecal samples. Conclusions This approach identifies a wide variety of candidate structured RNAs, including tmRNAs, antitoxins, and likely ribosome protein leaders, from a wide variety of taxa. Overall, our pipeline enables conservative predictions of thousands of novel candidate structured RNAs from human microbiomes.

2016 ◽  
Author(s):  
Shea N Gardner ◽  
Sasha K Ames ◽  
Maya B Gokhale ◽  
Tom R Slezak ◽  
Jonathan Allen

Software for rapid, accurate, and comprehensive microbial profiling of metagenomic sequence data on a desktop will play an important role in large scale clinical use of metagenomic data. Here we describe LMAT-ML (Livermore Metagenomics Analysis Toolkit-Marker Library) which can be run with 24 GB of DRAM memory, an amount available on many clusters, or with 16 GB DRAM plus a 24 GB low cost commodity flash drive (NVRAM), a cost effective alternative for desktop or laptop users. We compared results from LMAT with five other rapid, low-memory tools for metagenome analysis for 131 Human Microbiome Project samples, and assessed discordant calls with BLAST. All the tools except LMAT-ML reported overly specific or incorrect species and strain resolution of reads that were in fact much more widely conserved across species, genera, and even families. Several of the tools misclassified reads from synthetic or vector sequence as microbial or human reads as viral. We attribute the high numbers of false positive and false negative calls to a limited reference database with inadequate representation of known diversity. Our comparisons with real world samples show that LMAT-ML is the only tool tested that classifies the majority of reads, and does so with high accuracy.


Author(s):  
Brayon J. Fremin ◽  
Ami S. Bhatt

AbstractStructured RNAs play varied bioregulatory roles within microbes. To date, hundreds of candidate structured RNAs have been predicted using informatic approaches by searching for motif structures in genomic sequence data. However, only a subset of these candidate structured RNAs, those from culturable, well-studied microbes, have been shown to be transcribed. As the human microbiome contains thousands of species and strains of microbes, we sought to apply both informatic and experimental approaches to these organisms to identify novel transcribed structured RNAs. We combine an experimental approach, RNA-Seq, with an informatic approach, comparative genomics across the human microbiome project, to discover 1,085 candidate, conserved structured RNAs that are actively transcribed in human fecal microbiomes. These predictions include novel tracrRNAs that associate with Cas9 and RNA structures encoded in overlapping regions of the genome that are in opposing orientations. In summary, this combined experimental and computational approach enables the discovery of thousands of novel candidate structured RNAs.


Author(s):  
Anna Lavecchia ◽  
Matteo Chiara ◽  
Caterina De Virgilio ◽  
Caterina Manzari ◽  
Carlo Pazzani ◽  
...  

Abstract Staphylococcus cohnii (SC), a coagulase-negative bacterium, was first isolated in 1975 from human skin. Early phenotypic analyses led to the delineation of two subspecies (subsp.), Staphylococcus cohnii subsp. cohnii (SCC) and Staphylococcus cohnii subsp. urealyticus (SCU). SCC was considered to be specific to humans whereas SCU apparently demonstrated a wider host range, from lower primates to humans. The type strains ATCC 29974 and ATCC 49330 have been designated for SCC and SCU, respectively. Comparative analysis of 66 complete genome sequences—including a novel SC isolate—revealed unexpected patterns within the SC complex, both in terms of genomic sequence identity and gene content, highlighting the presence of 3 phylogenetically distinct groups. Based on our observations, and on the current guidelines for taxonomic classification for bacterial species, we propose a revision of the SC species complex. We suggest that SCC and SCU should be regarded as two distinct species: SC and SU (Staphylococcus urealyticus), and that two distinct subspecies, SCC and SCB (SC subsp. barensis, represented by the novel strain isolated in Bari) should be recognized within SC. Furthermore, since large scale comparative genomics studies recurrently suggest inconsistencies or conflicts in taxonomic assignments of bacterial species, we believe that the approach proposed here might be considered for more general application.


F1000Research ◽  
2017 ◽  
Vol 6 ◽  
pp. 2060
Author(s):  
Aleksandr Agafonov ◽  
Kimmo Mattila ◽  
Cuong Duong Tuan ◽  
Lars Tiede ◽  
Inge Alexander Raknes ◽  
...  

META-pipe is a complete service for the analysis of marine metagenomic data. It provides assembly of high-throughput sequence data, functional annotation of predicted genes, and taxonomic profiling. The functional annotation is computationally demanding and is therefore currently run on a high-performance computing cluster in Norway. However, additional compute resources are necessary to open the service to all ELIXIR users. We describe our approach for setting up and executing the functional analysis of META-pipe on additional academic and commercial clouds. Our goal is to provide a powerful analysis service that is easy to use and to maintain. Our design therefore uses a distributed architecture where we combine central servers with multiple distributed backends that execute the computationally intensive jobs. We believe our experiences developing and operating META-pipe provides a useful model for others that plan to provide a portal based data analysis service in ELIXIR and other organizations with geographically distributed compute and storage resources.


2019 ◽  
Vol 225 (3) ◽  
pp. 1355-1369 ◽  
Author(s):  
Erik J. M. Koenen ◽  
Dario I. Ojeda ◽  
Royce Steeves ◽  
Jérémy Migliore ◽  
Freek T. Bakker ◽  
...  

2019 ◽  
Author(s):  
Erik J.M. Koenen ◽  
Dario I. Ojeda ◽  
Royce Steeves ◽  
Jérémy Migliore ◽  
Freek T. Bakker ◽  
...  

AbstractThe consequences of the Cretaceous-Paleogene (K-Pg) boundary (KPB) mass extinction for the evolution of plant diversity are poorly understood, even although evolutionary turnover of plant lineages at the KPB is central to understanding the assembly of the Cenozoic biota. One aspect that has received considerable attention is the apparent concentration of whole genome duplication (WGD) events around the KPB, which may have played a role in survival and subsequent diversification of plant lineages. In order to gain new insights into the origins of Cenozoic biodiversity, we examine the origin and early evolution of the legume family, one of the most important angiosperm clades that rose to prominence after the KPB and for which multiple WGD events are found to have occurred early in its evolution. The legume family (Leguminosae or Fabaceae), with c. 20.000 species, is the third largest family of Angiospermae, and is globally widespread and second only to the grasses (Poaceae) in economic importance. Accordingly, it has been intensively studied in botanical, systematic and agronomic research, but a robust phylogenetic framework and timescale for legume evolution based on large-scale genomic sequence data is lacking, and key questions about the origin and early evolution of the family remain unresolved. We extend previous phylogenetic knowledge to gain insights into the early evolution of the family, analysing an alignment of 72 protein-coding chloroplast genes and a large set of nuclear genomic sequence data, sampling thousands of genes. We use a concatenation approach with heterogeneous models of sequence evolution to minimize inference artefacts, and evaluate support and conflict among individual nuclear gene trees with internode certainty calculations, a multi-species coalescent method, and phylogenetic supernetwork reconstruction. Using a set of 20 fossil calibrations we estimate a revised timeline of legume evolution based on a selection of genes that are both informative and evolving in an approximately clock-like fashion. We find that the root of the family is particularly difficult to resolve, with strong conflict among gene trees suggesting incomplete lineage sorting and/or reticulation. Mapping of duplications in gene family trees suggest that a WGD event occurred along the stem of the family and is shared by all legumes, with additional nested WGDs subtending subfamilies Papilionoideae and Detarioideae. We propose that the difficulty of resolving the root of the family is caused by a combination of ancient polyploidy and an alternation of long and very short internodes, shaped respectively by extinction and rapid divergence. Our results show that the crown age of the legumes dates back to the Maastrichtian or Paleocene and suggests that it is most likely close to the KPB. We conclude that the origin and early evolution of the legumes followed a complex history, in which multiple nested polyploidy events coupled with rapid diversification are associated with the mass extinction event at the KPB, ultimately underpinning the evolutionary success of the Leguminosae in the Cenozoic.


2015 ◽  
Vol 2015 ◽  
pp. 1-8 ◽  
Author(s):  
Akihito Kikuchi ◽  
Toshimichi Ikemura ◽  
Takashi Abe

With the remarkable increase in genomic sequence data from various organisms, novel tools are needed for comprehensive analyses of available big sequence data. We previously developed a Batch-Learning Self-Organizing Map (BLSOM), which can cluster genomic fragment sequences according to phylotype solely dependent on oligonucleotide composition and applied to genome and metagenomic studies. BLSOM is suitable for high-performance parallel-computing and can analyze big data simultaneously, but a large-scale BLSOM needs a large computational resource. We have developed Self-Compressing BLSOM (SC-BLSOM) for reduction of computation time, which allows us to carry out comprehensive analysis of big sequence data without the use of high-performance supercomputers. The strategy of SC-BLSOM is to hierarchically construct BLSOMs according to data class, such as phylotype. The first-layer BLSOM was constructed with each of the divided input data pieces that represents the data subclass, such as phylotype division, resulting in compression of the number of data pieces. The second BLSOM was constructed with a total of weight vectors obtained in the first-layer BLSOMs. We compared SC-BLSOM with the conventional BLSOM by analyzing bacterial genome sequences. SC-BLSOM could be constructed faster than BLSOM and cluster the sequences according to phylotype with high accuracy, showing the method’s suitability for efficient knowledge discovery from big sequence data.


2015 ◽  
Author(s):  
Sandeep J. Joseph ◽  
Ben Li ◽  
Robert A. Petit ◽  
Zhaohui S. Qin ◽  
Lyndsey A. Darrow ◽  
...  

AbstractMetagenome shotgun sequence projects offer the potential for large scale biogeographic analysis of microbial species. In this project we developed a method for detecting 33 common subtypes of the pathogenic bacterium Staphylococcus aureus. We used a binomial mixture model implemented in the binstrain software and the coverage counts at > 100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of each subtype in metagenome samples. Using this pipeline we were able to obtain > 87% sensitivity and > 94% specificity when testing on low genome coverage samples of diverse S. aureus strains (0.025X). We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage > 0.025. In both projects, CC8 and CC30 were the most common S. aureus subtypes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are risk factors for S. aureus but there was limited power to find factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city. We argue that pathogen detection in metagenome samples requires the use of subtypes based on whole species population genomic analysis rather than using ad hoc collections of reference strains.


2017 ◽  
Author(s):  
Stuart M. Brown ◽  
Yuhan Hao ◽  
Hao Chen ◽  
Bobby P. Laungani ◽  
Thahmina A. Ali ◽  
...  

AbstractBackgroundMetagenomic shotgun sequencing is becoming increasingly popular to study microbes associated with the human body and in environmental samples. A key goal of shotgun metagenomic sequencing is to identify gene functions and metabolic pathways that differ between samples or conditions. However, current methods to identify function in the large number of reads in a high-throughput sequence data file rely on the computationally intensive and low stringency approach of mapping each read to a generic database of proteins or reference microbial genomes.ResultsWe have developed an alternative analysis approach for shotgun metagenomic sequence data utilizing Bowtie2 DNA-DNA alignment of the reads to a database of well annotated genes compiled from human microbiome data. This method is rapid, and provides high stringency matches (>90% DNA sequence identity) of shotgun metagenomics reads to genes with annotated functions. We demonstrate the use of this method with synthetic data, Human Microbiome Project shotgun metagenomic data sets, and data from a study of liver disease. Differentially abundant KEGG gene functions can be detected in these experiments.ConclusionsFunctional annotation of metagenomic shotgun sequence reads can be accomplished by rapid DNA-DNA matching to a custom database of microbial sequences using the Bowtie2 sequence alignment tool. This method can be used for a variety of microbiome studies and allows functional analysis which is otherwise computationally demanding. This rapid annotation method is freely available as a Galaxy workflow within a Docker image.


2018 ◽  
Author(s):  
S. A. Kitchen ◽  
A. Ratan ◽  
O. C. Bedoya-Reina ◽  
R. Burhans ◽  
N. D. Fogarty ◽  
...  

ABSTRACTGenomic sequence data for non-model organisms are increasingly available requiring the development of efficient and reproducible workflows. Here, we develop the first genomic resources and reproducible workflows for two threatened members of the reef-building coral genus Acropora. We generated genomic sequence data from multiple samples of the Caribbean A. cervicornis (staghorn coral) and A. palmata (elkhorn coral), and predicted millions of nucleotide variants among these two species and the Pacific A. digitifera. A subset of predicted nucleotide variants were verified using restriction length polymorphism assays and proved useful in distinguishing the two Caribbean Acroporids and the hybrid they form (“A. prolifera”). Nucleotide variants are freely available from the Galaxy server (usegalaxy.org), and can be analyzed there with computational tools and stored workflows that require only an internet browser. We describe these data and some of the analysis tools, concentrating on fixed differences between A. cervicornis and A. palmata. In particular, we found that fixed amino acid differences between these two species were enriched in proteins associated with development, cellular stress response and the host’s interactions with associated microbes, for instance in the Wnt pathway, ABC transporters and superoxide dismutase. Identified candidate genes may underlie functional differences in the way these threatened species respond to changing environments. Users can expand the presented analyses easily by adding genomic data from additional species as they become available.Article SummaryWe provide the first comprehensive genomic resources for two threatened Caribbean reef-building corals in the genus Acropora. We identified genetic differences in key pathways and genes known to be important in the animals’ response to the environmental disturbances and larval development. We further provide a list of candidate loci for large scale genotyping of these species to gather intra- and interspecies differences between A. cervicornis and A. palmata across their geographic range. All analyses and workflows are made available and can be used as a resource to not only analyze these corals but other non-model organisms.


Sign in / Sign up

Export Citation Format

Share Document