scholarly journals Strain- and plasmid-level deconvolution of a synthetic metagenome by sequencing proximity ligation products

Author(s):  
Christopher W. Beitel ◽  
Lutz Froenicke ◽  
Jenna M. Lang ◽  
Ian F. Korf ◽  
Richard W. Michelmore ◽  
...  

Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of “binning” the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.

2014 ◽  
Author(s):  
Christopher W. Beitel ◽  
Lutz Froenicke ◽  
Jenna M. Lang ◽  
Ian F. Korf ◽  
Richard W. Michelmore ◽  
...  

Metagenomics is a valuable tool for the study of microbial communities but has been limited by the difficulty of “binning” the resulting sequences into groups corresponding to the individual species and strains that constitute the community. Moreover, there are presently no methods to track the flow of mobile DNA elements such as plasmids through communities or to determine which of these are co-localized within the same cell. We address these limitations by applying Hi-C, a technology originally designed for the study of three-dimensional genome structure in eukaryotes, to measure the cellular co-localization of DNA sequences. We leveraged Hi-C data generated from a synthetic metagenome sample to accurately cluster metagenome assembly contigs into groups that contain nearly complete genomes of each species. The Hi-C data also reliably associated plasmids with the chromosomes of their host and with each other. We further demonstrated that Hi-C data provides a long-range signal of strain-specific genotypes, indicating such data may be useful for high-resolution genotyping of microbial populations. Our work demonstrates that Hi-C sequencing data provide valuable information for metagenome analyses that are not currently obtainable by other methods. This metagenomic Hi-C method could facilitate future studies of the fine-scale population structure of microbes, as well as studies of how antibiotic resistance plasmids (or other genetic elements) mobilize in microbial communities. The method is not limited to microbiology; the genetic architecture of other heterogeneous populations of cells could also be studied with this technique.


2021 ◽  
Author(s):  
Barış Ekim ◽  
Bonnie Berger ◽  
Rayan Chikhi

DNA sequencing data continues to progress towards longer reads with increasingly lower sequencing error rates. We focus on the problem of assembling such reads into genomes, which poses challenges in terms of accuracy and computational resources when using cutting-edge assembly approaches, e.g. those based on overlapping reads using minimizer sketches. Here, we introduce the concept of minimizer-space sequencing data analysis, where the minimizers rather than DNA nucleotides are the atomic tokens of the alphabet. By projecting DNA sequences into ordered lists of minimizers, our key idea is to enumerate what we call k-min-mers, that are k-mers over a larger alphabet consisting of minimizer tokens. Our approach, mdBG or minimizer-dBG, achieves orders-of magnitude improvement in both speed and memory usage over existing methods without much loss of accuracy. We demonstrate three uses cases of mdBG: human genome assembly, metagenome assembly, and the representation of large pangenomes. For assembly, we implemented mdBG in software we call rust-mdbg, resulting in ultra-fast, low memory and highly-contiguous assembly of PacBio HiFi reads. A human genome is assembled in under 10 minutes using 8 cores and 10 GB RAM, and 60 Gbp of metagenome reads are assembled in 4 minutes using 1 GB RAM. For pangenome graphs, we newly allow a graphical representation of a collection of 661,405 bacterial genomes as an mdBG and successfully search it (in minimizer-space) for anti-microbial resistance (AMR) genes. We expect our advances to be essential to sequence analysis, given the rise of long-read sequencing in genomics, metagenomics and pangenomics.


2021 ◽  
Author(s):  
Chen Yang ◽  
Theodora Lo ◽  
Ka Ming Nip ◽  
Saber Hafezqorani ◽  
René L Warren ◽  
...  

Abstract Background: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, non-uniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical tools, such as microbial abundance estimation and metagenome assembly algorithms. When developing and testing bioinformatics tools and pipelines, the use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to provide a ground truth and assess the performance in a controlled environment. Results: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes, and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. Conclusions: The Meta-NanoSim characterization module investigates read features including chimeric information and abundance levels, while the simulation module simulates large and complex multi-sample microbial communities with different abundance profiles. All trained models and the software are freely accessible at Github: https://github.com/bcgsc/NanoSim .


2019 ◽  
Author(s):  
Jacqueline Jufen Zhu ◽  
Zofia Parteka ◽  
Byoungkoo Lee ◽  
Przemyslaw Szalaj ◽  
Ping Wang ◽  
...  

AbstractThe three-dimensional genome structure plays a fundamental role in gene regulation and cellular functions. Recent studies in genomics based on sequencing technologies inferred the very basic functional chromatin folding structures of the genome known as chromatin loops, the long-range chromatin interactions that are often mediated by protein factors. To visualize the looping structure of chromatin we applied super-resolution microscopy iPALM to image a specific chromatin loop in GM12878 cells. Totally, we have generated six images of the target chromatin region at the single molecule resolution. To infer the chromatin structures from the captured images, we modeled them as looping conformations using different computational algorithms and then evaluated the models by comparing with Hi-C data to examine the concordance. The results showed a good correlation between the imaging data and sequencing data, suggesting the visualization of higher-order chromatin structures for the very short genomic segments can be realized by microscopic imaging.


Author(s):  
Lucy M. McCully ◽  
Jasmine Graslie ◽  
Alana R. McGraw ◽  
Adam S. Bitzer ◽  
Auður M. Sigurbjörnsdóttir ◽  
...  

Within soil, bacteria are found in multi-species communities, where interactions can lead to emergent community properties. Studying bacteria in a social context is critical for investigation of community-level functions. We previously showed that co-cultured Pseudomonas fluorescens Pf0-1 and Pedobacter sp. V48 engage in interspecies social spreading (ISS) on a hard agar surface, a behavior which required close contact and depended on the nutritional environment. Here, we investigate whether social spreading is widespread among P. fluorescens and Pedobacter isolates, and whether the requirements for interaction vary. We find that this phenotype is not restricted to the interaction between P. fluorescens Pf0-1 and Pedobacter sp. V48, but is a prevalent behavior found in one clade in the P. fluorescens group and two clades in the Pedobacter genus. We show that the interaction with certain Pedobacter isolates occurred without close contact, indicating induction of spreading by a putative diffusible signal. As with ISS by Pf0-1+V48, motility of interacting pairs is influenced by the environment, with no spreading behaviors (or induction of motility) observed under high nutrient conditions. While Pf0-1+V48 require low nutrient but high NaCl conditions, in the broader range of interacting pairs the high salt influence was variable. The prevalence of motility phenotypes observed here and found within the literature indicates that community-induced locomotion in general, and social spreading in particular, is likely important within the environment. It is crucial that we continue to study microbial interactions and their emergent properties to gain a fuller understanding of the functions of microbial communities. Importance Interspecies social spreading (ISS) is an emergent behavior observed when P. fluorescens Pf0-1 and Pedobacter sp. V48 interact, during which both species move together across a surface. Importantly, this environment does not permit movement of either individual species. This group behavior suggests that communities of microbes can function in ways not predictable by knowledge of the individual members. Here we have asked whether ISS is widespread and thus potentially of importance in soil microbial communities. The significance of this research is the demonstration that surface spreading behaviors are not unique to the Pf0-1-V48 interaction, but rather is a more widespread phenomenon observed among members of distinct clades of both P. fluorescens and Pedobacter isolates. Further, we identify differences in mechanism of signaling and nutritional requirements for ISS. Emergent traits resulting from bacterial interactions are widespread and their characterization is necessary for a complete understanding of microbial community function.


2019 ◽  
Author(s):  
J. Yuyang Lu ◽  
Lei Chang ◽  
Tong Li ◽  
Ting Wang ◽  
Yafei Yin ◽  
...  

SUMMARYDespite extensive mapping of three-dimensional (3D) chromatin structures, the basic principles underlying genome folding remain unknown. Here, we report a fundamental role for L1 and B1 retrotransposons in shaping the macroscopic 3D genome structure. Homotypic clustering of B1 and L1 repeats in the nuclear interior or at the nuclear and nucleolar peripheries, respectively, segregates the genome into mutually exclusive nuclear compartments. This spatial segregation of L1 and B1 is conserved in mouse and human cells, and occurs dynamically during establishment of the 3D chromatin structure in early embryogenesis and the cell cycle. Depletion of L1 transcripts drastically disrupts the spatial distributions of L1- and B1-rich compartments. L1 transcripts are strongly associated with L1 DNA sequences and induce phase separation of the heterochromatin protein HP1α. Our results suggest that genomic repeats act as the blueprint of chromatin macrostructure, thus explaining the conserved higher-order structure of chromatin across mammalian cells.


2018 ◽  
Author(s):  
Alexander A. Boulgakov ◽  
Erhu Xiong ◽  
Sanchita Bhadra ◽  
Andrew D. Ellington ◽  
Edward M. Marcotte

AbstractWe extend the concept of DNA proximity ligation from a single readout per oligonucleotide pair to multiple reversible, iterative ligations re-using the same oligonucleotide molecules. Using iterative proximity ligation (IPL), we can in principle capture multiple ligation events between each oligonucleotide and its various neighbors and thus recover a far richer knowledge about their relative positions than single, irreversible ligation events. IPL would thus act to sample and record local molecular neighborhoods. By integrating a unique DNA barcode into each participating oligonucleotide, we can catalog the individual ligation events and thus capture the positional information contained therein in a high throughput manner using next-generation DNA sequencing. We propose that by interpreting IPL sequencing results in the context of graph theory and by applying spring layout algorithms, we can recover geometric patterns of objects labeled by DNA. Using simulations, we demonstrate that we can in principle recover letter patterns photolithographed onto slide surfaces using only IPL sequencing data, illustrating how our technique maps complex spatial configurations into DNA sequences and then – using only this sequence information – recovers them. We complement our theoretical work with an experimental proof-of-concept of iterative proximity ligation on an oligonucleotide population.


2021 ◽  
Author(s):  
Andreas Schneider ◽  
John Sundh ◽  
Görel Sundström ◽  
Kerstin Richau ◽  
Nicolas Delhomme ◽  
...  

<p>Microbial communities are major players in carbon and nitrogen cycling globally and are of particular importance for plant communities in the nutrient poor soils of boreal forests. Especially relevant are the fungal communities in the soil that interact with the plants in multiple ways, indirectly through their pivotal role in the breakdown of organic matter and, more directly, through mycorrhizal symbiosis with plant roots. Large-scale disturbances of these complex microbial communities can lead to shifts in soil carbon storage with unknown and global-scale long-term consequences. To understand the dynamics of these communities and their relationship to associated plants in response to climate change and anthropogenic influence, we need a better understanding of how modern “omics” methods can help us to understand compositional and functional shifts of these microbiomes. Microbial gene expression and functional activity can be assayed with RNA sequencing (RNA-Seq) data from environmental samples. In contrast, currently phylogenetic marker gene amplicon sequencing data is generally used to assess taxonomic composition and community structure of the microbiome. Few studies have considered how much of this structural and taxonomic information is included in RNA-Seq transcriptomic data from matched samples. Here we describe fungal communities using both RNA-Seq and fungal ITS1 DNA amplicon sequencing to compare the outcomes between the methods. We used a panel of root and needle samples from mature stands of the coniferous tree species Picea abies (Norway spruce) growing in untreated (nutrient deficient) and nutrient enriched plots at the Flakaliden forest research site in boreal northern Sweden. We created an assembly-based, reproducible and hardware agnostic workflow to taxonomically and functionally annotate fungal RNA-Seq data obtained from Norway spruce roots, which we compared to matching ITS amplicon sequencing data.<strong> </strong>We show that the community structure indicated by the fungal transcriptome is in agreement with that generated by the ITS data, while also identifying limitations imposed by current database coverage. Furthermore, we show examples to demonstrate how metatranscriptomics data additionally provides biologically informative functional insight at the community and individual species level. These findings highlight the potential of metatranscriptomics to advance our understanding of interaction, response and effect both between host plants and their associated microbial communities, and among the members of microbial communities in environmental samples in general.</p>


2016 ◽  
Author(s):  
Serghei Mangul ◽  
David Koslicki

ABSTRACTMicrobial communities inhabiting the human body exhibit significant variability across different individuals and tissues, and are suggested to play an important role in health and disease. High-throughput sequencing offers unprecedented possibilities to profile microbial community composition, but limitations of existing taxonomic classification methods (including incompleteness of existing microbial reference databases) limits the ability to accurately compare microbial communities across different samples. In this paper, we present a method able to overcome these limitations by circumventing the classification step and directly using the sequencing data to compare microbial communities. The proposed method provides a powerful reference-free way to assess differences in microbial abundances across samples. This method, called EMDeBruijn, condenses the sequencing data into a de Bruijn graph. The Earth Mover's Distance (EMD) is then used to measure similarities and differences of the microbial communities associated with the individual graphs. We apply this method to RNA-Seq data sets from a coronary artery calcification (CAC) study and shown that EMDeBruijn is able to differentiate between case and control CAC samples while utilizing all the candidate microbial reads. We compare these results to current reference-based methods, which are shown to have a limited capacity to discriminate between case and control samples. We conclude that this reference-free approach is a viable choice in comparative metatranscriptomic studies.


Author(s):  
B. Carragher ◽  
M. Whittaker

Techniques for three-dimensional reconstruction of macromolecular complexes from electron micrographs have been successfully used for many years. These include methods which take advantage of the natural symmetry properties of the structure (for example helical or icosahedral) as well as those that use single axis or other tilting geometries to reconstruct from a set of projection images. These techniques have traditionally relied on a very experienced operator to manually perform the often numerous and time consuming steps required to obtain the final reconstruction. While the guidance and oversight of an experienced and critical operator will always be an essential component of these techniques, recent advances in computer technology, microprocessor controlled microscopes and the availability of high quality CCD cameras have provided the means to automate many of the individual steps.During the acquisition of data automation provides benefits not only in terms of convenience and time saving but also in circumstances where manual procedures limit the quality of the final reconstruction.


Sign in / Sign up

Export Citation Format

Share Document