scholarly journals Electronic Mapping of a Bacterial Genome with Dual Solid-State Nanopores and Active Single-Molecule Control

2021 ◽  
Author(s):  
Arthur Rand ◽  
Philip Zimny ◽  
Roland Nagel ◽  
Chaitra Telang ◽  
Justin Mollison ◽  
...  

We present the first electronic mapping of a bacterial genome using solid-state nanopore technology. A dual-nanopore architecture and active control logic are used to produce single-molecule data that enables estimation of distances between physical tags installed at sequence motifs within double-stranded DNA (dsDNA). Previously developed dual-pore "DNA flossing" control generates multiple scans of tagged regions of each captured DNA. The control logic was extended here in two ways: first, to automate "zooming out" on each molecule to progressively increase the number of tags scanned during DNA flossing; and second, to automate recapture of a molecule that exited flossing to enable interrogation of the same and/or different regions of the molecule. New analysis methods were developed to produce consensus alignments from each multi-scan event. The combined multi-scanning and multi-capture method was applied to the challenge of mapping from a heterogeneous mixture of single-molecule fragments that make up the Escherichia coli (E. coli) chromosome. Coverage of 3.1X across 2,355 resolvable sites (68% of reference sites) of the E. coli genome was achieved after 5.6 hours of recording time. The recapture method showed a 38% increase in the merged-event alignment length compared to single-scan alignments. The observed inter-tag resolution was 150 bp in engineered DNA molecules and 166 bp natively within fragments of E. coli DNA, with detection of 133 inter-site intervals shorter than 200 bp in the E. coli reference map. Proof of concept results on estimating distances in repetitive regions of the E. coli genome are also provided. With an appropriately designed array and future refinements to the control logic, higher throughput implementations can enable human-sized genome and epigenome mapping applications.

2021 ◽  
Author(s):  
Yusuke Takahashi ◽  
Massa Shoura ◽  
Andrew Fire ◽  
Shinichi Morishita

Abstract BackgroundSingle molecule measurements of DNA polymerization kinetics provide a sensitive means to detect both secondary structures in DNA and deviations from primary chemical structure as a result of modified bases. In one approach to such analysis, deviations can be inferred by monitoring the behavior of DNA polymerase using single-molecule, real-time sequencing with zero-mode waveguide. This approach measures the time between fluorescence pulse signals from consecutive nucleosides incorporated during DNA replication, called the interpulse duration (IPD). ResultsIn this paper we present an analysis of loci with high IPDs in two genomes, a bacterial genome (E. coli) and a eukaryotic genome (C. elegans). To distinguish the potential effects of DNA modification on DNA polymerization speed, we paired an analysis of native genomic DNA with whole-genome amplified (WGA) material in which DNA modifications were effectively removed. Modification sites for E. coli are known and we observed the expected IPD shifts at these sites in the native but not WGA samples. For C. elegans, such differences were not observed. Instead, we found a number of novel sequence contexts where IPDs were raised relative to the average IPDs for each of the four nucleotides, but for which the raised IPD was present in both native and WGA samples. ConclusionThe latter results argue strongly against DNA modification as the underlying driver for high IPD segments for C. elegans, and provide a framework for separating effects of DNA modification from context-dependent DNA polymerase kinetic patterns inherent in underlying DNA sequence for a complex eukaryotic genome.


2020 ◽  
Author(s):  
Barbara Zehentner ◽  
Zachary Ardern ◽  
Michaela Kreitmeier ◽  
Siegfried Scherer ◽  
Klaus Neuhaus

SUMMARYThe genetic code allows six reading frames at a double-stranded DNA locus, and many open reading frames (ORFs) overlap extensively with ORFs of annotated genes (e.g., at least 30 bp or having an embedded ORF). Currently, bacterial genome annotation systematically discards embedded overlapping ORFs of genes (OLGs) due to an assumed information-content constraint, and, consequently, very few OLGs are known. Here we use strand-specific RNAseq and ribosome profiling, detecting about 200 embedded or partially overlapping ORFs of gene candidates in the pathogen E. coli O157:H7 EDL933. These are typically short, many of them show clear promoter motifs as determined by Cappable-seq, indistinguishable from those of annotated genes, and are expressed at a low level. We could express most of them as stable proteins, and 49 displayed a potential phenotype. Ribosome profiling analyses in three other E. coli strains predicted between 84 and 190 embedded antisense OLGs per strain except in E. coli K-12, which is an atypical lab strain. We also found evidence of homology to annotated genes for 100 to 300 OLGs per E. coli strain investigated. Based on this evidence we suggest that bacterial OLGs deserve attention with respect to genome annotation and coding complexity of bacterial genomes. Such sequences may constitute an important coding reserve, opening up new research in genetics and evolutionary biology.


2008 ◽  
Author(s):  
Henk Bolink ◽  
Rubén D. Costa ◽  
Enrique Orti ◽  
Michele Sessolo ◽  
Stefan Graber ◽  
...  

Author(s):  
Fabrice Pointillart ◽  
Bertrand Lefeuvre ◽  
Carlo Andrea Mattei ◽  
Jessica Flores Gonzalez ◽  
Frédéric Gendron ◽  
...  

Author(s):  
Eric S Tvedte ◽  
Mark Gasser ◽  
Benjamin C Sparklin ◽  
Jane Michalski ◽  
Carl E Hjelmen ◽  
...  

Abstract The newest generation of DNA sequencing technology is highlighted by the ability to generate sequence reads hundreds of kilobases in length. Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT) have pioneered competitive long read platforms, with more recent work focused on improving sequencing throughput and per-base accuracy. We used whole-genome sequencing data produced by three PacBio protocols (Sequel II CLR, Sequel II HiFi, RS II) and two ONT protocols (Rapid Sequencing and Ligation Sequencing) to compare assemblies of the bacteria Escherichia coli and the fruit fly Drosophila ananassae. In both organisms tested, Sequel II assemblies had the highest consensus accuracy, even after accounting for differences in sequencing throughput. ONT and PacBio CLR had the longest reads sequenced compared to PacBio RS II and HiFi, and genome contiguity was highest when assembling these datasets. ONT Rapid Sequencing libraries had the fewest chimeric reads in addition to superior quantification of E. coli plasmids versus ligation-based libraries. The quality of assemblies can be enhanced by adopting hybrid approaches using Illumina libraries for bacterial genome assembly or polishing eukaryotic genome assemblies, and an ONT-Illumina hybrid approach would be more cost-effective for many users. Genome-wide DNA methylation could be detected using both technologies, however ONT libraries enabled the identification of a broader range of known E. coli methyltransferase recognition motifs in addition to undocumented D. ananassae motifs. The ideal choice of long read technology may depend on several factors including the question or hypothesis under examination. No single technology outperformed others in all metrics examined.


Author(s):  
Daniella F Lato ◽  
G Brian Golding

Abstract Increasing evidence supports the notion that different regions of a genome have unique rates of molecular change. This variation is particularly evident in bacterial genomes where previous studies have reported gene expression and essentiality tend to decrease, while substitution rates usually increases with increasing distance from the origin of replication. Genomic reorganization such as rearrangements occur frequently in bacteria and allow for the introduction and restructuring of genetic content, creating gradients of molecular traits along genomes. Here, we explore the interplay of these phenomena by mapping substitutions to the genomes of Escherichia coli, Bacillus subtilis, Streptomyces, and Sinorhizobium meliloti, quantifying how many substitutions have occurred at each position in the genome. Preceding work indicates that substitution rate significantly increases with distance from the origin. Using a larger sample size and accounting for genome rearrangements through ancestral reconstruction, our analysis demonstrates that the correlation between the number of substitutions and distance from the origin of replication is often significant but small and inconsistent in direction. Some replicons had a significantly decreasing trend (E. coli and the chromosome of S. meliloti), while others showed the opposite significant trend (B. subtilis, Streptomyces, pSymA and pSymB in S. meliloti). dN, dS and ω were examined across all genes and there was no significant correlation between those values and distance from the origin. This study highlights the impact that genomic rearrangements and location have on molecular trends in some bacteria, illustrating the importance of considering spatial trends in molecular evolutionary analysis. Assuming that molecular trends are exclusively in one direction can be problematic.


Microbiome ◽  
2021 ◽  
Vol 9 (1) ◽  
Author(s):  
David Pellow ◽  
Alvah Zorea ◽  
Maraike Probst ◽  
Ori Furman ◽  
Arik Segal ◽  
...  

Abstract Background Metagenomic sequencing has led to the identification and assembly of many new bacterial genome sequences. These bacteria often contain plasmids: usually small, circular double-stranded DNA molecules that may transfer across bacterial species and confer antibiotic resistance. These plasmids are generally less studied and understood than their bacterial hosts. Part of the reason for this is insufficient computational tools enabling the analysis of plasmids in metagenomic samples. Results We developed SCAPP (Sequence Contents-Aware Plasmid Peeler)—an algorithm and tool to assemble plasmid sequences from metagenomic sequencing. SCAPP builds on some key ideas from the Recycler algorithm while improving plasmid assemblies by integrating biological knowledge about plasmids. We compared the performance of SCAPP to Recycler and metaplasmidSPAdes on simulated metagenomes, real human gut microbiome samples, and a human gut plasmidome dataset that we generated. We also created plasmidome and metagenome data from the same cow rumen sample and used the parallel sequencing data to create a novel assessment procedure. Overall, SCAPP outperformed Recycler and metaplasmidSPAdes across this wide range of datasets. Conclusions SCAPP is an easy to use Python package that enables the assembly of full plasmid sequences from metagenomic samples. It outperformed existing metagenomic plasmid assemblers in most cases and assembled novel and clinically relevant plasmids in samples we generated such as a human gut plasmidome. SCAPP is open-source software available from: https://github.com/Shamir-Lab/SCAPP.


2021 ◽  
Vol 22 (3) ◽  
pp. 1018
Author(s):  
Hiroaki Yokota

Helicases are nucleic acid-unwinding enzymes that are involved in the maintenance of genome integrity. Several parts of the amino acid sequences of helicases are very similar, and these quite well-conserved amino acid sequences are termed “helicase motifs”. Previous studies by X-ray crystallography and single-molecule measurements have suggested a common underlying mechanism for their function. These studies indicate the role of the helicase motifs in unwinding nucleic acids. In contrast, the sequence and length of the C-terminal amino acids of helicases are highly variable. In this paper, I review past and recent studies that proposed helicase mechanisms and studies that investigated the roles of the C-terminal amino acids on helicase and dimerization activities, primarily on the non-hexermeric Escherichia coli (E. coli) UvrD helicase. Then, I center on my recent study of single-molecule direct visualization of a UvrD mutant lacking the C-terminal 40 amino acids (UvrDΔ40C) used in studies proposing the monomer helicase model. The study demonstrated that multiple UvrDΔ40C molecules jointly participated in DNA unwinding, presumably by forming an oligomer. Thus, the single-molecule observation addressed how the C-terminal amino acids affect the number of helicases bound to DNA, oligomerization, and unwinding activity, which can be applied to other helicases.


2021 ◽  
Vol 16 (1) ◽  
Author(s):  
Kingshuk Mukherjee ◽  
Massimiliano Rossi ◽  
Leena Salmela ◽  
Christina Boucher

AbstractGenome wide optical maps are high resolution restriction maps that give a unique numeric representation to a genome. They are produced by assembling hundreds of thousands of single molecule optical maps, which are called Rmaps. Unfortunately, there are very few choices for assembling Rmap data. There exists only one publicly-available non-proprietary method for assembly and one proprietary software that is available via an executable. Furthermore, the publicly-available method, by Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006), follows the overlap-layout-consensus (OLC) paradigm, and therefore, is unable to scale for relatively large genomes. The algorithm behind the proprietary method, Bionano Genomics’ Solve, is largely unknown. In this paper, we extend the definition of bi-labels in the paired de Bruijn graph to the context of optical mapping data, and present the first de Bruijn graph based method for Rmap assembly. We implement our approach, which we refer to as rmapper, and compare its performance against the assembler of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) and Solve by Bionano Genomics on data from three genomes: E. coli, human, and climbing perch fish (Anabas Testudineus). Our method was able to successfully run on all three genomes. The method of Valouev et al. (Proc Natl Acad Sci USA 103(43):15770–15775, 2006) only successfully ran on E. coli. Moreover, on the human genome rmapper was at least 130 times faster than Bionano Solve, used five times less memory and produced the highest genome fraction with zero mis-assemblies. Our software, rmapper is written in C++ and is publicly available under GNU General Public License at https://github.com/kingufl/Rmapper.


2017 ◽  
Vol 112 (3) ◽  
pp. 524a
Author(s):  
Pradeep Sathyanarayana ◽  
Satyaghosh Maurya ◽  
Ganapathy Ayappa ◽  
Sandhya S. Visweswariah ◽  
Rahul Roy

Sign in / Sign up

Export Citation Format

Share Document