Genome-wide identification of 5-methylcytosine sites in bacterial genomes by high-throughput sequencing of MspJI restriction fragments

Single-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Genome-Wide Identification of 5-Methylcytosine Sites in Bacterial Genomes By High-Throughput Sequencing of MspJI Restriction Fragments

10.1101/2021.02.10.430591 ◽

2021 ◽

Author(s):

Brian P. Anton ◽

Alexey Fomenkov ◽

Victoria Wu ◽

Richard J. Roberts

Keyword(s):

Single Molecule ◽

Dna Sequences ◽

High Throughput Sequencing ◽

Cost Effective ◽

Restriction Enzymes ◽

Specific Sequence ◽

Genome Wide ◽

Cost Effective Alternative ◽

Simple Column ◽

Sequencing Platforms

ABSTRACTSingle-molecule Real-Time (SMRT) sequencing can easily identify sites of N6-methyladenine and N4-methylcytosine within DNA sequences, but similar identification of 5-methylcytosine sites is not as straightforward. In prokaryotic DNA, methylation typically occurs within specific sequence contexts, or motifs, that are a property of the methyltransferases that “write” these epigenetic marks. We present here a straightforward, cost-effective alternative to both SMRT and bisulfite sequencing for the determination of prokaryotic 5-methylcytosine methylation motifs. The method, called MFRE-Seq, relies on excision and isolation of fully methylated fragments of predictable size using MspJI-Family Restriction Enzymes (MFREs), which depend on the presence of 5-methylcytosine for cleavage. We demonstrate that MFRE-Seq is compatible with both Illumina and Ion Torrent sequencing platforms and requires only a digestion step and simple column purification of size-selected digest fragments prior to standard library preparation procedures. We applied MFRE-Seq to numerous bacterial and archaeal genomic DNA preparations and successfully confirmed known motifs and identified novel ones. This method should be a useful complement to existing methodologies for studying prokaryotic methylomes and characterizing the contributing methyltransferases.

Download Full-text

Hairpin structure facilitates high-fidelity DNA amplification reactions in both qPCR and high-throughput sequencing

10.1101/2021.11.10.21266179 ◽

2021 ◽

Author(s):

Kerou Zhang ◽

Alessandro Pinto ◽

Peng Dai ◽

Michael Wang ◽

Lauren Yuxuan Cheng ◽

...

Keyword(s):

Dna Sequences ◽

High Throughput Sequencing ◽

Limit Of Detection ◽

Dna Polymerases ◽

Dna Amplification ◽

Cost Effective ◽

High Fidelity ◽

Resistance Mutations ◽

Chain Reactions ◽

Early Cancer Diagnosis

Effective polymerase chain reactions (PCR) are important in bio-laboratories. It is essential to detect rare DNA-sequence variants for early cancer diagnosis or for drug-resistance mutations identification. Some of the common detection quantitative PCR (qPCR) methods are restricted in the limit of detection (LoD) because of the high polymerase misincorporation rate in Taq DNA polymerases. High-fidelity (HiFi) DNA polymerases have a 50- to 250-fold higher fidelity. Yet, there are currently no proper designs for multiplexed HiFi qPCR reactions. Moreover, the popularity of targeting highly multiplex DNA sequences requires minimizing PCR side products, as the potential of dimerization grows quadratically as the plexes of primers increases. Efforts tried before were either an add-on step, or technology-specific, or requiring high-level computing skills. There lacks an easy-to-apply and cost-effective method for dimerization reduction. Here, we presented the Occlusion System, composed of a 5'-overhanged primer and a probe with a short-stem hairpin. We demonstrated that it allowed multiplexing high-fidelity qPCR reaction, it was also compatible with the current variant-enrichment method to improve the LoD by 10-fold. Further, we found that the Occlusion System reduced the dimerization up to 10-fold in highly multiplexed PCR. Thus, the Occlusion System satisfactorily improved both qPCR sensitivity and PCR efficiency.

Download Full-text

Specific chromatin changes mark lateral organ founder cells in the Arabidopsis inflorescence meristem

Journal of Experimental Botany ◽

10.1093/jxb/erz181 ◽

2019 ◽

Vol 70 (15) ◽

pp. 3867-3879 ◽

Cited By ~ 7

Author(s):

Anneke Frerichs ◽

Julia Engelhorn ◽

Janine Altmüller ◽

Jose Gutierrez-Marcos ◽

Wolfgang Werr

Keyword(s):

Dna Sequences ◽

High Throughput Sequencing ◽

Gene Activation ◽

Regulatory Elements ◽

Inflorescence Meristem ◽

Genome Wide ◽

A Genome ◽

Hypersensitive Sites ◽

Lateral Organ ◽

Founder Cells

Abstract Fluorescence-activated cell sorting (FACS) and assay for transposase-accessible chromatin with high-throughput sequencing (ATAC-seq) were combined to analyse the chromatin state of lateral organ founder cells (LOFCs) in the peripheral zone of the Arabidopsis apetala1-1 cauliflower-1 double mutant inflorescence meristem. On a genome-wide level, we observed a striking correlation between transposase hypersensitive sites (THSs) detected by ATAC-seq and DNase I hypersensitive sites (DHSs). The mostly expanded DHSs were often substructured into several individual THSs, which correlated with phylogenetically conserved DNA sequences or enhancer elements. Comparing chromatin accessibility with available RNA-seq data, THS change configuration was reflected by gene activation or repression and chromatin regions acquired or lost transposase accessibility in direct correlation with gene expression levels in LOFCs. This was most pronounced immediately upstream of the transcription start, where genome-wide THSs were abundant in a complementary pattern to established H3K4me3 activation or H3K27me3 repression marks. At this resolution, the combined application of FACS/ATAC-seq is widely applicable to detect chromatin changes during cell-type specification and facilitates the detection of regulatory elements in plant promoters.

Download Full-text

Kohdista: an efficient method to index and query possible Rmap alignments

Algorithms for Molecular Biology ◽

10.1186/s13015-019-0160-9 ◽

2019 ◽

Vol 14 (1) ◽

Cited By ~ 1

Author(s):

Martin D. Muggli ◽

Simon J. Puglisi ◽

Christina Boucher

Keyword(s):

Single Molecule ◽

Restriction Enzymes ◽

Consensus Approach ◽

E Coli ◽

Genome Wide ◽

Alignment Problem ◽

Optical Map ◽

Optical Maps ◽

Map Data ◽

Genomic Regions

Abstract Background Genome-wide optical maps are ordered high-resolution restriction maps that give the position of occurrence of restriction cut sites corresponding to one or more restriction enzymes. These genome-wide optical maps are assembled using an overlap-layout-consensus approach using raw optical map data, which are referred to as Rmaps. Due to the high error-rate of Rmap data, finding the overlap between Rmaps remains challenging. Results We present Kohdista, which is an index-based algorithm for finding pairwise alignments between single molecule maps (Rmaps). The novelty of our approach is the formulation of the alignment problem as automaton path matching, and the application of modern index-based data structures. In particular, we combine the use of the Generalized Compressed Suffix Array (GCSA) index with the wavelet tree in order to build Kohdista. We validate Kohdista on simulated E. coli data, showing the approach successfully finds alignments between Rmaps simulated from overlapping genomic regions. Conclusion we demonstrate Kohdista is the only method that is capable of finding a significant number of high quality pairwise Rmap alignments for large eukaryote organisms in reasonable time.

Download Full-text

A sorghum Practical Haplotype Graph facilitates genome-wide imputation and cost-effective genomic prediction

10.1101/775221 ◽

2019 ◽

Author(s):

Sarah E. Jensen ◽

Jean Rigaud Charles ◽

Kebede Muleta ◽

Peter Bradbury ◽

Terry Casstevens ◽

...

Keyword(s):

Genomic Selection ◽

Genomic Prediction ◽

Sequence Data ◽

Input Sequence ◽

Genotyping By Sequencing ◽

Cost Effective ◽

Genome Wide ◽

Variant Information ◽

Sequencing Platforms ◽

Low Coverage

AbstractSuccessful management and utilization of increasingly large genomic datasets is essential for breeding programs to increase genetic gain and accelerate cultivar development. To help with data management and storage, we developed a sorghum Practical Haplotype Graph (PHG) pangenome database that stores all identified haplotypes and variant information for a given set of individuals. We developed two PHGs in sorghum, one with 24 individuals and another with 398 individuals, that reflect the diversity across genic regions of the sorghum genome. 24 founders of the Chibas sorghum breeding program were sequenced at low coverage (0.01x) and processed through the PHG to identify genome-wide variants. The PHG called SNPs with only 5.9% error at 0.01x coverage - only 3% lower than its accuracy when calling SNPs from 8x coverage sequence. Additionally, 207 progeny from the Chibas genomic selection (GS) training population were sequenced and processed through the PHG. Missing genotypes in the progeny were imputed from the parental haplotypes available in the PHG and used for genomic prediction. Mean prediction accuracies with PHG SNP calls range from 0.57-0.73 for different traits, and are similar to prediction accuracies obtained with genotyping-by-sequencing (GBS) or markers from sequencing targeted amplicons (rhAmpSeq). This study provides a proof of concept for using a sorghum PHG to call and impute SNPs from low-coverage sequence data and also shows that the PHG can unify genotype calls from different sequencing platforms. By reducing the amount of input sequence needed, the PHG has the potential to decrease the cost of genotyping for genomic selection, making GS more feasible and facilitating larger breeding populations that can capture maximum recombination. Our results demonstrate that the PHG is a useful research and breeding tool that can maintain variant information from a diverse group of taxa, store sequence data in a condensed but readily accessible format, unify genotypes from different genotyping methods, and provide a cost-effective option for genomic selection for any species.

Download Full-text

The theory and practice of measuring broad-range recombination rate from marker selected pools

10.1101/762575 ◽

2019 ◽

Author(s):

Kevin H.-C. Wei ◽

Aditya Mantha ◽

Doris Bachtrog

Keyword(s):

Genetic Distance ◽

Allele Frequency ◽

Recombination Rate ◽

High Throughput Sequencing ◽

Genetic Material ◽

Cost Effective ◽

Theory And Practice ◽

Rate Variation ◽

Sequencing Data ◽

Genome Wide

ABSTRACTRecombination is the exchange of genetic material between homologous chromosomes via physical crossovers. Pioneered by T. H. Morgan and A. Sturtevant over a century ago, methods to estimate recombination rate and genetic distance require scoring large number of recombinant individuals between molecular or visible markers. While high throughput sequencing methods have allowed for genome wide crossover detection producing high resolution maps, such methods rely on large number of recombinants individually sequenced and are therefore difficult to scale. Here, we present a simple and scalable method to infer near chromosome-wide recombination rate from marker selected pools and the corresponding analytical software MarSuPial. Rather than genotyping individuals from recombinant backcrosses, we bulk sequence marker selected pools to infer the allele frequency decay around the selected locus; since the number of recombinant individuals increases proportionally to the genetic distance from the selected locus, the allele frequency across the chromosome can be used to estimate the genetic distance and recombination rate. We mathematically demonstrate the relationship between allele frequency attenuation, recombinant fraction, genetic distance, and recombination rate in marker selected pools. Based on available chromosome-wide recombination rate models of Drosophila, we simulated read counts and determined that nonlinear local regressions (LOESS) produce robust estimates despite the high noise inherent to sequencing data. To empirically validate this approach, we show that (single) marker selected pools closely recapitulate genetic distances inferred from scoring recombinants between double markers. We theoretically determine how secondary loci with viability impacts can modulate the allele frequency decay and how to account for such effects directly from the data. We generated the recombinant map of three wild derived strains which strongly correlates with previous genome-wide measurements. Interestingly, amidst extensive recombination rate variation, multiple regions of the genomes show elevated rates across all strains. Lastly, we apply this method to estimate chromosome-wide crossover interference. Altogether, we find that marker selected pools is a simple and cost effective method for broad recombination rate estimates. Although it does not identify instances of crossovers, it can generate near chromosome-wide recombination maps in as little as one or two libraries.

Download Full-text

Confirmation of the Sequence of ‘Candidatus Liberibacter asiaticus’ and Assessment of Microbial Diversity in Huanglongbing-Infected Citrus Phloem Using a Metagenomic Approach

Molecular Plant-Microbe Interactions ◽

10.1094/mpmi-22-12-1624 ◽

2009 ◽

Vol 22 (12) ◽

pp. 1624-1634 ◽

Cited By ~ 66

Author(s):

Heather L. Tyler ◽

Luiz F. W. Roesch ◽

Siddarame Gowda ◽

William O. Dawson ◽

Eric W. Triplett

Keyword(s):

Dna Sequences ◽

High Throughput Sequencing ◽

Metagenomic Data ◽

Candidatus Liberibacter Asiaticus ◽

Metagenomic Dna ◽

Rna Sequences ◽

Culture Independent ◽

Candidatus Liberibacter ◽

Liberibacter Asiaticus ◽

Sequencing Platforms

The citrus disease Huanglongbing (HLB) is highly destructive in many citrus-growing regions of the world. The putative causal agent of this disease, ‘Candidatus Liberibacter asiaticus’, is difficult to culture, and Koch's postulates have not yet been fulfilled. As a result, efforts have focused on obtaining the genome sequence of ‘Ca. L. asiaticus’ in order to give insight on the physiology of this organism. In this work, three next-generation high-throughput sequencing platforms, 454, Solexa, and SOLiD, were used to obtain metagenomic DNA sequences from phloem tissue of Florida citrus trees infected with HLB. A culture-independent, polymerase chain reaction (PCR)-independent analysis of 16S ribosomal RNA sequences showed that the only bacterium present within the phloem metagenome was ‘Ca L. asiaticus’. No viral or viroid sequences were identified within the metagenome. By reference assembly, the phloem metagenome contained sequences that provided 26-fold coverage of the ‘Ca. L. asiaticus’ contigs in GenBank. By the same approach, phloem metagenomic data yielded less than 0.2-fold coverage of five other alphaproteobacterial genomes. Thus, phloem metagenomic DNA provided a PCR-independent means of verifying the presence of ‘Ca L. asiaticus’ in infected tissue and strongly suggests that no other disease agent was present in phloem. Analysis of these metagenomic data suggest that this approach has a detection limit of one ‘Ca. Liberibacter’ cell for every 52 phloem cells. The phloem sample sequenced here is estimated to have contained 1.7 ‘Ca. Liberibacter’ cells per phloem cell.

Download Full-text

DNA Metabarcoding for the Characterization of Terrestrial Microbiota—Pitfalls and Solutions

Microorganisms ◽

10.3390/microorganisms9020361 ◽

2021 ◽

Vol 9 (2) ◽

pp. 361

Author(s):

Davide Francioli ◽

Guillaume Lentendu ◽

Simon Lewin ◽

Steffen Kolb

Keyword(s):

High Throughput Sequencing ◽

Methodological Approach ◽

Soil Microbial Communities ◽

Cost Effective ◽

Trophic Levels ◽

Gene Markers ◽

Dna Metabarcoding ◽

Terrestrial Environments ◽

Sequencing Platforms

Soil-borne microbes are major ecological players in terrestrial environments since they cycle organic matter, channel nutrients across trophic levels and influence plant growth and health. Therefore, the identification, taxonomic characterization and determination of the ecological role of members of soil microbial communities have become major topics of interest. The development and continuous improvement of high-throughput sequencing platforms have further stimulated the study of complex microbiota in soils and plants. The most frequently used approach to study microbiota composition, diversity and dynamics is polymerase chain reaction (PCR), amplifying specific taxonomically informative gene markers with the subsequent sequencing of the amplicons. This methodological approach is called DNA metabarcoding. Over the last decade, DNA metabarcoding has rapidly emerged as a powerful and cost-effective method for the description of microbiota in environmental samples. However, this approach involves several processing steps, each of which might introduce significant biases that can considerably compromise the reliability of the metabarcoding output. The aim of this review is to provide state-of-the-art background knowledge needed to make appropriate decisions at each step of a DNA metabarcoding workflow, highlighting crucial steps that, if considered, ensures an accurate and standardized characterization of microbiota in environmental studies.

Download Full-text

Nanopore sequencing of RNA and cDNA molecules in Escherichia coli

RNA ◽

10.1261/rna.078937.121 ◽

2021 ◽

pp. rna.078937.121

Author(s):

Felix Grünberger ◽

Sébastien Ferreira-Cerca ◽

Dina Grohmann

Keyword(s):

Escherichia Coli ◽

Rna Sequencing ◽

Single Molecule ◽

High Throughput Sequencing ◽

Model Organism ◽

Cost Effective ◽

Rna Seq ◽

Sequencing Platform ◽

Quantitative Measurements ◽

Oxford Nanopore

High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely employed to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that (PCR)-cDNA-seq offers improved yield and accuracy compared to direct RNA sequencing. Notably, (PCR)-cDNA-seq is suitable for quantitative measurements and can be readily used for simultaneous and accurate detection of transcript 5'and 3' boundaries, analysis of transcriptional units and transcriptional heterogeneity. In summary, based on our comprehensive study, we show that Nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features. Thereby Nanopore RNA-seq holds the potential to become a valuable alternative method for RNA analysis in prokaryotes.

Download Full-text

Nanopore sequencing of RNA and cDNA molecules expands the transcriptomic toolbox in prokaryotes

10.1101/2021.06.14.448286 ◽

2021 ◽

Author(s):

Felix Gruenberger ◽

Sebastien Ferreira-Cerca ◽

Dina Grohmann

Keyword(s):

Rna Sequencing ◽

Single Molecule ◽

High Throughput Sequencing ◽

Model Organism ◽

Cost Effective ◽

Rna Seq ◽

Sequencing Platform ◽

Transcript Quantification ◽

Oxford Nanopore ◽

Bacterial Model

High-throughput sequencing dramatically changed our view of transcriptome architectures and allowed for ground-breaking discoveries in RNA biology. Recently, sequencing of full-length transcripts based on the single-molecule sequencing platform from Oxford Nanopore Technologies (ONT) was introduced and is widely employed to sequence eukaryotic and viral RNAs. However, experimental approaches implementing this technique for prokaryotic transcriptomes remain scarce. Here, we present an experimental and bioinformatic workflow for ONT RNA-seq in the bacterial model organism Escherichia coli, which can be applied to any microorganism. Our study highlights critical steps of library preparation and computational analysis and compares the results to gold standards in the field. Furthermore, we comprehensively evaluate the applicability and advantages of different ONT-based RNA sequencing protocols, including direct RNA, direct cDNA, and PCR-cDNA. We find that cDNA-seq offers improved yield and accuracy without bias in quantification compared to direct RNA sequencing. Notably, cDNA-seq can be readily used for simultaneous transcript quantification, accurate detection of transcript 5 ′ and 3′ boundaries, analysis of transcriptional units and transcriptional heterogeneity. In summary, we establish Nanopore RNA-seq to be a ready-to-use tool allowing rapid, cost-effective, and accurate annotation of multiple transcriptomic features thereby advancing it to become a standard method for RNA analysis in prokaryotes.

Download Full-text