Episo: quantitative estimation of RNA 5-methylcytosine at isoform level by high-throughput sequencing of RNA treated with bisulfite

Junfeng Liu; Ziyang An; Jianjun Luo; Jing Li; Feifei Li; Zhihua Zhang

doi:10.1093/bioinformatics/btz900

Episo: quantitative estimation of RNA 5-methylcytosine at isoform level by high-throughput sequencing of RNA treated with bisulfite

Bioinformatics ◽

10.1093/bioinformatics/btz900 ◽

2019 ◽

Vol 36 (7) ◽

pp. 2033-2039 ◽

Cited By ~ 2

Author(s):

Junfeng Liu ◽

Ziyang An ◽

Jianjun Luo ◽

Jing Li ◽

Feifei Li ◽

...

Keyword(s):

High Throughput ◽

High Throughput Sequencing ◽

Quantitative Estimation ◽

Supplementary Information ◽

Biological Processes ◽

Single Nucleotide ◽

Rna Immunoprecipitation ◽

Nucleotide Resolution ◽

Human And Mouse ◽

Single Nucleotide Resolution

Abstract Motivation RNA 5-methylcytosine (m5C) is a type of post-transcriptional modification that may be involved in numerous biological processes and tumorigenesis. RNA m5C can be profiled at single-nucleotide resolution by high-throughput sequencing of RNA treated with bisulfite (RNA-BisSeq). However, the exploration of transcriptome-wide profile and potential function of m5C in splicing remains to be elucidated due to lack of isoform level m5C quantification tool. Results We developed a computational package to quantify Epitranscriptomal RNA m5C at the transcript isoform level (named Episo). Episo consists of three tools: mapper, quant and Bisulfitefq, for mapping, quantifying and simulating RNA-BisSeq data, respectively. The high accuracy of Episo was validated using an improved m5C-specific methylated RNA immunoprecipitation (meRIP) protocol, as well as a set of in silico experiments. By applying Episo to public human and mouse RNA-BisSeq data, we found that the RNA m5C is not evenly distributed among the transcript isoforms, implying the m5C may subject to be regulated at isoform level. Availability and implementation Episo is released under the GNU GPLv3+ license. The resource code Episo is freely accessible from https://github.com/liujunfengtop/Episo (with Tophat/cufflink) and https://github.com/liujunfengtop/Episo/tree/master/Episo_Kallisto (with Kallisto). Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution

Nucleic Acids Research ◽

10.1093/nar/gkz569 ◽

2019 ◽

Vol 47 (18) ◽

pp. e103-e103 ◽

Cited By ~ 58

Author(s):

Benjamin J Callahan ◽

Joan Wong ◽

Cheryl Heiner ◽

Steve Oh ◽

Casey M Theriot ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

Amplicon Sequencing ◽

Full Length ◽

Rrna Gene ◽

Single Nucleotide ◽

Full Complement ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

AbstractTargeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate. In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowed Escherichia coli strains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in several E. coli strains. There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use.

Download Full-text

LongAGE: defining breakpoints of genomic structural variants through optimal and memory efficient alignments of long reads

Bioinformatics ◽

10.1093/bioinformatics/btaa703 ◽

2020 ◽

Author(s):

Quang Tran ◽

Alexej Abyzov

Keyword(s):

Copy Number Variants ◽

Supplementary Information ◽

Segmental Duplications ◽

Structural Variations ◽

Single Nucleotide ◽

Long Reads ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution ◽

Genomic Structural Variants ◽

Memory Efficient

Abstract Summary Defining the precise location of structural variations (SVs) at single-nucleotide breakpoint resolution is a challenging problem due to large gaps in alignment. Previously, Alignment with Gap Excision (AGE) enabled us to define breakpoints of SVs at single-nucleotide resolution; however, AGE requires a vast amount of memory when aligning a pair of long sequences. To address this, we developed a memory-efficient implementation—LongAGE—based on the classical Hirschberg algorithm. We demonstrate an application of LongAGE for resolving breakpoints of SVs embedded into segmental duplications on Pacific Biosciences (PacBio) reads that can be longer than 10 kb. Furthermore, we observed different breakpoints for a deletion and a duplication in the same locus, providing direct evidence that such multi-allelic copy number variants (mCNVs) arise from two or more independent ancestral mutations. Availability and implementation LongAGE is implemented in C++ and available on Github at https://github.com/Coaxecva/LongAGE. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

5-methylcytosine modification by Plasmodium NSUN2 stabilizes mRNA and mediates the development of gametocytes

10.1101/2021.06.06.447275 ◽

2021 ◽

Author(s):

Meng Liu ◽

Gangqiang Guo ◽

Pengge Qian ◽

Jianbing Mu ◽

Binbin Lu ◽

...

Keyword(s):

Translation Efficiency ◽

Biological Processes ◽

Malaria Parasites ◽

Single Nucleotide ◽

Dynamic Regulation ◽

Knock Out ◽

Transcript Stability ◽

Sexual Stages ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

5-methylcytosine (m5C) is an important epi-transcriptomic modification involved in mRNA stability and translation efficiency in various biological processes. However, it remains unclear if m5C modification contributes to the dynamic regulation of the transcriptome during the developmental cycles of Plasmodium parasites. Here, we characterize the landscape of m5C mRNA modifications at single nucleotide resolution in the asexual replication stages and gametocyte sexual stages of rodent (P. yoelii) and human (P. falciparum) malaria parasites. While different representations of m5C-modified mRNAs are associated with the different stages, the abundance of the m5C marker is strikingly enhanced in the transcriptomes of gametocytes. Our results show that m5C modifications confer stability to the Plasmodium transcripts and that a Plasmodium ortholog of NSUN2 is a major mRNA m5C methyltransferase in malaria parasites. Upon knock-out of P. yoelii nsun2 (pynsun2), marked reductions of m5C modification were observed in a panel of gametocytogenesis-associated transcripts. These reductions correlated with impaired gametocyte production in rodent and human malaria parasites. Restoration of the nsun2 gene in the knock-out parasites rescued the gametocyte production phenotype as well as m5C modification of the gametocytogenesis-associated transcripts. Together with the mRNA m5C profiles for two species of Plasmodium, our findings demonstrate a major role for NSUN2-mediated m5C modifications in mRNA transcript stability and sexual differentiation in malaria parasites.

Download Full-text

High-throughput profiling of influenza A virus hemagglutinin gene at single-nucleotide resolution

Scientific Reports ◽

10.1038/srep04942 ◽

2014 ◽

Vol 4 (1) ◽

Cited By ~ 54

Author(s):

Nicholas C. Wu ◽

Arthur P. Young ◽

Laith Q. Al-Mawsawi ◽

C. Anders Olson ◽

Jun Feng ◽

...

Keyword(s):

Influenza A Virus ◽

High Throughput ◽

Influenza A ◽

Single Nucleotide ◽

Hemagglutinin Gene ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

Download Full-text

High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution

10.1101/392332 ◽

2018 ◽

Cited By ~ 5

Author(s):

Benjamin J Callahan ◽

Joan Wong ◽

Cheryl Heiner ◽

Steve Oh ◽

Casey M Theriot ◽

...

Keyword(s):

16S Rrna ◽

16S Rrna Gene ◽

High Throughput ◽

Amplicon Sequencing ◽

Full Length ◽

Rrna Gene ◽

Single Nucleotide ◽

Full Complement ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

AbstractTargeted PCR amplification and high-throughput sequencing (amplicon sequencing) of 16S rRNA gene fragments is widely used to profile microbial communities. New long-read sequencing technologies can sequence the entire 16S rRNA gene, but higher error rates have limited their attractiveness when accuracy is important. Here we present a high-throughput amplicon sequencing methodology based on PacBio circular consensus sequencing and the DADA2 sample inference method that measures the full-length 16S rRNA gene with single-nucleotide resolution and a near-zero error rate.In two artificial communities of known composition, our method recovered the full complement of full-length 16S sequence variants from expected community members without residual errors. The measured abundances of intra-genomic sequence variants were in the integral ratios expected from the genuine allelic variants within a genome. The full-length 16S gene sequences recovered by our approach allowedE. colistrains to be correctly classified to the O157:H7 and K12 sub-species clades. In human fecal samples, our method showed strong technical replication and was able to recover the full complement of 16S rRNA alleles in severalE. colistrains.There are likely many applications beyond microbial profiling for which high-throughput amplicon sequencing of complete genes with single-nucleotide resolution will be of use.

Download Full-text

Chromosomal landscape of UV damage formation and repair at single-nucleotide resolution

Proceedings of the National Academy of Sciences ◽

10.1073/pnas.1606667113 ◽

2016 ◽

Vol 113 (32) ◽

pp. 9057-9062 ◽

Cited By ~ 60

Author(s):

Peng Mao ◽

Michael J. Smerdon ◽

Steven A. Roberts ◽

John J. Wyrick

Keyword(s):

Transcription Factors ◽

High Throughput Sequencing ◽

Dna Lesions ◽

Yeast Genome ◽

Uv Damage ◽

Single Nucleotide ◽

High Resolution Data ◽

Pyrimidine Dimers ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

UV-induced DNA lesions are important contributors to mutagenesis and cancer, but it is not fully understood how the chromosomal landscape influences UV lesion formation and repair. Genome-wide profiling of repair activity in UV irradiated cells has revealed significant variations in repair kinetics across the genome, not only among large chromatin domains, but also at individual transcription factor binding sites. Here we report that there is also a striking but predictable variation in initial UV damage levels across a eukaryotic genome. We used a new high-throughput sequencing method, known as CPD-seq, to precisely map UV-induced cyclobutane pyrimidine dimers (CPDs) at single-nucleotide resolution throughout the yeast genome. This analysis revealed that individual nucleosomes significantly alter CPD formation, protecting nucleosomal DNA with an inward rotational setting, even though such DNA is, on average, more intrinsically prone to form CPD lesions. CPD formation is also inhibited by DNA-bound transcription factors, in effect shielding important DNA elements from UV damage. Analysis of CPD repair revealed that initial differences in CPD damage formation often persist, even at later repair time points. Furthermore, our high-resolution data demonstrate, to our knowledge for the first time, that CPD repair is significantly less efficient at translational positions near the dyad of strongly positioned nucleosomes in the yeast genome. These findings define the global roles of nucleosomes and transcription factors in both UV damage formation and repair, and have important implications for our understanding of UV-induced mutagenesis in human cancers.

Download Full-text

BreakID: genomics breakpoints identification to detect gene fusion events using discordant pairs and split reads

Bioinformatics ◽

10.1093/bioinformatics/bty1070 ◽

2019 ◽

Vol 35 (16) ◽

pp. 2859-2861

Author(s):

Linfang Jin ◽

Jinhuo Lai ◽

Yang Zhang ◽

Ying Fu ◽

Shuhang Wang ◽

...

Keyword(s):

Gene Fusion ◽

Source Code ◽

High Sensitivity ◽

Supplementary Information ◽

Sequencing Data ◽

Single Nucleotide ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution ◽

Fusion Detection ◽

Better Than

AbstractSummaryHere we developed a tool called Breakpoint Identification (BreakID) to identity fusion events from targeted sequencing data. Taking discordant read pairs and split reads as supporting evidences, BreakID can identify gene fusion breakpoints at single nucleotide resolution. After validation with confirmed fusion events in cancer cell lines, we have proved that BreakID can achieve high sensitivity of 90.63% along with PPV of 100% at sequencing depth of 500× and perform better than other available fusion detection tools. We anticipate that BreakID will have an extensive popularity in the detection and analysis of fusions involved in clinical and research sequencing scenarios.Availability and implementationSource code is freely available at https://github.com/SinOncology/BreakID.Supplementary informationSupplementary data are available at Bioinformatics online.

Download Full-text

Identification of m6A residues at single-nucleotide resolution using eCLIP and an accessible custom analysis pipeline

10.1101/2020.03.11.986174 ◽

2020 ◽

Author(s):

Justin T. Roberts ◽

Allison M. Porman ◽

Aaron M. Johnson

Keyword(s):

Chemical Properties ◽

Rna Modifications ◽

Single Nucleotide ◽

Uv Crosslinking ◽

Rna Immunoprecipitation ◽

Input Sample ◽

Library Complexity ◽

Rna Fragments ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

AbstractMethylation at the N6 position of adenosine (m6A) is one of the most abundant RNA modifications found in eukaryotes, however accurate detection of specific m6A nucleotides within transcripts has been historically challenging due to m6A and unmodified adenosine having virtually indistinguishable chemical properties. While previous strategies such as methyl-RNA immunoprecipitation and sequencing (MeRIP-Seq) have relied on m6A-specific antibodies to isolate RNA fragments containing the modification, these methods do not allow for precise identification of individual m6A residues. More recently, modified cross-linking and immunoprecipitation (CLIP) based approaches that rely on inducing specific mutations during reverse transcription via UV crosslinking of the anti-m6A antibody to methylated RNA have been employed to overcome this limitation. However, the most utilized version of this approach, miCLIP, can be technically challenging to use for achieving high-complexity libraries. Here we present an improved methodology that yields high library complexity and allows for the straightforward identification of individual m6A residues with reliable confidence metrics. Based on enhanced CLIP (eCLIP), our m6A-eCLIP (meCLIP) approach couples the improvements of eCLIP with the inclusion of an input sample and an easy-to-use computational pipeline to allow for precise calling of m6A sites at true single nucleotide resolution. As the effort to accurately identify m6As in an efficient and straightforward way intensifies, this method is a valuable tool for investigators interested in unraveling the m6A epitranscriptome.

Download Full-text

DNAscent v2: detecting replication forks in nanopore sequencing data with deep learning

BMC Genomics ◽

10.1186/s12864-021-07736-6 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Michael A. Boemo

Keyword(s):

Dna Replication ◽

High Throughput ◽

Single Molecule ◽

Replication Fork ◽

Genome Stability ◽

Brdu Incorporation ◽

Sequencing Data ◽

Single Nucleotide ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

Abstract Background Measuring DNA replication dynamics with high throughput and single-molecule resolution is critical for understanding both the basic biology behind how cells replicate their DNA and how DNA replication can be used as a therapeutic target for diseases like cancer. In recent years, the detection of base analogues in Oxford Nanopore Technologies (ONT) sequencing reads has become a promising new method to supersede existing single-molecule methods such as DNA fibre analysis: ONT sequencing yields long reads with high throughput, and sequenced molecules can be mapped to the genome using standard sequence alignment software. Results This paper introduces DNAscent v2, software that uses a residual neural network to achieve fast, accurate detection of the thymidine analogue BrdU with single-nucleotide resolution. DNAscent v2 also comes equipped with an autoencoder that interprets the pattern of BrdU incorporation on each ONT-sequenced molecule into replication fork direction to call the location of replication origins termination sites. DNAscent v2 surpasses previous versions of DNAscent in BrdU calling accuracy, origin calling accuracy, speed, and versatility across different experimental protocols. Unlike NanoMod, DNAscent v2 positively identifies BrdU without the need for sequencing unmodified DNA. Unlike RepNano, DNAscent v2 calls BrdU with single-nucleotide resolution and detects more origins than RepNano from the same sequencing data. DNAscent v2 is open-source and available at https://github.com/MBoemo/DNAscent. Conclusions This paper shows that DNAscent v2 is the new state-of-the-art in the high-throughput, single-molecule detection of replication fork dynamics. These improvements in DNAscent v2 mark an important step towards measuring DNA replication dynamics in large genomes with single-molecule resolution. Looking forward, the increase in accuracy in single-nucleotide resolution BrdU calls will also allow DNAscent v2 to branch out into other areas of genome stability research, particularly the detection of DNA repair.

Download Full-text

Faculty Opinions recommendation of RNA structure analysis at single nucleotide resolution by selective 2'-hydroxyl acylation and primer extension (SHAPE).

Faculty Opinions – Post-Publication Peer Review of the Biomedical Literature ◽

10.3410/f.1024946.293765 ◽

2005 ◽

Author(s):

Douglas Turner

Keyword(s):

Structure Analysis ◽

Rna Structure ◽

Primer Extension ◽

Single Nucleotide ◽

Nucleotide Resolution ◽

Single Nucleotide Resolution

Download Full-text