scholarly journals WALT: fast and accurate read mapping for bisulfite sequencing

2016 ◽  
pp. btw490 ◽  
Author(s):  
Haifeng Chen ◽  
Andrew D. Smith ◽  
Ting Chen
2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Guilherme de Sena Brandine ◽  
Andrew D Smith

Abstract DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings.


2019 ◽  
Vol 35 (18) ◽  
pp. 3273-3278 ◽  
Author(s):  
Peng Wu ◽  
Yan Gao ◽  
Weilong Guo ◽  
Ping Zhu

Abstract Motivation Single-cell bisulfite sequencing (BS-seq) techniques have been developed for DNA methylation heterogeneity detection and studies with limited materials. However, the data deficiency such as low read mapping ratio is still a critical issue. Results We comprehensively characterize single-cell BS-seq data and reveal chimerical molecules to be the major source of alignment failures. These chimerical molecules are produced by recombination of genomic proximal sequences with microhomology regions (MR) after bisulfite conversion. In addition, we find DNA methylation within MR is highly variable, suggesting the necessity of removing these regions to accurately estimate DNA methylation levels. We further develop scBS-map to perform quality control and local alignment of bisulfite sequencing data, chimerical molecule determination and MR removal. Using scBS-map, we show remarkable increases in uniquely mapped reads, genomic coverage and number of CpG sites, and recover more functional elements with precise DNA methylation estimation. Availability and implementation The scBS-map software is freely available at https://github.com/wupengomics/scBS-map. Supplementary information Supplementary data are available at Bioinformatics online.


BMC Genomics ◽  
2015 ◽  
Vol 16 (Suppl 11) ◽  
pp. S2 ◽  
Author(s):  
Jacob Porter ◽  
Ming-an Sun ◽  
Hehuang Xie ◽  
Liqing Zhang

2022 ◽  
Vol 4 (1) ◽  
Author(s):  
Takashi Okada ◽  
Xin Sun ◽  
Stephen McIlfatrick ◽  
Justin C St. John

ABSTRACT Mitochondrial DNA (mtDNA) methylation in vertebrates has been hotly debated for over 40 years. Most contrasting results have been reported following bisulfite sequencing (BS-seq) analyses. We addressed whether BS-seq experimental and analysis conditions influenced the estimation of the levels of methylation in specific mtDNA sequences. We found false positive non-CpG methylation in the CHH context (fpCHH) using unmethylated Sus scrofa mtDNA BS-seq data. fpCHH methylation was detected on the top/plus strand of mtDNA within low guanine content regions. These top/plus strand sequences of fpCHH regions would become extremely AT-rich sequences after BS-conversion, whilst bottom/minus strand sequences remained almost unchanged. These unique sequences caused BS-seq aligners to falsely assign the origin of each strand in fpCHH regions, resulting in false methylation calls. fpCHH methylation detection was enhanced by short sequence reads, short library inserts, skewed top/bottom read ratios and non-directional read mapping modes. We confirmed no detectable CHH methylation in fpCHH regions by BS-amplicon sequencing. The fpCHH peaks were located in the D-loop, ATP6, ND2, ND4L, ND5 and ND6 regions and identified in our S. scrofa ovary and oocyte data and human BS-seq data sets. We conclude that non-CpG methylation could potentially be overestimated in specific sequence regions by BS-seq analysis.


2021 ◽  
Vol 8 (1) ◽  
Author(s):  
Brennan Hyden ◽  
Craig H. Carlson ◽  
Fred E. Gouker ◽  
Jeremy Schmutz ◽  
Kerrie Barry ◽  
...  

AbstractSex dimorphism and gene expression were studied in developing catkins in 159 F2 individuals from the bioenergy crop Salix purpurea, and potential mechanisms and pathways for regulating sex development were explored. Differential expression, eQTL, bisulfite sequencing, and network analysis were used to characterize sex dimorphism, detect candidate master regulator genes, and identify pathways through which the sex determination region (SDR) may mediate sex dimorphism. Eleven genes are presented as candidates for master regulators of sex, supported by gene expression and network analyses. These include genes putatively involved in hormone signaling, epigenetic modification, and regulation of transcription. eQTL analysis revealed a suite of transcription factors and genes involved in secondary metabolism and floral development that were predicted to be under direct control of the sex determination region. Furthermore, data from bisulfite sequencing and small RNA sequencing revealed strong differences in expression between males and females that would implicate both of these processes in sex dimorphism pathways. These data indicate that the mechanism of sex determination in Salix purpurea is likely different from that observed in the related genus Populus. This further demonstrates the dynamic nature of SDRs in plants, which involves a multitude of mechanisms of sex determination and a high rate of turnover.


2021 ◽  
Vol 11 (1) ◽  
Author(s):  
Hirotaka Yamagata ◽  
Hiroyuki Ogihara ◽  
Koji Matsuo ◽  
Shusaku Uchida ◽  
Ayumi Kobayashi ◽  
...  

AbstractThe heterogeneity of major depressive disorder (MDD) is attributed to the fact that diagnostic criteria (e.g., DSM-5) are only based on clinical symptoms. The discovery of blood biomarkers has the potential to change the diagnosis of MDD. The purpose of this study was to identify blood biomarkers of DNA methylation by strategically subtyping patients with MDD by onset age. We analyzed genome-wide DNA methylation of patients with adult-onset depression (AOD; age ≥ 50 years, age at depression onset < 50 years; N = 10) and late-onset depression (LOD; age ≥ 50 years, age at depression onset ≥ 50 years; N = 25) in comparison to that of 30 healthy subjects. The methylation profile of the AOD group was not only different from that of the LOD group but also more homogenous. Six identified methylation CpG sites were validated by pyrosequencing and amplicon bisulfite sequencing as potential markers for AOD in a second set of independent patients with AOD and healthy control subjects (N = 11). The combination of three specific methylation markers achieved the highest accuracy (sensitivity, 64%; specificity, 91%; accuracy, 77%). Taken together, our findings suggest that DNA methylation markers are more suitable for AOD than for LOD patients.


Author(s):  
Adrien Oliva ◽  
Raymond Tobler ◽  
Alan Cooper ◽  
Bastien Llamas ◽  
Yassine Souilmi

Abstract The current standard practice for assembling individual genomes involves mapping millions of short DNA sequences (also known as DNA ‘reads’) against a pre-constructed reference genome. Mapping vast amounts of short reads in a timely manner is a computationally challenging task that inevitably produces artefacts, including biases against alleles not found in the reference genome. This reference bias and other mapping artefacts are expected to be exacerbated in ancient DNA (aDNA) studies, which rely on the analysis of low quantities of damaged and very short DNA fragments (~30–80 bp). Nevertheless, the current gold-standard mapping strategies for aDNA studies have effectively remained unchanged for nearly a decade, during which time new software has emerged. In this study, we used simulated aDNA reads from three different human populations to benchmark the performance of 30 distinct mapping strategies implemented across four different read mapping software—BWA-aln, BWA-mem, NovoAlign and Bowtie2—and quantified the impact of reference bias in downstream population genetic analyses. We show that specific NovoAlign, BWA-aln and BWA-mem parameterizations achieve high mapping precision with low levels of reference bias, particularly after filtering out reads with low mapping qualities. However, unbiased NovoAlign results required the use of an IUPAC reference genome. While relevant only to aDNA projects where reference population data are available, the benefit of using an IUPAC reference demonstrates the value of incorporating population genetic information into the aDNA mapping process, echoing recent results based on graph genome representations.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Laura Santini ◽  
Florian Halbritter ◽  
Fabian Titz-Teixeira ◽  
Toru Suzuki ◽  
Maki Asami ◽  
...  

AbstractIn mammalian genomes, differentially methylated regions (DMRs) and histone marks including trimethylation of histone 3 lysine 27 (H3K27me3) at imprinted genes are asymmetrically inherited to control parentally-biased gene expression. However, neither parent-of-origin-specific transcription nor imprints have been comprehensively mapped at the blastocyst stage of preimplantation development. Here, we address this by integrating transcriptomic and epigenomic approaches in mouse preimplantation embryos. We find that seventy-one genes exhibit previously unreported parent-of-origin-specific expression in blastocysts (nBiX: novel blastocyst-imprinted expressed). Uniparental expression of nBiX genes disappears soon after implantation. Micro-whole-genome bisulfite sequencing (µWGBS) of individual uniparental blastocysts detects 859 DMRs. We further find that 16% of nBiX genes are associated with a DMR, whereas most are associated with parentally-biased H3K27me3, suggesting a role for Polycomb-mediated imprinting in blastocysts. nBiX genes are clustered: five clusters contained at least one published imprinted gene, and five clusters exclusively contained nBiX genes. These data suggest that early development undergoes a complex program of stage-specific imprinting involving different tiers of regulation.


GigaScience ◽  
2021 ◽  
Vol 10 (5) ◽  
Author(s):  
Colin Farrell ◽  
Michael Thompson ◽  
Anela Tosevska ◽  
Adewale Oyetunde ◽  
Matteo Pellegrini

Abstract Background Bisulfite sequencing is commonly used to measure DNA methylation. Processing bisulfite sequencing data is often challenging owing to the computational demands of mapping a low-complexity, asymmetrical library and the lack of a unified processing toolset to produce an analysis-ready methylation matrix from read alignments. To address these shortcomings, we have developed BiSulfite Bolt (BSBolt), a fast and scalable bisulfite sequencing analysis platform. BSBolt performs a pre-alignment sequencing read assessment step to improve efficiency when handling asymmetrical bisulfite sequencing libraries. Findings We evaluated BSBolt against simulated and real bisulfite sequencing libraries. We found that BSBolt provides accurate and fast bisulfite sequencing alignments and methylation calls. We also compared BSBolt to several existing bisulfite alignment tools and found BSBolt outperforms Bismark, BSSeeker2, BISCUIT, and BWA-Meth based on alignment accuracy and methylation calling accuracy. Conclusion BSBolt offers streamlined processing of bisulfite sequencing data through an integrated toolset that offers support for simulation, alignment, methylation calling, and data aggregation. BSBolt is implemented as a Python package and command line utility for flexibility when building informatics pipelines. BSBolt is available at https://github.com/NuttyLogic/BSBolt under an MIT license.


Sign in / Sign up

Export Citation Format

Share Document