Fast and memory-efficient mapping of short bisulfite sequencing reads using a two-letter alphabet

Abstract DNA cytosine methylation is an important epigenomic mark with a wide range of functions in many organisms. Whole genome bisulfite sequencing is the gold standard to interrogate cytosine methylation genome-wide. Algorithms used to map bisulfite-converted reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter. This encoding substantially reduces the entropy of nucleotide frequencies in the resulting reference genome. Within the paradigm of read mapping by first filtering possible candidate alignments, reduced entropy in the sequence space can increase the required computing effort. We introduce another bisulfite mapping algorithm (abismal), based on the idea of encoding a four-letter DNA sequence as only two letters, one for purines and one for pyrimidines. We show that this encoding can lead to greater specificity compared to existing encodings used to map bisulfite sequencing reads. Through the two-letter encoding, the abismal software tool maps reads in less time and using less memory than most bisulfite sequencing read mapping software tools, while attaining similar accuracy. This allows in silico methylation analysis to be performed in a wider range of computing machines with limited hardware settings.

Download Full-text

Increased accuracy and speed in whole genome bisulfite read mapping using a two-letter alphabet

10.1101/2020.12.21.423849 ◽

2020 ◽

Author(s):

Guilherme de Sena Brandine ◽

Andrew D. Smith

Keyword(s):

Dna Sequence ◽

Cytosine Methylation ◽

Software Tool ◽

Whole Genome ◽

Mapping Algorithm ◽

Letter Alphabet ◽

Wide Range ◽

Genome Bisulfite Sequencing ◽

Range Of Functions ◽

Using Data

AbstractDNA methylation, characterized by the presence of methyl group at cytosines in a DNA sequence, is an important epigenomic mark with a wide range of functions across diverse organisms. Whole genome bisulfite sequencing (WGBS) has emerged as the gold standard to interrogate cytosine methylation. Accurately mapping WGBS reads to a reference genome allows reconstruction of tissue methylomes at single-base resolution. Algorithms used to map WGBS reads often encode the four-base DNA alphabet with three letters by reducing two bases to a common letter.We introduce another bisulfite mapping algorithm (abismal), based on the novel idea of encoding a four-letter DNA sequence as two letters, one for purines and one for pyrimidines. We show theoretically that this encoding benefits from higher uniformity and specificity when subsequences are selected from reads for filtration. In our implementation, this leads to a decreased mapping time relative to the three-letter encoding. We demonstrate, using data from multiple public studies, that the abismal software tool improves mapping accuracy at significantly lower mapping times compared to commonly used mappers, with most notable improvements observed in samples originating from the random priming post-bisulfite adapter tagging protocol.

Download Full-text

URMAP, an ultra-fast read mapper

10.1101/2020.01.12.903351 ◽

2020 ◽

Cited By ~ 1

Author(s):

Robert C. Edgar

Keyword(s):

Read Mapping ◽

Mapping Algorithm ◽

Sequencing Technologies ◽

Large Size ◽

Mapping Software ◽

Biological Studies ◽

Wide Range ◽

Order Of Magnitude ◽

Comparable Accuracy ◽

Generation Sequencing

AbstractMapping of reads to reference sequences is an essential step in a wide range of biological studies. The large size of datasets generated with next-generation sequencing technologies motivates the development of fast mapping software. Here, I describe URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA and Bowtie2 with comparable accuracy on a benchmark test using simulated paired 150nt reads of a well-studied human genome. Software is freely available at https://drive5.com/urmap.

Download Full-text

Reliable variant calling during runtime of Illumina sequencing

10.1101/387662 ◽

2018 ◽

Cited By ~ 1

Author(s):

Tobias P. Loka ◽

Simon H. Tausch ◽

Bernhard Y. Renard

Keyword(s):

Real Time ◽

Disease Outbreaks ◽

Variant Calling ◽

Read Mapping ◽

Mapping Algorithm ◽

Infectious Disease Outbreaks ◽

Whole Exome ◽

Similar Accuracy ◽

Post Hoc

AbstractThe sequential paradigm of data acquisition and analysis in next-generation sequencing leads to high turnaround times for the generation of interpretable results. We combined a novel real-time read mapping algorithm with fast variant calling to obtain reliable variant calls still during the sequencing process. Thereby, our new algorithm allows for accurate read mapping results for intermediate cycles and supports large reference genomes such as the complete human reference. This enables the combination of real-time read mapping results with complex follow-up analysis. In this study, we showed the accuracy and scalability of our approach by applying real-time read mapping and variant calling to seven publicly available human whole exome sequencing datasets. Thereby, up to 89% of all detected SNPs were already identified after 40 sequencing cycles while showing similar precision as at the end of sequencing. Final results showed similar accuracy to those of conventional post-hoc analysis methods. When compared to standard routines, our live approach enables considerably faster interventions in clinical applications and infectious disease outbreaks. Besides variant calling, our approach can be adapted for a plethora of other mapping-based analyses.

Download Full-text

epiGBS2: an improved protocol and automated snakemake workflow for highly multiplexed reduced representation bisulfite sequencing

10.1101/2020.06.23.137091 ◽

2020 ◽

Author(s):

Fleur Gawehns ◽

Maarten Postuma ◽

Thomas P. van Gurp ◽

Niels C. A. M. Wagemaker ◽

Samar Fatma ◽

...

Keyword(s):

Cytosine Methylation ◽

Bisulfite Sequencing ◽

De Novo ◽

Bioinformatics Pipeline ◽

Reduced Representation ◽

Link Type ◽

Time Investment ◽

Reduced Representation Bisulfite Sequencing ◽

Wide Range ◽

Laboratory Protocol

AbstractepiGBS is an existing reduced representation bisulfite sequencing method to determine cytosine methylation and genetic polymorphisms de novo. Here, we present epiGBS2, an improved epiGBS laboratory protocol and user-friendly bioinformatics pipeline for a wide range of species with or without reference genome. epiGBS2 decreases costs and time investment and increases user-friendliness and reproducibility. The library protocol was adjusted to allow for a flexible choice of restriction enzymes and a double digest. Instead of fully methylated adapters, semi-methylated adapters are now used. The bioinformatics pipeline was improved in speed and integrated in the snakemake workflow management system, which now makes the pipeline easy to execute, modular, and parameter settings flexible. We also provide a detailed description of the laboratory protocol, an extensive manual of the bioinformatics pipeline, which is publicly accessible on github (https://github.com/nioo-knaw/epiGBS2) and zenodo (https://doi.org/10.5281/zenodo.3819996), and example output.

Download Full-text

URMAP, an ultra-fast read mapper

PeerJ ◽

10.7717/peerj.9338 ◽

2020 ◽

Vol 8 ◽

pp. e9338

Author(s):

Robert Edgar

Keyword(s):

Variant Calling ◽

Mapping Algorithm ◽

Sequencing Technologies ◽

Mapping Software ◽

A Genome ◽

Biological Studies ◽

Wide Range ◽

Order Of Magnitude ◽

Comparable Accuracy ◽

Validation Tests

Mapping of reads to reference sequences is an essential step in a wide range of biological studies. The large size of datasets generated with next-generation sequencing technologies motivates the development of fast mapping software. Here, I describe URMAP, a new read mapping algorithm. URMAP is an order of magnitude faster than BWA with comparable accuracy on several validation tests. On a Genome in a Bottle (GIAB) variant calling test with 30× coverage 2×150 reads, URMAP achieves high accuracy (precision 0.998, sensitivity 0.982 and F-measure 0.990) with the strelka2 caller. However, GIAB reference variants are shown to be biased against repetitive regions which are difficult to map and may therefore pose an unrealistically easy challenge to read mappers and variant callers.

Download Full-text

Reliable variant calling during runtime of Illumina sequencing

Scientific Reports ◽

10.1038/s41598-019-52991-z ◽

2019 ◽

Vol 9 (1) ◽

Cited By ~ 1

Author(s):

Tobias P. Loka ◽

Simon H. Tausch ◽

Bernhard Y. Renard

Keyword(s):

Real Time ◽

Disease Outbreaks ◽

Variant Calling ◽

Read Mapping ◽

Mapping Algorithm ◽

Infectious Disease Outbreaks ◽

Whole Exome ◽

Similar Accuracy ◽

Post Hoc

Abstract The sequential paradigm of data acquisition and analysis in next-generation sequencing leads to high turnaround times for the generation of interpretable results. We combined a novel real-time read mapping algorithm with fast variant calling to obtain reliable variant calls still during the sequencing process. Thereby, our new algorithm allows for accurate read mapping results for intermediate cycles and supports large reference genomes such as the complete human reference. This enables the combination of real-time read mapping results with complex follow-up analysis. In this study, we showed the accuracy and scalability of our approach by applying real-time read mapping and variant calling to seven publicly available human whole exome sequencing datasets. Thereby, up to 89% of all detected SNPs were already identified after 40 sequencing cycles while showing similar precision as at the end of sequencing. Final results showed similar accuracy to those of conventional post-hoc analysis methods. When compared to standard routines, our live approach enables considerably faster interventions in clinical applications and infectious disease outbreaks. Besides variant calling, our approach can be adapted for a plethora of other mapping-based analyses.

Download Full-text

Structure, Function and Interactions of Tau: Particular Focus on Potential Drug Targets for the Treatment of Tauopathies

CNS & Neurological Disorders - Drug Targets ◽

10.2174/1871527317666180525112008 ◽

2018 ◽

Vol 17 (5) ◽

pp. 325-337 ◽

Cited By ~ 2

Author(s):

Hojjat Borna ◽

Kasim Assadoulahei ◽

Gholamhossein Riazi ◽

Asghar Beigi Harchegani ◽

Alireza Shahriary

Keyword(s):

Tau Protein ◽

Drug Targets ◽

Brain Injuries ◽

Potential Drug ◽

Microtubule Network ◽

Wide Range ◽

Potential Risk Factors ◽

Range Of Functions ◽

Potential Drug Targets

Background & Objective: Neurodegenrative diseases are among the most widespread lifethreatening disorders around the world in elderly ages. The common feature of a group of neurodegenerative disorders, called tauopathies, is an accumulation of microtubule associated protein tau inside the neurons. The exact mechanism underlying tauopathies is not well-understood but several factors such as traumatic brain injuries and genetics are considered as potential risk factors. Although tau protein is well-known for its key role in stabilizing and organization of axonal microtubule network, it bears a broad range of functions including DNA protection and participation in signaling pathways. Moreover, the flexible unfolded structure of tau facilitates modification of tau by a wide range of intracellular enzymes which in turn broadens tau function and interaction spectrum. The distinctive properties of tau protein concomitant with the crucial role of tau interaction partners in the progression of neurodegeneration suggest tau and its binding partners as potential drug targets for the treatment of neurodegenerative diseases. Conclusion: This review aims to give a detailed description of structure, functions and interactions of tau protein in order to provide insight into potential therapeutic targets for treatment of tauopathies.

Download Full-text

Role of Thioredoxin-Interacting Protein in Diseases and Its Therapeutic Outlook

International Journal of Molecular Sciences ◽

10.3390/ijms22052754 ◽

2021 ◽

Vol 22 (5) ◽

pp. 2754

Author(s):

Naila Qayyum ◽

Muhammad Haseeb ◽

Moon Suk Kim ◽

Sangdun Choi

Keyword(s):

Interacting Protein ◽

Thioredoxin Interacting Protein ◽

Signaling Complex ◽

Current Review ◽

Wide Range ◽

Range Of Functions ◽

Species Production ◽

And Oxidative Stress ◽

Significant Attention

Thioredoxin-interacting protein (TXNIP), widely known as thioredoxin-binding protein 2 (TBP2), is a major binding mediator in the thioredoxin (TXN) antioxidant system, which involves a reduction-oxidation (redox) signaling complex and is pivotal for the pathophysiology of some diseases. TXNIP increases reactive oxygen species production and oxidative stress and thereby contributes to apoptosis. Recent studies indicate an evolving role of TXNIP in the pathogenesis of complex diseases such as metabolic disorders, neurological disorders, and inflammatory illnesses. In addition, TXNIP has gained significant attention due to its wide range of functions in energy metabolism, insulin sensitivity, improved insulin secretion, and also in the regulation of glucose and tumor suppressor activities in various cancers. This review aims to highlight the roles of TXNIP in the field of diabetology, neurodegenerative diseases, and inflammation. TXNIP is found to be a promising novel therapeutic target in the current review, not only in the aforementioned diseases but also in prolonged microvascular and macrovascular diseases. Therefore, TXNIP inhibitors hold promise for preventing the growing incidence of complications in relevant diseases.

Download Full-text

A Comparison of Some Methods of Deriving the Instantaneous Unit Hydrograph

Hydrology Research ◽

10.2166/nh.1985.0001 ◽

1985 ◽

Vol 16 (1) ◽

pp. 1-10 ◽

Cited By ~ 2

Author(s):

V. P. Singh ◽

C. Corradini ◽

F. Melone

Keyword(s):

Peak Flow ◽

Central Italy ◽

Effective Rainfall ◽

Unit Hydrograph ◽

Time To Peak ◽

Wide Range ◽

Instantaneous Unit Hydrograph ◽

Similar Accuracy ◽

Two Parameters ◽

Range Of Values

The geomorphological instantaneous unit hydrograph (IUH) proposed by Gupta et al. (1980) was compared with the IUH derived by commonly used time-area and Nash methods. This comparison was performed by analyzing the effective rainfall-direct runoff relationship for four large basins in Central Italy ranging in area from 934 to 4,147 km2. The Nash method was found to be the most accurate of the three methods. The geomorphological method, with only one parameter estimated in advance from the observed data, was found to be little less accurate than the Nash method which has two parameters determined from observations. Furthermore, if the geomorphological and Nash methods employed the same information represented by basin lag, then they produced similar accuracy provided the other Nash parameter, expressed by the product of peak flow and time to peak, was empirically assessed within a wide range of values. It was concluded that it was more appropriate to use the geomorphological method for ungaged basins and the Nash method for gaged basins.

Download Full-text

Biodiversity of Secondary Metabolites Compounds Isolated from Phylum Actinobacteria and Its Therapeutic Applications

Molecules ◽

10.3390/molecules26154504 ◽

2021 ◽

Vol 26 (15) ◽

pp. 4504

Author(s):

Muhanna Al-shaibani ◽

Radin Maya Saphira Radin Mohamed ◽

Nik Sidik ◽

Hesham Enshasy ◽

Adel Al-Gheethi ◽

...

Keyword(s):

Secondary Metabolites ◽

Bioactive Compounds ◽

Extreme Environments ◽

Software Tool ◽

Gene Clusters ◽

Well Being ◽

Bioactive Substances ◽

Diverse Range ◽

Wide Range ◽

Potential Applications

The current review aims to summarise the biodiversity and biosynthesis of novel secondary metabolites compounds, of the phylum Actinobacteria and the diverse range of secondary metabolites produced that vary depending on its ecological environments they inhabit. Actinobacteria creates a wide range of bioactive substances that can be of great value to public health and the pharmaceutical industry. The literature analysis process for this review was conducted using the VOSviewer software tool to visualise the bibliometric networks of the most relevant databases from the Scopus database in the period between 2010 and 22 March 2021. Screening and exploring the available literature relating to the extreme environments and ecosystems that Actinobacteria inhabit aims to identify new strains of this major microorganism class, producing unique novel bioactive compounds. The knowledge gained from these studies is intended to encourage scientists in the natural product discovery field to identify and characterise novel strains containing various bioactive gene clusters with potential clinical applications. It is evident that Actinobacteria adapted to survive in extreme environments represent an important source of a wide range of bioactive compounds. Actinobacteria have a large number of secondary metabolite biosynthetic gene clusters. They can synthesise thousands of subordinate metabolites with different biological actions such as anti-bacterial, anti-parasitic, anti-fungal, anti-virus, anti-cancer and growth-promoting compounds. These are highly significant economically due to their potential applications in the food, nutrition and health industries and thus support our communities’ well-being.

Download Full-text