scholarly journals ipDMR: identification of differentially methylated regions with interval P-values

Author(s):  
Zongli Xu ◽  
Changchun Xie ◽  
Jack A Taylor ◽  
Liang Niu

Abstract Summary ipDMR is an R software tool for identification of differentially methylated regions (DMRs) using auto-correlated P-values for individual CpGs from epigenome-wide association analysis using array or bisulfite sequencing data. It summarizes P-values for adjacent CpGs, identifies association peaks and then extends peaks to find boundaries of DMRs. ipDMR uses BED format files as input and is easy to use. Simulations guided by real data found that ipDMR outperformed current available methods and provided slightly higher true positive rates and much lower false discovery rates. Availability and implementation ipDMR is available at https://bioconductor.org/packages/release/bioc/html/ENmix.html. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Peter Hettegger ◽  
Klemens Vierlinger ◽  
Andreas Weinhaeusel

Abstract Motivation Data generated from high-throughput technologies such as sequencing, microarray and bead-chip technologies are unavoidably affected by batch effects (BEs). Large effort has been put into developing methods for correcting these effects. Often, BE correction and hypothesis testing cannot be done with one single model, but are done successively with separate models in data analysis pipelines. This potentially leads to biased P-values or false discovery rates due to the influence of BE correction on the data. Results We present a novel approach for estimating null distributions of test statistics in data analysis pipelines where BE correction is followed by linear model analysis. The approach is based on generating simulated datasets by random rotation and thereby retains the dependence structure of genes adequately. This allows estimating null distributions of dependent test statistics, and thus the calculation of resampling-based P-values and false-discovery rates following BE correction while maintaining the alpha level. Availability The described methods are implemented as randRotation package on Bioconductor: https://bioconductor.org/packages/randRotation/ Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Sebastian Deorowicz ◽  
Agnieszka Debudaj-Grabysz ◽  
Adam Gudyś ◽  
Szymon Grabowski

AbstractMotivationMapping reads to a reference genome is often the first step in a sequencing data analysis pipeline. Mistakes made at this computationally challenging stage cannot be recovered easily.ResultsWe present Whisper, an accurate and high-performant mapping tool, based on the idea of sorting reads and then mapping them against suffix arrays for the reference genome and its reverse complement. Employing task and data parallelism as well as storing temporary data on disk result in superior time efficiency at reasonable memory requirements. Whisper excels at large NGS read collections, in particular Illumina reads with typical WGS coverage. The experiments with real data indicate that our solution works in about 15% of the time needed by the well-known Bowtie2 and BWA-MEM tools at a comparable accuracy (validated in variant calling pipeline).AvailabilityWhisper is available for free from https://github.com/refresh-bio/Whisper or http://sun.aei.polsl.pl/REFRESH/Whisper/[email protected] informationSupplementary data are available at publisher Web site.


2018 ◽  
Author(s):  
Aaron T. L. Lun ◽  
Samantha Riesenfeld ◽  
Tallulah Andrews ◽  
Tomas Gomes ◽  
John C. Marioni ◽  
...  

AbstractDroplet-based single-cell RNA sequencing protocols have dramatically increased the throughput and efficiency of single-cell transcriptomics studies. A key computational challenge when processing these data is to distinguish libraries for real cells from empty droplets. Existing methods for cell calling set a minimum threshold on the total unique molecular identifier (UMI) count for each library, which indiscriminately discards cell libraries with low UMI counts. Here, we describe a new statistical method for calling cells from droplet-based data, based on detecting significant deviations from the expression profile of the ambient solution. Using simulations, we demonstrate that our method has greater power than existing approaches for detecting cell libraries with low UMI counts, while controlling the false discovery rate among detected cells. We also apply our method to real data, where we show that the use of our method results in the retention of distinct cell types that would otherwise have been discarded.


2019 ◽  
Vol 35 (22) ◽  
pp. 4716-4723 ◽  
Author(s):  
Daniel Tello ◽  
Juanita Gil ◽  
Cristian D Loaiza ◽  
John J Riascos ◽  
Nicolás Cardozo ◽  
...  

Abstract Motivation Accurate detection, genotyping and downstream analysis of genomic variants from high-throughput sequencing data are fundamental features in modern production pipelines for genetic-based diagnosis in medicine or genomic selection in plant and animal breeding. Our research group maintains the Next-Generation Sequencing Experience Platform (NGSEP) as a precise, efficient and easy-to-use software solution for these features. Results Understanding that incorrect alignments around short tandem repeats are an important source of genotyping errors, we implemented in NGSEP new algorithms for realignment and haplotype clustering of reads spanning indels and short tandem repeats. We performed extensive benchmark experiments comparing NGSEP to state-of-the-art software using real data from three sequencing protocols and four species with different distributions of repetitive elements. NGSEP consistently shows comparative accuracy and better efficiency compared to the existing solutions. We expect that this work will contribute to the continuous improvement of quality in variant calling needed for modern applications in medicine and agriculture. Availability and implementation NGSEP is available as open source software at http://ngsep.sf.net. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (10) ◽  
pp. 3115-3123 ◽  
Author(s):  
Teng Fei ◽  
Tianwei Yu

Abstract Motivation Batch effect is a frequent challenge in deep sequencing data analysis that can lead to misleading conclusions. Existing methods do not correct batch effects satisfactorily, especially with single-cell RNA sequencing (RNA-seq) data. Results We present scBatch, a numerical algorithm for batch-effect correction on bulk and single-cell RNA-seq data with emphasis on improving both clustering and gene differential expression analysis. scBatch is not restricted by assumptions on the mechanism of batch-effect generation. As shown in simulations and real data analyses, scBatch outperforms benchmark batch-effect correction methods. Availability and implementation The R package is available at github.com/tengfei-emory/scBatch. The code to generate results and figures in this article is available at github.com/tengfei-emory/scBatch-paper-scripts. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Jorge Alvarez-Jarreta ◽  
Patricia R S Rodrigues ◽  
Eoin Fahy ◽  
Anne O’Connor ◽  
Anna Price ◽  
...  

Abstract Summary We present LipidFinder 2.0, incorporating four new modules that apply artefact filters, remove lipid and contaminant stacks, in-source fragments and salt clusters, and a new isotope deletion method which is significantly more sensitive than available open-access alternatives. We also incorporate a novel false discovery rate method, utilizing a target–decoy strategy, which allows users to assess data quality. A renewed lipid profiling method is introduced which searches three different databases from LIPID MAPS and returns bulk lipid structures only, and a lipid category scatter plot with color blind friendly pallet. An API interface with XCMS Online is made available on LipidFinder’s online version. We show using real data that LipidFinder 2.0 provides a significant improvement over non-lipid metabolite filtering and lipid profiling, compared to available tools. Availability and implementation LipidFinder 2.0 is freely available at https://github.com/ODonnell-Lipidomics/LipidFinder and http://lipidmaps.org/resources/tools/lipidfinder. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Vol 35 (19) ◽  
pp. 3592-3598 ◽  
Author(s):  
Justin G Chitpin ◽  
Aseel Awdeh ◽  
Theodore J Perkins

Abstract Motivation Chromatin Immunopreciptation (ChIP)-seq is used extensively to identify sites of transcription factor binding or regions of epigenetic modifications to the genome. A key step in ChIP-seq analysis is peak calling, where genomic regions enriched for ChIP versus control reads are identified. Many programs have been designed to solve this task, but nearly all fall into the statistical trap of using the data twice—once to determine candidate enriched regions, and again to assess enrichment by classical statistical hypothesis testing. This double use of the data invalidates the statistical significance assigned to enriched regions, thus the true significance or reliability of peak calls remains unknown. Results Using simulated and real ChIP-seq data, we show that three well-known peak callers, MACS, SICER and diffReps, output biased P-values and false discovery rate estimates that can be many orders of magnitude too optimistic. We propose a wrapper algorithm, RECAP, that uses resampling of ChIP-seq and control data to estimate a monotone transform correcting for biases built into peak calling algorithms. When applied to null hypothesis data, where there is no enrichment between ChIP-seq and control, P-values recalibrated by RECAP are approximately uniformly distributed. On data where there is genuine enrichment, RECAP P-values give a better estimate of the true statistical significance of candidate peaks and better false discovery rate estimates, which correlate better with empirical reproducibility. RECAP is a powerful new tool for assessing the true statistical significance of ChIP-seq peak calls. Availability and implementation The RECAP software is available through www.perkinslab.ca or on github at https://github.com/theodorejperkins/RECAP. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document