scholarly journals ChIPdig: a comprehensive user-friendly tool for mining multi-sample ChIP-seq data

F1000Research ◽  
2019 ◽  
Vol 8 ◽  
pp. 1295 ◽  
Author(s):  
Ruben Esse

In recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data. Here we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, and 5' and 3' untranslated regions), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example. ChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig.

2017 ◽  
Author(s):  
Ruben Esse ◽  
Alla Grishok

AbstractBackgroundIn recent years, epigenetic research has enjoyed explosive growth as high-throughput sequencing technologies become more accessible and affordable. However, this advancement has not been matched with similar progress in data analysis capabilities from the perspective of experimental biologists not versed in bioinformatic languages. For instance, chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is at present widely used to identify genomic loci of transcription factor binding and histone modifications. Basic ChIP-seq data analysis, including read mapping and peak calling, can be accomplished through several well-established tools, but more sophisticated analyzes aimed at comparing data derived from different conditions or experimental designs constitute a significant bottleneck. We reason that the implementation of a single comprehensive ChIP-seq analysis pipeline could be beneficial for many experimental (wet lab) researchers who would like to generate genomic data.ResultsHere we present ChIPdig, a stand-alone application with adjustable parameters designed to allow researchers to perform several analyzes, namely read mapping to a reference genome, peak calling, annotation of regions based on reference coordinates (e.g. transcription start and termination sites, exons, introns, 5′ UTRs and 3′ UTRs), and generation of heatmaps and metaplots for visualizing coverage. Importantly, ChIPdig accepts multiple ChIP-seq datasets as input, allowing genome-wide differential enrichment analysis in regions of interest to be performed. ChIPdig is written in R and enables access to several existing and highly utilized packages through a simple user interface powered by the Shiny package. Here, we illustrate the utility and user-friendly features of ChIPdig by analyzing H3K36me3 and H3K4me3 ChIP-seq profiles generated by the modENCODE project as an example.ConclusionsChIPdig offers a comprehensive and user-friendly pipeline for analysis of multiple sets of ChIP-seq data by both experimental and computational researchers. It is open source and available at https://github.com/rmesse/ChIPdig.


2016 ◽  
Author(s):  
Christophe D Chabbert ◽  
Lars M Steinmetz ◽  
Bernd Klaus

The genome–wide study of epigenetic states requires the integrative analysis of histone modification ChIP–seq data. Here, we introduce an easy–to–use analytic framework to compare profiles of enrichment in histone modifications around classes of genomic elements, e.g. transcription start sites (TSS). Our framework is available via the user–friendly R/Bioconductor package DChIPRep. DChIPRep uses biological replicate information as well as chromatin Input data to allow for a rigorous assessment of differential enrichment. DChIPRep is available for download through the Bioconductor project at http://bioconductor.org/packages/DChIPRep. Contact [email protected]


2020 ◽  
Author(s):  
Marius Welzel ◽  
Anja Lange ◽  
Dominik Heider ◽  
Michael Schwarz ◽  
Bernd Freisleben ◽  
...  

AbstractSequencing of marker genes amplified from environmental samples, known as amplicon sequencing, allows us to resolve some of the hidden diversity and elucidate evolutionary relationships and ecological processes among complex microbial communities. The analysis of large numbers of samples at high sequencing depths generated by high throughput sequencing technologies requires effcient, flexible, and reproducible bioinformatics pipelines. Only a few existing workflows can be run in a user-friendly, scalable, and reproducible manner on different computing devices using an effcient workflow management system. We present Natrix, an open-source bioinformatics workflow for preprocessing raw amplicon sequencing data. The workflow contains all analysis steps from quality assessment, read assembly, dereplication, chimera detection, split-sample merging, sequence representative assignment (OTUs or ASVs) to the taxonomic assignment of sequence representatives. The workflow is written using Snakemake, a workflow management engine for developing data analysis workflows. In addition, Conda is used for version control. Thus, Snakemake ensures reproducibility and Conda offers version control of the utilized programs. The encapsulation of rules and their dependencies support hassle-free sharing of rules between workflows and easy adaptation and extension of existing workflows. Natrix is freely available on GitHub (https://github.com/MW55/Natrix).


2018 ◽  
Author(s):  
Daniel Capurso ◽  
Jiahui Wang ◽  
Simon Zhongyuan Tian ◽  
Liuyang Cai ◽  
Sandeep Namburi ◽  
...  

AbstractChIA-PET enables the genome-wide discovery of chromatin interactions involving specific protein factors, with base-pair resolution. Interpreting ChIA-PET data depends on having a robust analytic pipeline. Here, we introduce ChIA-PIPE, a fully automated pipeline for ChIA-PET data processing, quality assessment, analysis, and visualization. ChIA-PIPE performs linker filtering, read mapping, peak calling, loop calling, chromatin-contact-domain calling, and can resolve allele-specific peaks and loops. ChIA-PIPE also automates quality-control assessment for each dataset. Furthermore, ChIA-PIPE generates input files for visualizing 2D contact maps with Juicebox and HiGlass, and provides a new dockerized visualization tool for high-resolution, browser-based exploration of peaks and loops. With minimal adjusting, ChIA-PIPE can also be suited for the analysis of other related chromatin-mapping data.


2007 ◽  
Vol 8 (1) ◽  
pp. 3-12
Author(s):  
David B. Kushner

DNA microarrays have significantly impacted the study of gene expression on a genome-wide level but also have forced a more global consideration of research questions. As such, it has become critical to introduce undergraduate students to genomics approaches to research. A challenge with performing a DNA microarray experiment in the teaching lab is determining the time required for the study and how to handle the voluminous data generated. At an unexpectedly low cost, a 6-week, project-based lab module has been developed that provides 3 weeks for wet lab (hands-on work with the DNA microarrays) and 3 weeks for dry lab (analyzing data, using databases to help with data analysis, and considering the meaning of data within the large dataset). Options exist for extending the number of weeks dedicated to the project, but 6 weeks is sufficient for providing an introduction to both experimental genomics and data analysis. Students indicate that being able to both perform array experiments and thoroughly analyze data enriches their understanding of genomics and the complexity of biological systems.


GigaScience ◽  
2020 ◽  
Vol 9 (11) ◽  
Author(s):  
Florian Heyl ◽  
Daniel Maticzka ◽  
Michael Uhl ◽  
Rolf Backofen

Abstract Background Post-transcriptional regulation via RNA-binding proteins plays a fundamental role in every organism, but the regulatory mechanisms lack important understanding. Nevertheless, they can be elucidated by cross-linking immunoprecipitation in combination with high-throughput sequencing (CLIP-Seq). CLIP-Seq answers questions about the functional role of an RNA-binding protein and its targets by determining binding sites on a nucleotide level and associated sequence and structural binding patterns. In recent years the amount of CLIP-Seq data skyrocketed, urging the need for an automatic data analysis that can deal with different experimental set-ups. However, noncanonical data, new protocols, and a huge variety of tools, especially for peak calling, made it difficult to define a standard. Findings CLIP-Explorer is a flexible and reproducible data analysis pipeline for iCLIP data that supports for the first time eCLIP, FLASH, and uvCLAP data. Individual steps like peak calling can be changed to adapt to different experimental settings. We validate CLIP-Explorer on eCLIP data, finding similar or nearly identical motifs for various proteins in comparison with other databases. In addition, we detect new sequence motifs for PTBP1 and U2AF2. Finally, we optimize the peak calling with 3 different peak callers on RBFOX2 data, discuss the difficulty of the peak-calling step, and give advice for different experimental set-ups. Conclusion CLIP-Explorer finally fills the demand for a flexible CLIP-Seq data analysis pipeline that is applicable to the up-to-date CLIP protocols. The article further shows the limitations of current peak-calling algorithms and the importance of a robust peak detection.


Plants ◽  
2020 ◽  
Vol 9 (4) ◽  
pp. 439 ◽  
Author(s):  
Hanna Marie Schilbert ◽  
Andreas Rempel ◽  
Boas Pucker

High-throughput sequencing technologies have rapidly developed during the past years and have become an essential tool in plant sciences. However, the analysis of genomic data remains challenging and relies mostly on the performance of automatic pipelines. Frequently applied pipelines involve the alignment of sequence reads against a reference sequence and the identification of sequence variants. Since most benchmarking studies of bioinformatics tools for this purpose have been conducted on human datasets, there is a lack of benchmarking studies in plant sciences. In this study, we evaluated the performance of 50 different variant calling pipelines, including five read mappers and ten variant callers, on six real plant datasets of the model organism Arabidopsis thaliana. Sets of variants were evaluated based on various parameters including sensitivity and specificity. We found that all investigated tools are suitable for analysis of NGS data in plant research. When looking at different performance metrics, BWA-MEM and Novoalign were the best mappers and GATK returned the best results in the variant calling step.


2016 ◽  
Author(s):  
Christophe D Chabbert ◽  
Lars M Steinmetz ◽  
Bernd Klaus

The genome–wide study of epigenetic states requires the integrative analysis of histone modification ChIP–seq data. Here, we introduce an easy–to–use analytic framework to compare profiles of enrichment in histone modifications around classes of genomic elements, e.g. transcription start sites (TSS). Our framework is available via the user–friendly R/Bioconductor package DChIPRep. DChIPRep uses biological replicate information as well as chromatin Input data to allow for a rigorous assessment of differential enrichment. DChIPRep is available for download through the Bioconductor project at http://bioconductor.org/packages/DChIPRep. Contact [email protected]


2017 ◽  
Author(s):  
Julian Garneau ◽  
Florence Depardieu ◽  
Louis-Charles Fortier ◽  
David Bikard ◽  
Marc Monot

ABSTRACTBacteriophages are the most abundant viruses on earth and display an impressive genetic as well as morphologic diversity. Among those, the most common order of phages is the Caudovirales, whose viral particles packages linear double stranded DNA (dsDNA). In this study we investigated how the information gathered by high throughput sequencing technologies can be used to determine the DNA termini and packaging mechanisms of dsDNA phages. The wet-lab procedures traditionally used for this purpose rely on the identification and cloning of restriction fragment which can be delicate and cumbersome. Here, we developed a theoretical and statistical framework to analyze DNA termini and phage packaging mechanisms using next-generation sequencing data. Our methods, implemented in the PhageTerm software, work with sequencing reads in fastq format and the corresponding assembled phage genome.PhageTerm was validated on a set of phages with well-established packaging mechanisms representative of the termini diversity: 5’cos (lambda), 3’cos (HK97), pac (P1), headful without a pac site (T4), DTR (T7) and host fragment (Mu). In addition, we determined the termini of 9Clostridium difficilephages and 6 phages whose sequences where retrieved from the sequence read archive (SRA).A direct graphical interface is available as a Galaxy wrapper version athttps://galaxy.pasteur.frand a standalone version is accessible athttps://sourceforge.net/projects/phageterm/.


2016 ◽  
Author(s):  
Arun Durvasula ◽  
Paul J Hoffman ◽  
Tyler V Kent ◽  
Chaochih Liu ◽  
Thomas J Y Kono ◽  
...  

High throughput sequencing has changed many aspects of population genetics, molecular ecology, and related fields, affecting both experimental design and data analysis. The software package ANGSD allows users to perform a number of population genetic analyses on high-throughput sequencing data. ANGSD uses probabilistic approaches to calculate genome-wide descriptive statistics. The package makes use of genotype likelihood estimates rather than SNP calls and is specifically designed to produce more accurate results for samples with low sequencing depth. ANGSD makes use of full genome data while handling a wide array of sampling and experimental designs. Here we present ANGSD-wrapper, a set of wrapper scripts that provide a user-friendly interface for running ANGSD and visualizing results. ANGSD-wrapper supports multiple types of analyses including esti- mates of nucleotide sequence diversity and performing neutrality tests, principal component analysis, estimation of admixture proportions for individuals samples, and calculation of statistics that quantify recent introgression. ANGSD-wrapper also provides interactive graphing of ANGSD results to enhance data exploration. We demonstrate the usefulness of ANGSD-wrapper by analyzing resequencing data from populations of wild and domesticated Zea. ANGSD-wrapper is freely available from https://github.com/mojaveazure/angsd-wrapper.


Sign in / Sign up

Export Citation Format

Share Document