scholarly journals CHIPS: A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data

2021 ◽  
Author(s):  
Len Taing ◽  
Clara Cousins ◽  
Gali Bai ◽  
Paloma Cejas ◽  
Xintao Qiu ◽  
...  

AbstractMotivationThe chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involve many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis.ResultsWe developed a Snakemake pipeline called CHIPS (CHromatin enrichment Processor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, PCR bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible.AvailabilityCHIPS is available at https://github.com/liulab-dfci/CHIPS

F1000Research ◽  
2021 ◽  
Vol 10 ◽  
pp. 517
Author(s):  
Len Taing ◽  
Gali Bai ◽  
Clara Cousins ◽  
Paloma Cejas ◽  
Xintao Qiu ◽  
...  

Motivation: The chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involves many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis. Results: We developed a Snakemake pipeline called CHIPS (CHromatin enrIchment ProcesSor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, polymerase chain reaction bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible. Availability: CHIPS is available at https://github.com/liulab-dfci/CHIPS.


2020 ◽  
Author(s):  
Jason P. Smith ◽  
M. Ryan Corces ◽  
Jin Xu ◽  
Vincent P. Reuter ◽  
Howard Y. Chang ◽  
...  

MotivationAs chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects.ResultsPEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project.AvailabilityBSD2-licensed code and documentation at https://pepatac.databio.org.


2020 ◽  
Author(s):  
Rui Hong ◽  
Yusuke Koga ◽  
Shruthi Bandyadka ◽  
Anastasia Leshchyk ◽  
Zhe Wang ◽  
...  

AbstractPerforming comprehensive quality control is necessary to remove technical or biological artifacts in single-cell RNA sequencing (scRNA-seq) data. Artifacts in the scRNA-seq data, such as doublets or ambient RNA, can also hinder downstream clustering and marker selection and need to be assessed. While several algorithms have been developed to perform various quality control tasks, they are only available in different packages across various programming environments. No standardized workflow has been developed to streamline the generation and reporting of all quality control metrics from these tools. We have built an easy-to-use pipeline, named SCTK-QC, in the singleCellTK package that generates a comprehensive set of quality control metrics from a plethora of packages for quality control. We are able to import data from several preprocessing tools including CellRanger, STARSolo, BUSTools, dropEST, Optimus, and SEQC. Standard quality control metrics for each cell are calculated including the total number of UMIs, total number of genes detected, and the percentage of counts mapping to predefined gene sets such as mitochondrial genes. Doublet detection algorithms employed include scrublet, scds, doubletCells, and doubletFinder. DecontX is used to identify contamination in each individual cell. To make the data accessible in downstream analysis workflows, the results can be exported to common data structures in R and Python or to text files for use in any generic workflow. Overall, this pipeline will streamline and standardize quality control analyses for single cell RNA-seq data across different platforms.


2021 ◽  
Author(s):  
Byoungjoo Yoo ◽  
Hae-yoon Kim ◽  
Xi Chen ◽  
Weiping Shen ◽  
Ji Sun Jang ◽  
...  

ABSTRACTSteroid hormones influence diverse biological processes throughout the animal life cycle, including metabolism, stress resistance, reproduction, and lifespan. In insects, the steroid hormone, 20-hydroxyecdysone (20E), is the central regulator of molting and metamorphosis, and has been shown to play roles in tissue morphogenesis. For example, amnioserosa contraction, which is a major driving force in Drosophila dorsal closure (DC), is defective in embryos mutant for 20E biosynthesis. Here, we show that 20E signaling modulates the transcription of several DC participants in the amnioserosa and other dorsal tissues during late embryonic development, including the zipper locus, which encodes for non-muscle myosin II heavy chain. Canonical 20E signaling typically involves the binding of Ecdysone receptor (EcR) and Ultraspiracle heterodimers to ecdysone-response elements (EcREs) within the promoters of ecdysone-responsive genes to drive their expression. During DC, we provide evidence that 20E signaling instead acts in parallel to the JNK cascade via a direct interaction between EcR and the AP-1 component, Jun, which together binds to genomic regions containing AP-1 binding sites but no EcREs to control gene expression. Our work demonstrates a novel mode of action for 20E signaling in Drosophila that likely functions beyond DC, and may provide further insights into mammalian steroid hormone receptor interactions with AP-1.


2017 ◽  
Author(s):  
Caleb Lareau ◽  
Martin Aryee

Mumbach et al. recently described HiChIP, a novel protein-mediated chromatin conformation assay that lowers cellular input requirements while simultaneously increasing the yield of informative reads compared to previous methods. To facilitate the dissemination and adoption of this assay, we introduce hichipper (http://aryeelab.org/hichipper), an open-source HiChIP data preprocessing tool, with features that include bias-corrected peak calling, library quality control, DNA loop calling, and output of processed data for downstream analysis and visualization.


Biology Open ◽  
2021 ◽  
Author(s):  
Byoungjoo Yoo ◽  
Hae-yoon Kim ◽  
Xi Chen ◽  
Weiping Shen ◽  
Ji Sun Jang ◽  
...  

Steroid hormones influence diverse biological processes throughout the animal life cycle, including metabolism, stress resistance, reproduction, and lifespan. In insects, the steroid hormone, 20-hydroxyecdysone (20E), is the central hormone regulator of molting and metamorphosis, and plays roles in tissue morphogenesis. For example, amnioserosa contraction, which is a major driving force in Drosophila dorsal closure (DC), is defective in embryos mutant for 20E biosynthesis. Here, we show that 20E signaling modulates the transcription of several DC participants in the amnioserosa and other dorsal tissues during late embryonic development, including zipper, which encodes for non-muscle myosin. Canonical ecdysone signaling typically involves the binding of Ecdysone receptor (EcR) and Ultraspiracle heterodimers to ecdysone-response elements (EcREs) within the promoters of responsive genes to drive expression. During DC, however, we provide evidence that 20E signaling instead acts in parallel to the JNK cascade via a direct interaction between EcR and the AP-1 transcription factor subunit, Jun, which together binds to genomic regions containing AP-1 binding sites but no EcREs to control gene expression. Our work demonstrates a novel mode of action for 20E signaling in Drosophila that likely functions beyond DC, and may provide further insights into mammalian steroid hormone receptor interactions with AP-1.


2021 ◽  
Vol 3 (4) ◽  
Author(s):  
Jason P Smith ◽  
M Ryan Corces ◽  
Jin Xu ◽  
Vincent P Reuter ◽  
Howard Y Chang ◽  
...  

Abstract As chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects. PEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project. BSD2-licensed code and documentation are available at https://pepatac.databio.org.


2019 ◽  
Vol 23 (15) ◽  
pp. 1663-1670 ◽  
Author(s):  
Chunyan Ao ◽  
Shunshan Jin ◽  
Yuan Lin ◽  
Quan Zou

Protein methylation is an important and reversible post-translational modification that regulates many biological processes in cells. It occurs mainly on lysine and arginine residues and involves many important biological processes, including transcriptional activity, signal transduction, and the regulation of gene expression. Protein methylation and its regulatory enzymes are related to a variety of human diseases, so improved identification of methylation sites is useful for designing drugs for a variety of related diseases. In this review, we systematically summarize and analyze the tools used for the prediction of protein methylation sites on arginine and lysine residues over the last decade.


Author(s):  
Rianne R. Campbell ◽  
Siwei Chen ◽  
Joy H. Beardwood ◽  
Alberto J. López ◽  
Lilyana V. Pham ◽  
...  

AbstractDuring the initial stages of drug use, cocaine-induced neuroadaptations within the ventral tegmental area (VTA) are critical for drug-associated cue learning and drug reinforcement processes. These neuroadaptations occur, in part, from alterations to the transcriptome. Although cocaine-induced transcriptional mechanisms within the VTA have been examined, various regimens and paradigms have been employed to examine candidate target genes. In order to identify key genes and biological processes regulating cocaine-induced processes, we employed genome-wide RNA-sequencing to analyze transcriptional profiles within the VTA from male mice that underwent one of four commonly used paradigms: acute home cage injections of cocaine, chronic home cage injections of cocaine, cocaine-conditioning, or intravenous-self administration of cocaine. We found that cocaine alters distinct sets of VTA genes within each exposure paradigm. Using behavioral measures from cocaine self-administering mice, we also found several genes whose expression patterns corelate with cocaine intake. In addition to overall gene expression levels, we identified several predicted upstream regulators of cocaine-induced transcription shared across all paradigms. Although distinct gene sets were altered across cocaine exposure paradigms, we found, from Gene Ontology (GO) term analysis, that biological processes important for energy regulation and synaptic plasticity were affected across all cocaine paradigms. Coexpression analysis also identified gene networks that are altered by cocaine. These data indicate that cocaine alters networks enriched with glial cell markers of the VTA that are involved in gene regulation and synaptic processes. Our analyses demonstrate that transcriptional changes within the VTA depend on the route, dose and context of cocaine exposure, and highlight several biological processes affected by cocaine. Overall, these findings provide a unique resource of gene expression data for future studies examining novel cocaine gene targets that regulate drug-associated behaviors.


Sign in / Sign up

Export Citation Format

Share Document