Quality Control Metrics at Different Stages of Genomic Assembly in the Parallel Sequencing Using the Nanofor SPS

MotivationAs chromatin accessibility data from ATAC-seq experiments continues to expand, there is continuing need for standardized analysis pipelines. Here, we present PEPATAC, an ATAC-seq pipeline that is easily applied to ATAC-seq projects of any size, from one-off experiments to large-scale sequencing projects.ResultsPEPATAC leverages unique features of ATAC-seq data to optimize for speed and accuracy, and it provides several unique analytical approaches. Output includes convenient quality control plots, summary statistics, and a variety of generally useful data formats to set the groundwork for subsequent project-specific data analysis. Downstream analysis is simplified by a standard definition format, modularity of components, and metadata APIs in R and Python. It is restartable, fault-tolerant, and can be run on local hardware, using any cluster resource manager, or in provided Linux containers. We also demonstrate the advantage of aligning to the mitochondrial genome serially, which improves the accuracy of alignment statistics and quality control metrics. PEPATAC is a robust and portable first step for any ATAC-seq project.AvailabilityBSD2-licensed code and documentation at https://pepatac.databio.org.

Download Full-text

Genomics informatic � Quality control metrics for DNA sequencing

10.3403/30375446 ◽

2020 ◽

Keyword(s):

Quality Control ◽

Dna Sequencing ◽

Quality Control Metrics

Download Full-text

Comprehensive generation, visualization, and reporting of quality control metrics for single-cell RNA sequencing data

10.1101/2020.11.16.385328 ◽

2020 ◽

Author(s):

Rui Hong ◽

Yusuke Koga ◽

Shruthi Bandyadka ◽

Anastasia Leshchyk ◽

Zhe Wang ◽

...

Keyword(s):

Quality Control ◽

Single Cell ◽

Rna Sequencing ◽

Sequencing Data ◽

Programming Environments ◽

Marker Selection ◽

Quality Control Metrics ◽

Single Cell Rna Sequencing ◽

Standard Quality ◽

Downstream Analysis

AbstractPerforming comprehensive quality control is necessary to remove technical or biological artifacts in single-cell RNA sequencing (scRNA-seq) data. Artifacts in the scRNA-seq data, such as doublets or ambient RNA, can also hinder downstream clustering and marker selection and need to be assessed. While several algorithms have been developed to perform various quality control tasks, they are only available in different packages across various programming environments. No standardized workflow has been developed to streamline the generation and reporting of all quality control metrics from these tools. We have built an easy-to-use pipeline, named SCTK-QC, in the singleCellTK package that generates a comprehensive set of quality control metrics from a plethora of packages for quality control. We are able to import data from several preprocessing tools including CellRanger, STARSolo, BUSTools, dropEST, Optimus, and SEQC. Standard quality control metrics for each cell are calculated including the total number of UMIs, total number of genes detected, and the percentage of counts mapping to predefined gene sets such as mitochondrial genes. Doublet detection algorithms employed include scrublet, scds, doubletCells, and doubletFinder. DecontX is used to identify contamination in each individual cell. To make the data accessible in downstream analysis workflows, the results can be exported to common data structures in R and Python or to text files for use in any generic workflow. Overall, this pipeline will streamline and standardize quality control analyses for single cell RNA-seq data across different platforms.

Download Full-text

CHIPS: A Snakemake pipeline for quality control and reproducible processing of chromatin profiling data

F1000Research ◽

10.12688/f1000research.52878.1 ◽

2021 ◽

Vol 10 ◽

pp. 517

Author(s):

Len Taing ◽

Gali Bai ◽

Clara Cousins ◽

Paloma Cejas ◽

Xintao Qiu ◽

...

Keyword(s):

Quality Control ◽

Motif Finding ◽

Biological Processes ◽

Chain Reaction ◽

Quality Control Metrics ◽

Regulatory Potential ◽

Polymerase Chain ◽

Downstream Analysis ◽

Chromatin Profiling ◽

Genomic Regions

Motivation: The chromatin profile measured by ATAC-seq, ChIP-seq, or DNase-seq experiments can identify genomic regions critical in regulating gene expression and provide insights on biological processes such as diseases and development. However, quality control and processing chromatin profiling data involves many steps, and different bioinformatics tools are used at each step. It can be challenging to manage the analysis. Results: We developed a Snakemake pipeline called CHIPS (CHromatin enrIchment ProcesSor) to streamline the processing of ChIP-seq, ATAC-seq, and DNase-seq data. The pipeline supports single- and paired-end data and is flexible to start with FASTQ or BAM files. It includes basic steps such as read trimming, mapping, and peak calling. In addition, it calculates quality control metrics such as contamination profiles, polymerase chain reaction bottleneck coefficient, the fraction of reads in peaks, percentage of peaks overlapping with the union of public DNaseI hypersensitivity sites, and conservation profile of the peaks. For downstream analysis, it carries out peak annotations, motif finding, and regulatory potential calculation for all genes. The pipeline ensures that the processing is robust and reproducible. Availability: CHIPS is available at https://github.com/liulab-dfci/CHIPS.

Download Full-text