Differential Enriched Scan 2 (DEScan2): a fast pipeline for broad peak analysis

Nowadays, the analysis of RNA-seq and BS-Seq can be considered well established, whereas the analysis of broad peaks data as Sono-Seq/ATAC-Seq and histone modification (HM) ChIP-Seq is still challenging. To fill the gap in existing methods, we present DEScan2 a novel bioconductor package [2] for the analysis of broad peaks data. The method consists of three main steps: 1) a peak caller, 2) peak filtering and alignment across replicates and 3) a method to efficiently compute a count matrix of the filtered peaks. Using an already published ATAC-Seq dataset for chromatin accessibility our method shows interesting results, also by comparing it with other well-known tools for this kind of data analysis.

Download Full-text

Differential Enriched Scan 2 (DEScan2): a fast pipeline for broad peak analysis

10.7287/peerj.preprints.27357v1 ◽

2018 ◽

Cited By ~ 1

Author(s):

Dario Righelli ◽

John Koberstein ◽

Nancy Zhang ◽

Claudia Angelini ◽

Lucia Peixoto ◽

...

Keyword(s):

Data Analysis ◽

Histone Modification ◽

Chromatin Accessibility ◽

Broad Peak ◽

Bioconductor Package ◽

Rna Seq ◽

Peak Caller

Download Full-text

Advancing clinical genomics and precision medicine with GVViZ: FAIR bioinformatics platform for variable gene-disease annotation, visualization, and expression analysis

Human Genomics ◽

10.1186/s40246-021-00336-1 ◽

2021 ◽

Vol 15 (1) ◽

Author(s):

Zeeshan Ahmed ◽

Eduard Gibert Renart ◽

Saman Zeeshan ◽

XinQi Dong

Keyword(s):

Data Analysis ◽

Patient Care ◽

Expression Analysis ◽

High Throughput ◽

Gene Annotation ◽

Next Generation Sequencing Data ◽

Rna Seq ◽

Sequencing Data ◽

Complex Disorders ◽

Transcriptomics Data

Abstract Background Genetic disposition is considered critical for identifying subjects at high risk for disease development. Investigating disease-causing and high and low expressed genes can support finding the root causes of uncertainties in patient care. However, independent and timely high-throughput next-generation sequencing data analysis is still a challenge for non-computational biologists and geneticists. Results In this manuscript, we present a findable, accessible, interactive, and reusable (FAIR) bioinformatics platform, i.e., GVViZ (visualizing genes with disease-causing variants). GVViZ is a user-friendly, cross-platform, and database application for RNA-seq-driven variable and complex gene-disease data annotation and expression analysis with a dynamic heat map visualization. GVViZ has the potential to find patterns across millions of features and extract actionable information, which can support the early detection of complex disorders and the development of new therapies for personalized patient care. The execution of GVViZ is based on a set of simple instructions that users without a computational background can follow to design and perform customized data analysis. It can assimilate patients’ transcriptomics data with the public, proprietary, and our in-house developed gene-disease databases to query, easily explore, and access information on gene annotation and classified disease phenotypes with greater visibility and customization. To test its performance and understand the clinical and scientific impact of GVViZ, we present GVViZ analysis for different chronic diseases and conditions, including Alzheimer’s disease, arthritis, asthma, diabetes mellitus, heart failure, hypertension, obesity, osteoporosis, and multiple cancer disorders. The results are visualized using GVViZ and can be exported as image (PNF/TIFF) and text (CSV) files that include gene names, Ensembl (ENSG) IDs, quantified abundances, expressed transcript lengths, and annotated oncology and non-oncology diseases. Conclusions We emphasize that automated and interactive visualization should be an indispensable component of modern RNA-seq analysis, which is currently not the case. However, experts in clinics and researchers in life sciences can use GVViZ to visualize and interpret the transcriptomics data, making it a powerful tool to study the dynamics of gene expression and regulation. Furthermore, with successful deployment in clinical settings, GVViZ has the potential to enable high-throughput correlations between patient diagnoses based on clinical and transcriptomics data.

Download Full-text

Expression and Co-expression Analyses of WRKY, MYB, bHLH and bZIP Transcription Factor Genes in Potato (Solanum tuberosum) Under Abiotic Stress Conditions: RNA-seq Data Analysis

Potato Research ◽

10.1007/s11540-021-09502-3 ◽

2021 ◽

Author(s):

Ertugrul Filiz ◽

Firat Kurt

Keyword(s):

Transcription Factor ◽

Abiotic Stress ◽

Solanum Tuberosum ◽

Data Analysis ◽

Stress Conditions ◽

Bzip Transcription Factor ◽

Rna Seq ◽

Transcription Factor Genes

Download Full-text

Impact of RNA-seq data analysis algorithms on gene expression estimation and downstream prediction

Scientific Reports ◽

10.1038/s41598-020-74567-y ◽

2020 ◽

Vol 10 (1) ◽

Author(s):

Li Tong ◽

◽

Po-Yen Wu ◽

John H. Phan ◽

Hamid R. Hassazadeh ◽

...

Keyword(s):

Gene Expression ◽

Data Analysis ◽

Disease Outcome ◽

Rna Seq ◽

Next Generation Sequencing Technology ◽

Normalization Methods ◽

The Us ◽

Sequencing Quality ◽

Improved Accuracy ◽

The Impact

Abstract To use next-generation sequencing technology such as RNA-seq for medical and health applications, choosing proper analysis methods for biomarker identification remains a critical challenge for most users. The US Food and Drug Administration (FDA) has led the Sequencing Quality Control (SEQC) project to conduct a comprehensive investigation of 278 representative RNA-seq data analysis pipelines consisting of 13 sequence mapping, three quantification, and seven normalization methods. In this article, we focused on the impact of the joint effects of RNA-seq pipelines on gene expression estimation as well as the downstream prediction of disease outcomes. First, we developed and applied three metrics (i.e., accuracy, precision, and reliability) to quantitatively evaluate each pipeline’s performance on gene expression estimation. We then investigated the correlation between the proposed metrics and the downstream prediction performance using two real-world cancer datasets (i.e., SEQC neuroblastoma dataset and the NIH/NCI TCGA lung adenocarcinoma dataset). We found that RNA-seq pipeline components jointly and significantly impacted the accuracy of gene expression estimation, and its impact was extended to the downstream prediction of these cancer outcomes. Specifically, RNA-seq pipelines that produced more accurate, precise, and reliable gene expression estimation tended to perform better in the prediction of disease outcome. In the end, we provided scenarios as guidelines for users to use these three metrics to select sensible RNA-seq pipelines for the improved accuracy, precision, and reliability of gene expression estimation, which lead to the improved downstream gene expression-based prediction of disease outcome.

Download Full-text

BioJupies: Automated Generation of Interactive Notebooks for RNA-Seq Data Analysis in the Cloud

Cell Systems ◽

10.1016/j.cels.2018.10.007 ◽

2018 ◽

Vol 7 (5) ◽

pp. 556-561.e3 ◽

Cited By ~ 61

Author(s):

Denis Torre ◽

Alexander Lachmann ◽

Avi Ma’ayan

Keyword(s):

Data Analysis ◽

Rna Seq ◽

Automated Generation

Download Full-text

RNA-Seq Data Analysis (Bowtie-TopHat-Cufflinks) v3 (protocols.io.x9qfr5w)

protocols.io ◽

10.17504/protocols.io.x9qfr5w ◽

2019 ◽

Author(s):

Kiichi Hirota

Keyword(s):

Data Analysis ◽

Rna Seq

Download Full-text

Chromatin Accessibility Mapping of Primary Erythroid Cell Populations Leads to Identification and Validation of Nuclear Factor I X (NFIX) As a Novel Fetal Hemoglobin (HbF) Repressor

Blood ◽

10.1182/blood-2019-124337 ◽

2019 ◽

Vol 134 (Supplement_1) ◽

pp. 812-812

Author(s):

Mudit Chaand ◽

Chris Fiore ◽

Brian T Johnston ◽

Diane H Moon ◽

John P Carulli ◽

...

Keyword(s):

Gene Expression ◽

Globin Gene ◽

Chromatin Accessibility ◽

Cell Populations ◽

Rna Seq ◽

Globin Mrna ◽

Employment Equity ◽

Equity Ownership ◽

Erythroid Maturation ◽

Globin Gene Expression

Human beta-like globin gene expression is developmentally regulated. Erythroblasts (EBs) derived from fetal tissues, such as umbilical cord blood (CB), primarily express gamma globin mRNA (HBG) and HbF, while EBs derived from adult tissues, such as bone marrow (BM), predominantly express beta globin mRNA (HBB) and adult hemoglobin. Human genetics has validated de-repression of HBG in adult EBs as a powerful therapeutic paradigm in diseases involving defective HBB, such as sickle cell anemia. To identify novel factors involved in the switch from HBG to HBB expression, and to better understand the global regulatory networks driving the fetal and adult cell states, we performed transcriptome profiling (RNA-seq) and chromatin accessibility profiling (ATAC-seq) on sorted EB cell populations from CB or BM. This approach improves upon previous studies that used unsorted cells (Huang J, Dev Cell 2016) or that did not measure chromatin accessibility (Yan H, Am J Hematol 2018). CD34+ cells from CB and BM were differentiated using a 3-phase in vitro culture system (Giarratana M, Blood 2011). Fluorescence-activated cell sorting and the cell surface markers CD36 and GYPA were used to isolate 7 discrete populations, with each sorting gate representing increasingly mature, stage-matched EBs from CB or BM (Fig 1A, B). RNA-seq analysis revealed expected expression patterns of the beta-like globins, with total levels increasing during erythroid maturation and primarily composed of HBB or HBG transcripts in BM or CB, respectively (Fig 1C). Erythroid maturation led to progressive increases in chromatin accessibility at the HBB promoter in BM populations. In CB-derived cells, erythroid maturation led to progressive increases in chromatin accessibility at the HBG promoters through the CD36+GYPA+ stage (Pops 1-5). Chromatin accessibility shifted from the HBG promoters to the HBB promoter during the final stages of differentiation (Pops 6-7), suggesting that HBG gene activation is transient in CB EBs (Fig 1D). Hierarchical clustering and principal component analysis of ATAC-seq data revealed that cell populations cluster based on differentiation stage rather than by BM or CB lineage, suggesting most molecular changes are stage-specific, not lineage-specific (Fig 2A, B). To identify transcription factors driving cell state, and potentially beta-like globin expression preference, we searched for DNA binding motifs within regions of differential chromatin accessibility and found NFI factor motifs enriched under peaks that were larger in BM relative to CB (Fig 2C). Transcription factor footprinting analysis showed that both flanking accessibility and footprint depth at NFI motifs were also increased in BM relative to CB (Fig 2D). Increased chromatin accessibility was observed at the NFIX promoter in BM relative to CB populations, and in HUDEP-2 relative to HUDEP-1 cell lines (Fig 2E). Furthermore, accessibility at the NFIX promoter correlated with elevated NFIX mRNA in BM and HUDEP-2 relative to CB and HUDEP-1, respectively. Together these data implicated NFIX in HbF repression, a finding consistent with previous genome-wide association and DNA methylation studies that suggested a possible role for NFIX in regulating beta-like globin gene expression (Fabrice D, Nat Genet 2016; Lessard S, Genome Med 2015). To directly test the hypothesis that NFIX represses HbF, short hairpin RNAs were used to knockdown (KD) NFIX in primary erythroblasts derived from human CD34+ BM cells (Fig 3A). NFIX KD led to a time-dependent induction of HBG mRNA, HbF, and F-cells comparable to KD of the known HbF repressor BCL11A (Fig 3B-D). A similar effect on HbF was observed in HUDEP-2 cells following NFIX KD (Fig 3E). Consistent with HbF induction, NFIX KD also increased chromatin accessibility and decreased DNA methylation at the HBG promoters in primary EBs (Fig 3F, G). NFIX KD led to a delay in erythroid differentiation as measured by CD36 and GYPA expression (Fig 3H). Despite this delay, by day 14 a high proportion of fully enucleated erythroblasts was observed, suggesting NFIX KD cells are capable of terminal differentiation (Fig 3H). Collectively, these data have enabled identification and validation of NFIX as a novel repressor of HbF, a finding that enhances the understanding of beta-like globin gene regulation and has potential implications in the development of therapeutics for sickle cell disease. Disclosures Chaand: Syros Pharmaceuticals: Employment, Equity Ownership. Fiore:Syros Pharmaceuticals: Employment, Equity Ownership. Johnston:Syros Pharmaceuticals: Employment, Equity Ownership. Moon:Syros Pharmaceuticals: Employment, Equity Ownership. Carulli:Syros Pharmaceuticals: Employment, Equity Ownership. Shearstone:Syros Pharmaceuticals: Employment, Equity Ownership.

Download Full-text

Human and rat skeletal muscle single-nuclei multi-omic integrative analyses nominate causal cell types, regulatory elements, and SNPs for complex traits

Genome Research ◽

10.1101/gr.268482.120 ◽

2021 ◽

Author(s):

Peter Orchard ◽

Nandini Manickam ◽

Christa Ventresca ◽

Swarooparani Vadlamudi ◽

Arushi Varshney ◽

...

Keyword(s):

Skeletal Muscle ◽

Muscle Cell ◽

Muscle Fiber ◽

Cell Types ◽

Chromatin Accessibility ◽

Skeletal Muscle Cell ◽

Rna Seq ◽

Rat Skeletal Muscle ◽

Integrative Analyses ◽

Single Nucleus

Skeletal muscle accounts for the largest proportion of human body mass, on average, and is a key tissue in complex diseases and mobility. It is composed of several different cell and muscle fiber types. Here, we optimize single-nucleus ATAC-seq (snATAC-seq) to map skeletal muscle cell–specific chromatin accessibility landscapes in frozen human and rat samples, and single-nucleus RNA-seq (snRNA-seq) to map cell-specific transcriptomes in human. We additionally perform multi-omics profiling (gene expression and chromatin accessibility) on human and rat muscle samples. We capture type I and type II muscle fiber signatures, which are generally missed by existing single-cell RNA-seq methods. We perform cross-modality and cross-species integrative analyses on 33,862 nuclei and identify seven cell types ranging in abundance from 59.6% to 1.0% of all nuclei. We introduce a regression-based approach to infer cell types by comparing transcription start site–distal ATAC-seq peaks to reference enhancer maps and show consistency with RNA-based marker gene cell type assignments. We find heterogeneity in enrichment of genetic variants linked to complex phenotypes from the UK Biobank and diabetes genome-wide association studies in cell-specific ATAC-seq peaks, with the most striking enrichment patterns in muscle mesenchymal stem cells (∼3.5% of nuclei). Finally, we overlay these chromatin accessibility maps on GWAS data to nominate causal cell types, SNPs, transcription factor motifs, and target genes for type 2 diabetes signals. These chromatin accessibility profiles for human and rat skeletal muscle cell types are a useful resource for nominating causal GWAS SNPs and cell types.

Download Full-text