scholarly journals OEFinder: A user interface to identify and visualize ordering effects in single-cell RNA-seq data

2015 ◽  
Author(s):  
Ning Leng ◽  
Jeea Choi ◽  
Li-Fang Chu ◽  
James Thomson ◽  
Christina Kendziorski ◽  
...  

A recent paper identified an artifact in multiple single-cell RNA-seq (scRNA-seq) data sets generated by the Fluidigm C1 platform. Specifically, Leng* et al. showed significantly increased gene expression in cells captured from sites with small or large plate output IDs. We refer to this artifact as an ordering effect (OE). Including OE genes in downstream analyses could lead to biased results. To address this problem, we developed a statistical method and software called OEFinder to identify a sorted list of OE genes. OEFinder is available as an R package along with user-friendly graphical interface implementations that allows users to check for potential artifacts in scRNA-seq data generated by the Fluidigm C1 platform.

2019 ◽  
Vol 47 (18) ◽  
pp. e111-e111 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

Abstract A key challenge in modeling single-cell RNA-seq data is to capture the diversity of gene expression states regulated by different transcriptional regulatory inputs across individual cells, which is further complicated by largely observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model, from the kinetic relationships of the transcriptional regulatory inputs, mRNA metabolism and abundance in single cells. LTMG infers the expression multi-modalities across single cells, meanwhile, the dropouts and low expressions are treated as left truncated. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of scRNA-seq data, comparing to three other state-of-the-art models. Our biological assumption of the low non-zero expressions, rationality of the multimodality setting, and the capability of LTMG in extracting expression states specific to cell types or functions, are validated on independent experimental data sets. A differential gene expression test and a co-regulation module identification method are further developed. We experimentally validated that our differential expression test has higher sensitivity and specificity, compared with other five popular methods. The co-regulation analysis is capable of retrieving gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2018 ◽  
Author(s):  
Changlin Wan ◽  
Wennan Chang ◽  
Yu Zhang ◽  
Fenil Shah ◽  
Xiaoyu Lu ◽  
...  

ABSTRACTA key challenge in modeling single-cell RNA-seq (scRNA-seq) data is to capture the diverse gene expression states regulated by different transcriptional regulatory inputs across single cells, which is further complicated by a large number of observed zero and low expressions. We developed a left truncated mixture Gaussian (LTMG) model that stems from the kinetic relationships between the transcriptional regulatory inputs and metabolism of mRNA and gene expression abundance in a cell. LTMG infers the expression multi-modalities across single cell entities, representing a gene’s diverse expression states; meanwhile the dropouts and low expressions are treated as left truncated, specifically representing an expression state that is under suppression. We demonstrated that LTMG has significantly better goodness of fitting on an extensive number of single-cell data sets, comparing to three other state of the art models. In addition, our systems kinetic approach of handling the low and zero expressions and correctness of the identified multimodality are validated on several independent experimental data sets. Application on data of complex tissues demonstrated the capability of LTMG in extracting varied expression states specific to cell types or cell functions. Based on LTMG, a differential gene expression test and a co-regulation module identification method, namely LTMG-DGE and LTMG-GCR, are further developed. We experimentally validated that LTMG-DGE is equipped with higher sensitivity and specificity in detecting differentially expressed genes, compared with other five popular methods, and that LTMG-GCR is capable to retrieve the gene co-regulation modules corresponding to perturbed transcriptional regulations. A user-friendly R package with all the analysis power is available at https://github.com/zy26/LTMGSCA.


2019 ◽  
Author(s):  
Marcus Alvarez ◽  
Elior Rahmani ◽  
Brandon Jew ◽  
Kristina M. Garske ◽  
Zong Miao ◽  
...  

AbstractSingle-nucleus RNA sequencing (snRNA-seq) measures gene expression in individual nuclei instead of cells, allowing for unbiased cell type characterization in solid tissues. Contrary to single-cell RNA seq (scRNA-seq), we observe that snRNA-seq is commonly subject to contamination by high amounts of extranuclear background RNA, which can lead to identification of spurious cell types in downstream clustering analyses if overlooked. We present a novel approach to remove debris-contaminated droplets in snRNA-seq experiments, called Debris Identification using Expectation Maximization (DIEM). Our likelihood-based approach models the gene expression distribution of debris and cell types, which are estimated using EM. We evaluated DIEM using three snRNA-seq data sets: 1) human differentiating preadipocytes in vitro, 2) fresh mouse brain tissue, and 3) human frozen adipose tissue (AT) from six individuals. All three data sets showed various degrees of extranuclear RNA contamination. We observed that existing methods fail to account for contaminated droplets and led to spurious cell types. When compared to filtering using these state of the art methods, DIEM better removed droplets containing high levels of extranuclear RNA and led to higher quality clusters. Although DIEM was designed for snRNA-seq data, we also successfully applied DIEM to single-cell data. To conclude, our novel method DIEM removes debris-contaminated droplets from single-cell-based data fast and effectively, leading to cleaner downstream analysis. Our code is freely available for use at https://github.com/marcalva/diem.


2021 ◽  
Author(s):  
Fei Wu ◽  
Yaozhong Liu ◽  
Binhua Ling

RNA-seq data contains not only host transcriptomes but also non-host information that comprises transcripts from active microbiota in the host cells. Therefore, metatranscriptomics can reveal gene expression of the entire microbial community in a given sample. However, there is no single tool that can simultaneously analyze host-microbiota interactions and to quantify microbiome at the single-cell level, particularly for users with limited expertise of bioinformatics. Here, we developed a novel software program that can comprehensively and synergistically analyze gene expression of the host and microbiome as well as their association using bulk and single-cell RNA-seq data. Our pipeline, named Meta-Transcriptome Detector (MTD), can identify and quantify microbiome extensively, including viruses, bacteria, protozoa, fungi, plasmids, and vectors. MTD is easy to install and is user-friendly. This novel software program empowers researchers to study the interactions between microbiota and the host by analyzing gene expressions and pathways, which provides further insights into host responses to microorganisms.


2020 ◽  
Author(s):  
Xianjun Dong ◽  
Xiaoqi Li ◽  
Tzuu-Wang Chang ◽  
Scott T Weiss ◽  
Weiliang Qiu

Genome-wide association studies (GWAS) have revealed thousands of genetic loci for common diseases. One of the main challenges in the post-GWAS era is to understand the causality of the genetic variants. Expression quantitative trait locus (eQTL) analysis has been proven to be an effective way to address this question by examining the relationship between gene expression and genetic variation in a sufficiently powered cohort. However, it is often tricky to determine the sample size at which a variant with a specific allele frequency will be detected to associate with gene expression with sufficient power. This is particularly demanding with single-cell RNAseq studies. Therefore, a user-friendly tool to perform power analysis for eQTL at both bulk tissue and single-cell level will be critical. Here, we presented an R package called powerEQTL with flexible functions to calculate power, minimal sample size, or detectable minor allele frequency in both bulk tissue and single-cell eQTL analysis. A user-friendly, program-free web application is also provided, allowing customers to calculate and visualize the parameters interactively.


BMC Genomics ◽  
2020 ◽  
Vol 21 (S11) ◽  
Author(s):  
Yingying Cao ◽  
Simo Kitanovski ◽  
Daniel Hoffmann

Abstract Background RNA-Seq, the high-throughput sequencing (HT-Seq) of mRNAs, has become an essential tool for characterizing gene expression differences between different cell types and conditions. Gene expression is regulated by several mechanisms, including epigenetically by post-translational histone modifications which can be assessed by ChIP-Seq (Chromatin Immuno-Precipitation Sequencing). As more and more biological samples are analyzed by the combination of ChIP-Seq and RNA-Seq, the integrated analysis of the corresponding data sets becomes, theoretically, a unique option to study gene regulation. However, technically such analyses are still in their infancy. Results Here we introduce intePareto, a computational tool for the integrative analysis of RNA-Seq and ChIP-Seq data. With intePareto we match RNA-Seq and ChIP-Seq data at the level of genes, perform differential expression analysis between biological conditions, and prioritize genes with consistent changes in RNA-Seq and ChIP-Seq data using Pareto optimization. Conclusion intePareto facilitates comprehensive understanding of high dimensional transcriptomic and epigenomic data. Its superiority to a naive differential gene expression analysis with RNA-Seq and available integrative approach is demonstrated by analyzing a public dataset.


2019 ◽  
Author(s):  
Wenbo Guo ◽  
Dongfang Wang ◽  
Shicheng Wang ◽  
Yiran Shan ◽  
Jin Gu

AbstractSummaryMolecular heterogeneities bring great challenges for cancer diagnosis and treatment. Recent advance in single cell RNA-sequencing (scRNA-seq) technology make it possible to study cancer transcriptomic heterogeneities at single cell level. Here, we develop an R package named scCancer which focuses on processing and analyzing scRNA-seq data for cancer research. Except basic data processing steps, this package takes several special considerations for cancer-specific features. Firstly, the package introduced comprehensive quality control metrics. Secondly, it used a data-driven machine learning algorithm to accurately identify major cancer microenvironment cell populations. Thirdly, it estimated a malignancy score to classify malignant (cancerous) and non-malignant cells. Then, it analyzed intra-tumor heterogeneities by key cellular phenotypes (such as cell cycle and stemness) and gene signatures. Finally, a user-friendly graphic report was generated for all the analyses.Availabilityhttp://lifeome.net/software/sccancer/[email protected]


2020 ◽  
Author(s):  
Daniel Dimitrov ◽  
Quan Gu

AbstractRNA sequencing is a high-throughput sequencing technique considered as an indispensable research tool used in a broad range of transcriptome analysis studies. The most common application of RNA Sequencing is Differential Expression analysis and it is used to determine genetic loci with distinct expression across different conditions. On the other hand, an emerging field called single-cell RNA sequencing is used for transcriptome profiling at the individual cell level. The standard protocols for both these types of analyses include the processing of sequencing libraries and result in the generation of count matrices. An obstacle to these analyses and the acquisition of meaningful results is that both require programming expertise.BingleSeq was developed as an intuitive application that provides a user-friendly solution for the analysis of count matrices produced by both Bulk and Single-cell RNA-Seq experiments. This was achieved by building an interactive dashboard-like user interface and incorporating three state-of-the-art software packages for each type of the aforementioned analyses, alongside additional features such as key visualisation techniques, functional gene annotation analysis and rank-based consensus for differential gene analysis results, among others. As a result, BingleSeq puts the best and most widely used packages and tools for RNA-Seq analyses at the fingertips of biologists with no programming experience.


Author(s):  
Tobias Tekath ◽  
Martin Dugas

Abstract Motivation Each year, the number of published bulk and single-cell RNA-seq data sets is growing exponentially. Studies analyzing such data are commonly looking at gene-level differences, while the collected RNA-seq data inherently represents reads of transcript isoform sequences. Utilizing transcriptomic quantifiers, RNA-seq reads can be attributed to specific isoforms, allowing for analysis of transcript-level differences. A differential transcript usage (DTU) analysis is testing for proportional differences in a gene’s transcript composition, and has been of rising interest for many research questions, such as analysis of differential splicing or cell type identification. Results We present the R package DTUrtle, the first DTU analysis workflow for both bulk and single-cell RNA-seq data sets, and the first package to conduct a ‘classical’ DTU analysis in a single-cell context. DTUrtle extends established statistical frameworks, offers various result aggregation and visualization options and a novel detection probability score for tagged-end data. It has been successfully applied to bulk and single-cell RNA-seq data of human and mouse, confirming and extending key results. Additionally, we present novel potential DTU applications like the identification of cell type specific transcript isoforms as biomarkers. Availability The R package DTUrtle is available at https://github.com/TobiTekath/DTUrtle with extensive vignettes and documentation at https://tobitekath.github.io/DTUrtle/. Supplementary information Supplementary data are available at Bioinformatics online.


Sign in / Sign up

Export Citation Format

Share Document