scholarly journals Zeros in scRNA-seq data: good or bad? How to embrace or tackle zeros in scRNA-seq data analysis?

2020 ◽  
Author(s):  
Ruochen Jiang ◽  
Tianyi Sun ◽  
Dongyuan Song ◽  
Jingyi Jessica Li

AbstractSingle-cell RNA sequencing (scRNA-seq) technologies have revolutionized biomedical sciences by enabling genome-wide profiling of gene expression levels at an unprecedented single-cell resolution. A distinct characteristic of scRNA-seq data is the vast proportion of zeros unseen in bulk RNA-seq data. Researchers view these zeros differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as false signals or missing data to be corrected. As a result, the scRNA-seq field faces much controversy regarding how to handle zeros in data analysis. In this paper, we first discuss the origins of biological and non-biological zeros in scRNA-seq data. Second, we clarify the definitions of several commonly-used but ambiguous terms, including “dropouts,” “excess zeros,” and “zero inflation.” Third, we evaluate the impacts of non-biological zeros on cell clustering and differential gene expression analysis. Fourth, we summarize the advantages, disadvantages, and suitable users of three input data types: original counts, imputed counts, and binarized counts. Finally, we discuss the open questions regarding non-biological zeros, the need for benchmarking, and the importance of transparent analysis.

Circulation ◽  
2020 ◽  
Vol 142 (14) ◽  
pp. 1374-1388
Author(s):  
Yanming Li ◽  
Pingping Ren ◽  
Ashley Dawson ◽  
Hernan G. Vasquez ◽  
Waleed Ageedi ◽  
...  

Background: Ascending thoracic aortic aneurysm (ATAA) is caused by the progressive weakening and dilatation of the aortic wall and can lead to aortic dissection, rupture, and other life-threatening complications. To improve our understanding of ATAA pathogenesis, we aimed to comprehensively characterize the cellular composition of the ascending aortic wall and to identify molecular alterations in each cell population of human ATAA tissues. Methods: We performed single-cell RNA sequencing analysis of ascending aortic tissues from 11 study participants, including 8 patients with ATAA (4 women and 4 men) and 3 control subjects (2 women and 1 man). Cells extracted from aortic tissue were analyzed and categorized with single-cell RNA sequencing data to perform cluster identification. ATAA-related changes were then examined by comparing the proportions of each cell type and the gene expression profiles between ATAA and control tissues. We also examined which genes may be critical for ATAA by performing the integrative analysis of our single-cell RNA sequencing data with publicly available data from genome-wide association studies. Results: We identified 11 major cell types in human ascending aortic tissue; the high-resolution reclustering of these cells further divided them into 40 subtypes. Multiple subtypes were observed for smooth muscle cells, macrophages, and T lymphocytes, suggesting that these cells have multiple functional populations in the aortic wall. In general, ATAA tissues had fewer nonimmune cells and more immune cells, especially T lymphocytes, than control tissues did. Differential gene expression data suggested the presence of extensive mitochondrial dysfunction in ATAA tissues. In addition, integrative analysis of our single-cell RNA sequencing data with public genome-wide association study data and promoter capture Hi-C data suggested that the erythroblast transformation-specific related gene( ERG ) exerts an important role in maintaining normal aortic wall function. Conclusions: Our study provides a comprehensive evaluation of the cellular composition of the ascending aortic wall and reveals how the gene expression landscape is altered in human ATAA tissue. The information from this study makes important contributions to our understanding of ATAA formation and progression.


2003 ◽  
Vol 01 (03) ◽  
pp. 541-586 ◽  
Author(s):  
Tero Aittokallio ◽  
Markus Kurki ◽  
Olli Nevalainen ◽  
Tuomas Nikula ◽  
Anne West ◽  
...  

Microarray analysis has become a widely used method for generating gene expression data on a genomic scale. Microarrays have been enthusiastically applied in many fields of biological research, even though several open questions remain about the analysis of such data. A wide range of approaches are available for computational analysis, but no general consensus exists as to standard for microarray data analysis protocol. Consequently, the choice of data analysis technique is a crucial element depending both on the data and on the goals of the experiment. Therefore, basic understanding of bioinformatics is required for optimal experimental design and meaningful interpretation of the results. This review summarizes some of the common themes in DNA microarray data analysis, including data normalization and detection of differential expression. Algorithms are demonstrated by analyzing cDNA microarray data from an experiment monitoring gene expression in T helper cells. Several computational biology strategies, along with their relative merits, are overviewed and potential areas for additional research discussed. The goal of the review is to provide a computational framework for applying and evaluating such bioinformatics strategies. Solid knowledge of microarray informatics contributes to the implementation of more efficient computational protocols for the given data obtained through microarray experiments.


2019 ◽  
Author(s):  
Wei Wang ◽  
Gang Ren ◽  
Ni Hong ◽  
Wenfei Jin

Abstract Background: CCCTC-Binding Factor (CTCF), also known as 11-zinc finger protein, participates in many cellular processes, including insulator activity, transcriptional regulation and organization of chromatin architecture. Based on single cell flow cytometry and single cell RNA-FISH analyses, our previous study showed that deletion of CTCF binding site led to a significantly increase of cellular variation of its target gene. However, the effect of CTCF on genome-wide landscape of cell-to-cell variation is unclear. Results: We knocked down CTCF in EL4 cells using shRNA, and conducted single cell RNA-seq on both wild type (WT) cells and CTCF-Knockdown (CTCF-KD) cells using Fluidigm C1 system. Principal component analysis of single cell RNA-seq data showed that WT and CTCF-KD cells concentrated in two different clusters on PC1, indicating gene expression profiles of WT and CTCF-KD cells were systematically different. Interestingly, GO terms including regulation of transcription, DNA binding, Zinc finger and transcription factor binding were significantly enriched in CTCF-KD-specific highly variable genes, indicating tissue-specific genes such as transcription factors were highly sensitive to CTCF level. The dysregulation of transcription factors potentially explain why knockdown of CTCF lead to systematic change of gene expression. In contrast, housekeeping genes such as rRNA processing, DNA repair and tRNA processing were significantly enriched in WT-specific highly variable genes, potentially due to a higher cellular variation of cell activity in WT cells compared to CTCF-KD cells. We further found cellular variation-increased genes were significantly enriched in down-regulated genes, indicating CTCF knockdown simultaneously reduced the expression levels and increased the expression noise of its regulated genes. Conclusions: To our knowledge, this is the first attempt to explore genome-wide landscape of cellular variation after CTCF knockdown. Our study not only advances our understanding of CTCF function in maintaining gene expression and reducing expression noise, but also provides a framework for examining gene function.


2019 ◽  
Vol 20 (S24) ◽  
Author(s):  
Yu Zhang ◽  
Changlin Wan ◽  
Pengcheng Wang ◽  
Wennan Chang ◽  
Yan Huo ◽  
...  

Abstract Background Various statistical models have been developed to model the single cell RNA-seq expression profiles, capture its multimodality, and conduct differential gene expression test. However, for expression data generated by different experimental design and platforms, there is currently lack of capability to determine the most proper statistical model. Results We developed an R package, namely Multi-Modal Model Selection (M3S), for gene-wise selection of the most proper multi-modality statistical model and downstream analysis, useful in a single-cell or large scale bulk tissue transcriptomic data. M3S is featured with (1) gene-wise selection of the most parsimonious model among 11 most commonly utilized ones, that can best fit the expression distribution of the gene, (2) parameter estimation of a selected model, and (3) differential gene expression test based on the selected model. Conclusion A comprehensive evaluation suggested that M3S can accurately capture the multimodality on simulated and real single cell data. An open source package and is available through GitHub at https://github.com/zy26/M3S.


2019 ◽  
Vol 40 (5) ◽  
pp. 958-973 ◽  
Author(s):  
Melanie A. Huntley ◽  
Karpagam Srinivasan ◽  
Brad A. Friedman ◽  
Tzu-Ming Wang ◽  
Ada X. Yee ◽  
...  

PLoS ONE ◽  
2015 ◽  
Vol 10 (8) ◽  
pp. e0134865 ◽  
Author(s):  
Matthew N. Davies ◽  
Serena Verdi ◽  
Andrea Burri ◽  
Maciej Trzaskowski ◽  
Minyoung Lee ◽  
...  

2014 ◽  
Vol 42 (18) ◽  
pp. 11363-11382 ◽  
Author(s):  
Lawryn H. Kasper ◽  
Chunxu Qu ◽  
John C. Obenauer ◽  
Daniel J. McGoldrick ◽  
Paul K. Brindle

Sign in / Sign up

Export Citation Format

Share Document