Bfimpute: A Bayesian factorization method to recover single-cell RNA sequencing data

AbstractSingle-cell RNA-seq (scRNA-seq) offers opportunities to study gene expression of tens of thousands of single cells simultaneously, to investigate cell-to-cell variation, and to reconstruct cell-type-specific gene regulatory networks. Recovering dropout events in a sparse gene expression matrix for scRNA-seq data is a long-standing matrix completion problem. We introduce Bfimpute, a Bayesian factorization imputation algorithm that reconstructs two latent gene and cell matrices to impute final gene expression matrix within each cell group, with or without the aid of cell type labels or bulk data. Bfimpute achieves better accuracy than other six publicly notable scRNA-seq imputation methods on simulated and real scRNA-seq data, as measured by several different evaluation metrics. Bfimpute can also flexibly integrate any gene or cell related information that users provide to increase the performance. Availability: Bfimpute is implemented in R and is freely available at https://github.com/maiziezhoulab/Bfimpute.

Download Full-text

Single-cell RNA-seq data reveals TNBC tumor heterogeneity through characterizing subclone compositions and proportions

10.1101/858290 ◽

2019 ◽

Author(s):

Weida Wang ◽

Jinyuan Xu ◽

Shuyuan Wang ◽

Peng Xia ◽

Li Zhang ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Tumor Heterogeneity ◽

Single Cells ◽

Rna Seq ◽

Biological Functions ◽

Gene Markers ◽

Gene Expression Matrix ◽

Deconvolution Algorithm ◽

Expression Matrix

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.

Download Full-text

Analysis of Phase-Specific Gene Expression at the Single-Cell Level in the White-Opaque Switching System ofCandida albicans

Journal of Bacteriology ◽

10.1128/jb.183.12.3761-3769.2001 ◽

2001 ◽

Vol 183 (12) ◽

pp. 3761-3769 ◽

Cited By ~ 23

Author(s):

Anja Strauß ◽

Sonja Michel ◽

Joachim Morschhäuser

Keyword(s):

Gene Expression ◽

Single Cell ◽

Reporter Gene ◽

Single Cells ◽

Phenotypic Switching ◽

Specific Gene ◽

Single Cell Level ◽

Switching System ◽

Cell Type ◽

Cell Level

ABSTRACT The opportunistic fungal pathogen Candida albicanscan switch spontaneously and reversibly between different cell forms, a capacity that may enhance adaptation to different host niches and evasion of host defense mechanisms. Phenotypic switching has been studied intensively for the white-opaque switching system of strain WO-1. To facilitate the molecular analysis of phenotypic switching, we have constructed homozygous ura3 mutants from strain WO-1 by targeted gene deletion. The two URA3 alleles were sequentially inactivated using theMPA R -flipping strategy, which is based on the selection of integrative transformants carrying a mycophenolic acid (MPA) resistance marker that is subsequently deleted again by site-specific, FLP-mediated recombination. To investigate a possible cell type-independent switching in the expression of individual phase-specific genes, two different reporter genes that allowed the analysis of gene expression at the single-cell level were integrated into the genome, using URA3 as a selection marker. Fluorescence microscopic analysis of cells in which aGFP reporter gene was placed under the control of phase-specific promoters demonstrated that the opaque-phase-specificSAP1 gene was detectably expressed only in opaque cells and that the white-phase-specific WH11 gene was detectably expressed only in white cells. WhenMPA R was used as a reporter gene, it conferred an MPA-resistant phenotype on opaque but not white cells in strains expressing it from the SAP1 promoter, which was monitored at the level of single cells by a significantly enlarged size of the corresponding colonies on MPA-containing indicator plates. Similarly, white but not opaque cells became MPA resistant whenMPA R was placed under the control of the WH11 promoter. The analysis of these reporter strains showed that cell type-independent phase variation in the expression of the SAP1 and WH11 genes did not occur at a detectable frequency. The expression of these phase-specific genes of C. albicans in vitro, therefore, is tightly linked to the cell type.

Download Full-text

A novel method for predicting cell abundance based on single-cell RNA-seq data

BMC Bioinformatics ◽

10.1186/s12859-021-04187-4 ◽

2021 ◽

Vol 22 (S9) ◽

Author(s):

Jiajie Peng ◽

Lu Han ◽

Xuequn Shang

Keyword(s):

Gene Expression ◽

Least Squares ◽

Single Cell ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Gene Expression Matrix ◽

Cell Type Specific ◽

Novel Method ◽

Signature Matrix

Abstract Background It is important to understand the composition of cell type and its proportion in intact tissues, as changes in certain cell types are the underlying cause of disease in humans. Although compositions of cell type and ratios can be obtained by single-cell sequencing, single-cell sequencing is currently expensive and cannot be applied in clinical studies involving a large number of subjects. Therefore, it is useful to apply the bulk RNA-Seq dataset and the single-cell RNA dataset to deconvolute and obtain the cell type composition in the tissue. Results By analyzing the existing cell population prediction methods, we found that most of the existing methods need the cell-type-specific gene expression profile as the input of the signature matrix. However, in real applications, it is not always possible to find an available signature matrix. To solve this problem, we proposed a novel method, named DCap, to predict cell abundance. DCap is a deconvolution method based on non-negative least squares. DCap considers the weight resulting from measurement noise of bulk RNA-seq and calculation error of single-cell RNA-seq data, during the calculation process of non-negative least squares and performs the weighted iterative calculation based on least squares. By weighting the bulk tissue gene expression matrix and single-cell gene expression matrix, DCap minimizes the measurement error of bulk RNA-Seq and also reduces errors resulting from differences in the number of expressed genes in the same type of cells in different samples. Evaluation test shows that DCap performs better in cell type abundance prediction than existing methods. Conclusion DCap solves the deconvolution problem using weighted non-negative least squares to predict cell type abundance in tissues. DCap has better prediction results and does not need to prepare a signature matrix that gives the cell-type-specific gene expression profile in advance. By using DCap, we can better study the changes in cell proportion in diseased tissues and provide more information on the follow-up treatment of diseases.

Download Full-text

Improved estimation of cell type-specific gene expression through deconvolution of bulk tissues with matrix completion

10.1101/2021.06.30.450493 ◽

2021 ◽

Author(s):

Weixu Wang ◽

Jun Yao ◽

Yi Wang ◽

Chao Zhang ◽

Wei Tao ◽

...

Keyword(s):

Gene Expression ◽

Single Cells ◽

Matrix Completion ◽

Superior Performance ◽

Specific Gene ◽

Rna Seq ◽

Cell Type ◽

Specific Gene Expression ◽

Large Patient ◽

Cell Type Specific

Cell type-specific gene expression (CSE) brings novel biological insights into both physiological and pathological processes compared with bulk tissue gene expression. Although fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) are two widely used techniques to detect gene expression in a cell type-specific manner, the constraints of cost and labor force make it impractical as a routine on large patient cohorts. Here, we present ENIGMA, an algorithm that deconvolutes bulk RNA-seq into cell type-specific expression matrices and cell type fraction matrices without the need of physical sorting or sequencing of single cells. ENIGMA used cell type signature matrix generated from either FACS RNA-seq or scRNA-seq as reference, and applied matrix completion technique to achieve fast and accurate deconvolution. We demonstrated the superior performance of ENIGMA to previously published algorithms (TCA, bMIND and CIBERSORTx) while requiring much less running time on both simulated and realistic datasets. To prove its value in biological discovery, we applied ENIGMA to bulk RNA-seq from arthritis patients and revealed a pseudo-differentiation trajectory that could reflect monocyte to macrophage transition. We also applied ENIGMA to bulk RNA-seq data of pancreatic islet tissue from type 2 diabetes (T2D) patients and discovered a beta cell-specific gene co-expression module related to senescence and apoptosis that possibly contributed to the pathogenesis of T2D. Together, ENIGMA provides a new framework to improve the CSE estimation by integrating FACS RNA-seq and scRNA-seq with tissue bulk RNA-seq data, and will extend our understandings about cell heterogeneity on population level with no need for experimental tissue disaggregation.

Download Full-text

RNA splicing programs define tissue compartments and cell types at single cell resolution

10.1101/2021.05.01.442281 ◽

2021 ◽

Author(s):

Julia Eve Olivieri ◽

Roozbeh Dehghannasiri ◽

Peter Wang ◽

SoRi Jang ◽

Antoine de Morree ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

High Throughput ◽

Rna Splicing ◽

Single Cells ◽

Cell Types ◽

Mouse Lemur ◽

Cell Type ◽

Multiple Organs ◽

Single Cell Pcr

More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing during sperm development using, 10 of which are conserved in mouse and mouse lemur. The statistical properties of the SpliZ allow model-based identification of subpopulations within otherwise indistinguishable cells based on gene expression, illustrated by subpopulations of classical monocytes with stereotyped splicing, including an un-annotated exon, in SAT1, a Diamine acetyltransferase. Together, this unsupervised and annotation-free analysis of differential splicing in ultra high throughput droplet-based sequencing of human cells across multiple organs establishes splicing is regulated cell-type-specifically independent of gene expression.

Download Full-text

Abstract 4689: Subclone-specific evolution of tumor phenotypes – A framework to study subclone-specific gene expression from a combination of bulk DNA and single cell RNA sequencing data

10.1158/1538-7445.sabcs18-4689 ◽

2019 ◽

Author(s):

Yi Qiao ◽

Xiaomeng Huang ◽

Samuel Brady ◽

Andrea Bild ◽

David Bowtell ◽

...

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Specific Gene ◽

Sequencing Data ◽

Specific Gene Expression ◽

Single Cell Rna Sequencing ◽

Tumor Phenotypes

Download Full-text

Exploring and analysing single cell multi-omics data with VDJView

10.21203/rs.2.14949/v2 ◽

2019 ◽

Author(s):

Jerome Samir ◽

Simone Rizzetto ◽

Money Gupta ◽

Fabio Luciani

Keyword(s):

Gene Expression ◽

T Cells ◽

B Cells ◽

Hypothesis Testing ◽

Single Cell ◽

Single Cells ◽

Cellular Heterogeneity ◽

Specific Gene ◽

T And B Cells ◽

Gene Signatures

Abstract Background Single cell RNA sequencing provides unprecedented opportunity to simultaneously explore the transcriptomic and immune receptor diversity of T and B cells. However, there are limited tools available that simultaneously analyse large multi-omics datasets integrated with metadata such as patient and clinical information.Results We developed VDJView, which permits the simultaneous or independent analysis and visualisation of gene expression, immune receptors, and clinical metadata of both T and B cells. This tool is implemented as an easy-to-use R shiny web-application, which integrates numerous gene expression and TCR analysis tools, and accepts data from plate-based sorted or high-throughput single cell platforms. We utilised VDJView to analyse several 10X scRNA-seq datasets, including a recent dataset of 150,000 CD8+ T cells with available gene expression, TCR sequences, quantification of 15 surface proteins, and 44 antigen specificities (across viruses, cancer, and self-antigens). We performed quality control, filtering of tetramer non-specific cells, clustering, random sampling and hypothesis testing to discover antigen specific gene signatures which were associated with immune cell differentiation states and clonal expansion across the pathogen specific T cells. We also analysed 563 single cells (plate-based sorted) obtained from 11 subjects, revealing clonally expanded T and B cells across primary cancer tissues and metastatic lymph-node. These immune cells clustered with distinct gene signatures according to the breast cancer molecular subtype. VDJView has been tested in lab meetings and peer-to-peer discussions, showing effective data generation and discussion without the need to consult bioinformaticians.Conclusions VDJView enables researchers without profound bioinformatics skills to analyse immune scRNA-seq data, integrating and visualising this with clonality and metadata profiles, thus accelerating the process of hypothesis testing, data interpretation and discovery of cellular heterogeneity. VDJView is freely available at https://bitbucket.org/kirbyvisp/vdjview .

Download Full-text

G2S3: a gene graph-based imputation method for single-cell RNA sequencing data

10.1101/2020.04.01.020586 ◽

2020 ◽

Author(s):

Weimiao Wu ◽

Qile Dai ◽

Yunqing Liu ◽

Xiting Yan ◽

Zuoheng Wang

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Sequencing ◽

Expression Profiles ◽

Gene Expression Profiles ◽

Sequencing Data ◽

High Data ◽

Study Gene Expression ◽

Single Cell Rna Sequencing ◽

Novel Method

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.

Download Full-text

Genetic, parental and lifestyle factors influence telomere length

10.1101/2021.12.14.472541 ◽

2021 ◽

Author(s):

Sergio Andreu-Sanchez ◽

Geraldine Aubert ◽

Aida Ripoll-Cladellas ◽

Sandra Henkelman ◽

Daria V. Zhernakova ◽

...

Keyword(s):

Gene Expression ◽

Average Length ◽

Cell Types ◽

Paternal Age ◽

Specific Gene ◽

Cell Type ◽

Sequencing Data ◽

Nonsense Mediated Decay ◽

Cell Type Specific ◽

Telomere Biology

The average length of telomere repeats (TL) declines with age and is considered to be a marker of biological ageing. Here, we measured TL in six blood cell types from 1,046 individuals using the clinically validated Flow-FISH method. We identified remarkable cell-type-specific variations in TL. Host genetics, environmental, parental and intrinsic factors such as sex, parental age, and smoking are associated to variations in TL. By analysing the genome-wide methylation patterns, we identified that the association of maternal, but not paternal, age to TL is mediated by epigenetics. Coupling these measurements to single-cell RNA-sequencing data for 62 participants revealed differential gene expression in T-cells. Genes negatively associated with TL were enriched for pathways related to translation and nonsense-mediated decay. Altogether, this study addresses cell-type-specific differences in telomere biology and its relation to cell-type-specific gene expression and highlights how perinatal factors play a role in determining TL, on top of genetics and lifestyle.

Download Full-text

ReadZS detects developmentally regulated RNA processing programs in single cell RNA-seq and defines subpopulations independent of gene expression

10.1101/2021.09.29.462469 ◽

2021 ◽

Author(s):

Elisabeth Meyer ◽

Roozbeh Dehghannasiri ◽

Kaitlin Chaung ◽

Julia Salzman

Keyword(s):

Gene Expression ◽

Single Cell ◽

Rna Processing ◽

Single Cells ◽

Alternative Polyadenylation ◽

Cell Type ◽

Computationally Efficient ◽

Developmentally Regulated ◽

Eukaryotic Gene ◽

Human Spermatogenesis

Post-transcriptional regulation of RNA processing (RNAP), including splicing and alternative polyadenylation (APA), controls eukaryotic gene function. Conservative estimates based on bulk tissue studies conclude that at least 50% of mammalian genes undergo APA. Single-cell RNA sequencing (scRNA-seq) could enable a near complete estimate of the extent, function, and regulation of these and other forms of RNA processing. Yet, statistical methods to detect regulated RNAP are limited in their detection power because they suffer from reliance on (a) incomplete annotations of 3' untranslated regions (3' UTRs), (b) peak calling heuristics, (c) analysis based on measurements collapsed over all cells in a cell type (pseudobulking), or (d) APA-specific detection. Here, we introduce ReadZS, a computationally-efficient, and annotation-free statistical approach to identify regulated RNAP, including but not limited to APA, in single cells. ReadZS rediscovers and substantially extends the scope of known cell type-specific RNAP in the human lung and during human spermatogenesis. The unique single-cell resolution and statistical properties of ReadZS enable discovery of new evolutionarily conserved, developmentally regulated RNAP and subpopulations of lung-resident macrophages, homogenous by gene expression alone.

Download Full-text