scholarly journals Bfimpute: A Bayesian factorization method to recover single-cell RNA sequencing data

2021 ◽  
Author(s):  
Zi-Hang Wen ◽  
Jeremy L. Langsam ◽  
Lu Zhang ◽  
Wenjun Shen ◽  
Xin Zhou

AbstractSingle-cell RNA-seq (scRNA-seq) offers opportunities to study gene expression of tens of thousands of single cells simultaneously, to investigate cell-to-cell variation, and to reconstruct cell-type-specific gene regulatory networks. Recovering dropout events in a sparse gene expression matrix for scRNA-seq data is a long-standing matrix completion problem. We introduce Bfimpute, a Bayesian factorization imputation algorithm that reconstructs two latent gene and cell matrices to impute final gene expression matrix within each cell group, with or without the aid of cell type labels or bulk data. Bfimpute achieves better accuracy than other six publicly notable scRNA-seq imputation methods on simulated and real scRNA-seq data, as measured by several different evaluation metrics. Bfimpute can also flexibly integrate any gene or cell related information that users provide to increase the performance. Availability: Bfimpute is implemented in R and is freely available at https://github.com/maiziezhoulab/Bfimpute.

2019 ◽  
Author(s):  
Weida Wang ◽  
Jinyuan Xu ◽  
Shuyuan Wang ◽  
Peng Xia ◽  
Li Zhang ◽  
...  

AbstractUnderstanding subclonal architecture and their biological functions poses one of the key challenges to deeply portray and investigative the cause of triple-negative breast cancer (TNBC). Here we combine single-cell and bulk sequencing data to analyze tumor heterogeneity through characterizing subclone compositions and proportions. Based on sing-cell RNA-seq data (GSE118389) we identified five distinct cell subpopulations and characterized their biological functions based on their gene markers. According to the results of functional annotation, we found that C1 and C2 are related to immune functions, while C5 is related to programmed cell death. Then based on subclonal basis gene expression matrix, we applied deconvolution algorithm on TCGA tissue RNA-seq data and observed that microenvironment is diverse among TNBC subclones, especially C1 is closely related to T cells. What’s more, we also found that high C5 proportions would led to poor survival outcome, log-rank test p-value and HR [95%CI] for five years overall survival in GSE96058 dataset were 0.0158 and 2.557 [1.160-5.636]. Collectively, our analysis reveals both intra-tumor and inter-tumor heterogeneity and their association with subclonal microenvironment in TNBC (subclone compositions and proportions), and uncovers the organic combination of subclones dictating poor outcomes in this disease.HighlightsWe applied deconvolution algorithm on subclonal basis gene expression matrix to link single cells and bulk tissue together.


2001 ◽  
Vol 183 (12) ◽  
pp. 3761-3769 ◽  
Author(s):  
Anja Strauß ◽  
Sonja Michel ◽  
Joachim Morschhäuser

ABSTRACT The opportunistic fungal pathogen Candida albicanscan switch spontaneously and reversibly between different cell forms, a capacity that may enhance adaptation to different host niches and evasion of host defense mechanisms. Phenotypic switching has been studied intensively for the white-opaque switching system of strain WO-1. To facilitate the molecular analysis of phenotypic switching, we have constructed homozygous ura3 mutants from strain WO-1 by targeted gene deletion. The two URA3 alleles were sequentially inactivated using theMPA R -flipping strategy, which is based on the selection of integrative transformants carrying a mycophenolic acid (MPA) resistance marker that is subsequently deleted again by site-specific, FLP-mediated recombination. To investigate a possible cell type-independent switching in the expression of individual phase-specific genes, two different reporter genes that allowed the analysis of gene expression at the single-cell level were integrated into the genome, using URA3 as a selection marker. Fluorescence microscopic analysis of cells in which aGFP reporter gene was placed under the control of phase-specific promoters demonstrated that the opaque-phase-specificSAP1 gene was detectably expressed only in opaque cells and that the white-phase-specific WH11 gene was detectably expressed only in white cells. WhenMPA R was used as a reporter gene, it conferred an MPA-resistant phenotype on opaque but not white cells in strains expressing it from the SAP1 promoter, which was monitored at the level of single cells by a significantly enlarged size of the corresponding colonies on MPA-containing indicator plates. Similarly, white but not opaque cells became MPA resistant whenMPA R was placed under the control of the WH11 promoter. The analysis of these reporter strains showed that cell type-independent phase variation in the expression of the SAP1 and WH11 genes did not occur at a detectable frequency. The expression of these phase-specific genes of C. albicans in vitro, therefore, is tightly linked to the cell type.


2021 ◽  
Vol 22 (S9) ◽  
Author(s):  
Jiajie Peng ◽  
Lu Han ◽  
Xuequn Shang

Abstract Background It is important to understand the composition of cell type and its proportion in intact tissues, as changes in certain cell types are the underlying cause of disease in humans. Although compositions of cell type and ratios can be obtained by single-cell sequencing, single-cell sequencing is currently expensive and cannot be applied in clinical studies involving a large number of subjects. Therefore, it is useful to apply the bulk RNA-Seq dataset and the single-cell RNA dataset to deconvolute and obtain the cell type composition in the tissue. Results By analyzing the existing cell population prediction methods, we found that most of the existing methods need the cell-type-specific gene expression profile as the input of the signature matrix. However, in real applications, it is not always possible to find an available signature matrix. To solve this problem, we proposed a novel method, named DCap, to predict cell abundance. DCap is a deconvolution method based on non-negative least squares. DCap considers the weight resulting from measurement noise of bulk RNA-seq and calculation error of single-cell RNA-seq data, during the calculation process of non-negative least squares and performs the weighted iterative calculation based on least squares. By weighting the bulk tissue gene expression matrix and single-cell gene expression matrix, DCap minimizes the measurement error of bulk RNA-Seq and also reduces errors resulting from differences in the number of expressed genes in the same type of cells in different samples. Evaluation test shows that DCap performs better in cell type abundance prediction than existing methods. Conclusion DCap solves the deconvolution problem using weighted non-negative least squares to predict cell type abundance in tissues. DCap has better prediction results and does not need to prepare a signature matrix that gives the cell-type-specific gene expression profile in advance. By using DCap, we can better study the changes in cell proportion in diseased tissues and provide more information on the follow-up treatment of diseases.


2021 ◽  
Author(s):  
Weixu Wang ◽  
Jun Yao ◽  
Yi Wang ◽  
Chao Zhang ◽  
Wei Tao ◽  
...  

Cell type-specific gene expression (CSE) brings novel biological insights into both physiological and pathological processes compared with bulk tissue gene expression. Although fluorescence-activated cell sorting (FACS) and single-cell RNA sequencing (scRNA-seq) are two widely used techniques to detect gene expression in a cell type-specific manner, the constraints of cost and labor force make it impractical as a routine on large patient cohorts. Here, we present ENIGMA, an algorithm that deconvolutes bulk RNA-seq into cell type-specific expression matrices and cell type fraction matrices without the need of physical sorting or sequencing of single cells. ENIGMA used cell type signature matrix generated from either FACS RNA-seq or scRNA-seq as reference, and applied matrix completion technique to achieve fast and accurate deconvolution. We demonstrated the superior performance of ENIGMA to previously published algorithms (TCA, bMIND and CIBERSORTx) while requiring much less running time on both simulated and realistic datasets. To prove its value in biological discovery, we applied ENIGMA to bulk RNA-seq from arthritis patients and revealed a pseudo-differentiation trajectory that could reflect monocyte to macrophage transition. We also applied ENIGMA to bulk RNA-seq data of pancreatic islet tissue from type 2 diabetes (T2D) patients and discovered a beta cell-specific gene co-expression module related to senescence and apoptosis that possibly contributed to the pathogenesis of T2D. Together, ENIGMA provides a new framework to improve the CSE estimation by integrating FACS RNA-seq and scRNA-seq with tissue bulk RNA-seq data, and will extend our understandings about cell heterogeneity on population level with no need for experimental tissue disaggregation.


2021 ◽  
Author(s):  
Julia Eve Olivieri ◽  
Roozbeh Dehghannasiri ◽  
Peter Wang ◽  
SoRi Jang ◽  
Antoine de Morree ◽  
...  

More than 95% of human genes are alternatively spliced. Yet, the extent splicing is regulated at single-cell resolution has remained controversial due to both available data and methods to interpret it. We apply the SpliZ, a new statistical approach that is agnostic to transcript annotation, to detect cell-type-specific regulated splicing in > 110K carefully annotated single cells from 12 human tissues. Using 10x data for discovery, 9.1% of genes with computable SpliZ scores are cell-type specifically spliced. These results are validated with RNA FISH, single cell PCR, and in high throughput with Smart-seq2. Regulated splicing is found in ubiquitously expressed genes such as actin light chain subunit MYL6 and ribosomal protein RPS24, which has an epithelial-specific microexon. 13% of the statistically most variable splice sites in cell-type specifically regulated genes are also most variable in mouse lemur or mouse. SpliZ analysis further reveals 170 genes with regulated splicing during sperm development using, 10 of which are conserved in mouse and mouse lemur. The statistical properties of the SpliZ allow model-based identification of subpopulations within otherwise indistinguishable cells based on gene expression, illustrated by subpopulations of classical monocytes with stereotyped splicing, including an un-annotated exon, in SAT1, a Diamine acetyltransferase. Together, this unsupervised and annotation-free analysis of differential splicing in ultra high throughput droplet-based sequencing of human cells across multiple organs establishes splicing is regulated cell-type-specifically independent of gene expression.


2019 ◽  
Author(s):  
Jerome Samir ◽  
Simone Rizzetto ◽  
Money Gupta ◽  
Fabio Luciani

Abstract Background Single cell RNA sequencing provides unprecedented opportunity to simultaneously explore the transcriptomic and immune receptor diversity of T and B cells. However, there are limited tools available that simultaneously analyse large multi-omics datasets integrated with metadata such as patient and clinical information.Results We developed VDJView, which permits the simultaneous or independent analysis and visualisation of gene expression, immune receptors, and clinical metadata of both T and B cells. This tool is implemented as an easy-to-use R shiny web-application, which integrates numerous gene expression and TCR analysis tools, and accepts data from plate-based sorted or high-throughput single cell platforms. We utilised VDJView to analyse several 10X scRNA-seq datasets, including a recent dataset of 150,000 CD8+ T cells with available gene expression, TCR sequences, quantification of 15 surface proteins, and 44 antigen specificities (across viruses, cancer, and self-antigens). We performed quality control, filtering of tetramer non-specific cells, clustering, random sampling and hypothesis testing to discover antigen specific gene signatures which were associated with immune cell differentiation states and clonal expansion across the pathogen specific T cells. We also analysed 563 single cells (plate-based sorted) obtained from 11 subjects, revealing clonally expanded T and B cells across primary cancer tissues and metastatic lymph-node. These immune cells clustered with distinct gene signatures according to the breast cancer molecular subtype. VDJView has been tested in lab meetings and peer-to-peer discussions, showing effective data generation and discussion without the need to consult bioinformaticians.Conclusions VDJView enables researchers without profound bioinformatics skills to analyse immune scRNA-seq data, integrating and visualising this with clonality and metadata profiles, thus accelerating the process of hypothesis testing, data interpretation and discovery of cellular heterogeneity. VDJView is freely available at https://bitbucket.org/kirbyvisp/vdjview .


2020 ◽  
Author(s):  
Weimiao Wu ◽  
Qile Dai ◽  
Yunqing Liu ◽  
Xiting Yan ◽  
Zuoheng Wang

AbstractSingle-cell RNA sequencing provides an opportunity to study gene expression at single-cell resolution. However, prevalent dropout events result in high data sparsity and noise that may obscure downstream analyses. We propose a novel method, G2S3, that imputes dropouts by borrowing information from adjacent genes in a sparse gene graph learned from gene expression profiles across cells. We applied G2S3 and other existing methods to seven single-cell datasets to compare their performance. Our results demonstrated that G2S3 is superior in recovering true expression levels, identifying cell subtypes, improving differential expression analyses, and recovering gene regulatory relationships, especially for mildly expressed genes.


2021 ◽  
Author(s):  
Sergio Andreu-Sanchez ◽  
Geraldine Aubert ◽  
Aida Ripoll-Cladellas ◽  
Sandra Henkelman ◽  
Daria V. Zhernakova ◽  
...  

The average length of telomere repeats (TL) declines with age and is considered to be a marker of biological ageing. Here, we measured TL in six blood cell types from 1,046 individuals using the clinically validated Flow-FISH method. We identified remarkable cell-type-specific variations in TL. Host genetics, environmental, parental and intrinsic factors such as sex, parental age, and smoking are associated to variations in TL. By analysing the genome-wide methylation patterns, we identified that the association of maternal, but not paternal, age to TL is mediated by epigenetics. Coupling these measurements to single-cell RNA-sequencing data for 62 participants revealed differential gene expression in T-cells. Genes negatively associated with TL were enriched for pathways related to translation and nonsense-mediated decay. Altogether, this study addresses cell-type-specific differences in telomere biology and its relation to cell-type-specific gene expression and highlights how perinatal factors play a role in determining TL, on top of genetics and lifestyle.


2021 ◽  
Author(s):  
Elisabeth Meyer ◽  
Roozbeh Dehghannasiri ◽  
Kaitlin Chaung ◽  
Julia Salzman

Post-transcriptional regulation of RNA processing (RNAP), including splicing and alternative polyadenylation (APA), controls eukaryotic gene function. Conservative estimates based on bulk tissue studies conclude that at least 50% of mammalian genes undergo APA. Single-cell RNA sequencing (scRNA-seq) could enable a near complete estimate of the extent, function, and regulation of these and other forms of RNA processing. Yet, statistical methods to detect regulated RNAP are limited in their detection power because they suffer from reliance on (a) incomplete annotations of 3' untranslated regions (3' UTRs), (b) peak calling heuristics, (c) analysis based on measurements collapsed over all cells in a cell type (pseudobulking), or (d) APA-specific detection. Here, we introduce ReadZS, a computationally-efficient, and annotation-free statistical approach to identify regulated RNAP, including but not limited to APA, in single cells. ReadZS rediscovers and substantially extends the scope of known cell type-specific RNAP in the human lung and during human spermatogenesis. The unique single-cell resolution and statistical properties of ReadZS enable discovery of new evolutionarily conserved, developmentally regulated RNAP and subpopulations of lung-resident macrophages, homogenous by gene expression alone.


Sign in / Sign up

Export Citation Format

Share Document