scholarly journals baredSC: Bayesian Approach to Retrieve Expression Distribution of Single-Cell

2021 ◽  
Author(s):  
Lucille Lopez-Delisle ◽  
Jean-Baptiste Delisle

The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, the commonly used droplet-based method often produces very sparse samples. Sparsity can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling (i.e. Poisson noise). We developed a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell, which infers the intrinsic expression distribution in noisy single-cell data using a Gaussian mixture model (GMM). baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes' expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution.

2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Lucille Lopez-Delisle ◽  
Jean-Baptiste Delisle

Abstract Background The number of studies using single-cell RNA sequencing (scRNA-seq) is constantly growing. This powerful technique provides a sampling of the whole transcriptome of a cell. However, sparsity of the data can be a major hurdle when studying the distribution of the expression of a specific gene or the correlation between the expressions of two genes. Results We show that the main technical noise associated with these scRNA-seq experiments is due to the sampling, i.e., Poisson noise. We present a new tool named baredSC, for Bayesian Approach to Retrieve Expression Distribution of Single-Cell data, which infers the intrinsic expression distribution in scRNA-seq data using a Gaussian mixture model. baredSC can be used to obtain the distribution in one dimension for individual genes and in two dimensions for pairs of genes, in particular to estimate the correlation in the two genes’ expressions. We apply baredSC to simulated scRNA-seq data and show that the algorithm is able to uncover the expression distribution used to simulate the data, even in multi-modal cases with very sparse data. We also apply baredSC to two real biological data sets. First, we use it to measure the anti-correlation between Hoxd13 and Hoxa11, two genes with known genetic interaction in embryonic limb. Then, we study the expression of Pitx1 in embryonic hindlimb, for which a trimodal distribution has been identified through flow cytometry. While other methods to analyze scRNA-seq are too sensitive to sampling noise, baredSC reveals this trimodal distribution. Conclusion baredSC is a powerful tool which aims at retrieving the expression distribution of few genes of interest from scRNA-seq data.


2021 ◽  
Author(s):  
Austė Kanapeckaitė ◽  
Neringa Burokienė

Abstract At present, heart failure (HF) treatment only targets the symptoms based on the left ventricle dysfunction severity; however, the lack of systemic ‘omics’ studies and available biological data to uncover the heterogeneous underlying mechanisms signifies the need to shift the analytical paradigm towards network-centric and data mining approaches. This study, for the first time, aimed to investigate how bulk and single cell RNA-sequencing as well as the proteomics analysis of the human heart tissue can be integrated to uncover HF-specific networks and potential therapeutic targets or biomarkers. We also aimed to address the issue of dealing with a limited number of samples and to show how appropriate statistical models, enrichment with other datasets as well as machine learning-guided analysis can aid in such cases. Furthermore, we elucidated specific gene expression profiles using transcriptomic and mined data from public databases. This was achieved using the two-step machine learning algorithm to predict the likelihood of the therapeutic target or biomarker tractability based on a novel scoring system, which has also been introduced in this study. The described methodology could be very useful for the target or biomarker selection and evaluation during the pre-clinical therapeutics development stage as well as disease progression monitoring. In addition, the present study sheds new light into the complex aetiology of HF, differentiating between subtle changes in dilated cardiomyopathies (DCs) and ischemic cardiomyopathies (ICs) on the single cell, proteome and whole transcriptome level, demonstrating that HF might be dependent on the involvement of not only the cardiomyocytes but also on other cell populations. Identified tissue remodelling and inflammatory processes can be beneficial when selecting targeted pharmacological management for DCs or ICs, respectively.


2020 ◽  
Author(s):  
Auste Kanapeckaite ◽  
Neringa Burokiene

At present heart failure treatment targets symptoms based on the left ventricle dysfunction severity; however, lack of systemic studies and available biological data to uncover heterogeneous underlying mechanisms on the scale of genomic, transcriptional and expressed protein level signifies the need to shift the analytical paradigm toward network centric and data mining approaches. This study, for the first time, aimed to investigate how bulk and single cell RNA-sequencing as well as the proteomics analysis of the human heart tissue can be integrated to uncover heart failure specific networks and potential therapeutic targets or biomarkers. Furthermore, it was demonstrated that transcriptomics data in combination with minded data from public databases can be used to elucidate specific gene expression profiles. This was achieved using machine learning algorithms to predict the likelihood of the therapeutic target or biomarker tractability based on a novel scoring system also introduced in this study. The described methodology could be very useful for the target selection and evaluation during the pre-clinical therapeutics development stage. Finally, the present study shed new light into the complex etiology of the heart failure differentiating between subtle changes in dilated and ischemic cardiomyopathy on the single cell, proteome and whole transcriptome level.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Rongxin Fang ◽  
Sebastian Preissl ◽  
Yang Li ◽  
Xiaomeng Hou ◽  
Jacinta Lucero ◽  
...  

AbstractIdentification of the cis-regulatory elements controlling cell-type specific gene expression patterns is essential for understanding the origin of cellular diversity. Conventional assays to map regulatory elements via open chromatin analysis of primary tissues is hindered by sample heterogeneity. Single cell analysis of accessible chromatin (scATAC-seq) can overcome this limitation. However, the high-level noise of each single cell profile and the large volume of data pose unique computational challenges. Here, we introduce SnapATAC, a software package for analyzing scATAC-seq datasets. SnapATAC dissects cellular heterogeneity in an unbiased manner and map the trajectories of cellular states. Using the Nyström method, SnapATAC can process data from up to a million cells. Furthermore, SnapATAC incorporates existing tools into a comprehensive package for analyzing single cell ATAC-seq dataset. As demonstration of its utility, SnapATAC is applied to 55,592 single-nucleus ATAC-seq profiles from the mouse secondary motor cortex. The analysis reveals ~370,000 candidate regulatory elements in 31 distinct cell populations in this brain region and inferred candidate cell-type specific transcriptional regulators.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Lars Velten ◽  
Benjamin A. Story ◽  
Pablo Hernández-Malmierca ◽  
Simon Raffel ◽  
Daniel R. Leonce ◽  
...  

AbstractCancer stem cells drive disease progression and relapse in many types of cancer. Despite this, a thorough characterization of these cells remains elusive and with it the ability to eradicate cancer at its source. In acute myeloid leukemia (AML), leukemic stem cells (LSCs) underlie mortality but are difficult to isolate due to their low abundance and high similarity to healthy hematopoietic stem cells (HSCs). Here, we demonstrate that LSCs, HSCs, and pre-leukemic stem cells can be identified and molecularly profiled by combining single-cell transcriptomics with lineage tracing using both nuclear and mitochondrial somatic variants. While mutational status discriminates between healthy and cancerous cells, gene expression distinguishes stem cells and progenitor cell populations. Our approach enables the identification of LSC-specific gene expression programs and the characterization of differentiation blocks induced by leukemic mutations. Taken together, we demonstrate the power of single-cell multi-omic approaches in characterizing cancer stem cells.


2017 ◽  
Vol 114 (37) ◽  
pp. E7786-E7795 ◽  
Author(s):  
Jason C. H. Tsang ◽  
Joaquim S. L. Vong ◽  
Lu Ji ◽  
Liona C. Y. Poon ◽  
Peiyong Jiang ◽  
...  

The human placenta is a dynamic and heterogeneous organ critical in the establishment of the fetomaternal interface and the maintenance of gestational well-being. It is also the major source of cell-free fetal nucleic acids in the maternal circulation. Placental dysfunction contributes to significant complications, such as preeclampsia, a potentially lethal hypertensive disorder during pregnancy. Previous studies have identified significant changes in the expression profiles of preeclamptic placentas using whole-tissue analysis. Moreover, studies have shown increased levels of targeted RNA transcripts, overall and placental contributions in maternal cell-free nucleic acids during pregnancy progression and gestational complications, but it remains infeasible to noninvasively delineate placental cellular dynamics and dysfunction at the cellular level using maternal cell-free nucleic acid analysis. In this study, we addressed this issue by first dissecting the cellular heterogeneity of the human placenta and defined individual cell-type–specific gene signatures by analyzing more than 24,000 nonmarker selected cells from full-term and early preeclamptic placentas using large-scale microfluidic single-cell transcriptomic technology. Our dataset identified diverse cellular subtypes in the human placenta and enabled reconstruction of the trophoblast differentiation trajectory. Through integrative analysis with maternal plasma cell-free RNA, we resolved the longitudinal cellular dynamics of hematopoietic and placental cells in pregnancy progression. Furthermore, we were able to noninvasively uncover the cellular dysfunction of extravillous trophoblasts in early preeclamptic placentas. Our work showed the potential of integrating transcriptomic information derived from single cells into the interpretation of cell-free plasma RNA, enabling the noninvasive elucidation of cellular dynamics in complex pathological conditions.


Author(s):  
Qi Qiu ◽  
Peng Hu ◽  
Kiya W. Govek ◽  
Pablo G. Camara ◽  
Hao Wu

ABSTRACTSingle-cell RNA sequencing offers snapshots of whole transcriptomes but obscures the temporal dynamics of RNA biogenesis and decay. Here we present single-cell new transcript tagging sequencing (scNT-Seq), a method for massively parallel analysis of newly-transcribed and pre-existing RNAs from the same cell. This droplet microfluidics-based method enables high-throughput chemical conversion on barcoded beads, efficiently marking metabolically labeled newly-transcribed RNAs with T-to-C substitutions. By simultaneously measuring new and old transcriptomes, scNT-Seq reveals neuronal subtype-specific gene regulatory networks and time-resolved RNA trajectories in response to brief (minutes) versus sustained (hours) neuronal activation. Integrating scNT-Seq with genetic perturbation reveals that DNA methylcytosine dioxygenases may inhibit stepwise transition from pluripotent embryonic stem cell state to intermediate and totipotent two-cell-embryo-like (2C-like) states by promoting global RNA biogenesis. Furthermore, pulse-chase scNT-Seq enables transcriptome-wide measurements of RNA stability in rare 2C-like cells. Time-resolved single-cell transcriptomic analysis thus opens new lines of inquiry regarding cell-type-specific RNA regulatory mechanisms.


2020 ◽  
Author(s):  
Tatyana Dobreva ◽  
David Brown ◽  
Jong Hwee Park ◽  
Matt Thomson

AbstractAn individual’s immune system is driven by both genetic and environmental factors that vary over time. To better understand the temporal and inter-individual variability of gene expression within distinct immune cell types, we developed a platform that leverages multiplexed single-cell sequencing and out-of-clinic capillary blood extraction to enable simplified, cost-effective profiling of the human immune system across people and time at single-cell resolution. Using the platform, we detect widespread differences in cell type-specific gene expression between subjects that are stable over multiple days.SummaryIncreasing evidence implicates the immune system in an overwhelming number of diseases, and distinct cell types play specific roles in their pathogenesis.1,2 Studies of peripheral blood have uncovered a wealth of associations between gene expression, environmental factors, disease risk, and therapeutic efficacy.4 For example, in rheumatoid arthritis, multiple mechanistic paths have been found that lead to disease, and gene expression of specific immune cell types can be used as a predictor of therapeutic non-response.12 Furthermore, vaccines, drugs, and chemotherapy have been shown to yield different efficacy based on time of administration, and such findings have been linked to the time-dependence of gene expression in downstream pathways.21,22,23 However, human immune studies of gene expression between individuals and across time remain limited to a few cell types or time points per subject, constraining our understanding of how networks of heterogeneous cells making up each individual’s immune system respond to adverse events and change over time.


Sign in / Sign up

Export Citation Format

Share Document