scholarly journals Benchmarking scRNA-seq imputation tools with respect to network inference highlights deficits in performance at high levels of sparsity

2021 ◽  
Author(s):  
Lisa Maria Steinheuer ◽  
Sebastian Canzler ◽  
Jörg Hackermüller

AbstractGene correlation network inference from single-cell transcriptomics data potentially allows to gain unprecendented insights into cell type-specific regulatory programs. ScRNA-seq data is severely affected by dropout, which significantly hampers and restrains current downstream analysis. Although newly developed tools are capable to deal with sparse data, no appropriate single-cell network inference workflow has been established. A potential way to end this deadlock is the application of data imputation methods, which already proofed to be useful in specific contexts of single-cell data analysis, e.g., recovering cell clusters. In order to infer cell-type specific networks, two prerequisites must be met: the identification of cluster-specific cell-types and the network inference itself.Here, we propose a benchmarking framework to investigate both objections. By using suitable reference data with inherent correlation structure, six representative imputation tools and appropriate evaluation measures, we were able to systematically infer the impact of data imputation on network inference. Major network structures were found to be preserved in low dropout data sets. For moderately sparse data sets, DCA was able to recover gene correlation structures, although systematically introducing higher correlation values. No imputation tool was able to recover true signals from high dropout data. However, by using an additional biological data set we could show that cell-cell correlation by means of specific marker gene expression was not compromised through data imputation.Our analysis showed that network inference is feasible for low and moderately sparse data sets by using the unimputed and DCA-prepared data, respectively. High sparsity data, on the other side, still pose a major problem since current imputation techniques are not able to facilitate network inference. The annotation of cluster-specific cell-types as a prerequisite is not hampered by data imputation but their power to restore the deeply hidden correlation structures is still not sufficient enough.

2019 ◽  
Author(s):  
Pawel F. Przytycki ◽  
Katherine S. Pollard

Single-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell-type specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve enhancers to specific cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their enhancers.


2021 ◽  
Author(s):  
Gulden Olgun ◽  
Vishaka Gopalan ◽  
Sridhar Hannenhalli

Micro-RNAs (miRNA) are critical in development, homeostasis, and diseases, including cancer. However, our understanding of miRNA function at cellular resolution is thwarted by the inability of the standard single cell RNA-seq protocols to capture miRNAs. Here we introduce a machine learning tool -- miRSCAPE -- to infer miRNA expression in a sample from its RNA-seq profile. We establish miRSCAPE's accuracy separately in 10 tissues comprising ~10,000 tumor and normal bulk samples and demonstrate that miRSCAPE accurately infers cell type-specific miRNA activities (predicted vs observed fold-difference correlation ~ 0.81) in two independent datasets where miRNA profiles of specific cell types are available (HEK-GBM, Kidney-Breast-Skin). When trained on human hematopoietic cancers, miRSCAPE can identify active miRNAs in 8 hematopoietic cell lines in mouse with a reasonable accuracy (auROC = 0.67). Finally, we apply miRSCAPE to infer miRNA activities in scRNA clusters in Pancreatic and Lung cancers, as well as in 56 cell types in the Human Cell Landscape (HCL). Across the board, miRSCAPE recapitulates and provides a refined view of known miRNA biology. miRSCAPE is freely available and promises to substantially expand our understanding of gene regulatory networks at cellular resolution.


Author(s):  
Jiebiao Wang ◽  
Kathryn Roeder ◽  
Bernie Devlin

AbstractWhen assessed over a large number of samples, bulk RNA sequencing provides reliable data for gene expression at the tissue level. Single-cell RNA sequencing (scRNA-seq) deepens those analyses by evaluating gene expression at the cellular level. Both data types lend insights into disease etiology. With current technologies, however, scRNA-seq data are known to be noisy. Moreover, constrained by costs, scRNA-seq data are typically generated from a relatively small number of subjects, which limits their utility for some analyses, such as identification of gene expression quantitative trait loci (eQTLs). To address these issues while maintaining the unique advantages of each data type, we develop a Bayesian method (bMIND) to integrate bulk and scRNA-seq data. With a prior derived from scRNA-seq data, we propose to estimate sample-level cell-type-specific (CTS) expression from bulk expression data. The CTS expression enables large-scale sample-level downstream analyses, such as detecting CTS differentially expressed genes (DEGs) and eQTLs. Through simulations, we demonstrate that bMIND improves the accuracy of sample-level CTS expression estimates and power to discover CTS-DEGs when compared to existing methods. To further our understanding of two complex phenotypes, autism spectrum disorder and Alzheimer’s disease, we apply bMIND to gene expression data of relevant brain tissue to identify CTS-DEGs. Our results complement findings for CTS-DEGs obtained from snRNA-seq studies, replicating certain DEGs in specific cell types while nominating other novel genes in those cell types. Finally, we calculate CTS-eQTLs for eleven brain regions by analyzing GTEx V8 data, creating a new resource for biological insights.


BMC Biology ◽  
2020 ◽  
Vol 18 (1) ◽  
Author(s):  
Elin Lundin ◽  
Chenglin Wu ◽  
Albin Widmark ◽  
Mikaela Behm ◽  
Jens Hjerling-Leffler ◽  
...  

Abstract Background Adenosine-to-inosine (A-to-I) RNA editing is a process that contributes to the diversification of proteins that has been shown to be essential for neurotransmission and other neuronal functions. However, the spatiotemporal and diversification properties of RNA editing in the brain are largely unknown. Here, we applied in situ sequencing to distinguish between edited and unedited transcripts in distinct regions of the mouse brain at four developmental stages, and investigate the diversity of the RNA landscape. Results We analyzed RNA editing at codon-altering sites using in situ sequencing at single-cell resolution, in combination with the detection of individual ADAR enzymes and specific cell type marker transcripts. This approach revealed cell-type-specific regulation of RNA editing of a set of transcripts, and developmental and regional variation in editing levels for many of the targeted sites. We found increasing editing diversity throughout development, which arises through regional- and cell type-specific regulation of ADAR enzymes and target transcripts. Conclusions Our single-cell in situ sequencing method has proved useful to study the complex landscape of RNA editing and our results indicate that this complexity arises due to distinct mechanisms of regulating individual RNA editing sites, acting both regionally and in specific cell types.


2021 ◽  
Author(s):  
Kai Kang ◽  
Caizhi David Huang ◽  
Yuanyuan Li ◽  
David M. Umbach ◽  
Leping Li

AbstractBackgroundBiological tissues consist of heterogenous populations of cells. Because gene expression patterns from bulk tissue samples reflect the contributions from all cells in the tissue, understanding the contribution of individual cell types to the overall gene expression in the tissue is fundamentally important. We recently developed a computational method, CDSeq, that can simultaneously estimate both sample-specific cell-type proportions and cell-type-specific gene expression profiles using only bulk RNA-Seq counts from multiple samples. Here we present an R implementation of CDSeq (CDSeqR) with significant performance improvement over the original implementation in MATLAB and with a new function to aid interpretation of deconvolution outcomes. The R package would be of interest for the broader R community.ResultWe developed a novel strategy to substantially improve computational efficiency in both speed and memory usage. In addition, we designed and implemented a new function for annotating CDSeq-estimated cell types using publicly available single-cell RNA sequencing (scRNA-seq) data (single-cell data from 20 major organs are included in the R package). This function allows users to readily interpret and visualize the CDSeq-estimated cell types. We carried out additional validations of the CDSeqR software with in silico and in vitro mixtures and with real experimental data including RNA-seq data from the Cancer Genome Atlas (TCGA) and The Genotype-Tissue Expression (GTEx) project.ConclusionsThe existing bulk RNA-seq repositories, such as TCGA and GTEx, provide enormous resources for better understanding changes in transcriptomics and human diseases. They are also potentially useful for studying cell-cell interactions in the tissue microenvironment. However, bulk level analyses neglect tissue heterogeneity and hinder investigation in a cell-type-specific fashion. The CDSeqR package can be viewed as providing in silico single-cell dissection of bulk measurements. It enables researchers to gain cell-type-specific information from bulk RNA-seq data.


2021 ◽  
Author(s):  
Meriem Mekedem ◽  
Patrice Ravel ◽  
Jacques Colinge

The development of high-throughput genomic technologies associated with recent genetic perturbation techniques such as short hairpin RNA (shRNA), gene trapping, or gene editing (CRISPR/Cas9) has made it possible to obtain large perturbation data sets. These data sets are invaluable sources of information regarding the function of genes, and they offer unique opportunities to reverse engineer gene regulatory networks in specific cell types. Modular response analysis (MRA) is a well-accepted mathematical modeling method that is precisely aimed at such network inference tasks, but its use has been limited to rather small biological systems so far. In this study, we show that MRA can be employed on large systems with almost 1,000 network components. In particular, we show that MRA performance surpasses general-purpose mutual information-based algorithms. Part of these competitive results was obtained by the application of a novel heuristic that pruned MRA-inferred interactions a posteriori. We also exploited a block structure in MRA linear algebra to parallelize large system resolutions.


Author(s):  
Welles Robinson ◽  
Fiorella Schischlik ◽  
E. Michael Gertz ◽  
Alejandro A. Schäffer ◽  
Eytan Ruppin

AbstractMicrobial taxa that are differentially abundant between cell types are likely to be intracellular. Here we describe a new computational pipeline called CSI-Microbes (computational identification of Cell type Specific Intracellular Microbes) that aims to identify such putative intracellular species from single cell RNA-seq data in a given tumor sample. CSI-microbes also includes additional steps that can be applied to filter out microbial contaminants from the bona fide microbial residents of cells in the patients. We first test and validate CSI-microbes on a dataset of immune cells deliberately infected with Salmonella. We then apply CSI-microbes to identify intracellular microbes in breast cancer and melanoma. We identify Streptomyces as differentially abundant in the tumor cells of one breast cancer sample. We further identify three bacterial genera and four fungal genera that are differentially abundant and hence likely to be intracellular in the tumor cells in melanoma samples. No cell type specific bacteria were identified in our analysis of brain tumor samples. In sum, CSI-Microbes offers a new way to identify likely intracellular microbes living within specific cell populations in malignant tumors, markedly extending upon previous studies aimed at inferring microbial abundance from bulk tumor expression data.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Pawel F. Przytycki ◽  
Katherine S. Pollard

AbstractSingle-cell and bulk genomics assays have complementary strengths and weaknesses, and alone neither strategy can fully capture regulatory elements across the diversity of cells in complex tissues. We present CellWalker, a method that integrates single-cell open chromatin (scATAC-seq) data with gene expression (RNA-seq) and other data types using a network model that simultaneously improves cell labeling in noisy scATAC-seq and annotates cell type-specific regulatory elements in bulk data. We demonstrate CellWalker’s robustness to sparse annotations and noise using simulations and combined RNA-seq and ATAC-seq in individual cells. We then apply CellWalker to the developing brain. We identify cells transitioning between transcriptional states, resolve regulatory elements to cell types, and observe that autism and other neurological traits can be mapped to specific cell types through their regulatory elements.


2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Deepa Bhartiya

AbstractLife-long tissue homeostasis of adult tissues is supposedly maintained by the resident stem cells. These stem cells are quiescent in nature and rarely divide to self-renew and give rise to tissue-specific “progenitors” (lineage-restricted and tissue-committed) which divide rapidly and differentiate into tissue-specific cell types. However, it has proved difficult to isolate these quiescent stem cells as a physical entity. Recent single-cell RNAseq studies on several adult tissues including ovary, prostate, and cardiac tissues have not been able to detect stem cells. Thus, it has been postulated that adult cells dedifferentiate to stem-like state to ensure regeneration and can be defined as cells capable to replace lost cells through mitosis. This idea challenges basic paradigm of development biology regarding plasticity that a cell enters point of no return once it initiates differentiation. The underlying reason for this dilemma is that we are putting stem cells and somatic cells together while processing for various studies. Stem cells and adult mature cell types are distinct entities; stem cells are quiescent, small in size, and with minimal organelles whereas the mature cells are metabolically active and have multiple organelles lying in abundant cytoplasm. As a result, they do not pellet down together when centrifuged at 100–350g. At this speed, mature cells get collected but stem cells remain buoyant and can be pelleted by centrifuging at 1000g. Thus, inability to detect stem cells in recently published single-cell RNAseq studies is because the stem cells were unknowingly discarded while processing and were never subjected to RNAseq. This needs to be kept in mind before proposing to redefine adult stem cells.


Author(s):  
Hee-Dae Kim ◽  
Jing Wei ◽  
Tanessa Call ◽  
Nicole Teru Quintus ◽  
Alexander J. Summers ◽  
...  

AbstractDepression is the leading cause of disability and produces enormous health and economic burdens. Current treatment approaches for depression are largely ineffective and leave more than 50% of patients symptomatic, mainly because of non-selective and broad action of antidepressants. Thus, there is an urgent need to design and develop novel therapeutics to treat depression. Given the heterogeneity and complexity of the brain, identification of molecular mechanisms within specific cell-types responsible for producing depression-like behaviors will advance development of therapies. In the reward circuitry, the nucleus accumbens (NAc) is a key brain region of depression pathophysiology, possibly based on differential activity of D1- or D2- medium spiny neurons (MSNs). Here we report a circuit- and cell-type specific molecular target for depression, Shisa6, recently defined as an AMPAR component, which is increased only in D1-MSNs in the NAc of susceptible mice. Using the Ribotag approach, we dissected the transcriptional profile of D1- and D2-MSNs by RNA sequencing following a mouse model of depression, chronic social defeat stress (CSDS). Bioinformatic analyses identified cell-type specific genes that may contribute to the pathogenesis of depression, including Shisa6. We found selective optogenetic activation of the ventral tegmental area (VTA) to NAc circuit increases Shisa6 expression in D1-MSNs. Shisa6 is specifically located in excitatory synapses of D1-MSNs and increases excitability of neurons, which promotes anxiety- and depression-like behaviors in mice. Cell-type and circuit-specific action of Shisa6, which directly modulates excitatory synapses that convey aversive information, identifies the protein as a potential rapid-antidepressant target for aberrant circuit function in depression.


Sign in / Sign up

Export Citation Format

Share Document