scholarly journals SigHotSpotter: scRNA-seq-based computational tool to control cell subpopulation phenotypes for cellular rejuvenation strategies

Author(s):  
Srikanth Ravichandran ◽  
András Hartmann ◽  
Antonio del Sol

Abstract Summary Single-cell RNA-sequencing is increasingly employed to characterize disease or ageing cell subpopulation phenotypes. Despite exponential increase in data generation, systematic identification of key regulatory factors for controlling cellular phenotype to enable cell rejuvenation in disease or ageing remains a challenge. Here, we present SigHotSpotter, a computational tool to predict hotspots of signaling pathways responsible for the stable maintenance of cell subpopulation phenotypes, by integrating signaling and transcriptional networks. Targeted perturbation of these signaling hotspots can enable precise control of cell subpopulation phenotypes. SigHotSpotter correctly predicts the signaling hotspots with known experimental validations in different cellular systems. The tool is simple, user-friendly and is available as web-server or as stand-alone software. We believe SigHotSpotter will serve as a general purpose tool for the systematic prediction of signaling hotspots based on single-cell RNA-seq data, and potentiate novel cell rejuvenation strategies in the context of disease and ageing. Availability and implementation SigHotSpotter is at https://SigHotSpotter.lcsb.uni.lu as a web tool. Source code, example datasets and other information are available at https://gitlab.com/srikanth.ravichandran/sighotspotter. Supplementary information Supplementary data are available at Bioinformatics online.

Author(s):  
Yang Xu ◽  
Priyojit Das ◽  
Rachel Patton McCord

Abstract Motivation Deep learning approaches have empowered single-cell omics data analysis in many ways and generated new insights from complex cellular systems. As there is an increasing need for single cell omics data to be integrated across sources, types, and features of data, the challenges of integrating single-cell omics data are rising. Here, we present an unsupervised deep learning algorithm that learns discriminative representations for single-cell data via maximizing mutual information, SMILE (Single-cell Mutual Information Learning). Results Using a unique cell-pairing design, SMILE successfully integrates multi-source single-cell transcriptome data, removing batch effects and projecting similar cell types, even from different tissues, into the shared space. SMILE can also integrate data from two or more modalities, such as joint profiling technologies using single-cell ATAC-seq, RNA-seq, DNA methylation, Hi-C, and ChIP data. When paired cells are known, SMILE can integrate data with unmatched feature, such as genes for RNA-seq and genome wide peaks for ATAC-seq. Integrated representations learned from joint profiling technologies can then be used as a framework for comparing independent single source data. Supplementary information Supplementary data are available at Bioinformatics online. The source code of SMILE including analyses of key results in the study can be found at: https://github.com/rpmccordlab/SMILE.


2018 ◽  
Vol 34 (12) ◽  
pp. 2077-2086 ◽  
Author(s):  
Suoqin Jin ◽  
Adam L MacLean ◽  
Tao Peng ◽  
Qing Nie

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) offers unprecedented resolution for studying cellular decision-making processes. Robust inference of cell state transition paths and probabilities is an important yet challenging step in the analysis of these data. Results Here we present scEpath, an algorithm that calculates energy landscapes and probabilistic directed graphs in order to reconstruct developmental trajectories. We quantify the energy landscape using ‘single-cell energy’ and distance-based measures, and find that the combination of these enables robust inference of the transition probabilities and lineage relationships between cell states. We also identify marker genes and gene expression patterns associated with cell state transitions. Our approach produces pseudotemporal orderings that are—in combination—more robust and accurate than current methods, and offers higher resolution dynamics of the cell state transitions, leading to new insight into key transition events during differentiation and development. Moreover, scEpath is robust to variation in the size of the input gene set, and is broadly unsupervised, requiring few parameters to be set by the user. Applications of scEpath led to the identification of a cell-cell communication network implicated in early human embryo development, and novel transcription factors important for myoblast differentiation. scEpath allows us to identify common and specific temporal dynamics and transcriptional factor programs along branched lineages, as well as the transition probabilities that control cell fates. Availability and implementation A MATLAB package of scEpath is available at https://github.com/sqjin/scEpath. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Congting Ye ◽  
Qian Zhou ◽  
Xiaohui Wu ◽  
Chen Yu ◽  
Guoli Ji ◽  
...  

Abstract Motivation Alternative polyadenylation (APA) plays a key post-transcriptional regulatory role in mRNA stability and functions in eukaryotes. Single cell RNA-seq (scRNA-seq) is a powerful tool to discover cellular heterogeneity at gene expression level. Given 3′ enriched strategy in library construction, the most commonly used scRNA-seq protocol—10× Genomics enables us to improve the study resolution of APA to the single cell level. However, currently there is no computational tool available for investigating APA profiles from scRNA-seq data. Results Here, we present a package scDAPA for detecting and visualizing dynamic APA from scRNA-seq data. Taking bam/sam files and cell cluster labels as inputs, scDAPA detects APA dynamics using a histogram-based method and the Wilcoxon rank-sum test, and visualizes candidate genes with dynamic APA. Benchmarking results demonstrated that scDAPA can effectively identify genes with dynamic APA among different cell groups from scRNA-seq data. Availability and implementation The scDAPA package is implemented in Shell and R, and is freely available at https://scdapa.sourceforge.io. Contact [email protected] Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 36 (15) ◽  
pp. 4233-4239
Author(s):  
Di Ran ◽  
Shanshan Zhang ◽  
Nicholas Lytal ◽  
Lingling An

Abstract Motivation Single-cell RNA-sequencing (scRNA-seq) has become an important tool to unravel cellular heterogeneity, discover new cell (sub)types, and understand cell development at single-cell resolution. However, one major challenge to scRNA-seq research is the presence of ‘drop-out’ events, which usually is due to extremely low mRNA input or the stochastic nature of gene expression. In this article, we present a novel single-cell RNA-seq drop-out correction (scDoc) method, imputing drop-out events by borrowing information for the same gene from highly similar cells. Results scDoc is the first method that directly involves drop-out information to accounting for cell-to-cell similarity estimation, which is crucial in scRNA-seq drop-out imputation but has not been appropriately examined. We evaluated the performance of scDoc using both simulated data and real scRNA-seq studies. Results show that scDoc outperforms the existing imputation methods in reference to data visualization, cell subpopulation identification and differential expression detection in scRNA-seq data. Availability and implementation R code is available at https://github.com/anlingUA/scDoc. Supplementary information Supplementary data are available at Bioinformatics online.


2019 ◽  
Author(s):  
Shu Wang ◽  
Jia-Ren Lin ◽  
Eduardo D. Sontag ◽  
Peter K. Sorger

AbstractThe goal of many single-cell studies on eukaryotic cells is to gain insight into the biochemical reactions that control cell fate and state. In this paper we introduce the concept of effective stoichiometric space (ESS) to guide the reconstruction of biochemical networks from multiplexed, fixed time-point, single-cell data. In contrast to methods based solely on statistical models of data, the ESS method leverages the power of the geometric theory of toric varieties to begin unraveling the structure of chemical reaction networks (CRN). This application of toric theory enables a data-driven mapping of covariance relationships in single cell measurements into stoichiometric information, one in which each cell subpopulation has its associated ESS interpreted in terms of CRN theory. In the development of ESS we reframe certain aspects of the theory of CRN to better match data analysis. As an application of our approach we process cytomery- and image-based single-cell datasets and identify differences in cells treated with kinase inhibitors. Our approach is directly applicable to data acquired using readily accessible experimental methods such as Fluorescence Activated Cell Sorting (FACS) and multiplex immunofluorescence.Author summaryWe introduce a new notion, which we call the effective stoichiometric space (ESS), that elucidates network structure from the covariances of single-cell multiplexed data. The ESS approach differs from methods that are based on purely statistical models of data: it allows a completely new and data-driven translation of the theory of toric varieties in geometry and specifically their role in chemical reaction networks (CRN). In the process, we reframe certain aspects of the theory of CRN. As illustrations of our approach, we find stoichiometry in different single-cell datasets, and pinpoint dose-dependence of network perturbations in drug-treated cells.


2021 ◽  
Author(s):  
A. Ali Heydari ◽  
Oscar A. Davalos ◽  
Lihong Zhao ◽  
Katrina K. Hoyer ◽  
Suzanne S. Sindi

MotivationSingle-cell RNA sequencing (scRNAseq) technologies allow for measurements of gene expression at a single-cell resolution. This provides researchers with a tremendous advantage for detecting heterogeneity, delineating cellular maps or identifying rare subpopulations. However, a critical complication remains: the low number of single-cell observations due to limitations by cost or rarity of subpopulation. This absence of suicient data may cause inaccuracy or irreproducibility of downstream analysis. In this work, we present ACTIVA (Automated Cell-Type-informed Introspective Variational Autoencoder): a novel framework for generating realistic synthetic data using a single-stream adversarial variational autoencoder conditioned with cell-type information. Data generation and augmentation with ACTIVA can enhance scRNAseq pipelines and analysis, such as benchmarking new algorithms, studying the accuracy of classifiers and detecting marker genes. ACTIVA will facilitate analysis of smaller datasets, potentially reducing the number of patients and animals necessary in initial studies.ResultsWe train and evaluate models on multiple public scRNAseq datasets. Under the same conditions, ACTIVA trains up to 17 times faster than the GAN-based state-of-the-art model, scGAN (2.2 hours compared to 39.5 hours on Brain Small) while performing better or comparable in our quantitative and qualitative evaluations. We show that augmenting rare-populations with ACTIVA can significantly increase the classification accuracy of the rare population (more than 45% improvement in our rarest test case).Availability of data and codeLinks to raw, pre- and post-processed data, source code and tutorials are available at https://github.com/SindiLab.Supplementary informationSupplementary material can be found as a separate file with the same pre-print submission.


2021 ◽  
Author(s):  
Thomas P Prescott ◽  
Kan Zhu ◽  
Min Zhao ◽  
Ruth E Baker

ABSTRACTCell motility in response to environmental cues forms the basis of many developmental processes in multicellular organisms. One such environmental cue is an electric field (EF), which induces a form of motility known as electrotaxis. Electrotaxis has evolved in a number of cell types to guide wound healing, and has been associated with different cellular processes, suggesting that observed electrotactic behaviour is likely a combination of multiple distinct effects arising from the presence of an EF. In order to determine the different mechanisms by which observed electrotactic behaviour emerges, and thus to design EFs that can be applied to direct and control electrotaxis, researchers require accurate quantitative predictions of cellular responses to externally-applied fields. Here, we use mathematical modelling to formulate and parametrise a variety of hypothetical descriptions of how cell motility may change in response to an EF. We calibrate our model to observed data using synthetic likelihoods and Bayesian sequential learning techniques, and demonstrate that EFs impact cellular motility in three distinct ways. We also demonstrate how the model allows us to make predictions about cellular motility under different EFs. The resulting model and calibration methodology will thus form the basis for future data-driven and model-based feedback control strategies based on electric actuation.SIGNIFICANCEElectrotaxis is attracting much interest and development as a technique to control cell migration due to the precision of electric fields as actuation signals. However, precise control of electrotactic migration relies on an accurate model of how cell motility changes in response to applied electric fields. We present and calibrate a parametrised stochastic model that accurately replicates experimental single-cell data and enables the prediction of input–output behaviour while quantifying uncertainty and stochasticity. The model allows us to quantify three distinct ways in which electric fields perturb the motile behaviour of the cell. This model and the associated simulation-based calibration methodology will be central to future developments in the control of electrotaxis.


Author(s):  
Irzam Sarfraz ◽  
Muhammad Asif ◽  
Joshua D Campbell

Abstract Motivation R Experiment objects such as the SummarizedExperiment or SingleCellExperiment are data containers for storing one or more matrix-like assays along with associated row and column data. These objects have been used to facilitate the storage and analysis of high-throughput genomic data generated from technologies such as single-cell RNA sequencing. One common computational task in many genomics analysis workflows is to perform subsetting of the data matrix before applying down-stream analytical methods. For example, one may need to subset the columns of the assay matrix to exclude poor-quality samples or subset the rows of the matrix to select the most variable features. Traditionally, a second object is created that contains the desired subset of assay from the original object. However, this approach is inefficient as it requires the creation of an additional object containing a copy of the original assay and leads to challenges with data provenance. Results To overcome these challenges, we developed an R package called ExperimentSubset, which is a data container that implements classes for efficient storage and streamlined retrieval of assays that have been subsetted by rows and/or columns. These classes are able to inherently provide data provenance by maintaining the relationship between the subsetted and parent assays. We demonstrate the utility of this package on a single-cell RNA-seq dataset by storing and retrieving subsets at different stages of the analysis while maintaining a lower memory footprint. Overall, the ExperimentSubset is a flexible container for the efficient management of subsets. Availability and implementation ExperimentSubset package is available at Bioconductor: https://bioconductor.org/packages/ExperimentSubset/ and Github: https://github.com/campbio/ExperimentSubset. Supplementary information Supplementary data are available at Bioinformatics online.


Author(s):  
Givanna H Putri ◽  
Irena Koprinska ◽  
Thomas M Ashhurst ◽  
Nicholas J C King ◽  
Mark N Read

Abstract Motivation Many ‘automated gating’ algorithms now exist to cluster cytometry and single-cell sequencing data into discrete populations. Comparative algorithm evaluations on benchmark datasets rely either on a single performance metric, or a few metrics considered independently of one another. However, single metrics emphasize different aspects of clustering performance and do not rank clustering solutions in the same order. This underlies the lack of consensus between comparative studies regarding optimal clustering algorithms and undermines the translatability of results onto other non-benchmark datasets. Results We propose the Pareto fronts framework as an integrative evaluation protocol, wherein individual metrics are instead leveraged as complementary perspectives. Judged superior are algorithms that provide the best trade-off between the multiple metrics considered simultaneously. This yields a more comprehensive and complete view of clustering performance. Moreover, by broadly and systematically sampling algorithm parameter values using the Latin Hypercube sampling method, our evaluation protocol minimizes (un)fortunate parameter value selections as confounding factors. Furthermore, it reveals how meticulously each algorithm must be tuned in order to obtain good results, vital knowledge for users with novel data. We exemplify the protocol by conducting a comparative study between three clustering algorithms (ChronoClust, FlowSOM and Phenograph) using four common performance metrics applied across four cytometry benchmark datasets. To our knowledge, this is the first time Pareto fronts have been used to evaluate the performance of clustering algorithms in any application domain. Availability and implementation Implementation of our Pareto front methodology and all scripts and datasets to reproduce this article are available at https://github.com/ghar1821/ParetoBench. Supplementary information Supplementary data are available at Bioinformatics online.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii312-iii312
Author(s):  
Andrea Griesinger ◽  
Eric Prince ◽  
Andrew Donson ◽  
Kent Riemondy ◽  
Timothy Ritzman ◽  
...  

Abstract We have previously shown immune gene phenotype variations between posterior fossa ependymoma subgroups. PFA1 tumors chronically secrete IL-6, which pushes the infiltrating myeloid cells to an immune suppressive function. In contrast, PFA2 tumors have a more immune activated phenotype and have a better prognosis. The objective of this study was to use single-cell(sc) RNAseq to descriptively characterize the infiltrating myeloid cells. We analyzed approximately 8500 cells from 21 PFA patient samples and used advanced machine learning techniques to identify distinct myeloid and lymphoid subpopulations. The myeloid compartment was difficult to interrupt as the data shows a continuum of gene expression profiles exist within PFA1 and PFA2. Through lineage tracing, we were able to tease out that PFA2 myeloid cells expressed more genes associated with an anti-viral response (MHC II, TNF-a, interferon-gamma signaling); while PFA1 myeloid cells had genes associated with an immune suppressive phenotype (angiogenesis, wound healing, IL-10). Specifically, we found expression of IKZF1 was upregulated in PFA2 myeloid cells. IKZF1 regulates differentiation of myeloid cells toward M1 or M2 phenotype through upregulation of either IRF5 or IRF4 respectively. IRF5 expression correlated with IKZF1, being predominately expressed in the PFA2 myeloid cell subset. IKZF1 is also involved in T-cell activation. While we have not completed our characterization of the T-cell subpopulation, we did find significantly more T-cell infiltration in PFA2 than PFA1. Moving forward these studies will provide us with valuable information regarding the molecular switches involved in the tumor-immune microenvironment and to better develop immunotherapy for PFA ependymoma.


Sign in / Sign up

Export Citation Format

Share Document