scholarly journals Integrative analysis of single-cell expression data reveals distinct regulatory states in bidirectional promoters

2018 ◽  
Vol 11 (1) ◽  
Author(s):  
Fatemeh Behjati Ardakani ◽  
Kathrin Kattler ◽  
Karl Nordström ◽  
Nina Gasparoni ◽  
Gilles Gasparoni ◽  
...  
2019 ◽  
Author(s):  
Debajyoti Sinha ◽  
Pradyumn Sinha ◽  
Ritwik Saha ◽  
Sanghamitra Bandyopadhyay ◽  
Debarka Sengupta

Abstract Summary DropClust leverages Locality Sensitive Hashing (LSH) to speed up clustering of large scale single cell expression data. Here we present the improved dropClust, a complete R package that is, fast, interoperable and minimally resource intensive. The new dropClust features a novel batch effect removal algorithm that allows integrative analysis of single cell RNA-seq (scRNA-seq) datasets. Availability and implementation dropClust is freely available at https://github.com/debsin/dropClust as an R package. A lightweight online version of the dropClust is available at https://debsinha.shinyapps.io/dropClust/. Supplementary information Supplementary data are available at Bioinformatics online.


2018 ◽  
Vol 9 (1) ◽  
Author(s):  
Aashi Jindal ◽  
Prashant Gupta ◽  
Jayadeva ◽  
Debarka Sengupta

2018 ◽  
Vol 47 (D1) ◽  
pp. D711-D715 ◽  
Author(s):  
Awais Athar ◽  
Anja Füllgrabe ◽  
Nancy George ◽  
Haider Iqbal ◽  
Laura Huerta ◽  
...  

Author(s):  
Dongshunyi Li ◽  
Jun Ding ◽  
Ziv Bar-Joseph

Abstract Motivation Recent technological advances enable the profiling of spatial single-cell expression data. Such data present a unique opportunity to study cell–cell interactions and the signaling genes that mediate them. However, most current methods for the analysis of these data focus on unsupervised descriptive modeling, making it hard to identify key signaling genes and quantitatively assess their impact. Results We developed a Mixture of Experts for Spatial Signaling genes Identification (MESSI) method to identify active signaling genes within and between cells. The mixture of experts strategy enables MESSI to subdivide cells into subtypes. MESSI relies on multi-task learning using information from neighboring cells to improve the prediction of response genes within a cell. Applying the methods to three spatial single-cell expression datasets, we show that MESSI accurately predicts the levels of response genes, improving upon prior methods and provides useful biological insights about key signaling genes and subtypes of excitatory neuron cells. Availability and implementation MESSI is available at: https://github.com/doraadong/MESSI


2017 ◽  
Author(s):  
Patrick S Stumpf ◽  
Ben D MacArthur

AbstractThe molecular regulatory network underlying stem cell pluripotency has been intensively studied, and we now have a reliable ensemble model for the ‘average’ pluripotent cell. However, evidence of significant cell-to-cell variability suggests that the activity of this network varies within individual stem cells, leading to differential processing of environmental signals and variability in cell fates. Here, we adapt a method originally designed for face recognition to infer regulatory network patterns within individual cells from single-cell expression data. Using this method we identify three distinct network configurations in cultured mouse embryonic stem cells – corresponding to naïve and formative pluripotent states and an early primitive endoderm state – and associate these configurations with particular combinations of regulatory network activity archetypes that govern different aspects of the cell’s response to environmental stimuli, cell cycle status and core information processing circuitry. These results show how variability in cell identities arise naturally from alterations in underlying regulatory network dynamics and demonstrate how methods from machine learning may be used to better understand single cell biology, and the collective dynamics of cell communities.


2019 ◽  
Vol 14 (3) ◽  
pp. 255-268 ◽  
Author(s):  
Wei Zhang ◽  
Wenchao Li ◽  
Jianming Zhang ◽  
Ning Wang

Background: Gene Regulatory Network (GRN) inference algorithms aim to explore casual interactions between genes and transcriptional factors. High-throughput transcriptomics data including DNA microarray and single cell expression data contain complementary information in network inference. Objective: To enhance GRN inference, data integration across various types of expression data becomes an economic and efficient solution. Method: In this paper, a novel E-alpha integration rule-based ensemble inference algorithm is proposed to merge complementary information from microarray and single cell expression data. This paper implements a Gradient Boosting Tree (GBT) inference algorithm to compute importance scores for candidate gene-gene pairs. The proposed E-alpha rule quantitatively evaluates the credibility levels of each information source and determines the final ranked list. Results: Two groups of in silico gene networks are applied to illustrate the effectiveness of the proposed E-alpha integration. Experimental outcomes with size50 and size100 in silico gene networks suggest that the proposed E-alpha rule significantly improves performance metrics compared with single information source. Conclusion: In GRN inference, the integration of hybrid expression data using E-alpha rule provides a feasible and efficient way to enhance performance metrics than solely increasing sample sizes.


2019 ◽  
Vol 35 (20) ◽  
pp. 4011-4019 ◽  
Author(s):  
Ghislain Durif ◽  
Laurent Modolo ◽  
Jeff E Mold ◽  
Sophie Lambert-Lacroix ◽  
Franck Picard

Abstract Motivation The development of high-throughput single-cell sequencing technologies now allows the investigation of the population diversity of cellular transcriptomes. The expression dynamics (gene-to-gene variability) can be quantified more accurately, thanks to the measurement of lowly expressed genes. In addition, the cell-to-cell variability is high, with a low proportion of cells expressing the same genes at the same time/level. Those emerging patterns appear to be very challenging from the statistical point of view, especially to represent a summarized view of single-cell expression data. Principal component analysis (PCA) is a most powerful tool for high dimensional data representation, by searching for latent directions catching the most variability in the data. Unfortunately, classical PCA is based on Euclidean distance and projections that poorly work in presence of over-dispersed count data with dropout events like single-cell expression data. Results We propose a probabilistic Count Matrix Factorization (pCMF) approach for single-cell expression data analysis that relies on a sparse Gamma-Poisson factor model. This hierarchical model is inferred using a variational EM algorithm. It is able to jointly build a low dimensional representation of cells and genes. We show how this probabilistic framework induces a geometry that is suitable for single-cell data visualization, and produces a compression of the data that is very powerful for clustering purposes. Our method is competed against other standard representation methods like t-SNE, and we illustrate its performance for the representation of single-cell expression data. Availability and implementation Our work is implemented in the pCMF R-package (https://github.com/gdurif/pCMF). Supplementary information Supplementary data are available at Bioinformatics online.


2017 ◽  
Author(s):  
Chieh Lin ◽  
Siddhartha Jain ◽  
Hannah Kim ◽  
Ziv Bar-Joseph

AbstractWhile only recently developed, the ability to profile expression data in single cells (scRNA-Seq) has already led to several important studies and findings. However, this technology has also raised several new computational challenges including questions related to handling the noisy and sometimes incomplete data, how to identify unique group of cells in such experiments and how to determine the state or function of specific cells based on their expression profile. To address these issues we develop and test a method based on neural networks (NN) for the analysis and retrieval of single cell RNA-Seq data. We tested various NN architectures, some biologically motivated, and used these to obtain a reduced dimension representation of the single cell expression data. We show that the NN method improves upon prior methods in both, the ability to correctly group cells in experiments not used in the training and the ability to correctly infer cell type or state by querying a database of tens of thousands of single cell profiles. Such database queries (which can be performed using our web server) will enable researchers to better characterize cells when analyzing heterogeneous scRNA-Seq samples.Supporting website: http://sb.cs.cmu.edu/scnn/Password for accessing the retrieval task webserver: scRNA-Seq


Sign in / Sign up

Export Citation Format

Share Document