scholarly journals EpiScanpy: integrated single-cell epigenomic analysis

2019 ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
Maria Colomé-Tatché

ABSTRACTEpigenetic single-cell measurements reveal a layer of regulatory information not accessible to single-cell transcriptomics, however single-cell-omics analysis tools mainly focus on gene expression data. To address this issue, we present epiScanpy, a computational framework for the analysis of single-cell DNA methylation and single-cell ATAC-seq data. EpiScanpy makes the many existing RNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities. We introduce and compare multiple feature space constructions for epigenetic data and show the feasibility of common clustering, dimension reduction and trajectory learning techniques. We benchmark epiScanpy by interrogating different single-cell brain mouse atlases of DNA methylation, ATAC-seq and transcriptomics. We find that differentially methylated and differentially open markers between cell clusters enrich transcriptome-based cell type labels by orthogonal epigenetic information.

2021 ◽  
Vol 12 (1) ◽  
Author(s):  
Anna Danese ◽  
Maria L. Richter ◽  
Kridsadakorn Chaichoompu ◽  
David S. Fischer ◽  
Fabian J. Theis ◽  
...  

AbstractEpiScanpy is a toolkit for the analysis of single-cell epigenomic data, namely single-cell DNA methylation and single-cell ATAC-seq data. To address the modality specific challenges from epigenomics data, epiScanpy quantifies the epigenome using multiple feature space constructions and builds a nearest neighbour graph using epigenomic distance between cells. EpiScanpy makes the many existing scRNA-seq workflows from scanpy available to large-scale single-cell data from other -omics modalities, including methods for common clustering, dimension reduction, cell type identification and trajectory learning techniques, as well as an atlas integration tool for scATAC-seq datasets. The toolkit also features numerous useful downstream functions, such as differential methylation and differential openness calling, mapping epigenomic features of interest to their nearest gene, or constructing gene activity matrices using chromatin openness. We successfully benchmark epiScanpy against other scATAC-seq analysis tools and show its outperformance at discriminating cell types.


2018 ◽  
Author(s):  
Xianwen Ren ◽  
Liangtao Zheng ◽  
Zemin Zhang

ABSTRACTClustering is a prevalent analytical means to analyze single cell RNA sequencing data but the rapidly expanding data volume can make this process computational challenging. New methods for both accurate and efficient clustering are of pressing needs. Here we proposed a new clustering framework based on random projection and feature construction for large scale single-cell RNA sequencing data, which greatly improves clustering accuracy, robustness and computational efficacy for various state-of-the-art algorithms benchmarked on multiple real datasets. On a dataset with 68,578 human blood cells, our method reached 20% improvements for clustering accuracy and 50-fold acceleration but only consumed 66% memory usage compared to the widely-used software package SC3. Compared to k-means, the accuracy improvement can reach 3-fold depending on the concrete dataset. An R implementation of the framework is available from https://github.com/Japrin/sscClust.


2022 ◽  
Vol 23 (1) ◽  
Author(s):  
Huijian Feng ◽  
Lihui Lin ◽  
Jiekai Chen

Abstract Background Single-cell RNA sequencing is becoming a powerful tool to identify cell states, reconstruct developmental trajectories, and deconvolute spatial expression. The rapid development of computational methods promotes the insight of heterogeneous single-cell data. An increasing number of tools have been provided for biological analysts, of which two programming languages- R and Python are widely used among researchers. R and Python are complementary, as many methods are implemented specifically in R or Python. However, the different platforms immediately caused the data sharing and transformation problem, especially for Scanpy, Seurat, and SingleCellExperiemnt. Currently, there is no efficient and user-friendly software to perform data transformation of single-cell omics between platforms, which makes users spend unbearable time on data Input and Output (IO), significantly reducing the efficiency of data analysis. Results We developed scDIOR for single-cell data transformation between platforms of R and Python based on Hierarchical Data Format Version 5 (HDF5). We have created a data IO ecosystem between three R packages (Seurat, SingleCellExperiment, Monocle) and a Python package (Scanpy). Importantly, scDIOR accommodates a variety of data types across programming languages and platforms in an ultrafast way, including single-cell RNA-seq and spatial resolved transcriptomics data, using only a few codes in IDE or command line interface. For large scale datasets, users can partially load the needed information, e.g., cell annotation without the gene expression matrices. scDIOR connects the analytical tasks of different platforms, which makes it easy to compare the performance of algorithms between them. Conclusions scDIOR contains two modules, dior in R and diopy in Python. scDIOR is a versatile and user-friendly tool that implements single-cell data transformation between R and Python rapidly and stably. The software is freely accessible at https://github.com/JiekaiLab/scDIOR.


2018 ◽  
Author(s):  
Giovanni Iacono ◽  
Ramon Massoni-Badosa ◽  
Holger Heyn

SUMMARYSingle-cell RNA sequencing (scRNA-seq) plays a pivotal role in our understanding of cellular heterogeneity. Current analytical workflows are driven by categorizing principles that consider cells as individual entities and classify them into complex taxonomies. We have devised a conceptually different computational framework based on a holistic view, where single-cell datasets are used to infer global, large-scale regulatory networks. We developed correlation metrics that are specifically tailored to single-cell data, and then generated, validated and interpreted single-cell-derived regulatory networks from organs and perturbed systems, such as diabetes and Alzheimer’s disease. Using advanced tools from graph theory, we computed an unbiased quantification of a gene’s biological relevance, and accurately pinpointed key players in organ function and drivers of diseases. Our approach detected multiple latent regulatory changes that are invisible to single-cell workflows based on clustering or differential expression analysis. In summary, we have established the feasibility and value of regulatory network analysis using scRNA-seq datasets, which significantly broadens the biological insights that can be obtained with this leading technology.


2020 ◽  
Vol 22 (Supplement_3) ◽  
pp. iii311-iii312
Author(s):  
Bernhard Englinger ◽  
Johannes Gojo ◽  
Li Jiang ◽  
Jens M Hübner ◽  
McKenzie L Shaw ◽  
...  

Abstract Ependymoma represents a heterogeneous disease affecting the entire neuraxis. Extensive molecular profiling efforts have identified molecular ependymoma subgroups based on DNA methylation. However, the intratumoral heterogeneity and developmental origins of these groups are only partially understood, and effective treatments are still lacking for about 50% of patients with high-risk tumors. We interrogated the cellular architecture of ependymoma using single cell/nucleus RNA-sequencing to analyze 24 tumor specimens across major molecular subgroups and anatomic locations. We additionally analyzed ten patient-derived ependymoma cell models and two patient-derived xenografts (PDXs). Interestingly, we identified an analogous cellular hierarchy across all ependymoma groups, originating from undifferentiated neural stem cell-like populations towards different degrees of impaired differentiation states comprising neuronal precursor-like, astro-glial-like, and ependymal-like tumor cells. While prognostically favorable ependymoma groups predominantly harbored differentiated cell populations, aggressive groups were enriched for undifferentiated subpopulations. Projection of transcriptomic signatures onto an independent bulk RNA-seq cohort stratified patient survival even within known molecular groups, thus refining the prognostic power of DNA methylation-based profiling. Furthermore, we identified novel potentially druggable targets including IGF- and FGF-signaling within poorly prognostic transcriptional programs. Ependymoma-derived cell models/PDXs widely recapitulated the transcriptional programs identified within fresh tumors and are leveraged to validate identified target genes in functional follow-up analyses. Taken together, our analyses reveal a developmental hierarchy and transcriptomic context underlying the biologically and clinically distinct behavior of ependymoma groups. The newly characterized cellular states and underlying regulatory networks could serve as basis for future therapeutic target identification and reveal biomarkers for clinical trials.


2021 ◽  
Vol 13 (1) ◽  
Author(s):  
Chayaporn Suphavilai ◽  
Shumei Chia ◽  
Ankur Sharma ◽  
Lorna Tu ◽  
Rafael Peres Da Silva ◽  
...  

AbstractWhile understanding molecular heterogeneity across patients underpins precision oncology, there is increasing appreciation for taking intra-tumor heterogeneity into account. Based on large-scale analysis of cancer omics datasets, we highlight the importance of intra-tumor transcriptomic heterogeneity (ITTH) for predicting clinical outcomes. Leveraging single-cell RNA-seq (scRNA-seq) with a recommender system (CaDRReS-Sc), we show that heterogeneous gene-expression signatures can predict drug response with high accuracy (80%). Using patient-proximal cell lines, we established the validity of CaDRReS-Sc’s monotherapy (Pearson r>0.6) and combinatorial predictions targeting clone-specific vulnerabilities (>10% improvement). Applying CaDRReS-Sc to rapidly expanding scRNA-seq compendiums can serve as in silico screen to accelerate drug-repurposing studies. Availability: https://github.com/CSB5/CaDRReS-Sc.


2019 ◽  
Author(s):  
Ning Wang ◽  
Andrew E. Teschendorff

AbstractInferring the activity of transcription factors in single cells is a key task to improve our understanding of development and complex genetic diseases. This task is, however, challenging due to the relatively large dropout rate and noisy nature of single-cell RNA-Seq data. Here we present a novel statistical inference framework called SCIRA (Single Cell Inference of Regulatory Activity), which leverages the power of large-scale bulk RNA-Seq datasets to infer high-quality tissue-specific regulatory networks, from which regulatory activity estimates in single cells can be subsequently obtained. We show that SCIRA can correctly infer regulatory activity of transcription factors affected by high technical dropouts. In particular, SCIRA can improve sensitivity by as much as 70% compared to differential expression analysis and current state-of-the-art methods. Importantly, SCIRA can reveal novel regulators of cell-fate in tissue-development, even for cell-types that only make up 5% of the tissue, and can identify key novel tumor suppressor genes in cancer at single cell resolution. In summary, SCIRA will be an invaluable tool for single-cell studies aiming to accurately map activity patterns of key transcription factors during development, and how these are altered in disease.


Author(s):  
A. Nivaggioli ◽  
J. F. Hullo ◽  
G. Thibault

<p><strong>Abstract.</strong> Industrial companies often require complete inventories of their infrastructure. In many cases, a better inventory leads to a direct reduction of cost and uncertainty of engineering. While large scale panoramic surveys now allow these inventories to be performed remotely and reduce time on-site, the time and money required to visually segment the many types of components on thousands of high resolution panoramas can make the process infeasible. Recent studies have shown that deep learning techniques, namely deep neural networks, can accurately perform panoptic segmentation of <i>things</i> and <i>stuff</i> and hence be used to inventory the components of a picture. In order to train those deep architectures with specific industrial equipment, not available in public datasets, our approach uses an as-built 3D model of an industrial building to procedurally generate labels. Our results show that, despite the presence of errors during the generation of the dataset, our method is able to accurately perform panoptic segmentation on images of industrial scenes. In our testing, 80% of generated labels were correctly identified (non null intersection over union, i.e. true positive) by the panoptic segmentation, with great performance levels even for difficult classes, such as reflective heat insulators. We then visually investigated the 20% of true negative, and discovered that 80% were correctly segmented, but were counted as true negative because of errors in the dataset generation. Demonstrating this level of accuracy for panoptic segmentation on industrial panoramas for inventories also offers novel perspectives for 3D laser scan processing.</p>


Sign in / Sign up

Export Citation Format

Share Document