scholarly journals cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices

2021 ◽  
Author(s):  
Colin Megill ◽  
Bruce Martin ◽  
Charlotte Weaver ◽  
Sidney Bell ◽  
Lia Prins ◽  
...  

Quickly and flexibly exploring high-dimensional datasets, such as scRNAseq data, is underserved but critical for hypothesis generation, dataset annotation, publication, sharing, and community reuse. cellxgene is a highly generalizable, web-based interface for exploring high dimensional datasets along categorical, continuous and spatial dimensions, as well as feature annotation. cellxgene is differentiated by its ability to performantly handle millions of observations, and bridges a critical gap by enabling computational and experimental biologists to iteratively ask questions of private and public datasets. In doing so, cellxgene increases the utility and reusability of datasets across the single-cell ecosystem. The codebase can be accessed at https://github.com/chanzuckerberg/cellxgene. For questions and inquiries, please contact [email protected].

2017 ◽  
Author(s):  
Christine P’ng ◽  
Jeffrey Green ◽  
Lauren C. Chong ◽  
Daryl Waggott ◽  
Stephenie D. Prokopec ◽  
...  

AbstractWe introduce BPG, an easy-to-use framework for generating publication-quality, highly-customizable plots in the R statistical environment. This open-source package includes novel methods of displaying high-dimensional datasets and facilitates generation of complex multi-panel figures, making it ideal for complex datasets. A web-based interactive tool allows online figure customization, from which R code can be downloaded for seamless integration with computational pipelines. BPG is available at http://labs.oicr.on.ca/boutros-lab/software/bpg


2016 ◽  
Author(s):  
Robert Verity ◽  
Caitlin Collins ◽  
Daren C. Card ◽  
Sara M. Schaal ◽  
Liuyang Wang ◽  
...  

AbstractGenome scans are widely used to identify “outliers” in genomic data: loci with different patterns compared with the rest of the genome due to the action of selection or other non-adaptive forces of evolution. These genomic datasets are often high-dimensional, with complex correlation structures among variables, making it a challenge to identify outliers in a robust way. The Mahalanobis distance has been widely used for this purpose, but has the major limitation of assuming that data follow a simple parametric distribution. Here we develop three new metrics that can be used to identify outliers in multivariate space, while making no strong assumptions about the distribution of the data. These metrics are implemented in the R package MINOTAUR, which also includes an interactive web-based application for visualizing outliers in high-dimensional datasets. We illustrate how these metrics can be used to identify outliers from simulated genetic data, and discuss some of the limitations they may face in application.


Author(s):  
Thomas Myles Ashhurst ◽  
Felix Marsh-Wakefield ◽  
Givanna Haryono Putri ◽  
Alanna Gabrielle Spiteri ◽  
Diana Shinko ◽  
...  

ABSTRACTAs the size and complexity of high-dimensional cytometry data continue to expand, comprehensive, scalable, and methodical computational analysis approaches are essential. Yet, contemporary clustering and dimensionality reduction tools alone are insufficient to analyze or reproduce analyses across large numbers of samples, batches, or experiments. Moreover, approaches that allow for the integration of data across batches or experiments are not well incorporated into computational toolkits to allow for streamlined workflows. Here we present Spectre, an R package that enables comprehensive end-to-end integration and analysis of high-dimensional cytometry data from different batches or experiments. Spectre streamlines the analytical stages of raw data pre-processing, batch alignment, data integration, clustering, dimensionality reduction, visualization and population labelling, as well as quantitative and statistical analysis. Critically, the fundamental data structures used within Spectre, along with the implementation of machine learning classifiers, allow for the scalable analysis of very large high-dimensional datasets, generated by flow cytometry, mass cytometry (CyTOF), or spectral cytometry. Using open and flexible data structures, Spectre can also be used to analyze data generated by single-cell RNA sequencing (scRNAseq) or high-dimensional imaging technologies, such as Imaging Mass Cytometry (IMC). The simple, clear, and modular design of analysis workflows allow these tools to be used by bioinformaticians and laboratory scientists alike. Spectre is available as an R package or Docker container. R code is available on Github (https://github.com/immunedynamics/spectre).


2020 ◽  
Vol 8 (Suppl 3) ◽  
pp. A822-A822
Author(s):  
Sri Krishna ◽  
Frank Lowery ◽  
Amy Copeland ◽  
Stephanie Goff ◽  
Grégoire Altan-Bonnet ◽  
...  

BackgroundAdoptive T cell therapy (ACT) utilizing ex vivo-expanded autologous tumor infiltrating lymphocytes (TILs) can result in complete regression of human cancers.1 Successful immunotherapy is influenced by several tumor-intrinsic factors.2 3 Recently, T cell-intrinsic factors have been associated with immunotherapy response in murine and human studies.4 5 Analyses of tumor-reactive TILs have concluded that anti-tumor neoantigen-specific TILs are enriched in subsets defined by the expression of PD-1 or CD39.6 7 Thus, there is a lack of consensus regarding the tumor-reactive TIL subset that is directly responsible for successful immunotherapies such as ICB and ACT. In this study, we attempted to define the fitness landscape of TIL-enriched infusion products to specifically understand its phenotypic impact on human immunotherapy responses.MethodsWe compared the phenotypic differences that could distinguish bulk ACT infusion products (I.P.) administered to patients who had complete response to therapy (complete responders, CRs, N = 24) from those whose disease progressed following ACT (non-responders, NRs, N = 30) by high dimensional single cell protein and RNA analysis of the I.P. We further analyzed the phenotypic states of anti-tumor neoantigen specific TILs from patient I.P (N = 26) by flow cytometry and single cell transcriptomics.ResultsWe identified two CD8+ TIL populations associated with clinical outcomes: a memory-progenitor CD39-negative stem-like TIL (CD39-CD69-) in the I.P. associated with complete cancer regression (overall survival, P < 0.0001, HR = 0.217, 95% CI 0.101 to 0.463) and TIL persistence, and a terminally differentiated CD39-positive TIL (CD39+CD69+) population associated with poor TIL persistence post-treatment. Although the majority (>65%) of neoantigen-reactive TILs in both responders and non-responders to ACT were found in the differentiated CD39+ state, CR infusion products also contained a pool of CD39- stem-like neoantigen-specific TILs (median = 8.8%) that was lacking in NR infusion products (median = 23.6%, P = 1.86 x 10-5). Tumor-reactive stem-like T cells were capable of self-renewal, expansion, and persistence, and mediated superior anti-tumor response in vivo.ConclusionsOur results support the hypothesis that responders to ACT received infusion products containing a pool of stem-like neoantigen-specific TILs that are able to undergo prolific expansion, give rise to differentiated subsets, and mediate long-term tumor control and T cell persistence, in line with recent murine ICB studies mediated by TCF+ progenitor T cells.4 5 Our data also suggest that TIL subsets mediating ACT-response (stem-like CD39-) might be distinct from TIL subsets enriched for anti-tumor-reactivity (terminally differentiated CD39+) in human TIL.6 7AcknowledgementsWe thank Don White for curating the melanoma patient cohort, and J. Panopoulos (Flowjo) for helpful discussions on high-dimensional analysis, and NCI Surgery Branch members for helpful insights and suggestions. S. Krishna acknowledges funding support from NCI Director’s Innovation Award from the National Cancer Institute.Trial RegistrationNAEthics ApprovalThe study was approved by NCI’s IRB ethics board.ReferencesGoff SL, et al. Randomized, prospective evaluation comparing intensity of lymphodepletion before adoptive transfer of tumor-infiltrating lymphocytes for patients with metastatic melanoma. J Clin Oncol 2016;34:2389–2397.Snyder A, et al. Genetic basis for clinical response to CTLA-4 blockade in melanoma. N Engl J Med 2014;371:2189–2199.McGranahan N, et al. Clonal neoantigens elicit T cell immunoreactivity and sensitivity to immune checkpoint blockade. Science 2016;351:1463–1469.Sade-Feldman M, et al. Defining T cell states associated with response to checkpoint immunotherapy in melanoma. Cell 2019;176:404.Miller BC, et al. Subsets of exhausted CD8 T cells differentially mediate tumor control and respond to checkpoint blockade. Nat. Immunol 2019;20:326–336.Simoni Y, et al. Bystander CD8 T cells are abundant and phenotypically distinct in human tumour infiltrates. Nature 2018;557:575–579.Gros A, et al. PD-1 identifies the patient-specific CD8+ tumor-reactive repertoire infiltrating human tumors. J Clin Invest 2014;124:2246–2259.


2021 ◽  
Vol 18 (1) ◽  
Author(s):  
Christos Nikolaou ◽  
Kerstin Muehle ◽  
Stephan Schlickeiser ◽  
Alberto Sada Japp ◽  
Nadine Matzmohr ◽  
...  

An amendment to this paper has been published and can be accessed via the original article.


Sensors ◽  
2021 ◽  
Vol 21 (9) ◽  
pp. 3240
Author(s):  
Tehreem Syed ◽  
Vijay Kakani ◽  
Xuenan Cui ◽  
Hakil Kim

In recent times, the usage of modern neuromorphic hardware for brain-inspired SNNs has grown exponentially. In the context of sparse input data, they are undertaking low power consumption for event-based neuromorphic hardware, specifically in the deeper layers. However, using deep ANNs for training spiking models is still considered as a tedious task. Until recently, various ANN to SNN conversion methods in the literature have been proposed to train deep SNN models. Nevertheless, these methods require hundreds to thousands of time-steps for training and still cannot attain good SNN performance. This work proposes a customized model (VGG, ResNet) architecture to train deep convolutional spiking neural networks. In this current study, the training is carried out using deep convolutional spiking neural networks with surrogate gradient descent backpropagation in a customized layer architecture similar to deep artificial neural networks. Moreover, this work also proposes fewer time-steps for training SNNs with surrogate gradient descent. During the training with surrogate gradient descent backpropagation, overfitting problems have been encountered. To overcome these problems, this work refines the SNN based dropout technique with surrogate gradient descent. The proposed customized SNN models achieve good classification results on both private and public datasets. In this work, several experiments have been carried out on an embedded platform (NVIDIA JETSON TX2 board), where the deployment of customized SNN models has been extensively conducted. Performance validations have been carried out in terms of processing time and inference accuracy between PC and embedded platforms, showing that the proposed customized models and training techniques are feasible for achieving a better performance on various datasets such as CIFAR-10, MNIST, SVHN, and private KITTI and Korean License plate dataset.


2020 ◽  
Vol 36 (Supplement_2) ◽  
pp. i610-i617
Author(s):  
Mohammad Lotfollahi ◽  
Mohsen Naghipourfar ◽  
Fabian J Theis ◽  
F Alexander Wolf

Abstract Motivation While generative models have shown great success in sampling high-dimensional samples conditional on low-dimensional descriptors (stroke thickness in MNIST, hair color in CelebA, speaker identity in WaveNet), their generation out-of-distribution poses fundamental problems due to the difficulty of learning compact joint distribution across conditions. The canonical example of the conditional variational autoencoder (CVAE), for instance, does not explicitly relate conditions during training and, hence, has no explicit incentive of learning such a compact representation. Results We overcome the limitation of the CVAE by matching distributions across conditions using maximum mean discrepancy in the decoder layer that follows the bottleneck. This introduces a strong regularization both for reconstructing samples within the same condition and for transforming samples across conditions, resulting in much improved generalization. As this amount to solving a style-transfer problem, we refer to the model as transfer VAE (trVAE). Benchmarking trVAE on high-dimensional image and single-cell RNA-seq, we demonstrate higher robustness and higher accuracy than existing approaches. We also show qualitatively improved predictions by tackling previously problematic minority classes and multiple conditions in the context of cellular perturbation response to treatment and disease based on high-dimensional single-cell gene expression data. For generic tasks, we improve Pearson correlations of high-dimensional estimated means and variances with their ground truths from 0.89 to 0.97 and 0.75 to 0.87, respectively. We further demonstrate that trVAE learns cell-type-specific responses after perturbation and improves the prediction of most cell-type-specific genes by 65%. Availability and implementation The trVAE implementation is available via github.com/theislab/trvae. The results of this article can be reproduced via github.com/theislab/trvae_reproducibility.


2021 ◽  
Vol 22 (1) ◽  
Author(s):  
Van Hoan Do ◽  
Stefan Canzar

AbstractEmerging single-cell technologies profile multiple types of molecules within individual cells. A fundamental step in the analysis of the produced high-dimensional data is their visualization using dimensionality reduction techniques such as t-SNE and UMAP. We introduce j-SNE and j-UMAP as their natural generalizations to the joint visualization of multimodal omics data. Our approach automatically learns the relative contribution of each modality to a concise representation of cellular identity that promotes discriminative features but suppresses noise. On eight datasets, j-SNE and j-UMAP produce unified embeddings that better agree with known cell types and that harmonize RNA and protein velocity landscapes.


Sign in / Sign up

Export Citation Format

Share Document