Immunology Driven by Large-Scale Single-Cell Sequencing

The infinite sites assumption, which states that every genomic position mutates at most once over the lifetime of a tumor, is central to current approaches for reconstructing mutation histories of tumors, but has never been tested explicitly. We developed a rigorous statistical framework to test the assumption with single-cell sequencing data. The framework accounts for the high noise and contamination present in such data. We found strong evidence for recurrent mutations at the same site in 8 out of 9 single-cell sequencing datasets from human tumors. Six cases involved the loss of earlier mutations, five of which occurred at sites unaffected by large scale genomic deletions. Two cases exhibited parallel mutation, including the dataset with the strongest evidence of recurrence. Our results refute the general validity of the infinite sites assumption and indicate that more complex models are needed to adequately quantify intra-tumor heterogeneity.

Download Full-text

Integrating single-cell datasets with ambiguous batch information by incorporating molecular network features

Briefings in Bioinformatics ◽

10.1093/bib/bbab366 ◽

2021 ◽

Author(s):

Ji Dong ◽

Peijie Zhou ◽

Yichong Wu ◽

Yidong Chen ◽

Haoling Xie ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Developmental Stages ◽

Rapid Development ◽

Molecular Network ◽

Rna Seq ◽

Single Cell Sequencing ◽

The World ◽

Information Score ◽

Simple Network

Abstract With the rapid development of single-cell sequencing techniques, several large-scale cell atlas projects have been launched across the world. However, it is still challenging to integrate single-cell RNA-seq (scRNA-seq) datasets with diverse tissue sources, developmental stages and/or few overlaps, due to the ambiguity in determining the batch information, which is particularly important for current batch-effect correction methods. Here, we present SCORE, a simple network-based integration methodology, which incorporates curated molecular network features to infer cellular states and generate a unified workflow for integrating scRNA-seq datasets. Validating on real single-cell datasets, we showed that regardless of batch information, SCORE outperforms existing methods in accuracy, robustness, scalability and data integration.

Download Full-text

Large-scale simultaneous measurement of epitopes and transcriptomes in single cells

10.1101/113068 ◽

2017 ◽

Cited By ~ 10

Author(s):

Marlon Stoeckius ◽

Christoph Hafemeister ◽

William Stephenson ◽

Brian Houck-Loomis ◽

Pratip K. Chattopadhyay ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Single Cells ◽

Cellular Proteins ◽

Surface Markers ◽

Cell Surface Markers ◽

Complex Cell ◽

Single Cell Sequencing ◽

Protein Levels ◽

Phenotypic Information

Recent high-throughput single-cell sequencing approaches have been transformative for understanding complex cell populations, but are unable to provide additional phenotypic information, such as protein levels of cell-surface markers. Using oligonucleotide-labeled antibodies, we integrate measurements of cellular proteins and transcriptomes into an efficient, sequencing-based readout of single cells. This method is compatible with existing single-cell sequencing approaches and will readily scale as the throughput of these methods increase.

Download Full-text

RobustClone: A robust PCA method of tumor clone and evolution inference from single-cell sequencing data

10.1101/666271 ◽

2019 ◽

Cited By ~ 1

Author(s):

Ziwei Chen ◽

Fuzhou Gong ◽

Liang Ma ◽

Lin Wan

Keyword(s):

Single Cell ◽

Large Scale ◽

Principal Component ◽

Low Rank ◽

Breast Cancer Dataset ◽

Sequencing Data ◽

Cancer Dataset ◽

Large Reservoir ◽

Single Cell Sequencing ◽

Model Free

AbstractSingle-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and build phylogenetic relationships of tumor cells/clones. However, high technical errors bring much noise into the genetic data, thus limiting the application of evolutionary tools in the large reservoir. To recover the low-dimensional subspace of tumor subpopulations from error-prone SCS data in the presence of corrupted and/or missing elements, we developed an efficient computational framework, termed RobustClone, to recover the true genotypes of subclones based on the low-rank matrix factorization method of extended robust principal component analysis (RPCA) and reconstruct the subclonal evolutionary tree. RobustClone is a model-free method, fast and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods, both in accuracy and efficiency. We further validated RobustClone on 2 single-cell SNV and 2 single-cell CNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. RobustClone software is available at https://github.com/ucasdp/RobustClone.

Download Full-text

RobustClone: a robust PCA method for tumor clone and evolution inference from single-cell sequencing data

Bioinformatics ◽

10.1093/bioinformatics/btaa172 ◽

2020 ◽

Vol 36 (11) ◽

pp. 3299-3306

Author(s):

Ziwei Chen ◽

Fuzhou Gong ◽

Lin Wan ◽

Liang Ma

Keyword(s):

Single Cell ◽

Large Scale ◽

Clonal Evolution ◽

Low Rank ◽

Supplementary Information ◽

Breast Cancer Dataset ◽

Sequencing Data ◽

Cancer Dataset ◽

Single Cell Sequencing ◽

Model Free

Abstract Motivation Single-cell sequencing (SCS) data provide unprecedented insights into intratumoral heterogeneity. With SCS, we can better characterize clonal genotypes and reconstruct phylogenetic relationships of tumor cells/clones. However, SCS data are often error-prone, making their computational analysis challenging. Results To infer the clonal evolution in tumor from the error-prone SCS data, we developed an efficient computational framework, termed RobustClone. It recovers the true genotypes of subclones based on the extended robust principal component analysis, a low-rank matrix decomposition method, and reconstructs the subclonal evolutionary tree. RobustClone is a model-free method, which can be applied to both single-cell single nucleotide variation (scSNV) and single-cell copy-number variation (scCNV) data. It is efficient and scalable to large-scale datasets. We conducted a set of systematic evaluations on simulated datasets and demonstrated that RobustClone outperforms state-of-the-art methods in large-scale data both in accuracy and efficiency. We further validated RobustClone on two scSNV and two scCNV datasets and demonstrated that RobustClone could recover genotype matrix and infer the subclonal evolution tree accurately under various scenarios. In particular, RobustClone revealed the spatial progression patterns of subclonal evolution on the large-scale 10X Genomics scCNV breast cancer dataset. Availability and implementation RobustClone software is available at https://github.com/ucasdp/RobustClone. Contact [email protected] or [email protected] Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

PTX3 Mediates the Infiltration, Migration, and M2-Polarization of Macrophages in Glioblastoma by Large-Scale Single Cell Sequencing Analysis and in vitro Experiments

SSRN Electronic Journal ◽

10.2139/ssrn.3935664 ◽

2021 ◽

Author(s):

Hao Zhang ◽

Yifan Wang ◽

Yihan Zhao ◽

Tao Liu ◽

Zeyu Wang ◽

...

Keyword(s):

Single Cell ◽

Large Scale ◽

Sequencing Analysis ◽

In Vitro Experiments ◽

Single Cell Sequencing ◽

M2 Polarization

Download Full-text

Accurate Single-Cell Clustering through Ensemble Similarity Learning

Genes ◽

10.3390/genes12111670 ◽

2021 ◽

Vol 12 (11) ◽

pp. 1670

Author(s):

Hyundoo Jeong ◽

Sungtae Shin ◽

Hong-Gi Yeom

Keyword(s):

Single Cell ◽

Large Scale ◽

Clustering Algorithm ◽

Accurate Estimation ◽

Artificial Noise ◽

Similarity Learning ◽

Cell Clusters ◽

Single Cell Sequencing ◽

Cell Clustering ◽

Depth Analysis

Single-cell sequencing provides novel means to interpret the transcriptomic profiles of individual cells. To obtain in-depth analysis of single-cell sequencing, it requires effective computational methods to accurately predict single-cell clusters because single-cell sequencing techniques only provide the transcriptomic profiles of each cell. Although an accurate estimation of the cell-to-cell similarity is an essential first step to derive reliable single-cell clustering results, it is challenging to obtain the accurate similarity measurement because it highly depends on a selection of genes for similarity evaluations and the optimal set of genes for the accurate similarity estimation is typically unknown. Moreover, due to technical limitations, single-cell sequencing includes a larger number of artificial zeros, and the technical noise makes it difficult to develop effective single-cell clustering algorithms. Here, we describe a novel single-cell clustering algorithm that can accurately predict single-cell clusters in large-scale single-cell sequencing by effectively reducing the zero-inflated noise and accurately estimating the cell-to-cell similarities. First, we construct an ensemble similarity network based on different similarity estimates, and reduce the artificial noise using a random walk with restart framework. Finally, starting from a larger number small size but highly consistent clusters, we iteratively merge a pair of clusters with the maximum similarities until it reaches the predicted number of clusters. Extensive performance evaluation shows that the proposed single-cell clustering algorithm can yield the accurate single-cell clustering results and it can help deciphering the key messages underlying complex biological mechanisms.

Download Full-text

Graph Drawing-based Dimensionality Reduction to Identify Hidden Communities in Single-Cell Sequencing Spatial Representation

10.1101/2020.05.05.078550 ◽

2020 ◽

Author(s):

Alireza Khodadadi-Jamayran ◽

Aristotelis Tsirigos

Keyword(s):

Dimensionality Reduction ◽

Single Cell ◽

Large Scale ◽

Graph Drawing ◽

Dimensional Space ◽

K Nearest Neighbor ◽

Network Graph ◽

Gene Expressions ◽

Single Cell Sequencing ◽

Spring Force

SUMMARYWith the rapid growth of single cell sequencing technologies, finding cell communities with high accuracy has become crucial for large scale projects. Employing the current commonly used dimensionality reduction techniques such as tSNE and UMAP, it is often difficult to clearly distinguish cell communities in high dimensional space. Usually cell communities with similar origin and trajectories cluster so closely to each that their subtle but important differences do not become readily apparent. This creates a problem for clustering, as clustering is also performed on dimensionality reduction results. In order to identify such communities, scientists either perform broad clustering and then extract each cluster and perform re-clustering to identify sub-populations or they over-cluster the data and then merging the clusters with similar gene expressions. This is an incredibly cumbersome and time-consuming process. To solve this problem, we propose K-nearest-neighbor-based Network graph drawing Layout (KNetL, pronounced like ‘nettle’) for dimensionality reduction. In our method, we use force-directed graph drawing, whereby the attractive force (analogous to a spring force) and the repulsive force (analogous to an electrical force in atomic particles) between the cells are evaluated, and the cell communities are organized in a structural visualization. The coordinates of the force-compacted nodes are then extracted, and we employ dimensionality reduction methods, such as tSNE and UMAP to unpack the nodes. The final plot, a KNetL map, shows a visually-appealing and distinctive separation between cell communities. Our results show that KNetL maps bring significant resolution to visualizing and identifying otherwise hidden cell communities. All the algorithms are implemented in the iCellR package and available through the CRAN repository. Single (i) Cell R package (iCellR) provides great flexibility at every step of the analysis pipeline, including normalization, clustering, dimensionality reduction, interactive 2D and 3D visualizations, batch alignment or data integration, imputation, and interactive cell gating tools, which allow users to manually gate around the cells.

Download Full-text