Predicting cellular position in the Drosophila embryo from Single-Cell Transcriptomics data

AbstractSingle-cell RNA-seq technologies are rapidly evolving but while very informative, in standard scRNAseq experiments the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to keep the localization of the cells have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To bridge the gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as gold standard genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize rare subpopulations of cells. Selection of predictor genes was essential for this task and such genes showed a relatively high expression entropy, high spatial clustering and the presence of prominent developmental genes such as gap and pair-ruled genes and tissue defining markers.

Download Full-text

Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data

Life Science Alliance ◽

10.26508/lsa.202000867 ◽

2020 ◽

Vol 3 (11) ◽

pp. e202000867 ◽

Cited By ~ 1

Author(s):

Jovan Tanevski ◽

Thin Nguyen ◽

Buu Truong ◽

Nikos Karaiskos ◽

Mehmet Eren Ahsen ◽

...

Keyword(s):

Single Cell ◽

Spatial Organization ◽

Gene Selection ◽

Spatial Information ◽

Spatial Clustering ◽

Spatial Location ◽

Spatial Arrangement ◽

Fish Embryo ◽

Hybridization Data ◽

Silver Standard

Single-cell RNA-sequencing (scRNAseq) technologies are rapidly evolving. Although very informative, in standard scRNAseq experiments, the spatial organization of the cells in the tissue of origin is lost. Conversely, spatial RNA-seq technologies designed to maintain cell localization have limited throughput and gene coverage. Mapping scRNAseq to genes with spatial information increases coverage while providing spatial location. However, methods to perform such mapping have not yet been benchmarked. To fill this gap, we organized the DREAM Single-Cell Transcriptomics challenge focused on the spatial reconstruction of cells from the Drosophila embryo from scRNAseq data, leveraging as silver standard, genes with in situ hybridization data from the Berkeley Drosophila Transcription Network Project reference atlas. The 34 participating teams used diverse algorithms for gene selection and location prediction, while being able to correctly localize clusters of cells. Selection of predictor genes was essential for this task. Predictor genes showed a relatively high expression entropy, high spatial clustering and included prominent developmental genes such as gap and pair-rule genes and tissue markers. Application of the top 10 methods to a zebra fish embryo dataset yielded similar performance and statistical properties of the selected genes than in the Drosophila data. This suggests that methods developed in this challenge are able to extract generalizable properties of genes that are useful to accurately reconstruct the spatial arrangement of cells in tissues.

Download Full-text

Feature Selection for Topological Proximity Prediction of Single-Cell Transcriptomic Profiles in Drosophila Embryo Using Genetic Algorithm

Genes ◽

10.3390/genes12010028 ◽

2020 ◽

Vol 12 (1) ◽

pp. 28

Author(s):

Shruti Gupta ◽

Ajay Kumar Verma ◽

Shandar Ahmad

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Single Cell ◽

Gene Selection ◽

Spatial Information ◽

Single Cells ◽

Drosophila Embryo ◽

Main Challenge ◽

Selection For

Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.

Download Full-text

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2021.10041500 ◽

2021 ◽

Vol 24 (5) ◽

pp. 495

Author(s):

Jie Zhang ◽

Junhong Feng ◽

Xiani Yang ◽

Jianming Liu

Keyword(s):

Single Cell ◽

Gene Selection ◽

Information Gain ◽

Fruit Fly ◽

Rna Seq ◽

Gain Ratio ◽

Optimisation Algorithm ◽

Information Gain Ratio ◽

Combining Information

Download Full-text

sc-REnF:An entropy guided robust feature selection for clustering of single-cell rna-seq data

10.1101/2020.10.10.334573 ◽

2020 ◽

Author(s):

Snehalika Lall ◽

Abhik Ghosh ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Rna Seq ◽

Technical Noise ◽

Marker Selection ◽

Cell Clustering ◽

Typing Methods ◽

Original Application ◽

Downstream Analysis ◽

Cell Typing

ABSTRACTMany single-cell typing methods require pure clustering of cells, which is susceptible towards the technical noise, and heavily dependent on high quality informative genes selected in the preliminary steps of downstream analysis. Techniques for gene selection in single-cell RNA sequencing (scRNA-seq) data are seemingly simple which casts problems with respect to the resolution of (sub-)types detection, marker selection and ultimately impacts towards cell annotation. We introduce sc-REnF, a novel and robust entropy based feature (gene) selection method, which leverages the landmark advantage of ‘Renyi’ and ‘Tsallis’ entropy achieved in their original application, in single cell clustering. Thereby, gene selection is robust and less sensitive towards the technical noise present in the data, producing a pure clustering of cells, beyond classifying independent and unknown sample with utmost accuracy. The corresponding software is available at: https://github.com/Snehalikalall/sc-REnF

Download Full-text

RgCop-A regularized copula based method for gene selection in single cell rna-seq data

PLoS Computational Biology ◽

10.1371/journal.pcbi.1009464 ◽

2021 ◽

Vol 17 (10) ◽

pp. e1009464

Author(s):

Snehalika Lall ◽

Sumanta Ray ◽

Sanghamitra Bandyopadhyay

Keyword(s):

Single Cell ◽

Gene Selection ◽

Real Life ◽

Classification Performance ◽

Rna Seq ◽

Scale Invariant ◽

Dependence Measure ◽

Highly Expressed Genes ◽

The Stability ◽

Downstream Analysis

Gene selection in unannotated large single cell RNA sequencing (scRNA-seq) data is important and crucial step in the preliminary step of downstream analysis. The existing approaches are primarily based on high variation (highly variable genes) or significant high expression (highly expressed genes) failed to provide stable and predictive feature set due to technical noise present in the data. Here, we propose RgCop, a novel regularized copula based method for gene selection from large single cell RNA-seq data. RgCop utilizes copula correlation (Ccor), a robust equitable dependence measure that captures multivariate dependency among a set of genes in single cell expression data. We raise an objective function by adding a l1 regularization term with Ccor to penalizes the redundant co-efficient of features/genes, resulting non-redundant effective features/genes set. Results show a significant improvement in the clustering/classification performance of real life scRNA-seq data over the other state-of-the-art. RgCop performs extremely well in capturing dependence among the features of noisy data due to the scale invariant property of copula, thereby improving the stability of the method. Moreover, the differentially expressed (DE) genes identified from the clusters of scRNA-seq data are found to provide an accurate annotation of cells. Finally, the features/genes obtained from RgCop can able to annotate the unknown cells with high accuracy.

Download Full-text

Gene selection and classification combining information gain ratio with fruit fly optimisation algorithm for single-cell RNA-seq data

International Journal of Computational Science and Engineering ◽

10.1504/ijcse.2021.118098 ◽

2021 ◽

Vol 24 (5) ◽

pp. 495

Author(s):

Jie Zhang ◽

Junhong Feng ◽

Xiani Yang ◽

Jianming Liu

Keyword(s):

Single Cell ◽

Gene Selection ◽

Information Gain ◽

Fruit Fly ◽

Rna Seq ◽

Gain Ratio ◽

Optimisation Algorithm ◽

Information Gain Ratio ◽

Combining Information

Download Full-text

Gene Selection for Single-Cell RNA-Seq Data Based on Information Gain and Genetic Algorithm

2018 14th International Conference on Computational Intelligence and Security (CIS) ◽

10.1109/cis2018.2018.00021 ◽

2018 ◽

Author(s):

Jie Zhang ◽

Junhong Feng

Keyword(s):

Genetic Algorithm ◽

Single Cell ◽

Gene Selection ◽

Information Gain ◽

Rna Seq ◽

Selection For

Download Full-text

Spatial mapping of single cells in the Drosophila embryo from transcriptomic data based on topological consistency

F1000Research ◽

10.12688/f1000research.24163.1 ◽

2020 ◽

Vol 9 ◽

pp. 1014

Author(s):

Maryam Zand ◽

Jianhua Ruan

Keyword(s):

Single Cell ◽

Gold Standard ◽

Gene Selection ◽

Single Cells ◽

Expression Patterns ◽

Drosophila Embryo ◽

Weighted Averaging ◽

Marker Genes ◽

Spatial Mapping ◽

Successful Recovery

The advancement in single-cell RNA sequencing technologies allow us to obtain transcriptome at single cell resolution. However, the original spatial context of cells, a crucial knowledge for understanding cellular and tissue-level functions, is often lost during sequencing. To address this issue, the DREAM Single Cell Transcriptomics Challenge launched a community-wide effort to seek computational solutions for spatial mapping of single cells in tissues using single-cell RNAseq (scRNA-seq) data and a reference atlas obtained from in situ hybridization data. As a top-performing team in this competition, we approach this problem in three steps. The first step involves identifying a set of most informative genes based on the consistency between gene expression similarity and cell proximity. For this step, we propose two different approaches, i.e., an unsupervised approach that does not utilize the gold standard location of the cells provided by the challenge organizers, and a supervised approach that relies on the gold standard locations. In the second step, a Particle Swarm Optimization algorithm is used to optimize the weights of different genes in order to maximize matches between the predicted locations and the gold standard locations. Finally, the information embedded in the cell topology is used to improve the predicted cell-location scores by weighted averaging of scores from neighboring locations. Evaluation results based on DREAM scores show that our method accurately predicts the location of single cells, and the predictions lead to successful recovery of the spatial expression patterns for most of landmark genes. In addition, investigating the selected genes demonstrates that most predictive genes are cluster specific, and stable across our supervised and unsupervised gene selection frameworks. Overall, the promising results obtained by our methods in DREAM challenge demonstrated that topological consistency is a useful concept in identifying marker genes and constructing predictive models for spatial mapping of single cells.

Download Full-text

Spatial charting of single cell transcriptomes in tissues

10.1101/2021.11.24.469915 ◽

2021 ◽

Author(s):

Nicholas Navin ◽

Runmin Wei ◽

Siyuan He ◽

Shanshan Bai ◽

Emi Sei ◽

...

Keyword(s):

Single Cell ◽

Spatial Organization ◽

Spatial Information ◽

Ductal Carcinoma ◽

Single Cells ◽

Cell Types ◽

Spatial Structures ◽

Tissue Sections ◽

Normal Mouse Brain

Single cell RNA sequencing (scRNA-seq) methods can profile the transcriptomes of single cells but cannot preserve spatial information. Conversely, spatial transcriptomics (ST) assays can profile spatial regions in tissue sections, but do not have single cell genomic resolution. Here, we developed a computational approach called SChart, that combines these two datasets to achieve single cell spatial mapping of cell types, cell states and continuous phenotypes. We applied SChart to reconstruct cellular spatial structures in existing datasets from normal mouse brain and kidney tissues to validate our approach. We also performed scRNA-seq and ST experiments on two ductal carcinoma in situ (DCIS) tissues and applied SChart to identify subclones that were restricted to different ducts, and specific T cell states adjacent to the tumor areas. Our data shows that SChart can accurately map single cells in diverse tissue types to resolve their spatial organization into cellular neighborhoods and tissue structures.

Download Full-text

STACAS: Sub-Type Anchor Correction for Alignment in Seurat to integrate single-cell RNA-seq data

10.1101/2020.06.15.152306 ◽

2020 ◽

Cited By ~ 1

Author(s):

Massimo Andreatta ◽

Santiago J. Carmona

Keyword(s):

Single Cell ◽

Distance Measure ◽

Cell Types ◽

R Package ◽

Rna Seq ◽

Batch Effects ◽

Link Type ◽

Transcriptomics Data ◽

Public Repositories ◽

Cell Data

AbstractComputational tools for the integration of single-cell transcriptomics data are designed to correct batch effects between technical replicates or different technologies applied to the same population of cells. However, they have inherent limitations when applied to heterogeneous sets of data with moderate overlap in cell states or sub-types. STACAS is a package for the identification of integration anchors in the Seurat environment, optimized for the integration of datasets that share only a subset of cell types. We demonstrate that by i) correcting batch effects while preserving relevant biological variability across datasets, ii) filtering aberrant integration anchors with a quantitative distance measure, and iii) constructing optimal guide trees for integration, STACAS can accurately align scRNA-seq datasets composed of only partially overlapping cell populations. We anticipate that the algorithm will be a useful tool for the construction of comprehensive single-cell atlases by integration of the growing amount of single-cell data becoming available in public repositories.Code availabilityR package:https://github.com/carmonalab/STACASDocker image:https://hub.docker.com/repository/docker/mandrea1/stacas_demo

Download Full-text