Single-cell manifold-preserving feature selection for detecting rare cell populations

Abstract Motivation Most genomes contain thousands of genes, but for most functional responses, only a subset of those genes are relevant. To facilitate many single-cell RNASeq (scRNASeq) analyses the set of genes is often reduced through feature selection, i.e. by removing genes only subject to technical noise. Results We present M3Drop, an R package that implements popular existing feature selection methods and two novel methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show these new methods outperform existing methods on simulated and real datasets. Availability and implementation M3Drop is freely available on github as an R package and is compatible with other popular scRNASeq tools: https://github.com/tallulandrews/M3Drop. Supplementary information Supplementary data are available at Bioinformatics online.

Download Full-text

Feature Selection for Topological Proximity Prediction of Single-Cell Transcriptomic Profiles in Drosophila Embryo Using Genetic Algorithm

Genes ◽

10.3390/genes12010028 ◽

2020 ◽

Vol 12 (1) ◽

pp. 28

Author(s):

Shruti Gupta ◽

Ajay Kumar Verma ◽

Shandar Ahmad

Keyword(s):

Genetic Algorithm ◽

Feature Selection ◽

Single Cell ◽

Gene Selection ◽

Spatial Information ◽

Single Cells ◽

Drosophila Embryo ◽

Main Challenge ◽

Selection For

Single-cell transcriptomics data, when combined with in situ hybridization patterns of specific genes, can help in recovering the spatial information lost during cell isolation. Dialogue for Reverse Engineering Assessments and Methods (DREAM) consortium conducted a crowd-sourced competition known as DREAM Single Cell Transcriptomics Challenge (SCTC) to predict the masked locations of single cells from a set of 60, 40 and 20 genes out of 84 in situ gene patterns known in Drosophila embryo. We applied a genetic algorithm (GA) to predict the most important genes that carry positional and proximity information of the single-cell origins, in combination with the base distance mapping algorithm DistMap. Resulting gene selection was found to perform well and was ranked among top 10 in two of the three sub-challenges. However, the details of the method did not make it to the main challenge publication, due to an intricate aggregation ranking. In this work, we discuss the detailed implementation of GA and its post-challenge parameterization, with a view to identify potential areas where GA-based approaches of gene-set selection for topological association prediction may be improved, to be more effective. We believe this work provides additional insights into the feature-selection strategies and their relevance to single-cell similarity prediction and will form a strong addendum to the recently published work from the consortium.

Download Full-text

Ensemble Feature Selection for Single Cell Chromatin Conformation Analysis

10.1145/3473258.3473290 ◽

2021 ◽

Author(s):

Amirreza Rouhi ◽

Luca Nanni ◽

Arif Canakoglu ◽

Pietro Pinoli ◽

Stefano Ceri

Keyword(s):

Feature Selection ◽

Single Cell ◽

Chromatin Conformation ◽

Conformation Analysis ◽

Selection For

Download Full-text

Dropout-based feature selection for scRNASeq

10.1101/065094 ◽

2016 ◽

Cited By ~ 14

Author(s):

Tallulah S. Andrews ◽

Martin Hemberg

Keyword(s):

Feature Selection ◽

Dimensionality Reduction ◽

Single Cell ◽

Relevant Information ◽

Features Selection ◽

Technical Noise ◽

Biologically Relevant ◽

Selection For ◽

Cell Expression ◽

Variable Genes

AbstractFeatures selection is a key step in many single-cell RNASeq (scRNASeq) analyses. Feature selection is intended to preserve biologically relevant information while removing genes only subject to technical noise. As it is frequently performed prior to dimensionality reduction, clustering and pseudotime analyses, feature selection can have a major impact on the results. Several different approaches have been proposed for unsupervised feature selection from unprocessed single-cell expression matrices, most based upon identifying highly variable genes in the dataset. We present two methods which take advantage of the prevalence of zeros (dropouts) in scRNASeq data to identify features. We show that dropout-based feature selection outperforms variance-based feature selection for multiple applications of single-cell RNASeq.

Download Full-text

Triku: a feature selection method based on nearest neighbors for single-cell data

10.1101/2021.02.12.430764 ◽

2021 ◽

Author(s):

Alex M. Ascensión ◽

Olga Ibañez-Solé ◽

Inaki Inza ◽

Ander Izeta ◽

Marcos J. Araúzo-Bravo

Keyword(s):

Feature Selection ◽

Single Cell ◽

Nearest Neighbor ◽

Feature Selection Method ◽

Selection Method ◽

Cell Populations ◽

Neighbor Graph ◽

Gene Sets ◽

Nearest Neighbor Graph ◽

Cell Data

AbstractFeature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Triku is a feature selection method that favours genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the nearest neighbor graph. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on mutual information and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms, and contain fewer ribosomal and mitochondrial genes. Triku is available at https://gitlab.com/alexmascension/triku.

Download Full-text