Triku: a feature selection method based on nearest neighbors for single-cell data

AbstractFeature selection is a relevant step in the analysis of single-cell RNA sequencing datasets. Triku is a feature selection method that favours genes defining the main cell populations. It does so by selecting genes expressed by groups of cells that are close in the nearest neighbor graph. Triku efficiently recovers cell populations present in artificial and biological benchmarking datasets, based on mutual information and silhouette coefficient measurements. Additionally, gene sets selected by triku are more likely to be related to relevant Gene Ontology terms, and contain fewer ribosomal and mitochondrial genes. Triku is available at https://gitlab.com/alexmascension/triku.

Download Full-text

Unsupervised feature selection with least-squares quadratic mutual information

Indonesian Journal of Electrical Engineering and Computer Science ◽

10.11591/ijeecs.v22.i3.pp1619-1628 ◽

2021 ◽

Vol 22 (3) ◽

pp. 1619

Author(s):

Janya Sainui ◽

Chouvanee Srivisal

Keyword(s):

Feature Selection ◽

Mutual Information ◽

Nearest Neighbor ◽

Feature Selection Method ◽

K Nearest Neighbor ◽

Underlying Assumption ◽

Neighbor Graph ◽

High Dependency ◽

Benchmark Datasets ◽

Nearest Neighbor Graph

We propose the feature selection method based on the dependency between features in an unsupervised manner. The underlying assumption is that the most important feature should provide high dependency between itself and the rest of the features. Therefore, the top m features with maximum dependency scores should be selected, but the redundant features should be ignored. To deal with this problem, the objective function that is applied to evaluate the dependency between features plays a crucial role. However, previous methods mainly used the mutual information (MI), where the MI estimator based on the k-nearest neighbor graph, resulting in its estimation dependent on the selection of parameter, k, without a systematic way to select it. This implies that the MI estimator tends to be less reliable. Here, we introduce the leastsquares quadratic mutual information (LSQMI) that is more sensible because its tuning parameters can be selected by cross-validation. We show through the experiments that the use of LSQMI performed better than that of MI. In addition, we compared the proposed method to the three counterpart methods using six UCI benchmark datasets. The results demonstrated that the proposed method is useful for selecting the informative features as well as discarding the redundant ones.

Download Full-text

Feature Selection Method for Hydraulic System Faults Diagnosis Based on GA-PLS

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.44-47.1130 ◽

2010 ◽

Vol 44-47 ◽

pp. 1130-1134

Author(s):

Sheng Li ◽

Pei Lin Zhang ◽

Bing Li

Keyword(s):

Feature Selection ◽

Hydraulic System ◽

Nearest Neighbor ◽

Feature Selection Method ◽

Original Data ◽

Selection Method ◽

Classification Model ◽

K Nearest Neighbor ◽

K Nearest Neighbor Algorithm ◽

Faults Diagnosis

Feature selection is a key step in hydraulic system fault diagnosis. Some of the collected features are unrelated to classification model, and some are high correlated to other features. These features are harmful for establishing classification model. In order to solve this problem, genetic algorithm-partial least squares (GA-PLS) is proposed for selecting the representative and optimal features. K nearest neighbor algorithm (KNN) is used for diagnosing and classifying hydraulic system faults. For expressing better performance of GA-PLS, the original data of a model engineering hydraulic system is used, and the results of GA-PLS are compared with all feature used and GA. The experimental results show that, the proposed feature method can diagnose and classify hydraulic system faults more efficiently with using fewer features.

Download Full-text

A novel feature selection method for nearest neighbor search in binary embedding codes

2015 24th Wireless and Optical Communication Conference (WOCC) ◽

10.1109/wocc.2015.7346205 ◽

2015 ◽

Cited By ~ 1

Author(s):

Chih-Yi Chiu ◽

Yu-Cyuan Liou

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Feature Selection Method ◽

Selection Method ◽

Nearest Neighbor Search ◽

Neighbor Search

Download Full-text

Feature selection revisited in the single-cell era

Genome Biology ◽

10.1186/s13059-021-02544-3 ◽

2021 ◽

Vol 22 (1) ◽

Author(s):

Pengyi Yang ◽

Hao Huang ◽

Chunlei Liu

Keyword(s):

Feature Selection ◽

Single Cell ◽

Feature Selection Method ◽

Future Research ◽

Data Types ◽

Future Directions ◽

Recent Developments ◽

Feature Selection Techniques ◽

High Dimensional Datasets ◽

Cell Data

AbstractRecent advances in single-cell biotechnologies have resulted in high-dimensional datasets with increased complexity, making feature selection an essential technique for single-cell data analysis. Here, we revisit feature selection techniques and summarise recent developments. We review their application to a range of single-cell data types generated from traditional cytometry and imaging technologies and the latest array of single-cell omics technologies. We highlight some of the challenges and future directions and finally consider their scalability and make general recommendations on each type of feature selection method. We hope this review stimulates future research and application of feature selection in the single-cell era.

Download Full-text

A New Feature Selection Method Based on K-Nearest Neighbor Approach

Proceedings of the 2016 7th International Conference on Education, Management, Computer and Medicine (EMCM 2016) ◽

10.2991/emcm-16.2017.127 ◽

2017 ◽

Author(s):

Xianchang Wang ◽

Lishi Zhang ◽

Yonggang Ma

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Feature Selection Method ◽

Selection Method ◽

K Nearest Neighbor ◽

New Feature

Download Full-text

DUBStepR is a scalable correlation-based feature selection method for accurately clustering single-cell data

Nature Communications ◽

10.1038/s41467-021-26085-2 ◽

2021 ◽

Vol 12 (1) ◽

Author(s):

Bobby Ranjan ◽

Wenjie Sun ◽

Jinyu Park ◽

Kunal Mishra ◽

Florian Schmidt ◽

...

Keyword(s):

Feature Selection ◽

Single Cell ◽

Gene Selection ◽

Marker Gene ◽

Feature Selection Method ◽

Cell Types ◽

Selection Marker ◽

Selection Methods ◽

Data Types ◽

Cell Data

AbstractFeature selection (marker gene selection) is widely believed to improve clustering accuracy, and is thus a key component of single cell clustering pipelines. Existing feature selection methods perform inconsistently across datasets, occasionally even resulting in poorer clustering accuracy than without feature selection. Moreover, existing methods ignore information contained in gene-gene correlations. Here, we introduce DUBStepR (Determining the Underlying Basis using Stepwise Regression), a feature selection algorithm that leverages gene-gene correlations with a novel measure of inhomogeneity in feature space, termed the Density Index (DI). Despite selecting a relatively small number of genes, DUBStepR substantially outperformed existing single-cell feature selection methods across diverse clustering benchmarks. Additionally, DUBStepR was the only method to robustly deconvolve T and NK heterogeneity by identifying disease-associated common and rare cell types and subtypes in PBMCs from rheumatoid arthritis patients. DUBStepR is scalable to over a million cells, and can be straightforwardly applied to other data types such as single-cell ATAC-seq. We propose DUBStepR as a general-purpose feature selection solution for accurately clustering single-cell data.

Download Full-text

DRSA: a non-hierarchical clustering algorithm using k-NN graph and its application in vegetation classification

Vegetation of Russia ◽

10.31111/vegrus/2015.27.125 ◽

2015 ◽

pp. 125-138 ◽

Cited By ~ 2

Author(s):

I. V. Goncharenko

Keyword(s):

Cluster Analysis ◽

Clustering Algorithm ◽

Nearest Neighbor ◽

Clustering Algorithms ◽

Protein Structures ◽

Hierarchical Cluster ◽

Vegetation Classification ◽

K Nearest Neighbor ◽

Neighbor Graph ◽

Nearest Neighbor Graph

In this article we proposed a new method of non-hierarchical cluster analysis using k-nearest-neighbor graph and discussed it with respect to vegetation classification. The method of k-nearest neighbor (k-NN) classiﬁcation was originally developed in 1951 (Fix, Hodges, 1951). Later a term “k-NN graph” and a few algorithms of k-NN clustering appeared (Cover, Hart, 1967; Brito et al., 1997). In biology k-NN is used in analysis of protein structures and genome sequences. Most of k-NN clustering algorithms build «excessive» graph firstly, so called hypergraph, and then truncate it to subgraphs, just partitioning and coarsening hypergraph. We developed other strategy, the “upward” clustering in forming (assembling consequentially) one cluster after the other. Until today graph-based cluster analysis has not been considered concerning classification of vegetation datasets.

Download Full-text

Improvement of feature selection method in spam filtering

Journal of Computer Applications ◽

10.3724/sp.j.1087.2009.02812 ◽

2009 ◽

Vol 29 (10) ◽

pp. 2812-2815

Author(s):

Yang-zhu LU ◽

Xin-you ZHANG ◽

Yu QI

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Spam Filtering

Download Full-text

Feature Selection for Histopathological Image Classification using levy Flight Salp Swarm Optimizer

Recent Patents on Computer Science ◽

10.2174/2213275912666181210165129 ◽

2019 ◽

Vol 12 (4) ◽

pp. 329-337 ◽

Cited By ~ 2

Author(s):

Venubabu Rachapudi ◽

Golagani Lavanya Devi

Keyword(s):

Feature Selection ◽

Image Classification ◽

Feature Selection Method ◽

Selection Method ◽

Lévy Flight ◽

Levy Flight ◽

Local Optima ◽

Histopathological Image ◽

Surf Features ◽

Histopathological Image Classification

Background: An efficient feature selection method for Histopathological image classification plays an important role to eliminate irrelevant and redundant features. Therefore, this paper proposes a new levy flight salp swarm optimizer based feature selection method. Methods: The proposed levy flight salp swarm optimizer based feature selection method uses the levy flight steps for each follower salp to deviate them from local optima. The best solution returns the relevant and non-redundant features, which are fed to different classifiers for efficient and robust image classification. Results: The efficiency of the proposed levy flight salp swarm optimizer has been verified on 20 benchmark functions. The anticipated scheme beats the other considered meta-heuristic approaches. Furthermore, the anticipated feature selection method has shown better reduction in SURF features than other considered methods and performed well for histopathological image classification. Conclusion: This paper proposes an efficient levy flight salp Swarm Optimizer by modifying the step size of follower salp. The proposed modification reduces the chances of sticking into local optima. Furthermore, levy flight salp Swarm Optimizer has been utilized in the selection of optimum features from SURF features for the histopathological image classification. The simulation results validate that proposed method provides optimal values and high classification performance in comparison to other methods.

Download Full-text

The Effectiveness of the Fused Weighted Filter Feature Selection Method to Improve Software Fault Prediction

Journal of Communications Technology Electronics and Computer Science ◽

10.22385/jctecs.v8i0.96 ◽

2016 ◽

Vol 8 ◽

pp. 5 ◽

Cited By ~ 1

Author(s):

Fatemeh Alighardashi ◽

Mohammad Ali Zare Chahooki

Keyword(s):

Feature Selection ◽

Feature Selection Method ◽

Selection Method ◽

Machine Learning Algorithms ◽

Fault Prediction ◽

Filter Method ◽

Selection Methods ◽

Software Projects ◽

Software Fault Prediction ◽

Software Fault

Improving the software product quality before releasing by periodic tests is one of the most expensive activities in software projects. Due to limited resources to modules test in software projects, it is important to identify fault-prone modules and use the test sources for fault prediction in these modules. Software fault predictors based on machine learning algorithms, are effective tools for identifying fault-prone modules. Extensive studies are being done in this field to find the connection between features of software modules, and their fault-prone. Some of features in predictive algorithms are ineffective and reduce the accuracy of prediction process. So, feature selection methods to increase performance of prediction models in fault-prone modules are widely used. In this study, we proposed a feature selection method for effective selection of features, by using combination of filter feature selection methods. In the proposed filter method, the combination of several filter feature selection methods presented as fused weighed filter method. Then, the proposed method caused convergence rate of feature selection as well as the accuracy improvement. The obtained results on NASA and PROMISE with ten datasets, indicates the effectiveness of proposed method in improvement of accuracy and convergence of software fault prediction.

Download Full-text