scholarly journals Multi-Cluster Feature Selection Based on Isometric Mapping

2022 ◽  
Vol 9 (3) ◽  
pp. 570-572
Author(s):  
Yadi Wang ◽  
Zefeng Zhang ◽  
Yinghao Lin
2018 ◽  
Vol 07 (01) ◽  
pp. 1750015
Author(s):  
Bingqing Lin ◽  
Zhen Pang ◽  
Qihua Wang

This paper concerns with variable screening when highly correlated variables exist in high-dimensional linear models. We propose a novel cluster feature selection (CFS) procedure based on the elastic net and linear correlation variable screening to enjoy the benefits of the two methods. When calculating the correlation between the predictor and the response, we consider highly correlated groups of predictors instead of the individual ones. This is in contrast to the usual linear correlation variable screening. Within each correlated group, we apply the elastic net to select variables and estimate their parameters. This avoids the drawback of mistakenly eliminating true relevant variables when they are highly correlated like LASSO [R. Tibshirani, Regression shrinkage and selection via the lasso, J. R. Stat. Soc. Ser. B 58 (1996) 268–288] does. After applying the CFS procedure, the maximum absolute correlation coefficient between clusters becomes smaller and any common model selection methods like sure independence screening (SIS) [J. Fan and J. Lv, Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. Ser. B 70 (2008) 849–911] or LASSO can be applied to improve the results. Extensive numerical examples including pure simulation examples and semi-real examples are conducted to show the good performances of our procedure.


Author(s):  
Le Nguyen Hoai Nam ◽  
Ho Bao Quoc

The bag-of-words technique is often used to present a document in text categorization. However, for a large set of documents where the dimension of the bag-of-words vector is very high, text categorization becomes a serious challenge as a result of sparse data, over-fitting, and irrelevant features. A filter feature selection method reduces the number of features by eliminating irrelevant features from the bag-of-words vector. In this paper, we analyze the weak points and strong points of two filter feature selection approaches which are the frequency-based approach and the cluster-based approach. Thanks to the analysis, we propose hybrid filter feature selection methods, named the Frequency-Cluster Feature Selection (FCFS) and the Detailed Frequency-Cluster Feature Selection (DtFCFS), to further improve the performance of the filter feature selection process in text categorization. The FCFS is a combination of the Frequency-based approach and the Cluster-based approach, while the DtFCFS, a detailed version of the FCFS, is a comprehensively hybrid clusterbased method. We do experiments with four benchmark datasets (the Reuters-21578 and Newsgroup dataset for news classification, the Ohsumed dataset for medical document classification, and the LingSpam dataset for email classification) to compare the proposed methods with six related wellknown methods such as the Comprehensive Measurement Feature Selection (CMFS), the Optimal Orthogonal Centroid Feature Selection (OCFS), the Crossed Centroid Feature Selection (CIIC), the Information Gain (IG), the Chi-square (CHI), and the Deviation from Poisson Feature Selection (DFPFS). In terms of the Micro-F1, the Macro-F1, and the dimension reduction rate, the DtFCFS is superior to the other methods, while the FCFS shows competitive and even superior performance to the good methods, especially for the Macro-F1.


Author(s):  
Azadeh Dinparastdjadid ◽  
Ehsan T. Esfahani

Having the ability to study the activity of single neurons will facilitate studies in many areas including cognitive sciences and brain computer interface applications. Due to the fact that every neuron has it’s own unique spike waveform, by applying spike-sorting methods, one can separate neurons based on their associated spike. Spike sorting is an unsupervised learning problem in the realm of data mining and machine learning. In this study, a new method that will improve the accuracy of spike sorting in comparison to existing methods has been introduced. This method, which is named Multi Cluster Feature Selection (MCFS), will designate a reduced number of features from the original data set that will best differentiate the existing clusters through solving a Lasso optimization problem. MCFS, was also applied to data obtained from multi-channel recordings on a rat’s brain. With MCFS, each channel was studied and neurons in each channel were sorted with an improved rate in comparison to conventional methods such as PCA.


Author(s):  
Lindsey M. Kitchell ◽  
Francisco J. Parada ◽  
Brandi L. Emerick ◽  
Tom A. Busey

2012 ◽  
Vol 19 (2) ◽  
pp. 97-111 ◽  
Author(s):  
Muhammad Ahmad ◽  
Syungyoung Lee ◽  
Ihsan Ul Haq ◽  
Qaisar Mushtaq

Sign in / Sign up

Export Citation Format

Share Document