DECODING GRATING ORIENTATION FROM MICROELECTRODE ARRAY RECORDINGS IN MONKEY CORTICAL AREA V4

We propose an invasive brain-machine interface (BMI) that decodes the orientation of a visual grating from spike train recordings made with a 96 microelectrodes array chronically implanted into the prelunate gyrus (area V4) of a rhesus monkey. The orientation is decoded irrespective of the grating's spatial frequency. Since pyramidal cells are less prominent in visual areas, compared to (pre)motor areas, the recordings contain spikes with smaller amplitudes, compared to the noise level. Hence, rather than performing spike decoding, feature selection algorithms are applied to extract the required information for the decoder. Two types of feature selection procedures are compared, filter and wrapper. The wrapper is combined with a linear discriminant analysis classifier, and the filter is followed by a radial-basis function support vector machine classifier. In addition, since we have a multiclass classification problen, different methods for combining pairwise classifiers are compared.

Download Full-text

BREAST CANCER DETECTION USING RSFS-BASED FEATURE SELECTION ALGORITHMS IN THERMAL IMAGES

Biomedical Engineering Applications Basis and Communications ◽

10.4015/s1016237221500204 ◽

2021 ◽

pp. 2150020

Author(s):

Nazila Darabi ◽

Abdalhossein Rezai ◽

Seyedeh Shahrbanoo Falahieh Hamidpour

Keyword(s):

Breast Cancer ◽

Feature Selection ◽

Cancer Detection ◽

Vital Role ◽

Support Vector ◽

Computer Aided Detection ◽

K Nearest Neighbors ◽

Cad System ◽

Common Cancer ◽

Selection Algorithms

Breast cancer is a common cancer in female. Accurate and early detection of breast cancer can play a vital role in treatment. This paper presents and evaluates a thermogram based Computer-Aided Detection (CAD) system for the detection of breast cancer. In this CAD system, the Random Subset Feature Selection (RSFS) algorithm and hybrid of minimum Redundancy Maximum Relevance (mRMR) algorithm and Genetic Algorithm (GA) with RSFS algorithm are utilized for feature selection. In addition, the Support Vector Machine (SVM) and k-Nearest Neighbors (kNN) algorithms are utilized as classifier algorithm. The proposed CAD system is verified using MATLAB 2017 and a dataset that is composed of breast images from 78 patients. The implementation results demonstrate that using RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 85.36% and 75%, and sensitivity of 94.11% and 79.31%, respectively. In addition, using hybrid GA and RSFS algorithm for feature selection and kNN and SVM algorithms as classifier have accuracy of 83.87% and 69.56%, and sensitivity of 96% and 81.81%, respectively, and using hybrid mRMR and RSFS algorithms for feature selection and kNN and SVM algorithms as classifier have accuracy of 77.41% and 73.07%, and sensitivity of 98% and 72.72%, respectively.

Download Full-text

Using the Text Categorization Framework for Protein Classification

Handbook of Research on Text and Web Mining Technologies ◽

10.4018/978-1-59904-990-8.ch008 ◽

2010 ◽

pp. 128-140 ◽

Cited By ~ 1

Author(s):

Ricco Rakotomalala ◽

Faouzi Mhamdi

Keyword(s):

Support Vector Machine ◽

Feature Selection ◽

Predictive Model ◽

Text Categorization ◽

Learning Algorithm ◽

Support Vector ◽

Protein Classification ◽

Fixed Length ◽

Selection Algorithms ◽

Proteins Classification

In this chapter, we are interested in proteins classification starting from their primary structures. The goal is to automatically affect proteins sequences to their families. The main originality of the approach is that we directly apply the text categorization framework for the protein classification with very minor modifications. The main steps of the task are clearly identified: we must extract features from the unstructured dataset, we use the fixed length n-grams descriptors; we select and combine the most relevant one for the learning phase; and then, we select the most promising learning algorithm in order to produce accurate predictive model. We obtain essentially two main results. First, the approach is credible, giving accurate results with only 2-grams descriptors length. Second, in our context where many irrelevant descriptors are automatically generated, we must combine aggressive feature selection algorithms and low variance classifiers such as SVM (Support Vector Machine).

Download Full-text

Feature Selection Algorithm for Hyperlipidemia Classification

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.701-702.110 ◽

2014 ◽

Vol 701-702 ◽

pp. 110-113

Author(s):

Qi Rui Zhang ◽

He Xian Wang ◽

Jiang Wei Qin

Keyword(s):

Feature Selection ◽

Nearest Neighbor ◽

Information Gain ◽

Classification Systems ◽

Support Vector ◽

K Nearest Neighbor ◽

Data Set ◽

Document Frequency ◽

Selection Algorithms ◽

Term Weights

This paper reports a comparative study of feature selection algorithms on a hyperlipimedia data set. Three methods of feature selection were evaluated, including document frequency (DF), information gain (IG) and aχ2 statistic (CHI). The classification systems use a vector to represent a document and use tfidfie (term frequency, inverted document frequency, and inverted entropy) to compute term weights. In order to compare the effectives of feature selection, we used three classification methods: Naïve Bayes (NB), k Nearest Neighbor (kNN) and Support Vector Machines (SVM). The experimental results show that IG and CHI outperform significantly DF, and SVM and NB is more effective than KNN when macro-averagingF1 measure is used. DF is suitable for the task of large text classification.

Download Full-text

Diagnostic Performance of 2D and 3D T2WI-Based Radiomics Features With Machine Learning Algorithms to Distinguish Solid Solitary Pulmonary Lesion

Frontiers in Oncology ◽

10.3389/fonc.2021.683587 ◽

2021 ◽

Vol 11 ◽

Author(s):

Qi Wan ◽

Jiaxuan Zhou ◽

Xiaoying Xia ◽

Jianfeng Hu ◽

Peng Wang ◽

...

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Diagnostic Performance ◽

Feature Selection Method ◽

Machine Learning Algorithms ◽

Support Vector ◽

Learning Approaches ◽

Selection Methods ◽

Linear Discriminant ◽

2D And 3D

ObjectiveTo evaluate the performance of 2D and 3D radiomics features with different machine learning approaches to classify SPLs based on magnetic resonance(MR) T2 weighted imaging (T2WI).Material and MethodsA total of 132 patients with pathologically confirmed SPLs were examined and randomly divided into training (n = 92) and test datasets (n = 40). A total of 1692 3D and 1231 2D radiomics features per patient were extracted. Both radiomics features and clinical data were evaluated. A total of 1260 classification models, comprising 3 normalization methods, 2 dimension reduction algorithms, 3 feature selection methods, and 10 classifiers with 7 different feature numbers (confined to 3–9), were compared. The ten-fold cross-validation on the training dataset was applied to choose the candidate final model. The area under the receiver operating characteristic curve (AUC), precision-recall plot, and Matthews Correlation Coefficient were used to evaluate the performance of machine learning approaches.ResultsThe 3D features were significantly superior to 2D features, showing much more machine learning combinations with AUC greater than 0.7 in both validation and test groups (129 vs. 11). The feature selection method Analysis of Variance(ANOVA), Recursive Feature Elimination(RFE) and the classifier Logistic Regression(LR), Linear Discriminant Analysis(LDA), Support Vector Machine(SVM), Gaussian Process(GP) had relatively better performance. The best performance of 3D radiomics features in the test dataset (AUC = 0.824, AUC-PR = 0.927, MCC = 0.514) was higher than that of 2D features (AUC = 0.740, AUC-PR = 0.846, MCC = 0.404). The joint 3D and 2D features (AUC=0.813, AUC-PR = 0.926, MCC = 0.563) showed similar results as 3D features. Incorporating clinical features with 3D and 2D radiomics features slightly improved the AUC to 0.836 (AUC-PR = 0.918, MCC = 0.620) and 0.780 (AUC-PR = 0.900, MCC = 0.574), respectively.ConclusionsAfter algorithm optimization, 2D feature-based radiomics models yield favorable results in differentiating malignant and benign SPLs, but 3D features are still preferred because of the availability of more machine learning algorithmic combinations with better performance. Feature selection methods ANOVA and RFE, and classifier LR, LDA, SVM and GP are more likely to demonstrate better diagnostic performance for 3D features in the current study.

Download Full-text

Ensemble incremental deep multiple layer perceptron model – sentiment analysis application

International Journal of Web Information Systems ◽

10.1108/ijwis-05-2021-0056 ◽

2021 ◽

Vol ahead-of-print (ahead-of-print) ◽

Author(s):

Renuka Devi D. ◽

Sasikala S.

Keyword(s):

Feature Selection ◽

Support Vector ◽

Data Sets ◽

Nearest Neighbour ◽

Swarm Optimization ◽

Content Type ◽

Twitter Data ◽

Proposed Model ◽

Hoeffding Tree ◽

Selection Algorithms

Purpose The purpose of this paper is to enhance the accuracy of classification of streaming big data sets with lesser processing time. This kind of social analytics would contribute to society with inferred decisions at a correct time. The work is intended for streaming nature of Twitter data sets. Design/methodology/approach It is a demanding task to analyse the increasing Twitter data by the conventional methods. The MapReduce (MR) is used for quickest analytics. The online feature selection (OFS) accelerated bat algorithm (ABA) and ensemble incremental deep multiple layer perceptron (EIDMLP) classifier is proposed for Feature Selection and classification. Three Twitter data sets under varied categories are investigated (product, service and emotions). The proposed model is compared with Particle Swarm Optimization, Accelerated Particle Swarm Optimization, accelerated simulated annealing and mutation operator (ASAMO). Feature Selection algorithms and classifiers such as Naïve Bayes, support vector machine, Hoeffding tree and fuzzy minimal consistent class subset coverage with the k-nearest neighbour (FMCCSC-KNN). Findings The proposed model is compared with PSO, APSO, ASAMO. Feature Selection algorithms, and classifiers such as Naïve Bayes (NB), support vector machine (SVM), Hoeffding Tree (HT), and Fuzzy Minimal Consistent Class Subset Coverage with the K-Nearest Neighbour (FMCCSC-KNN). The outcome of the work has achieved an accuracy of 99%, 99.48%, 98.9% for the given data sets with the processing time of 0.0034, 0.0024, 0.0053, seconds respectively. Originality/value A novel framework is proposed for Feature Selection and classification. The work is compared with the authors’ previously developed classifiers with other state-of-the-art Feature Selection and classification algorithms.

Download Full-text

Optimal Combination of Multivariate Filter Feature Selection and Classifier for Speech-Based Depression Detection

Artificial Intelligence Evolution ◽

10.37256/aie.2220211149 ◽

2021 ◽

pp. 134-146

Author(s):

Surbhi Sharma ◽

Anthony J. Bustamante

Keyword(s):

Feature Selection ◽

Mahalanobis Distance ◽

Low Cost ◽

Feature Selection Method ◽

Superior Performance ◽

Support Vector ◽

Decision System ◽

Linear Discriminant ◽

Low Level ◽

Depression Detection

In this paper, we have focused to improve the performance of a speech-based uni-modal depression detection system, which is non-invasive, involves low cost and computation time in comparison to multi-modal systems. The performance of a decision system mainly depends on the choice of feature selection method and the classifier. We have investigated the combination of four well-known multivariate filter methods (minimum Redundancy Maximum Relevance, Scatter Ratio, Mahalanobis Distance, Fast Correlation Based feature selection) and four well-known classifiers (k-Nearest Neighbour, Linear Discriminant classifier, Decision Tree, Support Vector Machine) to obtain a minimal set of relevant and non-redundant features to improve the performance. This will speed up the acquisition of features from speech and build the decision system with low cost and complexity. Experimental results on the high and low-level features of recent work on the DAICWOZ dataset demonstrate the superior performance of the combination of Scatter Ratio and LDC as well as that of Mahalanobis Distance and LDC, in comparison to other combinations and existing speech-based depression results, for both gender independent and gender-based studies. Further, these combinations have also outperformed a few multimodal systems. It was noted that low-level features are more discriminatory and provide a better f1 score.

Download Full-text

Dysphonic Voice Pattern Analysis of Patients in Parkinson’s Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods

Computational and Mathematical Methods in Medicine ◽

10.1155/2017/4201984 ◽

2017 ◽

Vol 2017 ◽

pp. 1-11 ◽

Cited By ~ 6

Author(s):

Yunfeng Wu ◽

Pinnan Chen ◽

Yuchen Yao ◽

Xiaoquan Ye ◽

Yugui Xiao ◽

...

Keyword(s):

Feature Selection ◽

Pattern Analysis ◽

Characteristic Curve ◽

Support Vector ◽

Linear Discriminant ◽

Leibler Divergence ◽

Ensemble Algorithm ◽

Highly Correlated ◽

Feature Selection Approach ◽

Bagging Ensemble

Analysis of quantified voice patterns is useful in the detection and assessment of dysphonia and related phonation disorders. In this paper, we first study the linear correlations between 22 voice parameters of fundamental frequency variability, amplitude variations, and nonlinear measures. The highly correlated vocal parameters are combined by using the linear discriminant analysis method. Based on the probability density functions estimated by the Parzen-window technique, we propose an interclass probability risk (ICPR) method to select the vocal parameters with small ICPR values as dominant features and compare with the modified Kullback-Leibler divergence (MKLD) feature selection approach. The experimental results show that the generalized logistic regression analysis (GLRA), support vector machine (SVM), and Bagging ensemble algorithm input with the ICPR features can provide better classification results than the same classifiers with the MKLD selected features. The SVM is much better at distinguishing normal vocal patterns with a specificity of 0.8542. Among the three classification methods, the Bagging ensemble algorithm with ICPR features can identify 90.77% vocal patterns, with the highest sensitivity of 0.9796 and largest area value of 0.9558 under the receiver operating characteristic curve. The classification results demonstrate the effectiveness of our feature selection and pattern analysis methods for dysphonic voice detection and measurement.

Download Full-text

Improved Feature-Selection Method Considering the Imbalance Problem in Text Categorization

The Scientific World JOURNAL ◽

10.1155/2014/625342 ◽

2014 ◽

Vol 2014 ◽

pp. 1-17 ◽

Cited By ~ 9

Author(s):

Jieming Yang ◽

Zhaoyang Qu ◽

Zhiying Liu

Keyword(s):

Feature Selection ◽

Text Categorization ◽

Information Gain ◽

Feature Selection Method ◽

Support Vector ◽

Selection Methods ◽

Document Collections ◽

Imbalance Problem ◽

Important Approach ◽

Selection Algorithms

The filtering feature-selection algorithm is a kind of important approach to dimensionality reduction in the field of the text categorization. Most of filtering feature-selection algorithms evaluate the significance of a feature for category based on balanced dataset and do not consider the imbalance factor of dataset. In this paper, a new scheme was proposed, which can weaken the adverse effect caused by the imbalance factor in the corpus. We evaluated the improved versions of nine well-known feature-selection methods (Information Gain, Chi statistic, Document Frequency, Orthogonal Centroid Feature Selection, DIA association factor, Comprehensive Measurement Feature Selection, Deviation from Poisson Feature Selection, improved Gini index, and Mutual Information) using naïve Bayes and support vector machines on three benchmark document collections (20-Newsgroups, Reuters-21578, and WebKB). The experimental results show that the improved scheme can significantly enhance the performance of the feature-selection methods.

Download Full-text

ASSESSMENT OF FEATURE SELECTION AND CLASSIFICATION APPROACHES TO ENHANCE INFORMATION FROM OVERNIGHT OXIMETRY IN THE CONTEXT OF APNEA DIAGNOSIS

International Journal of Neural Systems ◽

10.1142/s0129065713500202 ◽

2013 ◽

Vol 23 (05) ◽

pp. 1350020 ◽

Cited By ~ 35

Author(s):

DANIEL ÁLVAREZ ◽

ROBERTO HORNERO ◽

J. VÍCTOR MARCOS ◽

NIELS WESSEL ◽

THOMAS PENZEL ◽

...

Keyword(s):

Feature Selection ◽

High Performance ◽

Principal Component ◽

Screening Tools ◽

Support Vector ◽

Feature Subset ◽

Blood Oxygen Saturation ◽

Test Set ◽

Linear Discriminant ◽

Validation Set

This study is aimed at assessing the usefulness of different feature selection and classification methodologies in the context of sleep apnea hypopnea syndrome (SAHS) detection. Feature extraction, selection and classification stages were applied to analyze blood oxygen saturation (SaO2) recordings in order to simplify polysomnography (PSG), the gold standard diagnostic methodology for SAHS. Statistical, spectral and nonlinear measures were computed to compose the initial feature set. Principal component analysis (PCA), forward stepwise feature selection (FSFS) and genetic algorithms (GAs) were applied to select feature subsets. Fisher's linear discriminant (FLD), logistic regression (LR) and support vector machines (SVMs) were applied in the classification stage. Optimum classification algorithms from each combination of these feature selection and classification approaches were prospectively validated on datasets from two independent sleep units. FSFS + LR achieved the highest diagnostic performance using a small feature subset (4 features), reaching 83.2% accuracy in the validation set and 88.7% accuracy in the test set. Similarly, GAs + SVM also achieved high generalization capability using a small number of input features (7 features), with 84.2% accuracy on the validation set and 84.5% accuracy in the test set. Our results suggest that reduced subsets of complementary features (25% to 50% of total features) and classifiers with high generalization ability could provide high-performance screening tools in the context of SAHS.

Download Full-text

Random forest–based feature selection and detection method for drunk driving recognition

International Journal of Distributed Sensor Networks ◽

10.1177/1550147720905234 ◽

2020 ◽

Vol 16 (2) ◽

pp. 155014772090523

Author(s):

ZhenLong Li ◽

HaoXin Wang ◽

YaoWei Zhang ◽

XiaoHua Zhao

Keyword(s):

Feature Selection ◽

Random Forest ◽

Driving Simulator ◽

Characteristic Curve ◽

Area Under The Curve ◽

Drunk Driving ◽

Support Vector ◽

Linear Discriminant ◽

Dummy Variable ◽

University Of Technology

A method for drunk driving detection using Feature Selection based on the Random Forest was proposed. First, driving behavior data were collected using a driving simulator at Beijing University of Technology. Second, the features were selected according to the Feature Importance in the random forest. Third, a dummy variable was introduced to encode the geometric characteristics of different roads so that drunk driving under different road conditions can be detected with the same classifier based on the random forest. Finally, the linear discriminant analysis, support vector machine, and AdaBoost classifiers were used and compared with the random forest. The accuracy, F1 score, receiver operating characteristic curve, and area under the curve value were used to evaluate the performance of the classifiers. The results show that Accelerator Depth, Speed, Distance to the Center of the Lane, Acceleration, Engine Revolution, Brake Depth, and Steering Angle have important influences on identifying the drivers’ states and can be used to detect drunk driving. Specifically, the classifiers with Accelerator Depth outperformed the other classifiers without Accelerator Depth. This means that Accelerator Depth is an important feature. Both the AdaBoost and random forest classifiers have an accuracy of 81.48%, which verified the effectiveness of the proposed method.

Download Full-text