Classification of Imbalanced Data Represented as Binary Features

Typically, classification is conducted on a dataset that consists of numerical features and target classes. For instance, a grayscale image, which is usually represented as a matrix of integers varying from 0 to 255, enables one to apply various classification algorithms to image classification tasks. However, datasets represented as binary features cannot use many standard machine learning algorithms optimally, yet their amount is not negligible. On the other hand, oversampling algorithms such as synthetic minority oversampling technique (SMOTE) and its variants are often used if the dataset for classification is imbalanced. However, since SMOTE and its variants synthesize new minority samples based on the original samples, the diversity of the samples synthesized from binary features is highly limited due to the poor representation of original features. To solve this problem, a preprocessing approach is studied. By converting binary features into numerical ones using feature extraction methods, succeeding oversampling methods can fully display their potential in improving the classifiers’ performances. Through comprehensive experiments using benchmark datasets and real medical datasets, it was observed that a converted dataset consisting of numerical features is better for oversampling methods (maximum improvements of accuracy and F1-score were 35.11% and 42.17%, respectively). In addition, it is confirmed that feature extraction and oversampling synergistically contribute to the improvement of classification performance.

Download Full-text

Genetic Programming with Image-Related Operators and A Flexible Program Structure for Feature Learning in Image Classification

10.26686/wgtn.13158323.v1 ◽

2020 ◽

Author(s):

Ying Bi ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Extraction ◽

Genetic Programming ◽

Image Classification ◽

Domain Knowledge ◽

Extraction Methods ◽

Classification Performance ◽

Model Complexity ◽

Program Structure ◽

New Approach ◽

Classification Tasks

IEEE Feature extraction is essential for solving image classification by transforming low-level pixel values into high-level features. However, extracting effective features from images is challenging due to high variations across images in scale, rotation, illumination, and background. Existing methods often have a fixed model complexity and require domain expertise. Genetic programming with a flexible representation can find the best solution without the use of domain knowledge. This paper proposes a new genetic programming-based approach to automatically learning informative features for different image classification tasks. In the new approach, a number of image-related operators, including filters, pooling operators and feature extraction methods, are employed as functions. A flexible program structure is developed to integrate different functions and terminals into a single tree/solution. The new approach can evolve solutions of variable depths to extract various numbers and types of features from the images. The new approach is examined on 12 different image classification tasks of varying difficulty and compared with a large number of effective algorithms. The results show that the new approach achieves better classification performance than most benchmark methods. The analysis of the evolved programs/solutions and the visualisation of the learned features provide deep insights on the proposed approach.

Download Full-text

Genetic Programming with Image-Related Operators and A Flexible Program Structure for Feature Learning in Image Classification

10.26686/wgtn.13158323 ◽

2020 ◽

Author(s):

Ying Bi ◽

Bing Xue ◽

Mengjie Zhang

Keyword(s):

Feature Extraction ◽

Genetic Programming ◽

Image Classification ◽

Domain Knowledge ◽

Extraction Methods ◽

Classification Performance ◽

Model Complexity ◽

Program Structure ◽

New Approach ◽

Classification Tasks

Download Full-text

Analysis of PCA Based Feature Extraction Methods for Classification of Hyperspectral Image

2019 2nd International Conference on Innovation in Engineering and Technology (ICIET) ◽

10.1109/iciet48527.2019.9290629 ◽

2019 ◽

Author(s):

U. A. Md. Ehsan Ali ◽

Md. Ali Hossain ◽

Md. Rashedul Islam

Keyword(s):

Feature Extraction ◽

Hyperspectral Image ◽

Extraction Methods

Download Full-text

Feature Extraction and Classification of Citrus Juice by Using an Enhanced L-KSVD on Data Obtained from Electronic Nose

Sensors ◽

10.3390/s19040916 ◽

2019 ◽

Vol 19 (4) ◽

pp. 916 ◽

Cited By ~ 2

Author(s):

Wen Cao ◽

Chunmei Liu ◽

Pengfei Jia

Keyword(s):

Feature Extraction ◽

Kernel Function ◽

Electronic Nose ◽

Classification Accuracy ◽

Extraction Methods ◽

Object Function ◽

Optimal Value ◽

Processed Products

Aroma plays a significant role in the quality of citrus fruits and processed products. The detection and analysis of citrus volatiles can be measured by an electronic nose (E-nose); in this paper, an E-nose is employed to classify the juice which is stored for different days. Feature extraction and classification are two important requirements for an E-nose. During the training process, a classifier can optimize its own parameters to achieve a better classification accuracy but cannot decide its input data which is treated by feature extraction methods, so the classification result is not always ideal. Label consistent KSVD (L-KSVD) is a novel technique which can extract the feature and classify the data at the same time, and such an operation can improve the classification accuracy. We propose an enhanced L-KSVD called E-LCKSVD for E-nose in this paper. During E-LCKSVD, we introduce a kernel function to the traditional L-KSVD and present a new initialization technique of its dictionary; finally, the weighted coefficients of different parts of its object function is studied, and enhanced quantum-behaved particle swarm optimization (EQPSO) is employed to optimize these coefficients. During the experimental section, we firstly find the classification accuracy of KSVD, and L-KSVD is improved with the help of the kernel function; this can prove that their ability of dealing nonlinear data is improved. Then, we compare the results of different dictionary initialization techniques and prove our proposed method is better. Finally, we find the optimal value of the weighted coefficients of the object function of E-LCKSVD that can make E-nose reach a better performance.

Download Full-text

Machine learning, waveform preprocessing and feature extraction methods for classification of acoustic startle waveforms

MethodsX ◽

10.1016/j.mex.2020.101166 ◽

2021 ◽

Vol 8 ◽

pp. 101166

Author(s):

Timothy J. Fawcett ◽

Chad S. Cooper ◽

Ryan J. Longenecker ◽

Joseph P. Walton

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Acoustic Startle ◽

Extraction Methods

Download Full-text

ECG Beats Classification Using Mixture of Features

International Scholarly Research Notices ◽

10.1155/2014/178436 ◽

2014 ◽

Vol 2014 ◽

pp. 1-12 ◽

Cited By ~ 21

Author(s):

Manab Kumar Das ◽

Samit Ari

Keyword(s):

Feature Extraction ◽

Extraction Methods ◽

Ectopic Beat ◽

Extraction Techniques ◽

Efficient System ◽

Ecg Signals ◽

Temporal Features ◽

S Transform ◽

Electrocardiogram Ecg

Classification of electrocardiogram (ECG) signals plays an important role in clinical diagnosis of heart disease. This paper proposes the design of an efficient system for classification of the normal beat (N), ventricular ectopic beat (V), supraventricular ectopic beat (S), fusion beat (F), and unknown beat (Q) using a mixture of features. In this paper, two different feature extraction methods are proposed for classification of ECG beats: (i) S-transform based features along with temporal features and (ii) mixture of ST and WT based features along with temporal features. The extracted feature set is independently classified using multilayer perceptron neural network (MLPNN). The performances are evaluated on several normal and abnormal ECG signals from 44 recordings of the MIT-BIH arrhythmia database. In this work, the performances of three feature extraction techniques with MLP-NN classifier are compared using five classes of ECG beat recommended by AAMI (Association for the Advancement of Medical Instrumentation) standards. The average sensitivity performances of the proposed feature extraction technique for N, S, F, V, and Q are 95.70%, 78.05%, 49.60%, 89.68%, and 33.89%, respectively. The experimental results demonstrate that the proposed feature extraction techniques show better performances compared to other existing features extraction techniques.

Download Full-text

Performance Analysis of Machine Learning Algorithms and Feature Extraction Methods for Sentiment Analysis

10.1109/icses52305.2021.9633882 ◽

2021 ◽

Author(s):

Anshumaan Chauhan ◽

Ayushi Agarwal ◽

Razia Sulthana

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Performance Analysis ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms

Download Full-text

Headnote Prediction Using Machine Learning

The International Arab Journal of Information Technology ◽

10.34028/iajit/18/5/7 ◽

2021 ◽

Vol 18 (5) ◽

Author(s):

Sarmad Mahar ◽

Sahar Zafar ◽

Kamran Nishat

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Active Learning ◽

Text Classification ◽

Extraction Methods ◽

Text Summarization ◽

Training Data ◽

Second Step ◽

Support Vector ◽

Classification Algorithms

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.

Download Full-text

A Comparison of Feature Extraction Methods for the Classification of Dynamic Activities From Accelerometer Data

IEEE Transactions on Biomedical Engineering ◽

10.1109/tbme.2008.2006190 ◽

2009 ◽

Vol 56 (3) ◽

pp. 871-879 ◽

Cited By ~ 304

Author(s):

Stephen J. Preece ◽

John Yannis Goulermas ◽

Laurence P. J. Kenney ◽

David Howard

Keyword(s):

Feature Extraction ◽

Extraction Methods ◽

Accelerometer Data

Download Full-text

Superpixel-Based Minimum Noise Fraction Feature Extraction for Classification of Hyperspectral Images

Traitement du signal ◽

10.18280/ts.370514 ◽

2020 ◽

Vol 37 (5) ◽

pp. 812-822

Author(s):

Behnam Asghari Beirami ◽

Mehdi Mokhtarzade

Keyword(s):

Feature Extraction ◽

Extraction Methods ◽

Hyperspectral Images ◽

Support Vector ◽

Minimum Noise Fraction ◽

Vector Machines ◽

Noise Covariance ◽

Noise Fraction ◽

Minimum Noise

In this paper, a novel feature extraction technique called SuperMNF is proposed, which is an extension of the minimum noise fraction (MNF) transformation. In SuperMNF, each superpixel has its own transformation matrix and MNF transformation is performed on each superpixel individually. The basic idea behind the SuperMNF is that each superpixel contains its specific signal and noise covariance matrices which are different from the adjacent superpixels. The extracted features, owning spatial-spectral content and provided in the lower dimension, are classified by maximum likelihood classifier and support vector machines. Experiments that are conducted on two real hyperspectral images, named Indian Pines and Pavia University, demonstrate the efficiency of SuperMNF since it yielded more promising results than some other feature extraction methods (MNF, PCA, SuperPCA, KPCA, and MMP).

Download Full-text