scholarly journals Classification of Imbalanced Data Represented as Binary Features

2021 ◽  
Vol 11 (17) ◽  
pp. 7825
Author(s):  
Kunti Robiatul Mahmudah ◽  
Fatma Indriani ◽  
Yukiko Takemori-Sakai ◽  
Yasunori Iwata ◽  
Takashi Wada ◽  
...  

Typically, classification is conducted on a dataset that consists of numerical features and target classes. For instance, a grayscale image, which is usually represented as a matrix of integers varying from 0 to 255, enables one to apply various classification algorithms to image classification tasks. However, datasets represented as binary features cannot use many standard machine learning algorithms optimally, yet their amount is not negligible. On the other hand, oversampling algorithms such as synthetic minority oversampling technique (SMOTE) and its variants are often used if the dataset for classification is imbalanced. However, since SMOTE and its variants synthesize new minority samples based on the original samples, the diversity of the samples synthesized from binary features is highly limited due to the poor representation of original features. To solve this problem, a preprocessing approach is studied. By converting binary features into numerical ones using feature extraction methods, succeeding oversampling methods can fully display their potential in improving the classifiers’ performances. Through comprehensive experiments using benchmark datasets and real medical datasets, it was observed that a converted dataset consisting of numerical features is better for oversampling methods (maximum improvements of accuracy and F1-score were 35.11% and 42.17%, respectively). In addition, it is confirmed that feature extraction and oversampling synergistically contribute to the improvement of classification performance.

2020 ◽  
Author(s):  
Ying Bi ◽  
Bing Xue ◽  
Mengjie Zhang

IEEE Feature extraction is essential for solving image classification by transforming low-level pixel values into high-level features. However, extracting effective features from images is challenging due to high variations across images in scale, rotation, illumination, and background. Existing methods often have a fixed model complexity and require domain expertise. Genetic programming with a flexible representation can find the best solution without the use of domain knowledge. This paper proposes a new genetic programming-based approach to automatically learning informative features for different image classification tasks. In the new approach, a number of image-related operators, including filters, pooling operators and feature extraction methods, are employed as functions. A flexible program structure is developed to integrate different functions and terminals into a single tree/solution. The new approach can evolve solutions of variable depths to extract various numbers and types of features from the images. The new approach is examined on 12 different image classification tasks of varying difficulty and compared with a large number of effective algorithms. The results show that the new approach achieves better classification performance than most benchmark methods. The analysis of the evolved programs/solutions and the visualisation of the learned features provide deep insights on the proposed approach.


2020 ◽  
Author(s):  
Ying Bi ◽  
Bing Xue ◽  
Mengjie Zhang

IEEE Feature extraction is essential for solving image classification by transforming low-level pixel values into high-level features. However, extracting effective features from images is challenging due to high variations across images in scale, rotation, illumination, and background. Existing methods often have a fixed model complexity and require domain expertise. Genetic programming with a flexible representation can find the best solution without the use of domain knowledge. This paper proposes a new genetic programming-based approach to automatically learning informative features for different image classification tasks. In the new approach, a number of image-related operators, including filters, pooling operators and feature extraction methods, are employed as functions. A flexible program structure is developed to integrate different functions and terminals into a single tree/solution. The new approach can evolve solutions of variable depths to extract various numbers and types of features from the images. The new approach is examined on 12 different image classification tasks of varying difficulty and compared with a large number of effective algorithms. The results show that the new approach achieves better classification performance than most benchmark methods. The analysis of the evolved programs/solutions and the visualisation of the learned features provide deep insights on the proposed approach.


Sensors ◽  
2019 ◽  
Vol 19 (4) ◽  
pp. 916 ◽  
Author(s):  
Wen Cao ◽  
Chunmei Liu ◽  
Pengfei Jia

Aroma plays a significant role in the quality of citrus fruits and processed products. The detection and analysis of citrus volatiles can be measured by an electronic nose (E-nose); in this paper, an E-nose is employed to classify the juice which is stored for different days. Feature extraction and classification are two important requirements for an E-nose. During the training process, a classifier can optimize its own parameters to achieve a better classification accuracy but cannot decide its input data which is treated by feature extraction methods, so the classification result is not always ideal. Label consistent KSVD (L-KSVD) is a novel technique which can extract the feature and classify the data at the same time, and such an operation can improve the classification accuracy. We propose an enhanced L-KSVD called E-LCKSVD for E-nose in this paper. During E-LCKSVD, we introduce a kernel function to the traditional L-KSVD and present a new initialization technique of its dictionary; finally, the weighted coefficients of different parts of its object function is studied, and enhanced quantum-behaved particle swarm optimization (EQPSO) is employed to optimize these coefficients. During the experimental section, we firstly find the classification accuracy of KSVD, and L-KSVD is improved with the help of the kernel function; this can prove that their ability of dealing nonlinear data is improved. Then, we compare the results of different dictionary initialization techniques and prove our proposed method is better. Finally, we find the optimal value of the weighted coefficients of the object function of E-LCKSVD that can make E-nose reach a better performance.


MethodsX ◽  
2021 ◽  
Vol 8 ◽  
pp. 101166
Author(s):  
Timothy J. Fawcett ◽  
Chad S. Cooper ◽  
Ryan J. Longenecker ◽  
Joseph P. Walton

2014 ◽  
Vol 2014 ◽  
pp. 1-12 ◽  
Author(s):  
Manab Kumar Das ◽  
Samit Ari

Classification of electrocardiogram (ECG) signals plays an important role in clinical diagnosis of heart disease. This paper proposes the design of an efficient system for classification of the normal beat (N), ventricular ectopic beat (V), supraventricular ectopic beat (S), fusion beat (F), and unknown beat (Q) using a mixture of features. In this paper, two different feature extraction methods are proposed for classification of ECG beats: (i) S-transform based features along with temporal features and (ii) mixture of ST and WT based features along with temporal features. The extracted feature set is independently classified using multilayer perceptron neural network (MLPNN). The performances are evaluated on several normal and abnormal ECG signals from 44 recordings of the MIT-BIH arrhythmia database. In this work, the performances of three feature extraction techniques with MLP-NN classifier are compared using five classes of ECG beat recommended by AAMI (Association for the Advancement of Medical Instrumentation) standards. The average sensitivity performances of the proposed feature extraction technique for N, S, F, V, and Q are 95.70%, 78.05%, 49.60%, 89.68%, and 33.89%, respectively. The experimental results demonstrate that the proposed feature extraction techniques show better performances compared to other existing features extraction techniques.


Author(s):  
Sarmad Mahar ◽  
Sahar Zafar ◽  
Kamran Nishat

Headnotes are the precise explanation and summary of legal points in an issued judgment. Law journals hire experienced lawyers to write these headnotes. These headnotes help the reader quickly determine the issue discussed in the case. Headnotes comprise two parts. The first part comprises the topic discussed in the judgment, and the second part contains a summary of that judgment. In this thesis, we design, develop and evaluate headnote prediction using machine learning, without involving human involvement. We divided this task into a two steps process. In the first step, we predict law points used in the judgment by using text classification algorithms. The second step generates a summary of the judgment using text summarization techniques. To achieve this task, we created a Databank by extracting data from different law sources in Pakistan. We labelled training data generated based on Pakistan law websites. We tested different feature extraction methods on judiciary data to improve our system. Using these feature extraction methods, we developed a dictionary of terminology for ease of reference and utility. Our approach achieves 65% accuracy by using Linear Support Vector Classification with tri-gram and without stemmer. Using active learning our system can continuously improve the accuracy with the increased labelled examples provided by the users of the system.


2009 ◽  
Vol 56 (3) ◽  
pp. 871-879 ◽  
Author(s):  
Stephen J. Preece ◽  
John Yannis Goulermas ◽  
Laurence P. J. Kenney ◽  
David Howard

2020 ◽  
Vol 37 (5) ◽  
pp. 812-822
Author(s):  
Behnam Asghari Beirami ◽  
Mehdi Mokhtarzade

In this paper, a novel feature extraction technique called SuperMNF is proposed, which is an extension of the minimum noise fraction (MNF) transformation. In SuperMNF, each superpixel has its own transformation matrix and MNF transformation is performed on each superpixel individually. The basic idea behind the SuperMNF is that each superpixel contains its specific signal and noise covariance matrices which are different from the adjacent superpixels. The extracted features, owning spatial-spectral content and provided in the lower dimension, are classified by maximum likelihood classifier and support vector machines. Experiments that are conducted on two real hyperspectral images, named Indian Pines and Pavia University, demonstrate the efficiency of SuperMNF since it yielded more promising results than some other feature extraction methods (MNF, PCA, SuperPCA, KPCA, and MMP).


Sign in / Sign up

Export Citation Format

Share Document