scholarly journals Feature selected cost-sensitive twin SVM for imbalanced data

2020 ◽  
Vol 309 ◽  
pp. 05013
Author(s):  
Xiaopeng Li ◽  
Xianrong Zhang

In this paper, we propose a cost-sensitive twin SVM (cs-tsvm) and apply it to imbalanced data. A weight is added to each instance according to its cost of misclassification which is related to its position. In preprocessing part, features are selected by their difference of majority and minority classes. The feature is selected when its difference value is higher than average one. The experiment is conducted on UCI datasets and G-mean, AUC and accuracy are evaluation metrics. The experimental results show that Feature selection with CS-TWSVM is useful for datasets with high dimension.

IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Mohammed Qaraad ◽  
Souad Amjad ◽  
Ibrahim I.M. Manhrawy ◽  
Hanaa Fathi ◽  
Bayoumi A. Hassan ◽  
...  

2017 ◽  
Vol 9 (1) ◽  
pp. 168781401668529 ◽  
Author(s):  
Sheng-wei Fei

In this article, fault diagnosis of bearing based on relevance vector machine classifier with improved binary bat algorithm is proposed, and the improved binary bat algorithm is used to select the appropriate features and kernel parameter of relevance vector machine. In the improved binary bat algorithm, the new velocities updating method of the bats is presented in order to ensure the decreasing of the probabilities of changing their position vectors’ elements when the position vectors’ elements of the bats are equal to the current best location’s element, and the increasing of the probabilities of changing their position vectors’ elements when the position vectors’ elements of the bats are unequal to the current best location’s element, which are helpful to strengthen the optimization ability of binary bat algorithm. The traditional relevance vector machine trained by the training samples with the unreduced features can be used to compare with the proposed improved binary bat algorithm–relevance vector machine method. The experimental results indicate that improved binary bat algorithm–relevance vector machine has a stronger fault diagnosis ability of bearing than the traditional relevance vector machine trained by the training samples with the unreduced features, and fault diagnosis of bearing based on improved binary bat algorithm–relevance vector machine is feasible.


Author(s):  
Sébastien Gadat ◽  
Sébastien Gadat

Variable selection for classification is a crucial paradigm in image analysis. Indeed, images are generally described by a large amount of features (pixels, edges …) although it is difficult to obtain a sufficiently large number of samples to draw reliable inference for classifications using the whole number of features. The authors describe in this chapter some simple and effective features selection methods based on filter strategy. They also provide some more sophisticated methods based on margin criterion or stochastic approximation techniques that achieve great performances of classification with a very small proportion of variables. Most of these “wrapper” methods are dedicated to a special case of classifier, except the Optimal features Weighting algorithm (denoted OFW in the sequel) which is a meta-algorithm and works with any classifier. A large part of this chapter will be dedicated to the description of the description of OFW and hybrid OFW algorithms. The authors illustrate also several other methods on practical examples of face detection problems.


Genes ◽  
2020 ◽  
Vol 11 (7) ◽  
pp. 717
Author(s):  
Garba Abdulrauf Sharifai ◽  
Zurinahni Zainol

The training machine learning algorithm from an imbalanced data set is an inherently challenging task. It becomes more demanding with limited samples but with a massive number of features (high dimensionality). The high dimensional and imbalanced data set has posed severe challenges in many real-world applications, such as biomedical data sets. Numerous researchers investigated either imbalanced class or high dimensional data sets and came up with various methods. Nonetheless, few approaches reported in the literature have addressed the intersection of the high dimensional and imbalanced class problem due to their complicated interactions. Lately, feature selection has become a well-known technique that has been used to overcome this problem by selecting discriminative features that represent minority and majority class. This paper proposes a new method called Robust Correlation Based Redundancy and Binary Grasshopper Optimization Algorithm (rCBR-BGOA); rCBR-BGOA has employed an ensemble of multi-filters coupled with the Correlation-Based Redundancy method to select optimal feature subsets. A binary Grasshopper optimisation algorithm (BGOA) is used to construct the feature selection process as an optimisation problem to select the best (near-optimal) combination of features from the majority and minority class. The obtained results, supported by the proper statistical analysis, indicate that rCBR-BGOA can improve the classification performance for high dimensional and imbalanced datasets in terms of G-mean and the Area Under the Curve (AUC) performance metrics.


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
Yanjuan Li ◽  
Zitong Zhang ◽  
Zhixia Teng ◽  
Xiaoyan Liu

Amyloid is generally an aggregate of insoluble fibrin; its abnormal deposition is the pathogenic mechanism of various diseases, such as Alzheimer’s disease and type II diabetes. Therefore, accurately identifying amyloid is necessary to understand its role in pathology. We proposed a machine learning-based prediction model called PredAmyl-MLP, which consists of the following three steps: feature extraction, feature selection, and classification. In the step of feature extraction, seven feature extraction algorithms and different combinations of them are investigated, and the combination of SVMProt-188D and tripeptide composition (TPC) is selected according to the experimental results. In the step of feature selection, maximum relevant maximum distance (MRMD) and binomial distribution (BD) are, respectively, used to remove the redundant or noise features, and the appropriate features are selected according to the experimental results. In the step of classification, we employed multilayer perceptron (MLP) to train the prediction model. The 10-fold cross-validation results show that the overall accuracy of PredAmyl-MLP reached 91.59%, and the performance was better than the existing methods.


Sign in / Sign up

Export Citation Format

Share Document