AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE

Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.

Download Full-text

Sentiment polarity classification of tweets using a extended dictionary

INTELIGENCIA ARTIFICIAL ◽

10.4114/intartif.vol21iss62pp1-12 ◽

2018 ◽

Vol 21 (62) ◽

pp. 1

Author(s):

Jorge E. Camargo ◽

Vladimir Vargas-Calderon ◽

Nelson Vargas ◽

Liliana Calderón-Benavides

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Classification Performance ◽

Semantic Relations ◽

Support Vector ◽

The Real ◽

Polarity Classification ◽

Real Academia ◽

Word Definitions

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.

Download Full-text

Kernel Parameter Selection for SVM Classification

Strategic Pervasive Computing Applications ◽

10.4018/978-1-61520-753-4.ch002 ◽

2011 ◽

pp. 44-55

Author(s):

Manju Bala ◽

R. K. Agrawal

Keyword(s):

Support Vector Machine ◽

Kernel Function ◽

Classification Accuracy ◽

Parameter Selection ◽

Kernel Functions ◽

Gaussian Kernel ◽

Support Vector ◽

Svm Classification ◽

Kernel Parameter ◽

Benchmark Datasets

The choice of kernel function and its parameter is very important for better performance of support vector machine. In this chapter, the authors proposed few new kernel functions which satisfy the Mercer’s conditions and a robust algorithm to automatically determine the suitable kernel function and its parameters based on AdaBoost to improve the performance of support vector machine. The performance of proposed algorithm is evaluated on several benchmark datasets from UCI repository. The experimental results for different datasets show that the Gaussian kernel is not always the best choice to achieve high generalization of support vector machine classifier. However, with the proper choice of kernel function and its parameters using proposed algorithm, it is possible to achieve maximum classification accuracy for all datasets.

Download Full-text

Epileptic seizure detection from EEG signals with phase–amplitude cross-frequency coupling and support vector machine

International Journal of Modern Physics B ◽

10.1142/s0217979218500868 ◽

2018 ◽

Vol 32 (08) ◽

pp. 1850086 ◽

Cited By ~ 3

Author(s):

Yang Liu ◽

Jiang Wang ◽

Lihui Cai ◽

Yingyuan Chen ◽

Yingmei Qin

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Epileptic Seizures ◽

Classification Performance ◽

Support Vector ◽

Svm Classifier ◽

Eeg Signals ◽

Frequency Coupling ◽

Phase Amplitude ◽

Cross Frequency Coupling

As a pattern of cross-frequency coupling (CFC), phase–amplitude coupling (PAC) depicts the interaction between the phase and amplitude of distinct frequency bands from the same signal, and has been proved to be closely related to the brain’s cognitive and memory activities. This work utilized PAC and support vector machine (SVM) classifier to identify the epileptic seizures from electroencephalogram (EEG) data. The entropy-based modulation index (MI) matrixes are used to express the strength of PAC, from which we extracted features as the input for classifier. Based on the Bonn database, which contains five datasets of EEG segments obtained from healthy volunteers and epileptic subjects, a 100% classification accuracy is achieved for identifying seizure ictal from healthy data, and an accuracy of 97.67% is reached in the classification of ictal EEG signals from inter-ictal EEGs. Based on the CHB–MIT database which is a group of continuously recorded epileptic EEGs by scalp electrodes, a 97.50% classification accuracy is obtained and a raising sign of MI value is found at 6[Formula: see text]s before seizure onset. The classification performance in this work is effective, and PAC can be considered as a useful tool for detecting and predicting the epileptic seizures and providing reference for clinical diagnosis.

Download Full-text

Imbalanced Support Vector Machine Classification Based on Hyper-Sphere

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.339.384 ◽

2013 ◽

Vol 339 ◽

pp. 384-388

Author(s):

Cun He Li ◽

Rui Xue Chen ◽

Yi Zhao Ouyang

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Learning Algorithm ◽

Classification Performance ◽

Training Data ◽

Support Vector ◽

Classification Rate ◽

Important Method ◽

Selection Technique ◽

Unbalanced Classification

In classification, when the distribution of the training data between classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. Features in the minority classes are normally difficult to be fully recognized. Hyper-sphere support vector machine is an important method for unbalanced classification which is an important issue, but this algorithm has a defect. In order to significantly improve the classification performance of imbalanced datasets, we propose a new method based on Generalized Hyper-sphere Support Vector Machine to enhance the classification accuracy for the minority classes. Support vector machine (SVM) is then used as the base classifier to train the reprocessed dataset. Our experimental results demonstrate that the proposed selection technique improves the classification rate of the rare events, and it also improves the overall accuracy of SVM without data pre-processing.

Download Full-text

Simulation and Performance Analysis of Tilted Time Window and Support Vector Machine Based Learning Object Ranking Method

Recent Advances in Electrical & Electronic Engineering (Formerly Recent Patents on Electrical & Electronic Engineering) ◽

10.2174/2213111607666190215120017 ◽

2020 ◽

Vol 13 (2) ◽

pp. 153-164

Author(s):

Narina Thakur ◽

Deepti Mehrotra ◽

Abhay Bansal ◽

Manju Bala

Keyword(s):

Support Vector Machine ◽

Classification Accuracy ◽

Time Window ◽

Optimal Solution ◽

Similarity Score ◽

Learning Objects ◽

Retrieval Algorithm ◽

Support Vector ◽

Computationally Efficient ◽

User Query

Objective: Since the adequacy of Learning Objects (LO) is a dynamic concept and changes in its use, needs and evolution, it is important to consider the importance of LO in terms of time to assess its relevance as the main objective of the proposed research. Another goal is to increase the classification accuracy and precision. Methods: With existing IR and ranking algorithms, MAP optimization either does not lead to a comprehensively optimal solution or is expensive and time - consuming. Nevertheless, Support Vector Machine learning competently leads to a globally optimal solution. SVM is a powerful classifier method with its high classification accuracy and the Tilted time window based model is computationally efficient. Results: This paper proposes and implements the LO ranking and retrieval algorithm based on the Tilted Time window and the Support Vector Machine, which uses the merit of both methods. The proposed model is implemented for the NCBI dataset and MAT Lab. Conclusion: The experiments have been carried out on the NCBI dataset, and LO weights are assigned to be relevant and non - relevant for a given user query according to the Tilted Time series and the Cosine similarity score. Results showed that the model proposed has much better accuracy.

Download Full-text

Combining Binary Particle Swarm Optimization with Support Vector Machine for Enhancing Rice Varieties Classification Accuracy

IEEE Access ◽

10.1109/access.2021.3076130 ◽

2021 ◽

pp. 1-1

Author(s):

Tran Thi Kim Nga ◽

Tuan Pham-Viet ◽

Dang Minh Tam ◽

Insoo Koo ◽

Vladimir Y. Mariano ◽

...

Keyword(s):

Support Vector Machine ◽

Particle Swarm Optimization ◽

Classification Accuracy ◽

Particle Swarm ◽

Support Vector ◽

Binary Particle Swarm Optimization ◽

Rice Varieties ◽

Swarm Optimization

Download Full-text

A visual terrain classification method for mobile robots’ navigation based on convolutional neural network and support vector machine

Transactions of the Institute of Measurement and Control ◽

10.1177/0142331220987917 ◽

2021 ◽

pp. 014233122098791

Author(s):

Wanli Wang ◽

Botao Zhang ◽

Kaiqi Wu ◽

Sergey A Chepinskiy ◽

Anton A Zhilenkov ◽

...

Keyword(s):

Neural Network ◽

Support Vector Machine ◽

Mobile Robots ◽

Convolutional Neural Network ◽

Hybrid Method ◽

Classification Accuracy ◽

Support Vector ◽

High Classification Accuracy ◽

Enhance Efficiency ◽

Multi Class Classification

In this paper, a hybrid method based on deep learning is proposed to visually classify terrains encountered by mobile robots. Considering the limited computing resource on mobile robots and the requirement for high classification accuracy, the proposed hybrid method combines a convolutional neural network with a support vector machine to keep a high classification accuracy while improve work efficiency. The key idea is that the convolutional neural network is used to finish a multi-class classification and simultaneously the support vector machine is used to make a two-class classification. The two-class classification performed by the support vector machine is aimed at one kind of terrain that users are mostly concerned with. Results of the two classifications will be consolidated to get the final classification result. The convolutional neural network used in this method is modified for the on-board usage of mobile robots. In order to enhance efficiency, the convolutional neural network has a simple architecture. The convolutional neural network and the support vector machine are trained and tested by using RGB images of six kinds of common terrains. Experimental results demonstrate that this method can help robots classify terrains accurately and efficiently. Therefore, the proposed method has a significant potential for being applied to the on-board usage of mobile robots.

Download Full-text

A Method Based on Support Vector Machine for Feature Selection of Latent Semantic Features

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.181-182.830 ◽

2011 ◽

Vol 181-182 ◽

pp. 830-835

Author(s):

Min Song Li

Keyword(s):

Support Vector Machine ◽

Text Categorization ◽

Latent Semantic Indexing ◽

Classification Performance ◽

Compact Representation ◽

Support Vector ◽

Semantic Features ◽

Semantic Indexing ◽

Feature Extraction Method ◽

Feature Subspace

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.

Download Full-text

Financial Distress Prediction Based on Support Vector Machine with a Modified Kernel Function

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0132 ◽

2016 ◽

Vol 25 (3) ◽

pp. 417-429

Author(s):

Chong Wu ◽

Lu Wang ◽

Zhe Shi

Keyword(s):

Support Vector Machine ◽

Kernel Function ◽

Financial Distress ◽

Classification Accuracy ◽

Feature Space ◽

Support Vector ◽

Input Space ◽

Financial Distress Prediction ◽

Support Vectors ◽

Distress Prediction

AbstractFor the financial distress prediction model based on support vector machine, there are no theories concerning how to choose a proper kernel function in a data-dependent way. This paper proposes a method of modified kernel function that can availably enhance classification accuracy. We apply an information-geometric method to modifying a kernel that is based on the structure of the Riemannian geometry induced in the input space by the kernel. A conformal transformation of a kernel from input space to higher-dimensional feature space enlarges volume elements locally near support vectors that are situated around the classification boundary and reduce the number of support vectors. This paper takes the Gaussian radial basis function as the internal kernel. Additionally, this paper combines the above method with the theories of standard regularization and non-dimensionalization to construct the new model. In the empirical analysis section, the paper adopts the financial data of Chinese listed companies. It uses five groups of experiments with different parameters to compare the classification accuracy. We can make the conclusion that the model of modified kernel function can effectively reduce the number of support vectors, and improve the classification accuracy.

Download Full-text

Shape-restricted support vector machine (SR-SVM): a SVM classifier taking supplementary shape information of input

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202155 ◽

2021 ◽

Vol 40 (1) ◽

pp. 1481-1494

Author(s):

Geng Deng ◽

Yaoguo Xie ◽

Xindong Wang ◽

Qiang Fu

Keyword(s):

Support Vector Machine ◽

Classification Performance ◽

Research Literature ◽

Support Vector ◽

Svm Classifier ◽

Classification Problems ◽

Active Set ◽

Shape Information ◽

Convex Optimization Problem ◽

Shape Restrictions

Many classification problems contain shape information from input features, such as monotonic, convex, and concave. In this research, we propose a new classifier, called Shape-Restricted Support Vector Machine (SR-SVM), which takes the component-wise shape information to enhance classification accuracy. There exists vast research literature on monotonic classification covering monotonic or ordinal shapes. Our proposed classifier extends to handle convex and concave types of features, and combinations of these types. While standard SVM uses linear separating hyperplanes, our novel SR-SVM essentially constructs non-parametric and nonlinear separating planes subject to component-wise shape restrictions. We formulate SR-SVM classifier as a convex optimization problem and solve it using an active-set algorithm. The approach applies basis function expansions on the input and effectively utilizes the standard SVM solver. We illustrate our methodology using simulation and real world examples, and show that SR-SVM improves the classification performance with additional shape information of input.

Download Full-text