scholarly journals AGNES-SMOTE: An Oversampling Algorithm Based on Hierarchical Clustering and Improved SMOTE

2020 ◽  
Vol 2020 ◽  
pp. 1-9
Author(s):  
Xin Wang ◽  
Yue Yang ◽  
Mingsong Chen ◽  
Qin Wang ◽  
Qin Qin ◽  
...  

Aiming at low classification accuracy of imbalanced datasets, an oversampling algorithm—AGNES-SMOTE (Agglomerative Nesting-Synthetic Minority Oversampling Technique) based on hierarchical clustering and improved SMOTE—is proposed. Its key procedures include hierarchically cluster majority samples and minority samples, respectively; divide minority subclusters on the basis of the obtained majority subclusters; select “seed sample” based on the sampling weight and probability distribution of minority subcluster; and restrict the generation of new samples in a certain area by centroid method in the sampling process. The combination of AGNES-SMOTE and SVM (Support Vector Machine) is presented to deal with imbalanced datasets classification. Experiments on UCI datasets are conducted to compare the performance of different algorithms mentioned in the literature. Experimental results indicate AGNES-SMOTE excels in synthesizing new samples and improves SVM classification performance on imbalanced datasets.

2018 ◽  
Vol 21 (62) ◽  
pp. 1
Author(s):  
Jorge E. Camargo ◽  
Vladimir Vargas-Calderon ◽  
Nelson Vargas ◽  
Liliana Calderón-Benavides

With the purpose of classifying text based on its sentiment polarity (positive or negative), we proposed an extension of a 68,000 tweets corpus through the inclusion of word definitions from a dictionary of the Real Academia Espa\~{n}ola de la Lengua (RAE). A set of 28,000 combinations of 6 Word2Vec and support vector machine parameters were considered in order to evaluate how positively would affect the inclusion of a RAE's dictionary definitions classification performance. We found that such a corpus extension significantly improve the classification accuracy. Therefore, we conclude that the inclusion of a RAE's dictionary increases the semantic relations learned by Word2Vec allowing a better classification accuracy.


Author(s):  
Manju Bala ◽  
R. K. Agrawal

The choice of kernel function and its parameter is very important for better performance of support vector machine. In this chapter, the authors proposed few new kernel functions which satisfy the Mercer’s conditions and a robust algorithm to automatically determine the suitable kernel function and its parameters based on AdaBoost to improve the performance of support vector machine. The performance of proposed algorithm is evaluated on several benchmark datasets from UCI repository. The experimental results for different datasets show that the Gaussian kernel is not always the best choice to achieve high generalization of support vector machine classifier. However, with the proper choice of kernel function and its parameters using proposed algorithm, it is possible to achieve maximum classification accuracy for all datasets.


2018 ◽  
Vol 32 (08) ◽  
pp. 1850086 ◽  
Author(s):  
Yang Liu ◽  
Jiang Wang ◽  
Lihui Cai ◽  
Yingyuan Chen ◽  
Yingmei Qin

As a pattern of cross-frequency coupling (CFC), phase–amplitude coupling (PAC) depicts the interaction between the phase and amplitude of distinct frequency bands from the same signal, and has been proved to be closely related to the brain’s cognitive and memory activities. This work utilized PAC and support vector machine (SVM) classifier to identify the epileptic seizures from electroencephalogram (EEG) data. The entropy-based modulation index (MI) matrixes are used to express the strength of PAC, from which we extracted features as the input for classifier. Based on the Bonn database, which contains five datasets of EEG segments obtained from healthy volunteers and epileptic subjects, a 100% classification accuracy is achieved for identifying seizure ictal from healthy data, and an accuracy of 97.67% is reached in the classification of ictal EEG signals from inter-ictal EEGs. Based on the CHB–MIT database which is a group of continuously recorded epileptic EEGs by scalp electrodes, a 97.50% classification accuracy is obtained and a raising sign of MI value is found at 6[Formula: see text]s before seizure onset. The classification performance in this work is effective, and PAC can be considered as a useful tool for detecting and predicting the epileptic seizures and providing reference for clinical diagnosis.


2013 ◽  
Vol 339 ◽  
pp. 384-388
Author(s):  
Cun He Li ◽  
Rui Xue Chen ◽  
Yi Zhao Ouyang

In classification, when the distribution of the training data between classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. Features in the minority classes are normally difficult to be fully recognized. Hyper-sphere support vector machine is an important method for unbalanced classification which is an important issue, but this algorithm has a defect. In order to significantly improve the classification performance of imbalanced datasets, we propose a new method based on Generalized Hyper-sphere Support Vector Machine to enhance the classification accuracy for the minority classes. Support vector machine (SVM) is then used as the base classifier to train the reprocessed dataset. Our experimental results demonstrate that the proposed selection technique improves the classification rate of the rare events, and it also improves the overall accuracy of SVM without data pre-processing.


Author(s):  
Narina Thakur ◽  
Deepti Mehrotra ◽  
Abhay Bansal ◽  
Manju Bala

Objective: Since the adequacy of Learning Objects (LO) is a dynamic concept and changes in its use, needs and evolution, it is important to consider the importance of LO in terms of time to assess its relevance as the main objective of the proposed research. Another goal is to increase the classification accuracy and precision. Methods: With existing IR and ranking algorithms, MAP optimization either does not lead to a comprehensively optimal solution or is expensive and time - consuming. Nevertheless, Support Vector Machine learning competently leads to a globally optimal solution. SVM is a powerful classifier method with its high classification accuracy and the Tilted time window based model is computationally efficient. Results: This paper proposes and implements the LO ranking and retrieval algorithm based on the Tilted Time window and the Support Vector Machine, which uses the merit of both methods. The proposed model is implemented for the NCBI dataset and MAT Lab. Conclusion: The experiments have been carried out on the NCBI dataset, and LO weights are assigned to be relevant and non - relevant for a given user query according to the Tilted Time series and the Cosine similarity score. Results showed that the model proposed has much better accuracy.


Author(s):  
Wanli Wang ◽  
Botao Zhang ◽  
Kaiqi Wu ◽  
Sergey A Chepinskiy ◽  
Anton A Zhilenkov ◽  
...  

In this paper, a hybrid method based on deep learning is proposed to visually classify terrains encountered by mobile robots. Considering the limited computing resource on mobile robots and the requirement for high classification accuracy, the proposed hybrid method combines a convolutional neural network with a support vector machine to keep a high classification accuracy while improve work efficiency. The key idea is that the convolutional neural network is used to finish a multi-class classification and simultaneously the support vector machine is used to make a two-class classification. The two-class classification performed by the support vector machine is aimed at one kind of terrain that users are mostly concerned with. Results of the two classifications will be consolidated to get the final classification result. The convolutional neural network used in this method is modified for the on-board usage of mobile robots. In order to enhance efficiency, the convolutional neural network has a simple architecture. The convolutional neural network and the support vector machine are trained and tested by using RGB images of six kinds of common terrains. Experimental results demonstrate that this method can help robots classify terrains accurately and efficiently. Therefore, the proposed method has a significant potential for being applied to the on-board usage of mobile robots.


2011 ◽  
Vol 181-182 ◽  
pp. 830-835
Author(s):  
Min Song Li

Latent Semantic Indexing(LSI) is an effective feature extraction method which can capture the underlying latent semantic structure between words in documents. However, it is probably not the most appropriate for text categorization to use the method to select feature subspace, since the method orders extracted features according to their variance,not the classification power. We proposed a method based on support vector machine to extract features and select a Latent Semantic Indexing that be suited for classification. Experimental results indicate that the method improves classification performance with more compact representation.


2016 ◽  
Vol 25 (3) ◽  
pp. 417-429
Author(s):  
Chong Wu ◽  
Lu Wang ◽  
Zhe Shi

AbstractFor the financial distress prediction model based on support vector machine, there are no theories concerning how to choose a proper kernel function in a data-dependent way. This paper proposes a method of modified kernel function that can availably enhance classification accuracy. We apply an information-geometric method to modifying a kernel that is based on the structure of the Riemannian geometry induced in the input space by the kernel. A conformal transformation of a kernel from input space to higher-dimensional feature space enlarges volume elements locally near support vectors that are situated around the classification boundary and reduce the number of support vectors. This paper takes the Gaussian radial basis function as the internal kernel. Additionally, this paper combines the above method with the theories of standard regularization and non-dimensionalization to construct the new model. In the empirical analysis section, the paper adopts the financial data of Chinese listed companies. It uses five groups of experiments with different parameters to compare the classification accuracy. We can make the conclusion that the model of modified kernel function can effectively reduce the number of support vectors, and improve the classification accuracy.


2021 ◽  
Vol 40 (1) ◽  
pp. 1481-1494
Author(s):  
Geng Deng ◽  
Yaoguo Xie ◽  
Xindong Wang ◽  
Qiang Fu

Many classification problems contain shape information from input features, such as monotonic, convex, and concave. In this research, we propose a new classifier, called Shape-Restricted Support Vector Machine (SR-SVM), which takes the component-wise shape information to enhance classification accuracy. There exists vast research literature on monotonic classification covering monotonic or ordinal shapes. Our proposed classifier extends to handle convex and concave types of features, and combinations of these types. While standard SVM uses linear separating hyperplanes, our novel SR-SVM essentially constructs non-parametric and nonlinear separating planes subject to component-wise shape restrictions. We formulate SR-SVM classifier as a convex optimization problem and solve it using an active-set algorithm. The approach applies basis function expansions on the input and effectively utilizes the standard SVM solver. We illustrate our methodology using simulation and real world examples, and show that SR-SVM improves the classification performance with additional shape information of input.


Sign in / Sign up

Export Citation Format

Share Document