A New SVM Multi-Class Classification Algorithm Based on Sample Scale and Distribution Area

2013 ◽  
Vol 712-715 ◽  
pp. 2529-2533
Author(s):  
Yu Ping Qin ◽  
Peng Da Qin ◽  
Shu Xian Lun ◽  
Yi Wang

A new SVM multi-class classification algorithm is proposed. Firstly, the optimal binary tree is constructed by the scale and the distribution area of every class sample, and then the sub-classifiers are trained for every non-leaf node in the binary tree. For the sample to be classified, the classification is done from the root node until someone leaf node, and the corresponding class of the leaf node is the class of the sample. The experimental results show that the algorithm improves the classification precision and classification speed, especially in the situation that the sample scale is less but its distribution area is bigger, the algorithm can improve greatly the classification performance.

2013 ◽  
Vol 373-375 ◽  
pp. 1085-1088 ◽  
Author(s):  
Yu Ping Qin ◽  
Peng Da Qin ◽  
Yi Wang ◽  
Shu Xian Lun

A improved binary tree SVM multi-class classification algorithm is proposed. Firstly, constructing the minimum hyper ellipsoid for each class sample in the feather space, and then generating optimal binary tree according to the hyper ellipsoid volume, training sub-classifier for every non-leaf node in the binary tree at the same time. For the sample to be classified, the sub-classifiers are used from the root node until one leaf node, and the corresponding class of the leaf node is the class of the sample. The experiments are done on the Statlog database, and the experimental results show that the algorithm improves classification precision and classification speed, especially in the situation that the number of class are more and their distribution area are equal approximately, the algorithm can greatly improve the classification precision and classification speed.


Author(s):  
Jun Zhang ◽  
◽  
Jinglu Hu

In this paper, we propose a Hierarchical Frequency Sensitive Competitive Learning (HFSCL) method to achieve Color Quantization (CQ). In HFSCL, the appropriate number of quantized colors and the palette can be obtained by an adaptive procedure following a binary tree structure with nodes and layers. Starting from the root node that contains all colors in an image until all nodes are examined by split conditions, a binary tree will be generated. In each node of the tree, a Frequency Sensitive Competitive Learning (FSCL) network is used to achieve two-way division. To avoid over-split, merging condition is defined to merge the clusters that are close enough to each other at each layer. Experimental results show that the proposed HFSCL has desired ability for CQ.


2013 ◽  
Vol 303-306 ◽  
pp. 1609-1612
Author(s):  
Huai Lin Dong ◽  
Xiao Dan Zhu ◽  
Qing Feng Wu ◽  
Juan Juan Huang

Naïve Bayes classification algorithm based on validity (NBCABV) optimizes the training data by eliminating the noise samples of training data with validity to improve the effect of classification, while it ignores the associations of properties. In consideration of the associations of properties, an improved method that is classification algorithm for Naïve Bayes based on validity and correlation (CANBBVC) is proposed to delete more noise samples with validity and correlation, thus resulting in better classification performance. Experimental results show this model has higher classification accuracy comparing the one based on validity solely.


2021 ◽  
Author(s):  
◽  
Hoai Nguyen

<p>Classification aims to identify a class label of an instance according to the information from its characteristics or features. Unfortunately, many classification problems have a large feature set containing irrelevant and redundant features, which reduce the classification performance. In order to address the above problem, feature selection is proposed to select a small subset of relevant features. There are three main types of feature selection methods, i.e. wrapper, embedded and filter approaches. Wrappers use a classification algorithm to evaluate candidate feature subsets. In embedded approaches, the selection process is embedded in the training process of a classification algorithm. Different from the other two approaches, filters do not involve any classification algorithm during the selection process. Feature selection is an important process but it is not an easy task due to its large search space and complex feature interactions. Because of the potential global search ability, Evolutionary Computation (EC), especially Particle Swarm Optimization (PSO), has been widely and successfully applied to feature selection. However, there is potential to improve the effectiveness and efficiency of EC-based feature selection.  The overall goal of this thesis is to investigate and improve the capability of EC for feature selection to select small feature subsets while maintaining or even improving the classification performance compared to using all features. Different aspects of feature selection are considered in this thesis such as the number of objectives (single-objective/multi-objective), the fitness function (filter/wrapper), and the searching mechanism.  This thesis introduces a new fitness function based on mutual information which is calculated by an estimation approach instead of the traditional counting approach. Results show that the estimation approach works well on both continuous and discrete data. More importantly, mutual information calculated by the estimation approach can capture feature interactions better than the traditional counting approach.  This thesis develops a novel binary PSO algorithm, which is the first work to redefine some core concepts of PSO such as velocity and momentum to suit the characteristics of binary search spaces. Experimental results show that the proposed binary PSO algorithm evolve better solutions than other binary EC algorithms when the search spaces are large and complex. Specifically, on feature selection, the proposed binary PSO algorithm can select smaller feature subsets with similar or better classification accuracies, especially when there are a large number of features.  This thesis proposes surrogate models for wrapper-based feature selection. The surrogate models use surrogate training sets which are subsets of informative instances selected from the training set. Experimental results show that the proposed surrogate models assist PSO to reduce the computational cost while maintaining or even improving the classification performance compared to using only the original training set.  The thesis develops the first wrapper-based multi-objective feature selection algorithm using MOEA/D. A new decomposition strategy using multiple reference points for MOEA/D is designed, which can deal with different characteristics of multi-objective feature selection such as highly discontinuous Pareto fronts and complex relationships between objectives. The experimental results show that the proposed algorithm can evolve more diverse non-dominated sets than other multi-objective algorithms.   This thesis introduces the first PSO-based feature selection algorithm for transfer learning. In the proposed algorithm, the fitness function uses classification performance to reduce the differences between domains while maintaining the discriminative ability on the target domain. The experimental results show that the proposed algorithm can select feature subsets which achieve better classification performance than four state-of-the-art feature-based transfer learning algorithms.</p>


2021 ◽  
Author(s):  
◽  
Hoai Nguyen

<p>Classification aims to identify a class label of an instance according to the information from its characteristics or features. Unfortunately, many classification problems have a large feature set containing irrelevant and redundant features, which reduce the classification performance. In order to address the above problem, feature selection is proposed to select a small subset of relevant features. There are three main types of feature selection methods, i.e. wrapper, embedded and filter approaches. Wrappers use a classification algorithm to evaluate candidate feature subsets. In embedded approaches, the selection process is embedded in the training process of a classification algorithm. Different from the other two approaches, filters do not involve any classification algorithm during the selection process. Feature selection is an important process but it is not an easy task due to its large search space and complex feature interactions. Because of the potential global search ability, Evolutionary Computation (EC), especially Particle Swarm Optimization (PSO), has been widely and successfully applied to feature selection. However, there is potential to improve the effectiveness and efficiency of EC-based feature selection.  The overall goal of this thesis is to investigate and improve the capability of EC for feature selection to select small feature subsets while maintaining or even improving the classification performance compared to using all features. Different aspects of feature selection are considered in this thesis such as the number of objectives (single-objective/multi-objective), the fitness function (filter/wrapper), and the searching mechanism.  This thesis introduces a new fitness function based on mutual information which is calculated by an estimation approach instead of the traditional counting approach. Results show that the estimation approach works well on both continuous and discrete data. More importantly, mutual information calculated by the estimation approach can capture feature interactions better than the traditional counting approach.  This thesis develops a novel binary PSO algorithm, which is the first work to redefine some core concepts of PSO such as velocity and momentum to suit the characteristics of binary search spaces. Experimental results show that the proposed binary PSO algorithm evolve better solutions than other binary EC algorithms when the search spaces are large and complex. Specifically, on feature selection, the proposed binary PSO algorithm can select smaller feature subsets with similar or better classification accuracies, especially when there are a large number of features.  This thesis proposes surrogate models for wrapper-based feature selection. The surrogate models use surrogate training sets which are subsets of informative instances selected from the training set. Experimental results show that the proposed surrogate models assist PSO to reduce the computational cost while maintaining or even improving the classification performance compared to using only the original training set.  The thesis develops the first wrapper-based multi-objective feature selection algorithm using MOEA/D. A new decomposition strategy using multiple reference points for MOEA/D is designed, which can deal with different characteristics of multi-objective feature selection such as highly discontinuous Pareto fronts and complex relationships between objectives. The experimental results show that the proposed algorithm can evolve more diverse non-dominated sets than other multi-objective algorithms.   This thesis introduces the first PSO-based feature selection algorithm for transfer learning. In the proposed algorithm, the fitness function uses classification performance to reduce the differences between domains while maintaining the discriminative ability on the target domain. The experimental results show that the proposed algorithm can select feature subsets which achieve better classification performance than four state-of-the-art feature-based transfer learning algorithms.</p>


2014 ◽  
Vol 513-517 ◽  
pp. 1840-1844 ◽  
Author(s):  
Long Jie Cui ◽  
Hong Li Wang ◽  
Rong Yi Cui

The classification performance of the classifier is weakened because the noise samples are introduced for the use of unlabeled samples in Tri-training. In this paper a new Tri-training style algorithm named AR-Tri-training (Tri-training with assistant and rich strategy) is proposed. Firstly, the assistant learning strategy is posed. Then the supporting learner is designed by combining the assistant learning strategy with rich information strategy. The number of mislabeled samples produced in the iterations of three classifiers mutually labeling are reduced by use of the supporting learner, moreover the unlabeled samples and the misclassified samples of validation set can be fully used. The proposed algorithm is applied to voice recognition. The experimental results show that AR-Tri-training algorithm can compensate for the shortcomings of Tri-training algorithm, further improve the testing rate.


2015 ◽  
Vol 7 (3) ◽  
pp. 18 ◽  
Author(s):  
Natarajan Meghanathan

We propose a generic algorithm to determine maximum bottleneck node weight-based data gathering (MaxBNW-DG) trees for wireless sensor networks (WSNs) and compare the performance of the MaxBNW-DG trees with those of maximum and minimum link weight-based data gathering trees (MaxLW-DG and MinLW-DG trees). Assuming each node in a WSN graph has a weight, the bottleneck weight for the path from a node u to the root node of the DG tree is the minimum of the node weights on the path (inclusive of the weights of the end nodes). The MaxBNW-DG tree algorithm determines a DG tree such that each node has a path of the largest bottleneck weight to the root node. We observe the MaxBNW-DG trees to incur lower height, larger percentage of nodes as leaf nodes and a larger weight per intermediate node compared to the leaf node; the tradeoff being a larger a network-wide data aggregation delay due to larger number of child nodes per intermediate node. The MaxBNW-DG algorithm could be used to determine DG trees with larger trust score, larger energy (and other such criterion for node weight) per intermediate node compared to the leaf node. 


2018 ◽  
Vol 72 (2) ◽  
pp. 430-446
Author(s):  
Shuaidong Jia ◽  
Zeyuan Dai ◽  
Lihua Zhang

Due to the limitations of the existing methods (for example, the route binary tree method) that can only automatically generate routes based on a single chart, a method for automatically generating the shortest distance route based on an obstacle spatial database is proposed. Using this proposed method, the route between two arbitrary points at sea can be automatically generated. First, the differences in accuracy and updating time of charts are quantitatively analysed. Next, the mechanism for updating obstacles is designed, an obstacle spatial database is constructed, and the obstacle data extracted from multiple charts are fused. Finally, considering the effect of efficiency on the amount of obstacle data, a route window and an improved R-tree index are designed for quickly extracting and querying the obstacle database. The experimental results demonstrate that compared with existing methods, the proposed method can generate the shortest distance between two arbitrary points at sea and eliminates the limitation of the area of the chart. In addition, with data from multiple charts, the route generated by the proposed method is more reliable than that of the existing methods, and it is more efficient.


2019 ◽  
Author(s):  
Seda Bilaloglu ◽  
Joyce Wu ◽  
Eduardo Fierro ◽  
Raul Delgado Sanchez ◽  
Paolo Santiago Ocampo ◽  
...  

AbstractVisual analysis of solid tissue mounted on glass slides is currently the primary method used by pathologists for determining the stage, type and subtypes of cancer. Although whole slide images are usually large (10s to 100s thousands pixels wide), an exhaustive though time-consuming assessment is necessary to reduce the risk of misdiagnosis. In an effort to address the many diagnostic challenges faced by trained experts, recent research has been focused on developing automatic prediction systems for this multi-class classification problem. Typically, complex convolutional neural network (CNN) architectures, such as Google’s Inception, are used to tackle this problem. Here, we introduce a greatly simplified CNN architecture, PathCNN, which allows for more efficient use of computational resources and better classification performance. Using this improved architecture, we trained simultaneously on whole-slide images from multiple tumor sites and corresponding non-neoplastic tissue. Dimensionality reduction analysis of the weights of the last layer of the network capture groups of images that faithfully represent the different types of cancer, highlighting at the same time differences in staining and capturing outliers, artifacts and misclassification errors. Our code is available online at: https://github.com/sedab/PathCNN.


Sign in / Sign up

Export Citation Format

Share Document