Multiclass Contour-Preserving Classification with Support Vector Machine (SVM)

2017 ◽  
Vol 26 (2) ◽  
pp. 323-334 ◽  
Author(s):  
Piyabute Fuangkhon

AbstractMulticlass contour-preserving classification (MCOV) has been used to preserve the contour of the data set and improve the classification accuracy of a feed-forward neural network. It synthesizes two types of new instances, called fundamental multiclass outpost vector (FMCOV) and additional multiclass outpost vector (AMCOV), in the middle of the decision boundary between consecutive classes of data. This paper presents a comparison on the generalization of an inclusion of FMCOVs, AMCOVs, and both MCOVs on the final training sets with support vector machine (SVM). The experiments were carried out using MATLAB R2015a and LIBSVM v3.20 on seven types of the final training sets generated from each of the synthetic and real-world data sets from the University of California Irvine machine learning repository and the ELENA project. The experimental results confirm that an inclusion of FMCOVs on the final training sets having raw data can improve the SVM classification accuracy significantly.

2021 ◽  
Vol 5 (2) ◽  
pp. 62-70
Author(s):  
Ömer KASIM

Cardiotocography (CTG) is used for monitoring the fetal heart rate signals during pregnancy. Evaluation of these signals by specialists provides information about fetal status. When a clinical decision support system is introduced with a system that can automatically classify these signals, it is more sensitive for experts to examine CTG data. In this study, CTG data were analysed with the Extreme Learning Machine (ELM) algorithm and these data were classified as normal, suspicious and pathological as well as benign and malicious. The proposed method is validated with the University of California International CTG data set. The performance of the proposed method is evaluated with accuracy, f1 score, Cohen kappa, precision, and recall metrics. As a result of the experiments, binary classification accuracy was obtained as 99.29%. There was only 1 false positive.  When multi-class classification was performed, the accuracy was obtained as 98.12%.  The amount of false positives was found as 2. The processing time of the training and testing of the ELM algorithm were quite minimized in terms of data processing compared to the support vector machine and multi-layer perceptron. This result proved that a high classification accuracy was obtained by analysing the CTG data both binary and multiple classification.


2019 ◽  
Vol 47 (3) ◽  
pp. 154-170
Author(s):  
Janani Balakumar ◽  
S. Vijayarani Mohan

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.


2017 ◽  
Vol 26 (1) ◽  
pp. 109-121 ◽  
Author(s):  
Piyabute Fuangkhon

AbstractSerial multi-class contour preserving classification can improve the representation of the contour of the data to improve the levels of classification accuracy for feed-forward neural network (FFNN). The algorithm synthesizes fundamental multi-class outpost vector (FMCOV) and additional multi-class outpost vector (AMCOV) at the decision boundary between consecutive classes of data to narrow the space of data. Both FMCOVs and AMCOVs will assist the FFNN to place the hyper-planes in such a way that can classify the data more accurately. However, the technique was designed to utilize only one processor. As a result, the execution time of the algorithm is significantly long. This article presents an improved version of the serial multi-class contour preserving classification that overcomes its time deficiency by utilizing thread-level parallelism to support parallel computing on multi-processor or multi-core system. The parallel algorithm distributes the data set and the processing of the FMCOV and AMCOV generators to be operated on available threads to increase the CPU utilization and the speedup factors of the FMCOV and AMCOV generators. The technique has been carefully designed to avoid data dependency issue. The experiments were conducted on both synthetic and real-world data sets. The experimental results confirm that the parallel multi-class contour preserving classification clearly outperforms the serial multi-class contour preserving classification in terms of CPU utilization and speedup factor.


2021 ◽  
Vol 11 (1) ◽  
pp. 29-49
Author(s):  
Amit Kumar ◽  
Bikash Kanti Sarkar

Research in disease diagnosis is a challenging task due to inconsistent, class imbalance, conflicting, and the high dimensionality of medical data sets. The excellent features of each data set play an important role in improving performance of classifiers that may follow either iterative or non-iterative approaches. In the present study, a comparative study is carried out to show the performance of iterative and non-iterative classifiers in combination with genetic algorithm (GA)-based feature selection approach over some widely used medical data sets. The experiment assists to identify the clinical data sets for which feature reduction is necessary for improving performance of classifiers. For iterative approaches, two popular classifiers, namely C4.5 and RIPPER, are chosen, whereas k-NN and naïve Bayes are taken as non-iterative learners. Fourteen real-world medical domain data sets are selected from the University of California, Irvine (UCI Repository) for conducting experiment over the learners.


2015 ◽  
Vol 46 (4) ◽  
pp. 138 ◽  
Author(s):  
Roberto Romaniello ◽  
Alessandro Leone ◽  
Giorgio Peri

The aim of this work is to evaluate the potential of least squares support vector machine (LS-SVM) regression to develop an efficient method to measure the colour of food materials in L*a*b* units by means of a computer vision systems (CVS). A laboratory CVS, based on colour digital camera (CDC), was implemented and three LS-SVM models were trained and validated, one for each output variables (L*, a*, and b*) required by this problem, using the RGB signals generated by the CDC as input variables to these models. The colour target-based approach was used to camera characterization and a standard reference target of 242 colour samples was acquired using the CVS and a colorimeter. This data set was split in two sets of equal sizes, for training and validating the LS-SVM models. An effective two-stage grid search process on the parameters space was performed in MATLAB to tune the regularization parameters γ and the kernel parameters σ<sup>2</sup> of the three LS-SVM models. A 3-8-3 multilayer feed-forward neural network (MFNN), according to the research conducted by León <em>et al.</em> (2006), was also trained in order to compare its performance with those of LS-SVM models. The LS-SVM models developed in this research have been shown better generalization capability then the MFNN, allowed to obtain high correlations between L*a*b* data acquired using the colorimeter and the corresponding data obtained by transformation of the RGB data acquired by the CVS. In particular, for the validation set, R<sup>2</sup> values equal to 0.9989, 0.9987, and 0.9994 for L*, a* and b* parameters were obtained. The root mean square error values were 0.6443, 0.3226, and 0.2702 for L*, a*, and b* respectively, and the average of colour differences ΔE<sub>ab</sub> was 0.8232±0.5033 units. Thus, LS-SVM regression seems to be a useful tool to measurement of food colour using a low cost CVS.


2021 ◽  
Vol 5 (1) ◽  
pp. 11-20
Author(s):  
Wahyu Hidayat ◽  
◽  
Mursyid Ardiansyah ◽  
Arief Setyanto ◽  
◽  
...  

Traveling activities are increasingly being carried out by people in the world. Some tourist attractions are difficult to reach hotels because some tourist attractions are far from the city center, Airbnb is a platform that provides home or apartment-based rentals. In lodging offers, there are two types of hosts, namely non-super host and super host. The super-host badge is obtained if the innkeeper has a good reputation and meets the requirements. There are advantages to being a super host such as having more visibility, increased earning potential and exclusive rewards. Support Vector Machine (SVM) algorithm classification process by these criteria data. Data set is unbalanced. The super host population is smaller than the non-super host. Overcoming the imbalance, this over sampling technique is carried out using ADASYN and SMOTE. Research goal was to decide the performance of ADASYN and sampling technique, SVM algorithm. Data analyses used over sampling which aims to handle unbalanced data sets, and confusion matrix used for testing Precision, Recall, and F1-SCORE, and Accuracy. Research shows that SMOTE SVM increases the accuracy rate by 1 percent from 80% to 81%, which is influenced by the increase in the True (minority) label test results and a decrease in the False label test results (majority), the SMOTE SVM is better than ADASYN SVM, and SVM without over sampling.


2020 ◽  
Vol 8 (5) ◽  
pp. 1557-1560

Support vector machine (SVM) is a commonly known efficient supervised learning algorithm for classification problems. However, the classification accuracy of the SVM classifier depends on its training parameters and the training data set as well. The main objective of this paper is to optimize its parameters and feature weighting in order to improve the strength of the SVM simultaneously. In this paper, the Imperialist Competitive Algorithm based Support Vector Machine (ICA-SVM) classifier is proposed to classify the efficient weed detection. This enhanced ICA-SVM classifier is able to select the appropriate input features and to optimize the parameters of SVM and is improving the classification accuracy. Experimental results show that the ICA-SVM classification algorithm reduces the computational complexity tremendously and improves classification Accuracy.


2006 ◽  
Vol 18 (6) ◽  
pp. 1472-1510 ◽  
Author(s):  
Sepp Hochreiter ◽  
Klaus Obermayer

We describe a new technique for the analysis of dyadic data, where two sets of objects (row and column objects) are characterized by a matrix of numerical values that describe their mutual relationships. The new technique, called potential support vector machine (P-SVM), is a large-margin method for the construction of classifiers and regression functions for the column objects. Contrary to standard support vector machine approaches, the P-SVM minimizes a scale-invariant capacity measure and requires a new set of constraints. As a result, the P-SVM method leads to a usually sparse expansion of the classification and regression functions in terms of the row rather than the column objects and can handle data and kernel matrices that are neither positive definite nor square. We then describe two complementary regularization schemes. The first scheme improves generalization performance for classification and regression tasks; the second scheme leads to the selection of a small, informative set of row support objects and can be applied to feature selection. Benchmarks for classification, regression, and feature selection tasks are performed with toy data as well as with several real-world data sets. The results show that the new method is at least competitive with but often performs better than the benchmarked standard methods for standard vectorial as well as true dyadic data sets. In addition, a theoretical justification is provided for the new approach.


2017 ◽  
Vol 26 (2) ◽  
pp. 335-358 ◽  
Author(s):  
Piyabute Fuangkhon

AbstractInstance selection endeavors to decide which instances from the data set should be maintained for further use during the learning process. It can result in increased generalization of the learning model, shorter time of the learning process, or scaling up to large data sources. This paper presents a parallel distance-based instance selection approach for a feed-forward neural network (FFNN), which can utilize all available processing power to reduce the data set while obtaining similar levels of classification accuracy as when the original data set is used. The algorithm identifies the instances at the decision boundary between consecutive classes of data, which are essential for placing hyperplane decision surfaces, and retains these instances in the reduced data set (subset). Each identified instance, called a prototype, is one of the representatives of the decision boundary of its class that constitutes the shape or distribution model of the data set. No feature or dimension is sacrificed in the reduction process. Regarding reduction capability, the algorithm obtains approximately 85% reduction power on non-overlapping two-class synthetic data sets, 70% reduction power on highly overlapping two-class synthetic data sets, and 77% reduction power on multiclass real-world data sets. Regarding generalization, the reduced data sets obtain similar levels of classification accuracy as when the original data set is used on both FFNN and support vector machine. Regarding execution time requirement, the speedup of the parallel algorithm over the serial algorithm is proportional to the number of threads the processor can run concurrently.


2021 ◽  
Author(s):  
Mehrnaz Ahmadi ◽  
Mehdi Khashei

Abstract Support vector machines (SVMs) are one of the most popular and widely-used approaches in modeling. Various kinds of SVM models have been developed in the literature of prediction and classification in order to cover different purposes. Fuzzy and crisp support vector machines are a well-known branch of modeling approaches that frequently applied for certain and uncertain modeling, respectively. However, each of these models can only be efficiently used in its specified domain and cannot yield appropriate and accurate results if the opposite situations have occurred. While the real-world systems and data sets often contain both certain and uncertain patterns that are complicatedly mixed together and need to be simultaneously modeled. In this paper, a generalized support vector machine (GSVM) is proposed that can simultaneously benefit the unique advantages of certain and uncertain versions of the traditional support vector machines in their own specialized categories. In the proposed model, the underlying data set is first categorized into two classes of certain and uncertain patterns. Then, certain patterns are modeled by a support vector machine, and uncertain patterns are modeled by a fuzzy support vector machine. After that, the function of the relationship, as well as the relative importance of each component, are estimated by another support vector machine, and subsequently, the final forecasts of the proposed model are calculated. Empirical results of wind speed forecasting indicate that the proposed method not only can achieve more accurate results than support vector machines (SVMs) and fuzzy support vector machines (FSVMs) but also can yield better forecasting performance than traditional fuzzy and nonfuzzy single models and traditional preprocessing-based hybrid models of SVMs.


Sign in / Sign up

Export Citation Format

Share Document