A Modified Incremental Support Vector Machine for Regression

2011 ◽  
Vol 135-136 ◽  
pp. 63-69 ◽  
Author(s):  
Jian Guo Wang ◽  
Liang Wu Cheng ◽  
Wen Xing Zhang ◽  
Bo Qin

support vector machine (SVM) has been shown to exhibit superior predictive power compared to traditional approaches in many studies, such as mechanical equipment monitoring and diagnosis. However, SVM training is very costly in terms of time and memory consumption due to the enormous amounts of training data and the quadratic programming problem. In order to improve SVM training speed and accuracy, we propose a modified incremental support vector machine (MISVM) for regression problems in this paper. The main concepts are that using the distance from the margin vectors which violate the Karush-Kuhn-Tucker (KKT) condition to the final decision hyperplane to evaluate the importance of each margin vectors, and the margin vectors whose distance is below the specified value are preserved, the others are eliminated. Then the original SVs and the remaining margin vectors are used to train a new SVM. The proposed MISVM can not only eliminate the unimportant samples such as noise samples, but also preserved the important samples. The effectiveness of the proposed MISVMs is demonstrated with two UCI data sets. These experiments also show that the proposed MISVM is competitive with previously published methods.

Author(s):  
Robin C. Gilbert ◽  
Shivakumar Raman ◽  
Theodore B. Trafalis ◽  
Suleiman M. Obeidat ◽  
Juan A. Aguirre-Cruz

Nonlinear forms such as the cone, sphere, cylinder, and torus present significant problems in representation and verification. In this paper we examine linear and nonlinear forms using a heavily modified support vector machine (SVM) technique. The SVM approach applied to regression problems is used to derive quadratic programming problems that allow for generalized symbolic solutions to nonlinear regression. We have tested our approach to several geometries and achieved excellent results even with small data sets, making this method robust and efficient. More importantly, we identify process or inspection tendencies that could help in better designing the processes. Adaptive feature verification can be achieved through effective identification of the manufacturing pattern.


Author(s):  
PAK KIN WONG ◽  
CHI MAN VONG ◽  
CHUN SHUN CHEUNG ◽  
KA IN WONG

To predict the performance of a diesel engine, current practice relies on the use of black-box identification where numerous experiments must be carried out in order to obtain numerical values for model training. Although many diesel engine models based on artificial neural networks (ANNs) have already been developed, they have many drawbacks such as local minima, user burden on selection of optimal network structure, large training data size and poor generalization performance, making themselves difficult to be put into practice. This paper proposes to use extreme learning machine (ELM), which can overcome most of the aforementioned drawbacks, to model the emission characteristics and the brake-specific fuel consumption of the diesel engine under scarce and exponential sample data sets. The resulting ELM model is compared with those developed using popular ANNs such as radial basis function neural network (RBFNN) and advanced techniques such as support vector machine (SVM) and its variants, namely least squares support vector machine (LS-SVM) and relevance vector machine (RVM). Furthermore, some emission outputs of diesel engines suffer from the problem of exponentiality (i.e., the output y grows up exponentially along input x) that will deteriorate the prediction accuracy. A logarithmic transformation is therefore applied to preprocess and post-process the sample data sets in order to improve the prediction accuracy of the model. Evaluation results show that ELM with the logarithmic transformation is better than SVM, LS-SVM, RVM and RBFNN with/without the logarithmic transformation, regardless the model accuracy and training time.


2020 ◽  
Vol 12 (3) ◽  
pp. 516 ◽  
Author(s):  
Anita Sabat-Tomala ◽  
Edwin Raczko ◽  
Bogdan Zagajewski

Invasive and expansive plant species are considered a threat to natural biodiversity because of their high adaptability and low habitat requirements. Species investigated in this research, including Solidago spp., Calamagrostis epigejos, and Rubus spp., are successfully displacing native vegetation and claiming new areas, which in turn severely decreases natural ecosystem richness, as they rapidly encroach on protected areas (e.g., Natura 2000 habitats). Because of the damage caused, the European Union (EU) has committed all its member countries to monitor biodiversity. In this paper we compared two machine learning algorithms, Support Vector Machine (SVM) and Random Forest (RF), to identify Solidago spp., Calamagrostis epigejos, and Rubus spp. on HySpex hyperspectral aerial images. SVM and RF are reliable and well-known classifiers that achieve satisfactory results in the literature. Data sets containing 30, 50, 100, 200, and 300 pixels per class in the training data set were used to train SVM and RF classifiers. The classifications were performed on 430-spectral bands and on the most informative 30 bands extracted using the Minimum Noise Fraction (MNF) transformation. As a result, maps of the spatial distribution of analyzed species were achieved; high accuracies were observed for all data sets and classifiers (an average F1 score above 0.78). The highest accuracies were obtained using 30 MNF bands and 300 sample pixels per class in the training data set (average F1 score > 0.9). Lower training data set sample sizes resulted in decreased average F1 scores, up to 13 percentage points in the case of 30-pixel samples per class.


2019 ◽  
Vol 6 (5) ◽  
pp. 190001 ◽  
Author(s):  
Katherine E. Klug ◽  
Christian M. Jennings ◽  
Nicholas Lytal ◽  
Lingling An ◽  
Jeong-Yeol Yoon

A straightforward method for classifying heavy metal ions in water is proposed using statistical classification and clustering techniques from non-specific microparticle scattering data. A set of carboxylated polystyrene microparticles of sizes 0.91, 0.75 and 0.40 µm was mixed with the solutions of nine heavy metal ions and two control cations, and scattering measurements were collected at two angles optimized for scattering from non-aggregated and aggregated particles. Classification of these observations was conducted and compared among several machine learning techniques, including linear discriminant analysis, support vector machine analysis, K-means clustering and K-medians clustering. This study found the highest classification accuracy using the linear discriminant and support vector machine analysis, each reporting high classification rates for heavy metal ions with respect to the model. This may be attributed to moderate correlation between detection angle and particle size. These classification models provide reasonable discrimination between most ion species, with the highest distinction seen for Pb(II), Cd(II), Ni(II) and Co(II), followed by Fe(II) and Fe(III), potentially due to its known sorption with carboxyl groups. The support vector machine analysis was also applied to three different mixture solutions representing leaching from pipes and mine tailings, and showed good correlation with single-species data, specifically with Pb(II) and Ni(II). With more expansive training data and further processing, this method shows promise for low-cost and portable heavy metal identification and sensing.


2021 ◽  
Vol 15 ◽  
Author(s):  
Justine Staal ◽  
Francesco Mattace-Raso ◽  
Hennie A. M. Daniels ◽  
Johannes van der Steen ◽  
Johan J. M. Pel

BackgroundResearch into Alzheimer’s disease has shifted toward the identification of minimally invasive and less time-consuming modalities to define preclinical stages of Alzheimer’s disease.MethodHere, we propose visuomotor network dysfunctions as a potential biomarker in AD and its prodromal stage, mild cognitive impairment with underlying the Alzheimer’s disease pathology. The functionality of this network was tested in terms of timing, accuracy, and speed with goal-directed eye-hand tasks. The predictive power was determined by comparing the classification performance of a zero-rule algorithm (baseline), a decision tree, a support vector machine, and a neural network using functional parameters to classify controls without cognitive disorders, mild cognitive impaired patients, and Alzheimer’s disease patients.ResultsFair to good classification was achieved between controls and patients, controls and mild cognitive impaired patients, and between controls and Alzheimer’s disease patients with the support vector machine (77–82% accuracy, 57–93% sensitivity, 63–90% specificity, 0.74–0.78 area under the curve). Classification between mild cognitive impaired patients and Alzheimer’s disease patients was poor, as no algorithm outperformed the baseline (63% accuracy, 0% sensitivity, 100% specificity, 0.50 area under the curve).Comparison with Existing Method(s)The classification performance found in the present study is comparable to that of the existing CSF and MRI biomarkers.ConclusionThe data suggest that visuomotor network dysfunctions have potential in biomarker research and the proposed eye-hand tasks could add to existing tests to form a clear definition of the preclinical phenotype of AD.


2019 ◽  
Vol 11 (2) ◽  
pp. 144
Author(s):  
Danar Wido Seno ◽  
Arief Wibowo

Social media writing content growing make a lot of new words that appear on Twitter in the form of words and abbreviations that appear so that sentiment analysis is increasingly difficult to get high accuracy of textual data on Twitter social media. In this study, the authors conducted research on sentiment analysis of the pairs of candidates for President and Vice President of Indonesia in the 2019 Elections. To obtain higher accuracy results and accommodate the problem of textual data development on Twitter, the authors conducted a combination of methods to conduct the sentiment analysis with unsupervised and supervised methods. namely Lexicon Based. This study used Twitter data in October 2018 using the search keywords with the names of each pair of candidates for President and Vice President of the 2019 Elections totaling 800 datasets. From the study with 800 datasets the best accuracy was obtained with a value of 92.5% with 80% training data composition and 20% testing data with a Precision value in each class between 85.7% - 97.2% and Recall value for each class among 78, 2% - 93.5%. With the Lexicon Based method as a labeling dataset, the process of labeling the Support Vector Machine dataset is no longer done manually but is processed by the Lexicon Based method and the dictionary on the lexicon can be added along with the development of data content on Twitter social media.


2020 ◽  
Author(s):  
Harith Al-Sahaf ◽  
Mengjie Zhang ◽  
M Johnston

In machine learning, it is common to require a large number of instances to train a model for classification. In many cases, it is hard or expensive to acquire a large number of instances. In this paper, we propose a novel genetic programming (GP) based method to the problem of automatic image classification via adopting a one-shot learning approach. The proposed method relies on the combination of GP and Local Binary Patterns (LBP) techniques to detect a predefined number of informative regions that aim at maximising the between-class scatter and minimising the within-class scatter. Moreover, the proposed method uses only two instances of each class to evolve a classifier. To test the effectiveness of the proposed method, four different texture data sets are used and the performance is compared against two other GP-based methods namely Conventional GP and Two-tier GP. The experiments revealed that the proposed method outperforms these two methods on all the data sets. Moreover, a better performance has been achieved by Naïve Bayes, Support Vector Machine, and Decision Trees (J48) methods when extracted features by the proposed method have been used compared to the use of domain-specific and Two-tier GP extracted features. © Springer International Publishing 2013.


2021 ◽  
Vol 5 (11) ◽  
pp. 303
Author(s):  
Kian K. Sepahvand

Damage detection, using vibrational properties, such as eigenfrequencies, is an efficient and straightforward method for detecting damage in structures, components, and machines. The method, however, is very inefficient when the values of the natural frequencies of damaged and undamaged specimens exhibit slight differences. This is particularly the case with lightweight structures, such as fiber-reinforced composites. The nonlinear support vector machine (SVM) provides enhanced results under such conditions by transforming the original features into a new space or applying a kernel trick. In this work, the natural frequencies of damaged and undamaged components are used for classification, employing the nonlinear SVM. The proposed methodology assumes that the frequencies are identified sequentially from an experimental modal analysis; for the study propose, however, the training data are generated from the FEM simulations for damaged and undamaged samples. It is shown that nonlinear SVM using kernel function yields in a clear classification boundary between damaged and undamaged specimens, even for minor variations in natural frequencies.


Author(s):  
Noviah Dwi Putranti ◽  
Edi Winarko

AbstrakAnalisis sentimen dalam penelitian ini merupakan proses klasifikasi dokumen tekstual ke dalam dua kelas, yaitu kelas sentimen positif dan negatif.  Data opini diperoleh dari jejaring sosial Twitter berdasarkan query dalam Bahasa Indonesia. Penelitian ini bertujuan untuk menentukan sentimen publik terhadap objek tertentu yang disampaikan di Twitter dalam bahasa Indonesia, sehingga membantu usaha untuk melakukan riset pasar atas opini publik. Data yang sudah terkumpul dilakukan proses preprocessing dan POS tagger untuk menghasilkan model klasifikasi melalui proses pelatihan. Teknik pengumpulan kata yang memiliki sentimen dilakukan dengan pendekatan berdasarkan kamus, yang dihasilkan dalam penelitian ini berjumlah 18.069 kata. Algoritma Maximum Entropy digunakan untuk POS tagger dan algoritma yang digunakan untuk membangun model klasifikasi atas data pelatihan dalam penelitian ini adalah Support Vector Machine. Fitur yang digunakan adalah unigram dengan fitur pembobotan TFIDF. Implementasi klasifikasi diperoleh akurasi 86,81 %  pada pengujian 7 fold cross validation untuk tipe kernel Sigmoid. Pelabelan kelas secara manual dengan POS tagger menghasilkan akurasi 81,67%.  Kata kunci—analisis sentimen, klasifikasi, maximum entropy POS tagger, support vector machine, twitter.  AbstractSentiment analysis in this research classified textual documents into two classes, positive and negative sentiment. Opinion data obtained a query from social networking site Twitter of Indonesian tweet. This research uses  Indonesian tweets. This study aims to determine public sentiment toward a particular object presented in Twitter businesses conduct market. Collected data then prepocessed to help POS tagged to generate classification models through the training process. Sentiment word collection has done the dictionary based approach, which is generated in this study consists 18.069 words. Maximum Entropy algorithm is used for POS tagger and the algorithms used to build the classification model on the training data is Support Vector Machine. The unigram features used are the features of TFIDF weighting.Classification implementation 86,81 % accuration at examination of 7 validation cross fold for the type of kernel of Sigmoid. Class labeling manually with POS tagger yield accuration 81,67 %. Keywords—sentiment analysis, classification, maximum entropy POS tagger, support vector machine, twitter.


2017 ◽  
Vol 9 (4) ◽  
pp. 416 ◽  
Author(s):  
Nelly Indriani Widiastuti ◽  
Ednawati Rainarli ◽  
Kania Evita Dewi

Classification is the process of grouping objects that have the same features or characteristics into several classes. The automatic documents classification use words frequency that appears on training data as features. The large number of documents cause the number of words that appears as a feature will increase. Therefore, summaries are chosen to reduce the number of words that used in classification. The classification uses multiclass Support Vector Machine (SVM) method. SVM was considered to have a good reputation in the classification. This research tests the effect of summary as selection features into documents classification. The summaries reduce text into 50%. A result obtained that the summaries did not affect value accuracy of classification of documents that use SVM. But, summaries improve the accuracy of Simple Logistic Classifier. The classification testing shows that the accuracy of Naïve Bayes Multinomial (NBM) better than SVM


Sign in / Sign up

Export Citation Format

Share Document