scholarly journals Optimization of Support Vector Machine Method Using Feature Selection to Improve Classification Results

2021 ◽  
Vol 4 (1) ◽  
pp. 22-27
Author(s):  
Saikin Saikin ◽  
◽  
Sofiansyah Fadli ◽  
Maulana Ashari ◽  
◽  
...  

The performance of the organizations or companiesare based on the qualities possessed by their employee. Both of good or bad employee performance will have an impact on productivity and the impact of profits obtained by the company. Support Vector Machine (SVM) is a machine learning method based on statistical learning theory and can solve high non-linearity, regression, etc. In machine learning, the optimization model is a part for improving the accuracy of the model for data learning. Several techniques are used, one of which is feature selection, namely reducing data dimensions so that it can reduce computation in data modeling. This study aims to apply the method of machine learning to the employee data of the Bank Rakyat Indonesia (BRI) company. The method used is SVM method by increasing the accuracy of learning data by using a feature selection technique using a wrapper algorithm. From the results of the classification test, the average accuracy obtained is 72 percent with a precision value of 71 and the recall value is rounded off to 72 percent, with a combination of SVM and cross-validation. Data obtained from Kaggle data, which consists of training data and testing data. each consisting of 30 columns and 22005 rows in the training data and testing data consisting of 29 col-umns and 6000 rows. The results of this study get a classification score of 82 percent. The precision value obtained is rounded off to 82 percent, a recall of 86 percent and an f1-score of 81 percent.

2020 ◽  
Vol 8 (6) ◽  
pp. 2862-2867

E-commerce is a website or mobile application platform that help people to buy products. Before purchasing the product, customer will decide to buy it or not by reading the review from previous buyer. There is a problem that there are a lot of review so it will take a long time for customer to read it all. This research will be using sentiment analysis method to classify the review data. Sentiment analysis or opinion mining is a machine learning approach to classify and analyse texts or documents about human’s sentiments, emotions, and opinions. In this research, sentiment analysis was used to classify product reviews from e-commerce websites into positive or negative classes. The results could be processed further and be used to summarize customers' opinions about a certain product without reading every single review. The goal of this research is to optimize classification performance by using feature selection technique. Terms Frequency-Inverse Document Frequency (TF-IDF) feature extraction, Backward Elimination feature selection, and five different classifiers (Naïve Bayes, Support Vector Machine, K-Nearest Neighbour, Decision Tree, Random Forest) were used in analysing the sentiment of the reviews. In this research, the dataset used are Indonesian language and classified into two classes(positive and negative). The best accuracy is achieved by using TF-IDF, Backward Elimination and Support Vector Machine (SVM) with a score of 85.97%, which increases by 7.91% if compared to the process without feature selection. Based on the results, Backward Elimination feature selection succeeded in improving all performance for all classifiers used in this research.


2016 ◽  
Vol 78 (5-10) ◽  
Author(s):  
Farzana Kabir Ahmad ◽  
Abdullah Yousef Awwad Al-Qammaz ◽  
Yuhanis Yusof

Human-computer intelligent interaction (HCII) is a rising field of science that aims to refine and enhance the interaction between computer and human. Since emotion plays a vital role in human daily life, the ability of computer to interpret and response to human emotion is a crucial element for future intelligent system. Accordingly, several studies have been conducted to recognise human emotion using different technique such as facial expression, speech, galvanic skin response (GSR), or heart rate (HR). However, such techniques have problems mainly in terms of credibility and reliability as people can fake their feeling and response. Electroencephalogram (EEG) on the other has shown to be a very effective way in recognising human emotion as this technique records the brain activity of human and they can hardly be deceived by voluntary control. Regardless the popularity of EEG in recognizing human emotion, this study field is relatively challenging as EEG signal is nonlinear, involves myriad factors and chaotic in nature. These issues have led to high dimensional problem and poor classification results. To address such problems, this study has proposed a novel computational model, which consist of three main stages, namely a) feature extraction; b) feature selection and c) classifier. Discrete wavelet packet transform (DWPT) has been used to extract EEG signals feature and ultimately 204,800 features from 32 subject-independent have been obtained. Meanwhile, Genetic Algorithm (GA) and Least squares support vector machine (LS-SVM) have been used as a feature selection technique and classifier respectively. This computational model is tested on the common DEAP pre-processed EEG dataset in order to classify three levels of valence and arousal. The empirical results have shown that the proposed GA-LSSVM, has improved the classification results to 49.22% and 54.83% for valence and arousal respectively, whereas is it observed that 46.33% of valence and 48.30% of arousal classification were achieved when no feature selection technique is applied on the identical classifier


2020 ◽  
Vol 2020 ◽  
pp. 1-12
Author(s):  
A. M. Umbrajkaar ◽  
A. Krishnamoorthy ◽  
R. B. Dhumale

The Industry 4.0 revolution is insisting strongly for use of machine learning-based processes and condition monitoring. In this paper, emphasis is given on machine learning-based approach for condition monitoring of shaft misalignment. This work highlights combined approach of artificial neural network and support vector machine for identification and measure of shaft misalignment. The measure of misalignment requires more features to be extracted under variable load conditions. Hence, primary objective is to measure misalignment with a minimum number of extracted features. This is achieved through normalization of vibration signal. An experimental setup is prepared to collect the required vibration signals. The normalized time domain nonstationary signals are given to discrete wavelet transform for features extraction. The extracted features such as detailed coefficient is considered for feature selection viz. Skewness, Kurtosis, Max, Min, Root mean square, and Entropy. The ReliefF algorithm is used to decide best feature on rank basis. The ratio of maximum energy to Shannon entropy is used in wavelet selection. The best feature is used to train machine learning algorithm. The rank-based feature selection has improved classification accuracy of support vector machine. The result obtained with the combined approach are discussed for different misalignment conditions.


Author(s):  
Wahyu Caesarendra

This paper presents the EMG signal classification based on PCA and SVM method. The data is acquired from the 5 subjects and each subject perform 7 hand gestures includes the tripod, power, precision closed, finger point, mouse, hand open, and hand close. Each gesture is repeated 10 times (5 data as training data and the 5 remaining data as testing data). Each of training and testing data are processed using 16 features extraction in time–domain and reduced using principal component analysis (PCA) to obtain new set of features. Features classification using support vector machine classify new set of features from each subject result 85% - 89% percentage of training classification. Training data classification is tested using testing data of EMG signals and giving accuracy reach 80% - 86%.


2019 ◽  
Vol 10 (1) ◽  
pp. 47-54
Author(s):  
Abdullah Jafari Chashmi ◽  
Mehdi Chehel Amirani

Abstract Primary recognition of heart diseases by exploiting computer aided diagnosis (CAD) machines, decreases the vast rate of fatality among cardiac patients. Recognition of heart abnormalities is a staggering task because the low changes in ECG signals may not be exactly specified with eyesight. In this paper, an efficient approach for ECG arrhythmia diagnosis is proposed based on a combination of discrete wavelet transform and higher order statistics feature extraction and entropy based feature selection methods. Using the neural network and support vector machine, five classes of heartbeat categories are classified. Applying the neural network and support vector machine method, our proposed system is able to classify the arrhythmia classes with high accuracy (99.83%) and (99.03%), respectively. The advantage of the presented procedure has been experimentally demonstrated compared to the other recently presented methods in terms of accuracy.


2021 ◽  
Author(s):  
Qifei Zhao ◽  
Xiaojun Li ◽  
Yunning Cao ◽  
Zhikun Li ◽  
Jixin Fan

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.


Author(s):  
Nguyen Thi Ngoc Anh ◽  
Nguyen Danh Tu ◽  
Vijender Kumar Solanki ◽  
Nguyen Linh Giang ◽  
Vu Hoai Thu ◽  
...  

Background: In recent years, human resource management is a crucial role in every companies or organization’s operation. Loyalty employee or Churn employee influence the operation of the organization. The impact of Churn employees is difference because of their role in organization. Objective: Thus, we define two Employee Value Models (EVMs) of organizations or companies based on employee features that are popular of almost companies. Methods: Meanwhile, with the development of Artificial intelligent, machine learning is possible to give predict data-based models having high accuracy.Thus, integrating Churn prediction, EVM and machine learning such as support vector machine, logistic regression, random forest is proposed in this paper. The strong points of each model are used and weak points are reduced to help the companies or organizations avoid high value employee leaving in the future. The process of prediction integrating Churn, value of employee and machine learning are described detail in 6 steps. The pros of integrating model gives the more necessary results for company than Churn prediction model but the cons is complexity of model and algorithms and speed of computing. Results: A case study of an organization with 1470 employee positions is carried out to demonstrate the whole integrating churn predict, EVM and machine learning process. The accuracy of the integrating model is high from 82% to 85%. Moreover, the some results of Churn and value employee are analyzed. Conclusion: This paper is proposing upgrade models for predicting an employee who may leave an organization and integration of two models including employee value model and Churn prediction is feasible.


Repositor ◽  
2019 ◽  
Vol 1 (1) ◽  
pp. 1
Author(s):  
Hendra Saputra ◽  
Setio Basuki ◽  
Mahar Faiqurahman

AbstrakPertumbuhan Malware Android telah meningkat secara signifikan seiring dengan majunya jaman dan meninggkatnya keragaman teknik dalam pengembangan Android. Teknik Machine Learning adalah metode yang saat ini bisa kita gunakan dalam memodelkan pola fitur statis dan dinamis dari Malware Android. Dalam tingkat keakurasian dari klasifikasi jenis Malware peneliti menghubungkan antara fitur aplikasi dengan fitur yang dibutuhkan dari setiap jenis kategori Malware. Kategori jenis Malware yang digunakan merupakan jenis Malware yang banyak beredar saat ini. Untuk mengklasifikasi jenis Malware pada penelitian ini digunakan Support Vector Machine (SVM). Jenis SVM yang akan digunakan adalah class SVM one against one menggunakan Kernel RBF. Fitur yang akan dipakai dalam klasifikasi ini adalah Permission dan Broadcast Receiver. Untuk meningkatkan akurasi dari hasil klasifikasi pada penelitian ini digunakan metode Seleksi Fitur. Seleksi Fitur yang digunakan ialah Correlation-based Feature  Selection (CSF), Gain Ratio (GR) dan Chi-Square (CHI). Hasil dari Seleksi Fitur akan di evaluasi bersama dengan hasil yang tidak menggunakan Seleksi Fitur. Akurasi klasifikasi Seleksi Fitur CFS menghasilkan akurasi sebesar 90.83% , GR dan CHI sebesar 91.25% dan data yang tidak menggunakan Seleksi Fitur sebesar 91.67%. Hasil dari pengujian menunjukan bahwa Permission dan Broadcast Receiver bisa digunakan dalam mengklasifikasi jenis Malware, akan tetapi metode Seleksi Fitur yang digunakan mempunyai akurasi yang berada sedikit dibawah data yang tidak menggunakan Seleksi Fitur. Kata kunci: klasifikasi malware android, seleksi fitur, SVM dan multi class SVM one agains one  Abstract Android Malware has growth significantly along with the advance of the times and the increasing variety of technique in the development of Android. Machine Learning technique is a method that now we can use in the modeling the pattern of a static and dynamic feature of Android Malware. In the level of accuracy of the Malware type classification, the researcher connect between the application feature with the feature required by each types of Malware category. The category of malware used is a type of Malware that many circulating today, to classify the type of Malware in this study used Support Vector Machine (SVM). The SVM type wiil be used is class SVM one against one using the RBF Kernel. The feature will be used in this classification are the Permission and Broadcast Receiver.  To improve the accuracy of the classification result in this study used Feature Selection method. Selection of feature used are Correlation-based Feature Selection (CFS), Gain Ratio (GR) and Chi-Square (CHI). Result from Feature Selection will be evaluated together with result that not use Feature Selection. Accuracy Classification Feature Selection CFS result accuracy of 90.83%, GR and CHI of 91.25% and data that not use Feature Selection of 91.67%. The result of testing indicate that permission and broadcast receiver can be used in classyfing type of Malware, but the Feature Selection method that used have accuracy is a little below the data that are not using Feature Selection. Keywords: Classification Android Malware, Feature Selection, SVM and Multi Class SVM one against one


2019 ◽  
Vol 11 (21) ◽  
pp. 2548
Author(s):  
Dong Luo ◽  
Douglas G. Goodin ◽  
Marcellus M. Caldas

Disasters are an unpredictable way to change land use and land cover. Improving the accuracy of mapping a disaster area at different time is an essential step to analyze the relationship between human activity and environment. The goals of this study were to test the performance of different processing procedures and examine the effect of adding normalized difference vegetation index (NDVI) as an additional classification feature for mapping land cover changes due to a disaster. Using Landsat ETM+ and OLI images of the Bento Rodrigues mine tailing disaster area, we created two datasets, one with six bands, and the other one with six bands plus the NDVI. We used support vector machine (SVM) and decision tree (DT) algorithms to build classifier models and validated models performance using 10-fold cross-validation, resulting in accuracies higher than 90%. The processed results indicated that the accuracy could reach or exceed 80%, and the support vector machine had a better performance than the decision tree. We also calculated each land cover type’s sensitivity (true positive rate) and found that Agriculture, Forest and Mine sites had higher values but Bareland and Water had lower values. Then, we visualized land cover maps in 2000 and 2017 and found out the Mine sites areas have been expanded about twice of the size, but Forest decreased 12.43%. Our findings showed that it is feasible to create a training data pool and use machine learning algorithms to classify a different year’s Landsat products and NDVI can improve the vegetation covered land classification. Furthermore, this approach can provide a venue to analyze land pattern change in a disaster area over time.


Sign in / Sign up

Export Citation Format

Share Document