scholarly journals Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb

2021 ◽  
Vol 5 (1) ◽  
pp. 11-20
Author(s):  
Wahyu Hidayat ◽  
◽  
Mursyid Ardiansyah ◽  
Arief Setyanto ◽  
◽  
...  

Traveling activities are increasingly being carried out by people in the world. Some tourist attractions are difficult to reach hotels because some tourist attractions are far from the city center, Airbnb is a platform that provides home or apartment-based rentals. In lodging offers, there are two types of hosts, namely non-super host and super host. The super-host badge is obtained if the innkeeper has a good reputation and meets the requirements. There are advantages to being a super host such as having more visibility, increased earning potential and exclusive rewards. Support Vector Machine (SVM) algorithm classification process by these criteria data. Data set is unbalanced. The super host population is smaller than the non-super host. Overcoming the imbalance, this over sampling technique is carried out using ADASYN and SMOTE. Research goal was to decide the performance of ADASYN and sampling technique, SVM algorithm. Data analyses used over sampling which aims to handle unbalanced data sets, and confusion matrix used for testing Precision, Recall, and F1-SCORE, and Accuracy. Research shows that SMOTE SVM increases the accuracy rate by 1 percent from 80% to 81%, which is influenced by the increase in the True (minority) label test results and a decrease in the False label test results (majority), the SMOTE SVM is better than ADASYN SVM, and SVM without over sampling.

2019 ◽  
Vol 47 (3) ◽  
pp. 154-170
Author(s):  
Janani Balakumar ◽  
S. Vijayarani Mohan

Purpose Owing to the huge volume of documents available on the internet, text classification becomes a necessary task to handle these documents. To achieve optimal text classification results, feature selection, an important stage, is used to curtail the dimensionality of text documents by choosing suitable features. The main purpose of this research work is to classify the personal computer documents based on their content. Design/methodology/approach This paper proposes a new algorithm for feature selection based on artificial bee colony (ABCFS) to enhance the text classification accuracy. The proposed algorithm (ABCFS) is scrutinized with the real and benchmark data sets, which is contrary to the other existing feature selection approaches such as information gain and χ2 statistic. To justify the efficiency of the proposed algorithm, the support vector machine (SVM) and improved SVM classifier are used in this paper. Findings The experiment was conducted on real and benchmark data sets. The real data set was collected in the form of documents that were stored in the personal computer, and the benchmark data set was collected from Reuters and 20 Newsgroups corpus. The results prove the performance of the proposed feature selection algorithm by enhancing the text document classification accuracy. Originality/value This paper proposes a new ABCFS algorithm for feature selection, evaluates the efficiency of the ABCFS algorithm and improves the support vector machine. In this paper, the ABCFS algorithm is used to select the features from text (unstructured) documents. Although, there is no text feature selection algorithm in the existing work, the ABCFS algorithm is used to select the data (structured) features. The proposed algorithm will classify the documents automatically based on their content.


2017 ◽  
Vol 26 (2) ◽  
pp. 323-334 ◽  
Author(s):  
Piyabute Fuangkhon

AbstractMulticlass contour-preserving classification (MCOV) has been used to preserve the contour of the data set and improve the classification accuracy of a feed-forward neural network. It synthesizes two types of new instances, called fundamental multiclass outpost vector (FMCOV) and additional multiclass outpost vector (AMCOV), in the middle of the decision boundary between consecutive classes of data. This paper presents a comparison on the generalization of an inclusion of FMCOVs, AMCOVs, and both MCOVs on the final training sets with support vector machine (SVM). The experiments were carried out using MATLAB R2015a and LIBSVM v3.20 on seven types of the final training sets generated from each of the synthetic and real-world data sets from the University of California Irvine machine learning repository and the ELENA project. The experimental results confirm that an inclusion of FMCOVs on the final training sets having raw data can improve the SVM classification accuracy significantly.


2021 ◽  
Vol 9 (4) ◽  
pp. 467
Author(s):  
Putu Agus Prawira Dharma Yuda ◽  
I Putu Gede Hendra Suputra

The development of the internet is so significant, if we look at the growth of the internet in the world, it has reached more than 4 billion and in Indonesia, there are more than 171 million users out of a total population of more than 273 million people. This is due to the very fast development of information technology and various kinds of media and functions. However, of the advances in internet technology, it did not escape the existing internet attacks. One of them is phishing. Phishing is a form of activity that threatens or traps someone with the concept of luring that person. Namely by tricking someone so that the person indirectly provides all the information the trapper needs. Phishing is included in cybercrime, where crime is rampant through computer networks. Along with the times, crime is also increasingly widespread throughout the world. So that the threats that are happening today are also via computers. With such cases, this study aims to predict phishing sites with a classification algorithm. One of them is by using the SVM (Support Vector Machine) Algorithm. This research was conducted by classifying the phishing website data set and then calculating the accuracy for each kernel. From the study, the results are SVM with Gaussian RBF has the best performance with 88.92% accuracy, and SVM with Sigmoid kernel has the worst performance with 79.33% accuracy.


Author(s):  
Christ Memory Sitorus ◽  
Adhi Rizal ◽  
Mohamad Jajuli

The ride-hailing service is now booming because it has been helped by internet technology, therefore many call this service online transportation. The magnitude of the potential for growth in online transportation service users also increases the risk of user satisfaction which could have declined therefore the company is increasing in its service. Both in terms of application and services provided by partners/drivers of the company. During each trip, the online transportation application will record device movement data and send it to the server. This data set is usually called telematic data. This telematics data if processed can have enormous benefits. In this study, an analysis will be conducted to predict the risk of online transportation trips using the Support Vector Machine (SVM) algorithm based on the obtained telematic data. The data obtained is telematic data so it must be processed first using feature engineering to obtain 51 features, then trained using the SVM algorithm with RBF kernel and modified C values. Every C value that is changed will be used K-Fold cross-validation first to separate the testing data and training data. The specified k value is 5. The results for each trial obtained accuracy, Receiver Operating Characteristic (ROC) and Area Under the Curves (AUC), for the best that is at C = 100 while the worst at C = 0.001.


2021 ◽  
Vol 5 (2) ◽  
pp. 475
Author(s):  
Ade Clinton Sitepu ◽  
Wanayumini Wanayumini ◽  
Zakarias Situmorang

Cyberbullying is the same as bullying but it is done through media technology. Bullying has often occurred along with the development of social media technology in society. Some technique are needed to filter out bully comments because it will indirectly affect the psychological condition of the reader, morover it is aimed at the person concerned. By using data mining techniques, the system is expected to be able to classify information circulating in the community. This research uses the Support Vector Machine (SVM) classification because the algorithm is good at performing the classification process. Research using about 1000 dataset comments. Data are grouped manually first into the labels "bully" and "not bully" then the data divide into training data and test data. To test the system capability, data is analyzed using confusion matrix. The results showed that the SVM Algorithm was able to classify with an level of accuracy 87.75%, 89% precision and 91% Recal. The SVM algorithm is able to formulate training data with level of accuracy 98.3%


Author(s):  
BING-YU SUN ◽  
DE-SHUANG HUANG ◽  
HAI-TAO FANG ◽  
XING-MING YANG

Lidar is an active remote sensing instrument, but its effective range is often limited by signal-to-noise (SNR) ratio. The reason is that noises or fluctuations always strongly affect the measured results. To resolve this problem, a novel approach of using least-squares support vector machine (LS-SVM) to reconstruct the Lidar signal is proposed in this paper. LS-SVM has been proven as robust to noisy data; the Lidar signal, which is strongly corrupted by noises or fluctuations, can be thought as a function of distance. So detecting Lidar signals from high noisy regime can be regarded as a robust regression procedure which involves estimating the underlying relationship from detected signal data set. To apply the LS-SVM on Lidar signal regression, firstly the noises in Lidar signal is analyzed and then the traditional LS-SVM algorithm is modified to incorporate the a priori knowledge of the Lidar signal in the training of LS-SVM. The experimental results demonstrate the effectiveness and efficiency of our approach.


2021 ◽  
Author(s):  
Mehrnaz Ahmadi ◽  
Mehdi Khashei

Abstract Support vector machines (SVMs) are one of the most popular and widely-used approaches in modeling. Various kinds of SVM models have been developed in the literature of prediction and classification in order to cover different purposes. Fuzzy and crisp support vector machines are a well-known branch of modeling approaches that frequently applied for certain and uncertain modeling, respectively. However, each of these models can only be efficiently used in its specified domain and cannot yield appropriate and accurate results if the opposite situations have occurred. While the real-world systems and data sets often contain both certain and uncertain patterns that are complicatedly mixed together and need to be simultaneously modeled. In this paper, a generalized support vector machine (GSVM) is proposed that can simultaneously benefit the unique advantages of certain and uncertain versions of the traditional support vector machines in their own specialized categories. In the proposed model, the underlying data set is first categorized into two classes of certain and uncertain patterns. Then, certain patterns are modeled by a support vector machine, and uncertain patterns are modeled by a fuzzy support vector machine. After that, the function of the relationship, as well as the relative importance of each component, are estimated by another support vector machine, and subsequently, the final forecasts of the proposed model are calculated. Empirical results of wind speed forecasting indicate that the proposed method not only can achieve more accurate results than support vector machines (SVMs) and fuzzy support vector machines (FSVMs) but also can yield better forecasting performance than traditional fuzzy and nonfuzzy single models and traditional preprocessing-based hybrid models of SVMs.


2022 ◽  
Vol 12 (1) ◽  
Author(s):  
Yusuf Essam ◽  
Yuk Feng Huang ◽  
Ahmed H. Birima ◽  
Ali Najah Ahmed ◽  
Ahmed El-Shafie

AbstractHigh loads of suspended sediments in rivers are known to cause detrimental effects to potable water sources, river water quality, irrigation activities, and dam or reservoir operations. For this reason, the study of suspended sediment load (SSL) prediction is important for monitoring and damage mitigation purposes. The present study tests and develops machine learning (ML) models, based on the support vector machine (SVM), artificial neural network (ANN) and long short-term memory (LSTM) algorithms, to predict SSL based on 11 different river data sets comprising of streamflow (SF) and SSL data obtained from the Malaysian Department of Irrigation and Drainage. The main objective of the present study is to propose a single model that is capable of accurately predicting SSLs for any river data set within Peninsular Malaysia. The ANN3 model, based on the ANN algorithm and input scenario 3 (inputs consisting of current-day SF, previous-day SF, and previous-day SSL), is determined as the best model in the present study as it produced the best predictive performance for 5 out of 11 of the tested data sets and obtained the highest average RM with a score of 2.64 when compared to the other tested models, indicating that it has the highest reliability to produce relatively high-accuracy SSL predictions for different data sets. Therefore, the ANN3 model is proposed as a universal model for the prediction of SSL within Peninsular Malaysia.


2020 ◽  
Vol 12 (3) ◽  
pp. 516 ◽  
Author(s):  
Anita Sabat-Tomala ◽  
Edwin Raczko ◽  
Bogdan Zagajewski

Invasive and expansive plant species are considered a threat to natural biodiversity because of their high adaptability and low habitat requirements. Species investigated in this research, including Solidago spp., Calamagrostis epigejos, and Rubus spp., are successfully displacing native vegetation and claiming new areas, which in turn severely decreases natural ecosystem richness, as they rapidly encroach on protected areas (e.g., Natura 2000 habitats). Because of the damage caused, the European Union (EU) has committed all its member countries to monitor biodiversity. In this paper we compared two machine learning algorithms, Support Vector Machine (SVM) and Random Forest (RF), to identify Solidago spp., Calamagrostis epigejos, and Rubus spp. on HySpex hyperspectral aerial images. SVM and RF are reliable and well-known classifiers that achieve satisfactory results in the literature. Data sets containing 30, 50, 100, 200, and 300 pixels per class in the training data set were used to train SVM and RF classifiers. The classifications were performed on 430-spectral bands and on the most informative 30 bands extracted using the Minimum Noise Fraction (MNF) transformation. As a result, maps of the spatial distribution of analyzed species were achieved; high accuracies were observed for all data sets and classifiers (an average F1 score above 0.78). The highest accuracies were obtained using 30 MNF bands and 300 sample pixels per class in the training data set (average F1 score > 0.9). Lower training data set sample sizes resulted in decreased average F1 scores, up to 13 percentage points in the case of 30-pixel samples per class.


2019 ◽  
Vol 7 (4) ◽  
pp. 166-171
Author(s):  
Hapsoro Agung Nugroho ◽  
Haryas Subyantara Wicaksana

The threat of earthquake calamity spread throughout most of the Indonesian archipelago. Smartphone’s accelerometer usage as a seismic parameter detector in Indonesia, of which the noise has obstacles, mainly due to human activities. This study aims to classify linear acceleration signals caused by human activity and earthquake acceleration signals as an initial effort to reduce noise caused by human activity in the smartphone’s accelerometer signals. Both signals are classified by using the Support Vector Machine (SVM) algorithm of which consists of several steps, respectively, data collection, data preprocessing, data segmentation, feature extraction, and classification. These algorithms are tested to 2545 human activity signals in trouser pocket, 2430 human activity signals in shirt pocket and earthquake acceleration signals. Based on the test results by using the confusion matrix, linear acceleration signal data caused by human activity and earthquake acceleration signals can be classified properly using an SVM algorithm with Polynomial or Gaussian kernel with a small kernel scale value. The algorithms can achieve an accuracy of 87.74% to 97.94%.


Sign in / Sign up

Export Citation Format

Share Document