Prediksi Risiko Perjalanan Transportasi Online Dari Data Telematik Menggunakan Algoritma Support Vector Machine

The ride-hailing service is now booming because it has been helped by internet technology, therefore many call this service online transportation. The magnitude of the potential for growth in online transportation service users also increases the risk of user satisfaction which could have declined therefore the company is increasing in its service. Both in terms of application and services provided by partners/drivers of the company. During each trip, the online transportation application will record device movement data and send it to the server. This data set is usually called telematic data. This telematics data if processed can have enormous benefits. In this study, an analysis will be conducted to predict the risk of online transportation trips using the Support Vector Machine (SVM) algorithm based on the obtained telematic data. The data obtained is telematic data so it must be processed first using feature engineering to obtain 51 features, then trained using the SVM algorithm with RBF kernel and modified C values. Every C value that is changed will be used K-Fold cross-validation first to separate the testing data and training data. The specified k value is 5. The results for each trial obtained accuracy, Receiver Operating Characteristic (ROC) and Area Under the Curves (AUC), for the best that is at C = 100 while the worst at C = 0.001.

Download Full-text

Implementation of the Support Vector Machine (SVM) Algorithm in Classifying Website Phishing

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v09.i04.p03 ◽

2021 ◽

Vol 9 (4) ◽

pp. 467

Author(s):

Putu Agus Prawira Dharma Yuda ◽

I Putu Gede Hendra Suputra

Keyword(s):

Support Vector Machine ◽

Total Population ◽

Internet Technology ◽

Support Vector ◽

The Internet ◽

Data Set ◽

Svm Algorithm ◽

The World ◽

Fast Development ◽

The Times

The development of the internet is so significant, if we look at the growth of the internet in the world, it has reached more than 4 billion and in Indonesia, there are more than 171 million users out of a total population of more than 273 million people. This is due to the very fast development of information technology and various kinds of media and functions. However, of the advances in internet technology, it did not escape the existing internet attacks. One of them is phishing. Phishing is a form of activity that threatens or traps someone with the concept of luring that person. Namely by tricking someone so that the person indirectly provides all the information the trapper needs. Phishing is included in cybercrime, where crime is rampant through computer networks. Along with the times, crime is also increasingly widespread throughout the world. So that the threats that are happening today are also via computers. With such cases, this study aims to predict phishing sites with a classification algorithm. One of them is by using the SVM (Support Vector Machine) Algorithm. This research was conducted by classifying the phishing website data set and then calculating the accuracy for each kernel. From the study, the results are SVM with Gaussian RBF has the best performance with 88.92% accuracy, and SVM with Sigmoid kernel has the worst performance with 79.33% accuracy.

Download Full-text

Automatic Task Classification via Support Vector Machine and Crowdsourcing

Mobile Information Systems ◽

10.1155/2018/6920679 ◽

2018 ◽

Vol 2018 ◽

pp. 1-9 ◽

Cited By ~ 3

Author(s):

Hyungsik Shin ◽

Jeongyeup Paek

Keyword(s):

Support Vector Machine ◽

Mobile Devices ◽

Prediction Accuracy ◽

Training Data ◽

Amazon Mechanical Turk ◽

Support Vector ◽

Data Set ◽

English Sentence ◽

Task Classification ◽

Personal Assistant

Automatic task classification is a core part of personal assistant systems that are widely used in mobile devices such as smartphones and tablets. Even though many industry leaders are providing their own personal assistant services, their proprietary internals and implementations are not well known to the public. In this work, we show through real implementation and evaluation that automatic task classification can be implemented for mobile devices by using the support vector machine algorithm and crowdsourcing. To train our task classifier, we collected our training data set via crowdsourcing using the Amazon Mechanical Turk platform. Our classifier can classify a short English sentence into one of the thirty-two predefined tasks that are frequently requested while using personal mobile devices. Evaluation results show high prediction accuracy of our classifier ranging from 82% to 99%. By using large amount of crowdsourced data, we also illustrate the relationship between training data size and the prediction accuracy of our task classifier.

Download Full-text

Prediction of Collapsibility of Loess of Construction Sites in Xining Based on Machine Learning Methods

10.21203/rs.3.rs-307514/v1 ◽

2021 ◽

Author(s):

Qifei Zhao ◽

Xiaojun Li ◽

Yunning Cao ◽

Zhikun Li ◽

Jixin Fan

Keyword(s):

Machine Learning ◽

Support Vector Machine ◽

Training Data ◽

Support Vector ◽

Engineering Practice ◽

Burial Depth ◽

Learning Methods ◽

Data Set ◽

Machine Learning Methods ◽

North East

Abstract Collapsibility of loess is a significant factor affecting engineering construction in loess area, and testing the collapsibility of loess is costly. In this study, A total of 4,256 loess samples are collected from the north, east, west and middle regions of Xining. 70% of the samples are used to generate training data set, and the rest are used to generate verification data set, so as to construct and validate the machine learning models. The most important six factors are selected from thirteen factors by using Grey Relational analysis and multicollinearity analysis: burial depth、water content、specific gravity of soil particles、void rate、geostatic stress and plasticity limit. In order to predict the collapsibility of loess, four machine learning methods: Support Vector Machine (SVM), Random Subspace Based Support Vector Machine (RSSVM), Random Forest (RF) and Naïve Bayes Tree (NBTree), are studied and compared. The receiver operating characteristic (ROC) curve indicators, standard error (SD) and 95% confidence interval (CI) are used to verify and compare the models in different research areas. The results show that: RF model is the most efficient in predicting the collapsibility of loess in Xining, and its AUC average is above 80%, which can be used in engineering practice.

Download Full-text

DEPTH ESTIMATION OF SHALLOW WATER USING MULTISPECTRAL SATELLITE IMAGERY SENTINEL-2A

Jurnal Segara ◽

10.15578/segara.v16i3.8562 ◽

2020 ◽

Vol 16 (3) ◽

Author(s):

Arip Rahman

Keyword(s):

Shallow Water ◽

Test Data ◽

Remote Sensing Data ◽

Depth Estimation ◽

Training Data ◽

Coefficient Of Determination ◽

Support Vector ◽

Data Set ◽

Svm Algorithm ◽

Sentinel 2A

Shallow water bathymetry estimation from remote sensing data has been increasing widespread, as an alternative to traditional bathymetry measurement that has disturbed by technical and logistic problem. Deriving bathymetry data from Sentinel 2A images, at visible wavelength (blue, green and red) 10 meter spatial resolution was carried out around the waters of the Kemujan Island Karimunjawa National Park Central Java. Amount of 1280 points data are used as training data sets and 854 points data as test data set produced from sounding. Dark Object Substraction (DOS) has been to correct atmospherically the Sentinel-2A images. Several algorithm has been applied to derive bathymetry data, including: linear transform, ratio transform and support vector machine (SVM). The highest correlation between depth prediction and observe resulted from SVM algorithm with a coefficient of determination (R2) 0.71 (training data) and 0.56 (test data). The assessment of the accuracy of the three methods using RMSE and MAE values, the SVM algorithm has the smallest value (< 1 m). This indicates that the SVM algorithm has a high accuracy compared to the other two methods. The bathymetry map derived from Sentinel 2A imagery cannot be used as a reference for navigation.

Download Full-text

Application of Support Vector Machine in Determination of Real Estate Price

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.461.818 ◽

2012 ◽

Vol 461 ◽

pp. 818-821

Author(s):

Shi Hu Zhang

Keyword(s):

Support Vector Machine ◽

Real Estate ◽

Learning Algorithm ◽

Predictive Ability ◽

Training Data ◽

Small Samples ◽

Support Vector ◽

Data Set ◽

Real Estate Price

The problem of real estate prices are the current focus of the community's concern. Support Vector Machine is a new machine learning algorithm, as its excellent performance of the study, and in small samples to identify many ways, and so has its unique advantages, is now used in many areas. Determination of real estate price is a complicated problem due to its non-linearity and the small quantity of training data. In this study, support vector machine (SVM) is proposed to forecast the price of real estate price in China. The experimental results indicate that the SVM method can achieve greater accuracy than grey model, artificial neural network under the circumstance of small training data. It was also found that the predictive ability of the SVM outperformed those of some traditional pattern recognition methods for the data set used here.

Download Full-text

Mitigation of Nonlinear Impairments by Using Support Vector Machine and Nonlinear Volterra Equalizer

Applied Sciences ◽

10.3390/app9183800 ◽

2019 ◽

Vol 9 (18) ◽

pp. 3800

Author(s):

Rebekka Weixer ◽

Jonas Koch ◽

Patrick Plany ◽

Simon Ohlendorf ◽

Stephan Pachnicke

Keyword(s):

Support Vector Machine ◽

Phase Noise ◽

Nonlinear Effects ◽

Single Mode ◽

Single Mode Fiber ◽

Kernel Functions ◽

Training Data ◽

Support Vector ◽

Data Set ◽

Nonlinear Phase

A support vector machine (SVM) based detection is applied to different equalization schemes for a data center interconnect link using coherent 64 GBd 64-QAM over 100 km standard single mode fiber (SSMF). Without any prior knowledge or heuristic assumptions, the SVM is able to learn and capture the transmission characteristics from only a short training data set. We show that, with the use of suitable kernel functions, the SVM can create nonlinear decision thresholds and reduce the errors caused by nonlinear phase noise (NLPN), laser phase noise, I/Q imbalances and so forth. In order to apply the SVM to 64-QAM we introduce a binary coding SVM, which provides a binary multiclass classification with reduced complexity. We investigate the performance of this SVM and show how it can improve the bit-error rate (BER) of the entire system. After 100 km the fiber-induced nonlinear penalty is reduced by 2 dB at a BER of 3.7 × 10 − 3 . Furthermore, we apply a nonlinear Volterra equalizer (NLVE), which is based on the nonlinear Volterra theory, as another method for mitigating nonlinear effects. The combination of SVM and NLVE reduces the large computational complexity of the NLVE and allows more accurate compensation of nonlinear transmission impairments.

Download Full-text

Same-Species Contamination Detection with Variant Calling Information from Next Generation Sequencing

10.1101/531558 ◽

2019 ◽

Cited By ~ 1

Author(s):

Tao Jiang ◽

Martin Buchkovich ◽

Alison Motsinger-Reif

Keyword(s):

Support Vector Machine ◽

Copy Number ◽

Variant Calling ◽

Training Data ◽

Support Vector ◽

Next Generation ◽

Simulation Experiments ◽

Normal Cells ◽

Rbf Kernel ◽

Generation Sequencing

AbstractMotivationSame-species contamination detection is an important quality control step in genetic data analysis. Compared with widely discussed cross-species contamination, same-species contamination is more challenging to detect, and there is a scarcity of methods to detect and correct for this quality control issue. Same-species contamination may be due to contamination by lab technicians or samples from other contributors. Here, we introduce a novel machine learning algorithm to detect same species contamination in next generation sequence data using support vector machines. Our approach uniquely detects such contamination using variant calling information stored in the variant call format (VCF) files (either DNA or RNA), and importantly can differentiate between same species contamination and mixtures of tumor and normal cells.MethodsIn the first stage of our approach, a change-point detection method is used to identify copy number variations or copy number aberrations (CNVs or CNAs) for filtering prior to testing for contamination. Next, single nucleotide polymorphism (SNP) data is used to test for same species contamination using a support vector machine model. Based on the assumption that alternative allele frequencies in next generation sequencing follow the beta-binomial distribution, the deviation parameter ρ is estimated by maximum likelihood method. All features of a radial basis function (RBF) kernel support vector machine (SVM) are generated using either publicly available or private training data. Lastly, the generated SVM is applied in the test data to detect contamination. If training data is not available, a default RBF kernel SVM model is used.ResultsWe demonstrate the potential of our approach using simulation experiments, creating datasets with varying levels of contamination. The datasets combine, in silico, exome sequencing data of DNA from two lymphoblastoid cell lines (NA12878 and NA10855). We generated VCF files using variants identified in these data, and then evaluated the power and false positive rate of our approach to detect same species contamination. Our simulation experiments show that our method can detect levels of contamination as low as 5% with reasonable false positive rates. Results in real data have sensitivity above 99.99% and specificity at 90.24%, even in the presence of DNA degradation that has similar features to contaminated samples. Additionally, the approach can identify the difference between mixture of tumor-normal cells and contamination. We provide an R software implementation of our approach using the defcon()function in the vanquish: Variant Quality Investigation Helper R package on CRAN.

Download Full-text

DoS attack detection method on application layer for publish-subscribe networks

Information and Control Systems ◽

10.31799/1684-8853-2020-4-50-60 ◽

2020 ◽

pp. 50-60

Author(s):

Dmitrii Dikii

Keyword(s):

Support Vector Machine ◽

New Technologies ◽

Denial Of Service ◽

Attack Detection ◽

Training Data ◽

Polynomial Kernel ◽

Support Vector ◽

Cyberphysical Systems ◽

Dos Attack ◽

Data Set

Introduction: For the development of cyberphysical systems, new technologies and data transfer protocols are being developed, in order to reduce the energy costs of communication devices. One of the modern approaches to data transmission in cyberphysical systems is the publish-subscribe model, which is subject to a denial-of-service attack. Purpose: Development of a model for detecting a DoS attack implemented at the application level of publish-subscribe networks based on the analysis of their traffic using machine learning methods. Results: A model is developed for detecting a DoS attack, operating with three classifiers depending on the message type: connection, subscription, and publication. This approach makes it possible to identify the source of an attack. That can be a network node, a particular device, or a user account. A multi-layer perceptron, the random forest algorithm, and a support vector machine of various configurations were considered as classifiers. Training and test data sets were generated for the proposed feature vector. The classification quality was evaluated by calculating the F1 score, the Matthews correlation coefficient, and accuracy. The multilayer perceptron model and the support vector machine with a polynomial kernel and SMO optimization method showed the best values of all metrics. However, in the case of the support vector machine, a slight decrease in the prediction quality was detected when the width of the traffic analysis window was close to the longest period of sending legitimate messages from the training data set. Practical relevance: The results of the research can be used in the development of intrusion detection features for cyberphysical systems using the publish-subscribe model, or other systems based on the same approach

Download Full-text

Beam-hardening correction by a surface fitting and phase classification by a least square support vector machine approach for tomography images of geological samples

Solid Earth Discussions ◽

10.5194/sed-7-3383-2015 ◽

2015 ◽

Vol 7 (4) ◽

pp. 3383-3408 ◽

Cited By ~ 4

Author(s):

F. Khan ◽

F. Enzmann ◽

M. Kersten

Keyword(s):

Support Vector Machine ◽

Image Data ◽

Least Square ◽

Training Data ◽

Support Vector ◽

Beam Hardening ◽

Reconstruction Procedure ◽

Data Set ◽

Corrected Image ◽

Computed Microtomography

Abstract. In X-ray computed microtomography (μXCT) image processing is the most important operation prior to image analysis. Such processing mainly involves artefact reduction and image segmentation. We propose a new two-stage post-reconstruction procedure of an image of a geological rock core obtained by polychromatic cone-beam μXCT technology. In the first stage, the beam-hardening (BH) is removed applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. The final BH-corrected image is extracted from the residual data, or the difference between the surface elevation values and the original grey-scale values. For the second stage, we propose using a least square support vector machine (a non-linear classifier algorithm) to segment the BH-corrected data as a pixel-based multi-classification task. A combination of the two approaches was used to classify a complex multi-mineral rock sample. The Matlab code for this approach is provided in the Appendix. A minor drawback is that the proposed segmentation algorithm may become computationally demanding in the case of a high dimensional training data set.

Download Full-text

Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb

EDUMATIC Jurnal Pendidikan Informatika ◽

10.29408/edumatic.v5i1.3125 ◽

2021 ◽

Vol 5 (1) ◽

pp. 11-20

Author(s):

Wahyu Hidayat ◽

◽

Mursyid Ardiansyah ◽

Arief Setyanto ◽

◽

...

Keyword(s):

Support Vector Machine ◽

Confusion Matrix ◽

Sampling Technique ◽

Host Population ◽

Support Vector ◽

Data Sets ◽

Test Results ◽

Data Set ◽

Tourist Attractions ◽

Svm Algorithm

Traveling activities are increasingly being carried out by people in the world. Some tourist attractions are difficult to reach hotels because some tourist attractions are far from the city center, Airbnb is a platform that provides home or apartment-based rentals. In lodging offers, there are two types of hosts, namely non-super host and super host. The super-host badge is obtained if the innkeeper has a good reputation and meets the requirements. There are advantages to being a super host such as having more visibility, increased earning potential and exclusive rewards. Support Vector Machine (SVM) algorithm classification process by these criteria data. Data set is unbalanced. The super host population is smaller than the non-super host. Overcoming the imbalance, this over sampling technique is carried out using ADASYN and SMOTE. Research goal was to decide the performance of ADASYN and sampling technique, SVM algorithm. Data analyses used over sampling which aims to handle unbalanced data sets, and confusion matrix used for testing Precision, Recall, and F1-SCORE, and Accuracy. Research shows that SMOTE SVM increases the accuracy rate by 1 percent from 80% to 81%, which is influenced by the increase in the True (minority) label test results and a decrease in the False label test results (majority), the SMOTE SVM is better than ADASYN SVM, and SVM without over sampling.

Download Full-text