Implementation of the Support Vector Machine (SVM) Algorithm in Classifying Website Phishing

Putu Agus Prawira Dharma Yuda; I Putu Gede Hendra Suputra

doi:10.24843/jlk.2021.v09.i04.p03

Implementation of the Support Vector Machine (SVM) Algorithm in Classifying Website Phishing

JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) ◽

10.24843/jlk.2021.v09.i04.p03 ◽

2021 ◽

Vol 9 (4) ◽

pp. 467

Author(s):

Putu Agus Prawira Dharma Yuda ◽

I Putu Gede Hendra Suputra

Keyword(s):

Support Vector Machine ◽

Total Population ◽

Internet Technology ◽

Support Vector ◽

The Internet ◽

Data Set ◽

Svm Algorithm ◽

The World ◽

Fast Development ◽

The Times

The development of the internet is so significant, if we look at the growth of the internet in the world, it has reached more than 4 billion and in Indonesia, there are more than 171 million users out of a total population of more than 273 million people. This is due to the very fast development of information technology and various kinds of media and functions. However, of the advances in internet technology, it did not escape the existing internet attacks. One of them is phishing. Phishing is a form of activity that threatens or traps someone with the concept of luring that person. Namely by tricking someone so that the person indirectly provides all the information the trapper needs. Phishing is included in cybercrime, where crime is rampant through computer networks. Along with the times, crime is also increasingly widespread throughout the world. So that the threats that are happening today are also via computers. With such cases, this study aims to predict phishing sites with a classification algorithm. One of them is by using the SVM (Support Vector Machine) Algorithm. This research was conducted by classifying the phishing website data set and then calculating the accuracy for each kernel. From the study, the results are SVM with Gaussian RBF has the best performance with 88.92% accuracy, and SVM with Sigmoid kernel has the worst performance with 79.33% accuracy.

Prediksi Risiko Perjalanan Transportasi Online Dari Data Telematik Menggunakan Algoritma Support Vector Machine

Jurnal Teknik Informatika dan Sistem Informasi ◽

10.28932/jutisi.v6i2.2672 ◽

2020 ◽

Vol 6 (2) ◽

Author(s):

Christ Memory Sitorus ◽

Adhi Rizal ◽

Mohamad Jajuli

Keyword(s):

Support Vector Machine ◽

User Satisfaction ◽

Internet Technology ◽

Training Data ◽

Support Vector ◽

Data Set ◽

K Value ◽

Movement Data ◽

Svm Algorithm ◽

Rbf Kernel

The ride-hailing service is now booming because it has been helped by internet technology, therefore many call this service online transportation. The magnitude of the potential for growth in online transportation service users also increases the risk of user satisfaction which could have declined therefore the company is increasing in its service. Both in terms of application and services provided by partners/drivers of the company. During each trip, the online transportation application will record device movement data and send it to the server. This data set is usually called telematic data. This telematics data if processed can have enormous benefits. In this study, an analysis will be conducted to predict the risk of online transportation trips using the Support Vector Machine (SVM) algorithm based on the obtained telematic data. The data obtained is telematic data so it must be processed first using feature engineering to obtain 51 features, then trained using the SVM algorithm with RBF kernel and modified C values. Every C value that is changed will be used K-Fold cross-validation first to separate the testing data and training data. The specified k value is 5. The results for each trial obtained accuracy, Receiver Operating Characteristic (ROC) and Area Under the Curves (AUC), for the best that is at C = 100 while the worst at C = 0.001.

Analisis Sentimen Twitter Kuliah Online Pasca Covid-19 Menggunakan Algoritma Support Vector Machine dan Naive Bayes

Jurnal Komtika ◽

10.31603/komtika.v5i1.5189 ◽

2021 ◽

Vol 5 (1) ◽

pp. 43-51

Author(s):

Hendrik Setiawan ◽

Ema Utami ◽

Sudarmawan Sudarmawan

Keyword(s):

Support Vector Machine ◽

World Health ◽

Support Vector ◽

Online Lectures ◽

Svm Algorithm ◽

The World ◽

The Government ◽

Bayes Algorithm ◽

Health Organization ◽

Performance Results

The World Health Organization (WHO) COVID-19 is an infectious disease caused by the Coronavirus which originally came from an outbreak in the city of Wuhan, China in December 2019 which later became a pandemic that occurred in many countries around the world. This disease has caused the government to give a regional lockdown status to give students the status of "at home" for students to enforce online or online lectures, this has caused various sentiments given by students in responding to online lectures via social media twitter. For sentiment analysis, the researcher applies the nave Bayes algorithm and support vector machine (SVM) with the performance results obtained on the Bayes algorithm with an accuracy of 81.20%, time 9.00 seconds, recall 79.60% and precision 79.40% while for the SVM algorithm get an accuracy value of 85%, time 31.60 seconds, recall 84% and precision 83.60%, the performance results are obtained in the 1st iteration for nave Bayes and the 423th iteration for the SVM algorithm

Automatic Evaluation of Heart Condition According to the Sounds Emitted and Implementing Six Classification Methods

Healthcare ◽

10.3390/healthcare9030317 ◽

2021 ◽

Vol 9 (3) ◽

pp. 317

Author(s):

Manuel A. Soto-Murillo ◽

Jorge I. Galván-Tejada ◽

Carlos E. Galván-Tejada ◽

Jose M. Celaya-Padilla ◽

Huizilopoztli Luna-García ◽

...

Keyword(s):

Support Vector Machine ◽

Logistic Regression ◽

Roc Curve ◽

Heart Sounds ◽

Machine Learning Algorithms ◽

World Health ◽

Support Vector ◽

Computer Assisted ◽

Data Set ◽

The World

The main cause of death in Mexico and the world is heart disease, and it will continue to lead the death rate in the next decade according to data from the World Health Organization (WHO) and the National Institute of Statistics and Geography (INEGI). Therefore, the objective of this work is to implement, compare and evaluate machine learning algorithms that are capable of classifying normal and abnormal heart sounds. Three different sounds were analyzed in this study; normal heart sounds, heart murmur sounds and extra systolic sounds, which were labeled as healthy sounds (normal sounds) and unhealthy sounds (murmur and extra systolic sounds). From these sounds, fifty-two features were calculated to create a numerical dataset; thirty-six statistical features, eight Linear Predictive Coding (LPC) coefficients and eight Cepstral Frequency-Mel Coefficients (MFCC). From this dataset two more were created; one normalized and one standardized. These datasets were analyzed with six classifiers: k-Nearest Neighbors, Naive Bayes, Decision Trees, Logistic Regression, Support Vector Machine and Artificial Neural Networks, all of them were evaluated with six metrics: accuracy, specificity, sensitivity, ROC curve, precision and F1-score, respectively. The performances of all the models were statistically significant, but the models that performed best for this problem were logistic regression for the standardized data set, with a specificity of 0.7500 and a ROC curve of 0.8405, logistic regression for the normalized data set, with a specificity of 0.7083 and a ROC curve of 0.8407, and Support Vector Machine with a lineal kernel for the non-normalized data; with a specificity of 0.6842 and a ROC curve of 0.7703. Both of these metrics are of utmost importance in evaluating the performance of computer-assisted diagnostic systems.

Pengaruh Algoritma ADASYN dan SMOTE terhadap Performa Support Vector Machine pada Ketidakseimbangan Dataset Airbnb

EDUMATIC Jurnal Pendidikan Informatika ◽

10.29408/edumatic.v5i1.3125 ◽

2021 ◽

Vol 5 (1) ◽

pp. 11-20

Author(s):

Wahyu Hidayat ◽

◽

Mursyid Ardiansyah ◽

Arief Setyanto ◽

◽

...

Keyword(s):

Support Vector Machine ◽

Confusion Matrix ◽

Sampling Technique ◽

Host Population ◽

Support Vector ◽

Data Sets ◽

Test Results ◽

Data Set ◽

Tourist Attractions ◽

Svm Algorithm

Traveling activities are increasingly being carried out by people in the world. Some tourist attractions are difficult to reach hotels because some tourist attractions are far from the city center, Airbnb is a platform that provides home or apartment-based rentals. In lodging offers, there are two types of hosts, namely non-super host and super host. The super-host badge is obtained if the innkeeper has a good reputation and meets the requirements. There are advantages to being a super host such as having more visibility, increased earning potential and exclusive rewards. Support Vector Machine (SVM) algorithm classification process by these criteria data. Data set is unbalanced. The super host population is smaller than the non-super host. Overcoming the imbalance, this over sampling technique is carried out using ADASYN and SMOTE. Research goal was to decide the performance of ADASYN and sampling technique, SVM algorithm. Data analyses used over sampling which aims to handle unbalanced data sets, and confusion matrix used for testing Precision, Recall, and F1-SCORE, and Accuracy. Research shows that SMOTE SVM increases the accuracy rate by 1 percent from 80% to 81%, which is influenced by the increase in the True (minority) label test results and a decrease in the False label test results (majority), the SMOTE SVM is better than ADASYN SVM, and SVM without over sampling.

A NOVEL ROBUST REGRESSION APPROACH OF LIDAR SIGNAL BASED ON MODIFIED LEAST SQUARES SUPPORT VECTOR MACHINE

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s021800140500423x ◽

2005 ◽

Vol 19 (05) ◽

pp. 715-729 ◽

Cited By ~ 3

Author(s):

BING-YU SUN ◽

DE-SHUANG HUANG ◽

HAI-TAO FANG ◽

XING-MING YANG

Keyword(s):

Support Vector Machine ◽

Least Squares ◽

Robust Regression ◽

A Priori ◽

Support Vector ◽

Lidar Signal ◽

Data Set ◽

Novel Approach ◽

Svm Algorithm ◽

Effectiveness And Efficiency

Lidar is an active remote sensing instrument, but its effective range is often limited by signal-to-noise (SNR) ratio. The reason is that noises or fluctuations always strongly affect the measured results. To resolve this problem, a novel approach of using least-squares support vector machine (LS-SVM) to reconstruct the Lidar signal is proposed in this paper. LS-SVM has been proven as robust to noisy data; the Lidar signal, which is strongly corrupted by noises or fluctuations, can be thought as a function of distance. So detecting Lidar signals from high noisy regime can be regarded as a robust regression procedure which involves estimating the underlying relationship from detected signal data set. To apply the LS-SVM on Lidar signal regression, firstly the noises in Lidar signal is analyzed and then the traditional LS-SVM algorithm is modified to incorporate the a priori knowledge of the Lidar signal in the training of LS-SVM. The experimental results demonstrate the effectiveness and efficiency of our approach.

Algorithm Comparation of Naive Bayes and Support Vector Machine based on Particle Swarm Optimization in Sentiment Analysis of Freight Forwarding Services

Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) ◽

10.29207/resti.v4i2.1840 ◽

2020 ◽

Vol 4 (2) ◽

pp. 362-369

Author(s):

Sharazita Dyah Anggita ◽

Ikmah

Keyword(s):

Support Vector Machine ◽

Sentiment Analysis ◽

Naive Bayes ◽

Naïve Bayes ◽

Support Vector ◽

The Public ◽

Svm Algorithm ◽

Bayes Algorithm ◽

Freight Forwarding ◽

Improved Accuracy

The needs of the community for freight forwarding are now starting to increase with the marketplace. User opinion about freight forwarding services is currently carried out by the public through many things one of them is social media Twitter. By sentiment analysis, the tendency of an opinion will be able to be seen whether it has a positive or negative tendency. The methods that can be applied to sentiment analysis are the Naive Bayes Algorithm and Support Vector Machine (SVM). This research will implement the two algorithms that are optimized using the PSO algorithms in sentiment analysis. Testing will be done by setting parameters on the PSO in each classifier algorithm. The results of the research that have been done can produce an increase in the accreditation of 15.11% on the optimization of the PSO-based Naive Bayes algorithm. Improved accuracy on the PSO-based SVM algorithm worth 1.74% in the sigmoid kernel.

A Computational Method for the Identification of Endolysins and Autolysins

Protein and Peptide Letters ◽

10.2174/0929866526666191002104735 ◽

2020 ◽

Vol 27 (4) ◽

pp. 329-336 ◽

Cited By ~ 1

Author(s):

Lei Xu ◽

Guangmin Liang ◽

Baowen Chen ◽

Xu Tan ◽

Huaikun Xiang ◽

...

Keyword(s):

Support Vector Machine ◽

Cell Wall ◽

Experimental Results ◽

Computational Method ◽

Lytic Enzyme ◽

Support Vector ◽

Lytic Enzymes ◽

Data Set ◽

Optimal Feature ◽

Better Than

Background: Cell lytic enzyme is a kind of highly evolved protein, which can destroy the cell structure and kill the bacteria. Compared with antibiotics, cell lytic enzyme will not cause serious problem of drug resistance of pathogenic bacteria. Thus, the study of cell wall lytic enzymes aims at finding an efficient way for curing bacteria infectious. Compared with using antibiotics, the problem of drug resistance becomes more serious. Therefore, it is a good choice for curing bacterial infections by using cell lytic enzymes. Cell lytic enzyme includes endolysin and autolysin and the difference between them is the purpose of the break of cell wall. The identification of the type of cell lytic enzymes is meaningful for the study of cell wall enzymes. Objective: In this article, our motivation is to predict the type of cell lytic enzyme. Cell lytic enzyme is helpful for killing bacteria, so it is meaningful for study the type of cell lytic enzyme. However, it is time consuming to detect the type of cell lytic enzyme by experimental methods. Thus, an efficient computational method for the type of cell lytic enzyme prediction is proposed in our work. Method: We propose a computational method for the prediction of endolysin and autolysin. First, a data set containing 27 endolysins and 41 autolysins is built. Then the protein is represented by tripeptides composition. The features are selected with larger confidence degree. At last, the classifier is trained by the labeled vectors based on support vector machine. The learned classifier is used to predict the type of cell lytic enzyme. Results: Following the proposed method, the experimental results show that the overall accuracy can attain 97.06%, when 44 features are selected. Compared with Ding's method, our method improves the overall accuracy by nearly 4.5% ((97.06-92.9)/92.9%). The performance of our proposed method is stable, when the selected feature number is from 40 to 70. The overall accuracy of tripeptides optimal feature set is 94.12%, and the overall accuracy of Chou's amphiphilic PseAAC method is 76.2%. The experimental results also demonstrate that the overall accuracy is improved by nearly 18% when using the tripeptides optimal feature set. Conclusion: The paper proposed an efficient method for identifying endolysin and autolysin. In this paper, support vector machine is used to predict the type of cell lytic enzyme. The experimental results show that the overall accuracy of the proposed method is 94.12%, which is better than some existing methods. In conclusion, the selected 44 features can improve the overall accuracy for identification of the type of cell lytic enzyme. Support vector machine performs better than other classifiers when using the selected feature set on the benchmark data set.

Correlation between the structure and skin permeability of compounds

Scientific Reports ◽

10.1038/s41598-021-89587-5 ◽

2021 ◽

Vol 11 (1) ◽

Author(s):

Ruolan Zeng ◽

Jiyong Deng ◽

Limin Dang ◽

Xinliang Yu

Keyword(s):

Large Data ◽

Qsar Model ◽

Coefficient Of Determination ◽

Support Vector ◽

Skin Permeability ◽

Data Set ◽

Test Set ◽

Svm Algorithm ◽

Svm Model ◽

Toxicity Relationship

AbstractA three-descriptor quantitative structure–activity/toxicity relationship (QSAR/QSTR) model was developed for the skin permeability of a sufficiently large data set consisting of 274 compounds, by applying support vector machine (SVM) together with genetic algorithm. The optimal SVM model possesses the coefficient of determination R2 of 0.946 and root mean square (rms) error of 0.253 for the training set of 139 compounds; and a R2 of 0.872 and rms of 0.302 for the test set of 135 compounds. Compared with other models reported in the literature, our SVM model shows better statistical performance in a model that deals with more samples in the test set. Therefore, applying a SVM algorithm to develop a nonlinear QSAR model for skin permeability was achieved.

Multi-phase classification by a least-squares support vector machine approach in tomography images of geological samples

Solid Earth ◽

10.5194/se-7-481-2016 ◽

2016 ◽

Vol 7 (2) ◽

pp. 481-492 ◽

Cited By ~ 8

Author(s):

Faisal Khan ◽

Frieder Enzmann ◽

Michael Kersten

Keyword(s):

Support Vector Machine ◽

Least Squares ◽

Image Data ◽

Main Beam ◽

Support Vector ◽

Geological Samples ◽

Data Set ◽

Corrected Image ◽

Phase Classification ◽

Multi Phase

Abstract. Image processing of X-ray-computed polychromatic cone-beam micro-tomography (μXCT) data of geological samples mainly involves artefact reduction and phase segmentation. For the former, the main beam-hardening (BH) artefact is removed by applying a best-fit quadratic surface algorithm to a given image data set (reconstructed slice), which minimizes the BH offsets of the attenuation data points from that surface. A Matlab code for this approach is provided in the Appendix. The final BH-corrected image is extracted from the residual data or from the difference between the surface elevation values and the original grey-scale values. For the segmentation, we propose a novel least-squares support vector machine (LS-SVM, an algorithm for pixel-based multi-phase classification) approach. A receiver operating characteristic (ROC) analysis was performed on BH-corrected and uncorrected samples to show that BH correction is in fact an important prerequisite for accurate multi-phase classification. The combination of the two approaches was thus used to classify successfully three different more or less complex multi-phase rock core samples.

Cybercafé, Cybercommunity

M/C Journal ◽

10.5204/mcj.1699 ◽

1998 ◽

Vol 1 (1) ◽

Author(s):

Joseph Crawfoot

Keyword(s):

Social Interaction ◽

Middle Class ◽

Internet Technology ◽

The Internet ◽

Bourgeois Society ◽

The Past ◽

Mud Cake ◽

The People ◽

The World ◽

A Site

Cities are an important symbol of our contemporary era. They are not just places of commerce, but are emblems of the people who live within them. A significant feature of cities are their meeting places; areas that have either been designed or appropriated by the people. An example of this is the café. Cafés hold a unique place in history, as sites that have witnessed the growth of revolution, relationships great and small, between people and ideas, and more recently, technology. Computers are transcending their place in the private home or office and are now finding their way into café culture. What I am suggesting is that this is bringing about a new way of understanding how cafés foster community and act as media for social interaction. To explore this idea further I will look at the historical background of the café, particularly within Parisian culture. For W. Scott Haine, cities such as Paris have highly influential abilities. As he points out "the Paris milieu determined the consciousness of workers as much as their labor" (114). While specifically related to Paris, Haine is highlighting an important aspect in the relationship between people and the built environment. He suggests that buildings and streets are not just inanimate objects, but structures that shape our habits and our beliefs. Towards the middle of the nineteenth century, Paris was developing a new cultural level, referred to as Bohemia. Derived from the French word for Gypsy (Seigel 5) it was used to denote a class of people who in the eyes of Honoré de Balzac were the talent of the future (Seigel 4). People who would be diplomats, artists, journalists, soldiers, who at that moment existed in a transient state with much social but little material wealth. Emerging within this Bohemian identity were the bourgeois. They were individuals who led a working class existence, they usually held property but more importantly they helped provide the physical environment for Bohemian culture to flourish. Bourgeois society had the money to patronize Bohemian artists. As Seigel says "Bohemian and bourgeois were -- and are -- parts of a single field: they imply, require, and attract each other" (5). Cafés were a site of symbiosis between these two groups. As Seigel points out they were not so much established to create a Bohemian world away from the reality of working life, but to provide a space were the predominantly bourgeois clientèle could be entertained (216). These ideas of entertainment saw the rise of the literary café, a venue not just for drinking and socialization but where potential writers and orators could perform for an audience. Contemporary society has seen a strong decline in Bohemian culture, with the (franchised) café being appropriated by the upper class as a site of lattes and mud cake. Recent developments in Internet technology however have prompted a change in this trend. Whereas in the past cafés had brought about a symbiosis between the classes of Bohemian and bourgeois society they are now becoming sites that foster relationships between the middle class and computer technology. Computers and the Internet have their origins within a privileged community, of government departments, defence forces and universities. It is only in the past three years that Internet technology has moved out of a realm of expert knowledge to achieve a broad level of usage in the average household. Certain barriers still exist though in terms of a person's ability to gain access to this medium. Just as Bohemian culture arose out of a population of educated people lacking skills of manual labor and social status (Seigel 217), computers and Internet culture offer a means for people to go beyond their social boundaries. Cafés were sites for Bohemians to transcend the social, political, and economic dictates that had shaped their lives. In a similar fashion the Internet offers a means for people to explore beyond their physical world. Internet cafés have been growing steadily around the world. What they represent is a change in the concept of social interaction. As in the past with the Paris café and the exchange of ideas, Internet cafés have become places were people can interact not just on a face-to-face basis but also through computer-mediated communication. What this points to is a broadening in the idea of the café as a medium of social interaction. This is where the latte and mud cake trend is beginning to break down. By placing Internet technology within cafés, proprietors are inviting a far greater section of the community within their walls. While these experiences still attract a price tag they suggest a change in the idea that would have seen both the café and the Internet as commodities of the élite. What this is doing is re-invigorating the idea of the streets belonging to the middle class and other sub-cultures, allowing people access to space so that relationships and communities can be formed. References Haine, W. Scott. The World of the Paris Cafe: Sociability amongst the French Working Class 1789 - 1914. Baltimore: Johns Hopkins UP, 1996. Seigel, Jerrold. Bohemian Paris: Culture, Politics and the Boundaries of Bourgeois Life, 1830 - 1930. New York: Penguin Books, 1987. Citation reference for this article MLA style: Joseph Crawfoot. "Cybercafé, Cybercommunity." M/C: A Journal of Media and Culture 1.1 (1998). [your date of access] <http://www.uq.edu.au/mc/9807/cafe.php>. Chicago style: Joseph Crawfoot, "Cybercafé, Cybercommunity," M/C: A Journal of Media and Culture 1, no. 1 (1998), <http://www.uq.edu.au/mc/9807/cafe.php> ([your date of access]). APA style: Joseph Crawfoot. (1998) Cybercafé, cybercommunity. M/C: A Journal of Media and Culture 1(1). <http://www.uq.edu.au/mc/9807/cafe.php> ([your date of access]).