Detecting Phishing Website Using Machine Learning

A fraud attempt to get sensitive and personal information like password, username, and bank details like credit/debit card details by masking as a reliable organization in electronic communication. The phishing website will appear the same as the legitimate website and directs the user to a page to enter personal details of the user on the fake website. Through machine learning algorithms one can improve the accuracy of the prediction. The proposed method predicts the URL based phishing websites based on features and also gives maximum accuracy. This method uses uniform resource locator (URL) features. We identified features that phishing site URLs contain. The proposed method employs those features for phishing detection. The proposed system predicts the URL based phishing websites with maximum accuracy.

Download Full-text

Phishing Detection Based on Machine Learning and Feature Selection Methods

International Journal of Interactive Mobile Technologies (iJIM) ◽

10.3991/ijim.v13i12.11411 ◽

2019 ◽

Vol 13 (12) ◽

pp. 171 ◽

Cited By ~ 1

Author(s):

Mohammad Almseidin ◽

AlMaha Abu Zuraiq ◽

Mouhammd Al-kasassbeh ◽

Nidal Alnidami

Keyword(s):

Machine Learning ◽

Feature Selection ◽

Random Forest ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Web Pages ◽

Selection Methods ◽

Random Forest Algorithm ◽

Phishing Detection ◽

Enormous Number

With increasing technology developments, the Internet has become everywhere and accessible by everyone. There are a considerable number of web-pages with different benefits. Despite this enormous number, not all of these sites are legitimate. There are so-called phishing sites that deceive users into serving their interests. This paper dealt with this problem using machine learning algorithms in addition to employing a novel dataset that related to phishing detection, which contains 5000 legitimate web-pages and 5000 phishing ones. In order to obtain the best results, various machine learning algorithms were tested. Then J48, Random forest, and Multilayer perceptron were chosen. Different feature selection tools were employed to the dataset in order to improve the efficiency of the models. The best result of the experiment achieved by utilizing 20 features out of 48 features and applying it to Random forest algorithm. The accuracy was 98.11%.

Download Full-text

Detection of Malicious Uniform Resource Locator

International Journal of Recent Technology and Engineering - 2 ◽

10.35940/ijrte.a1265.078219 ◽

2019 ◽

Vol 8 (2) ◽

pp. 41-47

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Text Messages ◽

Machine Learning Algorithms ◽

The Internet ◽

Application Layer ◽

Uniform Resource Locator ◽

Network Characteristics ◽

The World ◽

Use Of Internet

With the growing use of internet across the world ,the threats posed by it are numerous. The information you get and share across the internet is accessible, can be tracked and modified. Malicious websites play a pivotal role in effecting your system. These websites reach users through emails, text messages, pop ups or devious advertisements. The outcome of these websites or Uniform Resource Locators (URLs) would often be a downloaded malware, spyware, ransomware and compromised accounts. A malicious website or URL requires action on the users side, however in the case of drive by only downloads, the website will attempt to install software on the computer without asking users permission first. We put forward a model to forecast a URL is malicious or benign, based on the application layer and network characteristics. Machine learning algorithms for classification are used to develop a classifier using the targeted dataset. The targeted dataset is divided into training and validation sets. These sets are used to train and validate the classifier model. The hyper parameters are tuned to refine the model and generate better results

Download Full-text

COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR PHISHING WEBSITE DETECTION

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v06i01.017 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Anuraag Velamati

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Personal Information ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cyber Attack ◽

Modern Era

Phishing is the most commonly approached cyber-attack in this modern era. Through such attacks, the phisher will target the innocent users by tricking them into revealing their secure and personal information, with the purpose of using it fraudulently. In order to avoid getting phished, users should have awareness of phishing websites, have a blacklist of phishing websites which requires the knowledge of website being detected as phishing.

Download Full-text

Performance Analysis of Machine Learning Algorithms Used for Web Based Phishing Detection

Journal of University of Shanghai for Science and Technology ◽

10.51201/jusst/21/05187 ◽

2021 ◽

Vol 23 (05) ◽

pp. 650-656

Author(s):

Shailendra Baliram Torane ◽

◽

Dr. Narendra Shekokar ◽

Keyword(s):

Machine Learning ◽

Performance Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Support Vector ◽

The Real ◽

Detection Algorithms ◽

Confidential Data ◽

Accuracy Parameter ◽

Phishing Detection

Phishing is a cybercrime technique in which the attacker creates a copy of genuine websites with the same color pattern, layout, font, and logo and with a domain name that matches with the real one. Then, broadcast this fake website through various online modes like emails and social media. The attacker creates lucrative offers or discounts to lure in people to click on the phishing link. Once the user clicks on this phishing link, they a re directed to the duplicate website that the attacker had created. The user believes that it is the real website and enters his/her login details and other confidential data. This data is stored on the attacker’s server thus giving him full access to the victim’s data. The phishing attack is mainly targeted to collect confidential data of the victim. This data includes Username, Passwords, Bank details, security Credit card numbers etc. Machine Learning algorithms are being used widely in detecting phishing websites. This paper shows performance analysis of three Machine learning algorithms used for URL phishing detection. These algorithms are Extreme Learning Machine, Support Vector Machine and Naïve Bayes algorithm. The paper analyses these algorithms on the parameters of Accuracy, Precision, Recall, F1 score and Confusion matrix. The dataset includes 11,000 entries and 30 features from UC Irvine dataset repository. The literature survey shows how only importance is given to only one parameter i.e., Accuracy parameter when analyzing performance of the URL phishing detection algorithms. This paper concludes on how Accuracy parameter does not show full picture on the overall performance of the URL phishing detection algorithms and also how Precision and Recall parameters are very important in understanding the working of these algorithms.

Download Full-text

Offline handwritten signature verification using various Machine Learning Algorithms

ITM Web of Conferences ◽

10.1051/itmconf/20214003010 ◽

2021 ◽

Vol 40 ◽

pp. 03010

Author(s):

Chinmay Lokare ◽

Rachana Patil ◽

Saloni Rane ◽

Deepakkumar Kathirasen ◽

Yogita Mistry

Keyword(s):

Machine Learning ◽

Personal Information ◽

Learning Algorithms ◽

Principal Component ◽

Component Analysis ◽

Machine Learning Algorithms ◽

Kernel Principal Component Analysis ◽

Signature Verification ◽

Bayes Algorithm ◽

Handwritten Signature

In today’s world it is necessary to protect one’s authenticity in order to ensure the protection of personal information that only the authenticate credentials of a person can have access to. Nowadays there is an increase in number of malpractices like signature forgery to access the important information of a person. To encounter signature verification problem, there have been a number of advances in verifying the authenticity of signature using various techniques including Machine Learning and Deep Learning. This paper introduces a novel approach to verify the signatures using difference of gaussian filtering technique, gray level co-occurrence matrix feature extraction technique, principle component analysis and kernel principal component analysis associated with various machine learning algorithms. The publicly available Kaggle offline handwritten signature dataset is used for training. This article compares the accuracy of the dataset on various machine learning algorithms. After training datasets the lowest accuracy achieved is 56.66% for Naive Bayes algorithm. The highest accuracy achieved is 82% for K-Nearest Neighbour (KNN) and 81.66% for Random Forest using principle components and kernel principle components of the dataset.

Download Full-text

Supplemental Material for One Model to Rule Them All? Using Machine Learning Algorithms to Determine the Number of Factors in Exploratory Factor Analysis

Psychological Methods ◽

10.1037/met0000262.supp ◽

2020 ◽

Keyword(s):

Machine Learning ◽

Factor Analysis ◽

Exploratory Factor Analysis ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Number Of Factors

Download Full-text

Forecasting US movies box office performances in Turkey using machine learning algorithms

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189120 ◽

2020 ◽

Vol 39 (5) ◽

pp. 6579-6590

Author(s):

Sandy Çağlıyor ◽

Başar Öztayşi ◽

Selime Sezgin

Keyword(s):

Machine Learning ◽

Global Economy ◽

Learning Algorithms ◽

Forecast Model ◽

Machine Learning Algorithms ◽

Gradient Boosting ◽

High Stakes ◽

Box Office ◽

Industry Forecast ◽

The Impact

The motion picture industry is one of the largest industries worldwide and has significant importance in the global economy. Considering the high stakes and high risks in the industry, forecast models and decision support systems are gaining importance. Several attempts have been made to estimate the theatrical performance of a movie before or at the early stages of its release. Nevertheless, these models are mostly used for predicting domestic performances and the industry still struggles to predict box office performances in overseas markets. In this study, the aim is to design a forecast model using different machine learning algorithms to estimate the theatrical success of US movies in Turkey. From various sources, a dataset of 1559 movies is constructed. Firstly, independent variables are grouped as pre-release, distributor type, and international distribution based on their characteristic. The number of attendances is discretized into three classes. Four popular machine learning algorithms, artificial neural networks, decision tree regression and gradient boosting tree and random forest are employed, and the impact of each group is observed by compared by the performance models. Then the number of target classes is increased into five and eight and results are compared with the previously developed models in the literature.

Download Full-text

Intelligent system of English composition scoring model based on improved machine learning algorithm

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189235 ◽

2020 ◽

pp. 1-11

Author(s):

Jie Liu ◽

Lin Lin ◽

Xiufang Liang

Keyword(s):

Machine Learning ◽

Evaluation System ◽

Intelligent System ◽

Learning Algorithm ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Assessment System ◽

English Composition ◽

Region Extraction ◽

Constraint Model

The online English teaching system has certain requirements for the intelligent scoring system, and the most difficult stage of intelligent scoring in the English test is to score the English composition through the intelligent model. In order to improve the intelligence of English composition scoring, based on machine learning algorithms, this study combines intelligent image recognition technology to improve machine learning algorithms, and proposes an improved MSER-based character candidate region extraction algorithm and a convolutional neural network-based pseudo-character region filtering algorithm. In addition, in order to verify whether the algorithm model proposed in this paper meets the requirements of the group text, that is, to verify the feasibility of the algorithm, the performance of the model proposed in this study is analyzed through design experiments. Moreover, the basic conditions for composition scoring are input into the model as a constraint model. The research results show that the algorithm proposed in this paper has a certain practical effect, and it can be applied to the English assessment system and the online assessment system of the homework evaluation system algorithm system.

Download Full-text

The Unlearnable Checkerboard Pattern

Communications of the Blyth Institute ◽

10.33014/issn.2640-5652.1.2.holloway.1 ◽

2019 ◽

Vol 1 (2) ◽

pp. 78-80

Author(s):

Eric Holloway

Keyword(s):

Machine Learning ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Checkerboard Pattern ◽

Simple Task

Detecting some patterns is a simple task for humans, but nearly impossible for current machine learning algorithms. Here, the "checkerboard" pattern is examined, where human prediction nears 100% and machine prediction drops significantly below 50%.

Download Full-text