Detection of Phishing Websites using an Efficient Feature-Based Machine Learning Framework

Phishing is a cyber-attack which is socially engineered to trick naive online users into revealing sensitive information such as user data, login credentials, social security number, banking information etc. Attackers fool the Internet users by posing as a legitimate webpage to retrieve personal information. This can also be done by sending emails posing as reputable companies or businesses. Phishing exploits several vulnerabilities effectively and there is no one solution which protects users from all vulnerabilities. A classification/prediction model is designed based on heuristic features that are extracted from website domain, URL, web protocol, source code to eliminate the drawbacks of existing anti-phishing techniques. In the model we combine some existing solutions such as blacklisting and whitelisting, heuristics and visual-based similarity which provides higher level security. We use the model with different Machine Learning Algorithms, namely Logistic Regression, Decision Trees, K-Nearest Neighbours and Random Forests, and compare the results to find the most efficient machine learning framework.

Download Full-text

COMPARATIVE STUDY OF MACHINE LEARNING ALGORITHMS FOR PHISHING WEBSITE DETECTION

International Journal of Engineering Applied Sciences and Technology ◽

10.33564/ijeast.2021.v06i01.017 ◽

2021 ◽

Vol 6 (1) ◽

Author(s):

Anuraag Velamati

Keyword(s):

Machine Learning ◽

Comparative Study ◽

Personal Information ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cyber Attack ◽

Modern Era

Phishing is the most commonly approached cyber-attack in this modern era. Through such attacks, the phisher will target the innocent users by tricking them into revealing their secure and personal information, with the purpose of using it fraudulently. In order to avoid getting phished, users should have awareness of phishing websites, have a blacklist of phishing websites which requires the knowledge of website being detected as phishing.

Download Full-text

Detecting Phishing Websites Using an Efficient Feature-based Machine Learning Framework

Revista Gestão Inovação e Tecnologias ◽

10.47059/revistageintec.v11i2.1832 ◽

2021 ◽

Vol 11 (2) ◽

pp. 2106-2112

Author(s):

K. Mohana Sundaram ◽

R. Sasikumar ◽

Atthipalli Sai Meghana ◽

Arava Anuja ◽

Chandolu Praneetha

Keyword(s):

Machine Learning ◽

Predictive Accuracy ◽

Sensitive Information ◽

Learning Methods ◽

Modified Model ◽

Learning Framework ◽

Machine Learning Methods ◽

Test Models ◽

Digital Crime ◽

Feature Based

Phishing is a form of digital crime where spam messages and spam sites attract users to exploit sensitive information on fishermen. Sensitive information obtained is used to take notes or to access money. To combat the crime of identity theft, Microsoft's cloud-based program attempts to use logical testing to determine how you can build trust with the characters. The purpose of this paper is to create a molded channel using a variety of machine learning methods. Separation is a method of machine learning that can be used effectively to identify fish, assemble and test models, use different mixing settings, and look at different mechanical learning processes, and measure the accuracy of the modified model and show multiple measurement measurements. The current study compares predictive accuracy, f1 scores, guessing and remembering multiple machine learning methods including Naïve Bayes (NB) and Random forest (RF) to detect criminal messages to steal sensitive information and improve the process by selecting highlighting strategies and improving crime classification accuracy. to steal sensitive information.

Download Full-text

Feature based Phishing Website Detection using Random Forest Classifier

International Journal for Research in Applied Science and Engineering Technology ◽

10.22214/ijraset.2021.35400 ◽

2021 ◽

Vol 9 (VI) ◽

pp. 1902-1906

Author(s):

Sonali Kadam

Keyword(s):

Machine Learning ◽

Random Forest Classifier ◽

The Internet ◽

Classification Algorithms ◽

Sensitive Information ◽

Security Threat ◽

Uniform Resource Locator ◽

Internet Users ◽

Feature Based ◽

Phishing Detection

In today’s world, one of the most vulnerable security threat which poses a problem to the internet users is phishing. Phishing is an attack made to steal the sensitive information of the users such as password, PIN, card details etc., In a phishing attack, the attacker creates a fake website to make the users click it and steal the sensitive information of users. . In this paper, we propose a feature-based phishing detection technique that uses uniform resource locator (URL) features. This paper focuses on the extracting the features which are then classified based on their effect within a website. The feature groups include address- bar related features, abnormal- based features, HTML – JavaScript based features and domain based features. We plan to use machine learning and implement some classification algorithms and compare the performance of these algorithms on our dataset.

Download Full-text

Hybrid Machine Learning: A Tool to Detect Phishing Attacks in Communication Networks

ECTI Transactions on Computer and Information Technology (ECTI-CIT) ◽

10.37936/ecti-cit.2021153.240565 ◽

2021 ◽

Vol 15 (3) ◽

pp. 374-389

Author(s):

Ademola Philip Abidoye ◽

Boniface Kabaso

Keyword(s):

Machine Learning ◽

Communication Networks ◽

Credit Card ◽

Personal Information ◽

False Positive Rate ◽

False Negative ◽

False Negative Rate ◽

Machine Learning Techniques ◽

Sensitive Information ◽

Cyber Attack

Phishing is a cyber-attack that uses disguised email as a weapon and has been on the rise in recent times. Innocent Internet user if peradventure clicking on a fraudulent link may cause him to fall victim of divulging his personal information such as credit card pin, login credentials, banking information and other sensitive information. There are many ways in which the attackers can trick victims to reveal their personal information. In this article, we select important phishing URLs features that can be used by attacker to trick Internet users into taking the attacker’s desired action. We use two machine learning techniques to accurately classify our data sets. We compare the performance of other related techniques with our scheme. The results of the experiments show that the approach is highly effective in detecting phishing URLs and attained an accuracy of 97.8% with 1.06% false positive rate, 0.5% false negative rate, and an error rate of 0.3%. The proposed scheme performs better compared to other selected related work. This shows that our approach can be used for real-time application in detecting phishing URLs.

Download Full-text

Real time cyber attack analysis on Hadoop ecosystem using machine learning algorithms

2015 2nd Asia-Pacific World Congress on Computer Science and Engineering (APWC on CSE) ◽

10.1109/apwccse.2015.7476223 ◽

2015 ◽

Author(s):

Md Tanzim Khorshed ◽

Neeraj Anand Sharma ◽

Aaron Vinek Dutt ◽

A.B.M. Shawkat Ali ◽

Yang Xiang

Keyword(s):

Machine Learning ◽

Real Time ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Cyber Attack ◽

Hadoop Ecosystem

Download Full-text

Intelligent Malware Detection Using Deep Dilated Residual Networks for Cyber Security

Research Anthology on Artificial Intelligence Applications in Security ◽

10.4018/978-1-7998-7705-9.ch050 ◽

2021 ◽

pp. 1085-1099

Author(s):

S. Abijah Roseline ◽

S. Geetha

Keyword(s):

Machine Learning ◽

Cyber Security ◽

Machine Learning Algorithms ◽

Human Interaction ◽

Machine Learning Techniques ◽

Detection Methods ◽

Security Threat ◽

Signature Detection ◽

Learning Techniques ◽

Feature Based

Malware is the most serious security threat, which possibly targets billions of devices like personal computers, smartphones, etc. across the world. Malware classification and detection is a challenging task due to the targeted, zero-day, and stealthy nature of advanced and new malwares. The traditional signature detection methods like antivirus software were effective for detecting known malwares. At present, there are various solutions for detection of such unknown malwares employing feature-based machine learning algorithms. Machine learning techniques detect known malwares effectively but are not optimal and show a low accuracy rate for unknown malwares. This chapter explores a novel deep learning model called deep dilated residual network model for malware image classification. The proposed model showed a higher accuracy of 98.50% and 99.14% on Kaggle Malimg and BIG 2015 datasets, respectively. The new malwares can be handled in real-time with minimal human interaction using the proposed deep residual model.

Download Full-text

Automatic Pulmonary Nodule Detection Applying Deep Learning or Machine Learning Algorithms to the LIDC-IDRI Database: A Systematic Review

Diagnostics ◽

10.3390/diagnostics9010029 ◽

2019 ◽

Vol 9 (1) ◽

pp. 29 ◽

Cited By ~ 20

Author(s):

Lea Pehrson ◽

Michael Nielsen ◽

Carsten Ammitzbøl Lauridsen

Keyword(s):

Machine Learning ◽

Systematic Review ◽

Deep Learning ◽

Machine Learning Algorithms ◽

Ct Scans ◽

Lung Nodules ◽

Original Research ◽

Feature Based ◽

High Level ◽

Meta Analyses

The aim of this study was to provide an overview of the literature available on machine learning (ML) algorithms applied to the Lung Image Database Consortium Image Collection (LIDC-IDRI) database as a tool for the optimization of detecting lung nodules in thoracic CT scans. This systematic review was compiled according to Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Only original research articles concerning algorithms applied to the LIDC-IDRI database were included. The initial search yielded 1972 publications after removing duplicates, and 41 of these articles were included in this study. The articles were divided into two subcategories describing their overall architecture. The majority of feature-based algorithms achieved an accuracy >90% compared to the deep learning (DL) algorithms that achieved an accuracy in the range of 82.2%–97.6%. In conclusion, ML and DL algorithms are able to detect lung nodules with a high level of accuracy, sensitivity, and specificity using ML, when applied to an annotated archive of CT scans of the lung. However, there is no consensus on the method applied to determine the efficiency of ML algorithms.

Download Full-text

Effective Parameter Optimization & Classification using Bat-Inspired Algorithm with Improving NSSA

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a1498.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 3343-3349

Keyword(s):

Machine Learning ◽

Optimal Parameter ◽

Personal Information ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Gradient Boosting ◽

Security Measures ◽

End User ◽

Effective Strategies ◽

Made In

Network Security is an important aspectin communication-related activities. In recent times, the advent of more sophisticated technologies changed the way the information is being sharedwith everyone in any part of the world.Concurrently, these advancements are mishandled to compromise the end-user devices intentionally to steal their personal information. The number of attacks made on targeted devices is increasing over time. Even though the security mechanisms used to defend the network is enhanced and kept updated periodically, new advanced methods are developed by the intruders to penetrate the system. In order to avoid these discrepancies, effective strategies must be applied to enhance the security measures in the network. In this paper, a machine learning-based approach is proposed to identify the pattern of different categories of attacks made in the past. KDD cup 1999 dataset is accessed to develop this predictive model. Bat optimization algorithm identifies the optimal parameter subset. Supervised machine learning algorithms were employed to train the model from the data to make predictions. The performance of the system is evaluated through evaluation metrics like accuracy, precision and so on. Four classification algorithms were used out of which, gradient boosting model outperformed the benchmarked algorithms and proved its importance on data classification based on the accuracy obtained from this model.

Download Full-text

Detecting Phishing Website Using Machine Learning

International Journal of Advanced Research in Science, Communication and Technology ◽

10.48175/ijarsct-1082 ◽

2021 ◽

pp. 16-19

Author(s):

Aarti Chile ◽

Mrunal Jadhav ◽

Shital Thakare ◽

Prof. Yogita Chavan

Keyword(s):

Machine Learning ◽

Electronic Communication ◽

Personal Information ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Uniform Resource Locator ◽

Debit Card ◽

Maximum Accuracy ◽

Phishing Detection

A fraud attempt to get sensitive and personal information like password, username, and bank details like credit/debit card details by masking as a reliable organization in electronic communication. The phishing website will appear the same as the legitimate website and directs the user to a page to enter personal details of the user on the fake website. Through machine learning algorithms one can improve the accuracy of the prediction. The proposed method predicts the URL based phishing websites based on features and also gives maximum accuracy. This method uses uniform resource locator (URL) features. We identified features that phishing site URLs contain. The proposed method employs those features for phishing detection. The proposed system predicts the URL based phishing websites with maximum accuracy.

Download Full-text

Feature-Based Opinion Mining and Managed Machine Learning with Sentiment Classification Models

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4555.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 3992-3998

Keyword(s):

Machine Learning ◽

Sentiment Analysis ◽

Language Processing ◽

Opinion Mining ◽

Machine Learning Algorithms ◽

Support Vector ◽

K Nearest Neighbors ◽

Data Intensive ◽

Learning Tasks ◽

Feature Based

Sentiment Analysis is individuals' opinions and feedbacks study towards a substance, which can be items, services, movies, people or events. The opinions are mostly expressed as remarks or reviews. With the social network, gatherings and websites, these reviews rose as a significant factor for the client’s decision to buy anything or not. These days, a vast scalable computing environment provides us with very sophisticated way of carrying out various data-intensive natural language processing (NLP) and machine-learning tasks to examine these reviews. One such example is text classification, a compelling method for predicting the clients' sentiment. In this paper, we attempt to center our work of sentiment analysis on movie review database. We look at the sentiment expression to order the extremity of the movie reviews on a size of 0(highly disliked) to 4(highly preferred) and perform feature extraction and ranking and utilize these features to prepare our multilabel classifier to group the movie review into its right rating. This paper incorporates sentiment analysis utilizing feature-based opinion mining and managed machine learning. The principle center is to decide the extremity of reviews utilizing nouns, verbs, and adjectives as opinion words. In addition, a comparative study on different classification approaches has been performed to determine the most appropriate classifier to suit our concern problem space. In our study, we utilized six distinctive machine learning algorithms – Naïve Bayes, Logistic Regression, SVM (Support Vector Machine), RF (Random Forest) KNN (K nearest neighbors) and SoftMax Regression.

Download Full-text