Detecting phishing websites using machine learning technique

In recent years, advancements in Internet and cloud technologies have led to a significant increase in electronic trading in which consumers make online purchases and transactions. This growth leads to unauthorized access to users’ sensitive information and damages the resources of an enterprise. Phishing is one of the familiar attacks that trick users to access malicious content and gain their information. In terms of website interface and uniform resource locator (URL), most phishing webpages look identical to the actual webpages. Various strategies for detecting phishing websites, such as blacklist, heuristic, Etc., have been suggested. However, due to inefficient security technologies, there is an exponential increase in the number of victims. The anonymous and uncontrollable framework of the Internet is more vulnerable to phishing attacks. Existing research works show that the performance of the phishing detection system is limited. There is a demand for an intelligent technique to protect users from the cyber-attacks. In this study, the author proposed a URL detection technique based on machine learning approaches. A recurrent neural network method is employed to detect phishing URL. Researcher evaluated the proposed method with 7900 malicious and 5800 legitimate sites, respectively. The experiments’ outcome shows that the proposed method’s performance is better than the recent approaches in malicious URL detection.

Download Full-text

IDS for Industrial Applications: A Federated Learning Approach with Active Personalization

Sensors ◽

10.3390/s21206743 ◽

2021 ◽

Vol 21 (20) ◽

pp. 6743

Author(s):

Vasiliki Kelli ◽

Vasileios Argyriou ◽

Thomas Lagkas ◽

George Fragulis ◽

Elisavet Grigoriou ◽

...

Keyword(s):

Machine Learning ◽

Active Learning ◽

Network Flow ◽

Detection System ◽

Human Life ◽

Industrial Sector ◽

Machine Learning Techniques ◽

Sensitive Information ◽

Learning Approaches ◽

Monitoring And Control

Internet of Things (IoT) is a concept adopted in nearly every aspect of human life, leading to an explosive utilization of intelligent devices. Notably, such solutions are especially integrated in the industrial sector, to allow the remote monitoring and control of critical infrastructure. Such global integration of IoT solutions has led to an expanded attack surface against IoT-enabled infrastructures. Artificial intelligence and machine learning have demonstrated their ability to resolve issues that would have been impossible or difficult to address otherwise; thus, such solutions are closely associated with securing IoT. Classical collaborative and distributed machine learning approaches are known to compromise sensitive information. In our paper, we demonstrate the creation of a network flow-based Intrusion Detection System (IDS) aiming to protecting critical infrastructures, stemming from the pairing of two machine learning techniques, namely, federated learning and active learning. The former is utilized for privately training models in federation, while the latter is a semi-supervised approach applied for global model adaptation to each of the participant’s traffic. Experimental results indicate that global models perform significantly better for each participant, when locally personalized with just a few active learning queries. Specifically, we demonstrate how the accuracy increase can reach 7.07% in only 10 queries.

Download Full-text

A Novel PCA-Firefly Based XGBoost Classification Model for Intrusion Detection in Networks Using GPU

Electronics ◽

10.3390/electronics9020219 ◽

2020 ◽

Vol 9 (2) ◽

pp. 219 ◽

Cited By ~ 37

Author(s):

Sweta Bhattacharya ◽

Siva Rama Krishnan S ◽

Praveen Kumar Reddy Maddikunta ◽

Rajesh Kaluri ◽

Saurabh Singh ◽

...

Keyword(s):

Machine Learning ◽

Intrusion Detection ◽

Comprehensive Evaluation ◽

Detection System ◽

Human Life ◽

Principal Component ◽

Cyber Attacks ◽

Classification Model ◽

Learning Approaches ◽

Machine Learning Model

The enormous popularity of the internet across all spheres of human life has introduced various risks of malicious attacks in the network. The activities performed over the network could be effortlessly proliferated, which has led to the emergence of intrusion detection systems. The patterns of the attacks are also dynamic, which necessitates efficient classification and prediction of cyber attacks. In this paper we propose a hybrid principal component analysis (PCA)-firefly based machine learning model to classify intrusion detection system (IDS) datasets. The dataset used in the study is collected from Kaggle. The model first performs One-Hot encoding for the transformation of the IDS datasets. The hybrid PCA-firefly algorithm is then used for dimensionality reduction. The XGBoost algorithm is implemented on the reduced dataset for classification. A comprehensive evaluation of the model is conducted with the state of the art machine learning approaches to justify the superiority of our proposed approach. The experimental results confirm the fact that the proposed model performs better than the existing machine learning models.

Download Full-text

Phisher Fighter: Website Phishing Detection System Based on URL and Term Frequency-Inverse Document Frequency Values

Journal of Cyber Security and Mobility ◽

10.13052/jcsm2245-1439.1114 ◽

2021 ◽

Author(s):

E. Sri Vishva ◽

D. Aju

Keyword(s):

Machine Learning ◽

Detection System ◽

Stochastic Gradient Descent ◽

Support Vector ◽

Sensitive Information ◽

Inverse Document Frequency ◽

Term Frequency ◽

Document Frequency ◽

Single Piece ◽

Phishing Detection

Fundamentally, phishing is a common cybercrime that is indulged by the intruders or hackers on naive and credible individuals and make them to reveal their unique and sensitive information through fictitious websites. The primary intension of this kind of cybercrime is to gain access to the ad hominem or classified information from the recipients. The obtained data comprises of information that can very well utilized to recognize an individual. The purloined personal or sensitive information is commonly marketed in the online dark market and subsequently these information will be bought by the personal identity brigands. Depending upon the sensitivity and the importance of the stolen information, the price of a single piece of purloined information would vary from few dollars to thousands of dollars. Machine learning (ML) as well as Deep Learning (DL) are powerful methods to analyse and endeavour against these phishing attacks. A machine learning based phishing detection system is proposed to protect the website and users from such attacks. In order to optimize the results in a better way, the TF-IDF (Term Frequency-Inverse Document Frequency) value of webpages is employed within the system. ML methods such as LR (Logistic Regression), RF (Random Forest), SVM (Support Vector Machine), NB (Naive Bayes) and SGD (Stochastic Gradient Descent) are applied for training and testing the obtained dataset. Henceforth, a robust phishing website detection system is developed with 90.68% accuracy.

Download Full-text

Machine Learning Based Hybrid Intrusion Detection ForVirtualized Infrastructures In Cloud Computing Environments

Journal of Physics Conference Series ◽

10.1088/1742-6596/2089/1/012072 ◽

2021 ◽

Vol 2089 (1) ◽

pp. 012072

Author(s):

Ayesha Sarosh

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Intrusion Detection ◽

Detection System ◽

Cyber Attacks ◽

Support Vector ◽

Suggested Technique ◽

Secure Environment ◽

Computing Environments ◽

Cloud Technologies

Abstract Nowadays technology steady shift have seen from the models of conventional software to the cloud technologies. Cloud computing is rapidly becoming the standard by fulfilling the computer infrastructure demands of all sizes of enterprises. One of the essential tools forbuilding trustworthy& secure environment of Cloud computing is the Intrusion detection, given the ubiquitous cyber attacks which can proliferate morph dynamically & rapidly. Machine learning (ML) based hybrid intrusion detection for virtualized infrastructures in cloud computing environments is presented in this paper. This infrastructure uses Hybrid algorithm: SVM (support vector machine) & K – means clustering classification algorithm, for improving the anomaly detection system accuracy. For evaluating this approach, UNSW-NB15 study is utilized from dataset & results compared with earlier techniques. For evaluating theperformance of suggested technique utilizes performance measures like average detection time. This approach has better accuracy compared to earlier approaches.

Download Full-text

Directed adversarial sampling attacks on phishing detection

Journal of Computer Security ◽

10.3233/jcs-191411 ◽

2021 ◽

Vol 29 (1) ◽

pp. 1-23

Author(s):

Hossein Shirazi ◽

Bruhadeshwar Bezawada ◽

Indrakshi Ray ◽

Chuck Anderson

Keyword(s):

Machine Learning ◽

Credit Card ◽

Personal Information ◽

Success Probability ◽

Training Dataset ◽

Sensitive Information ◽

Learning Approaches ◽

Adversarial Learning ◽

The Face ◽

Phishing Detection

Phishing websites trick honest users into believing that they interact with a legitimate website and capture sensitive information, such as user names, passwords, credit card numbers, and other personal information. Machine learning is a promising technique to distinguish between phishing and legitimate websites. However, machine learning approaches are susceptible to adversarial learning attacks where a phishing sample can bypass classifiers. Our experiments on publicly available datasets reveal that the phishing detection mechanisms are vulnerable to adversarial learning attacks. We investigate the robustness of machine learning-based phishing detection in the face of adversarial learning attacks. We propose a practical approach to simulate such attacks by generating adversarial samples through direct feature manipulation. To enhance the sample’s success probability, we describe a clustering approach that guides an attacker to select the best possible phishing samples that can bypass the classifier by appearing as legitimate samples. We define the notion of vulnerability level for each dataset that measures the number of features that can be manipulated and the cost for such manipulation. Further, we clustered phishing samples and showed that some clusters of samples are more likely to exhibit higher vulnerability levels than others. This helps an adversary identify the best candidates of phishing samples to generate adversarial samples at a lower cost. Our finding can be used to refine the dataset and develop better learning models to compensate for the weak samples in the training dataset.

Download Full-text

A Study: Machine Learning and Deep Learning Approaches for Intrusion Detection System

Second International Conference on Computer Networks and Communication Technologies - Lecture Notes on Data Engineering and Communications Technologies ◽

10.1007/978-3-030-37051-0_94 ◽

2020 ◽

pp. 845-849

Author(s):

C. H. Sekhar ◽

K. Venkata Rao

Keyword(s):

Machine Learning ◽

Deep Learning ◽

Intrusion Detection ◽

Intrusion Detection System ◽

Detection System ◽

Learning Approaches

Download Full-text

Attack and Anomaly Detection in IoT Networks Using Supervised Machine Learning Approaches

Revue d intelligence artificielle ◽

10.18280/ria.350102 ◽

2021 ◽

Vol 35 (1) ◽

pp. 11-21

Author(s):

Himani Tyagi ◽

Rajendra Kumar

Keyword(s):

Machine Learning ◽

Performance Metrics ◽

Detection System ◽

Feature Reduction ◽

Machine Learning Algorithms ◽

Supervised Machine Learning ◽

Testing Time ◽

Learning Approaches ◽

Reduction Techniques ◽

Share Data

IoT is characterized by communication between things (devices) that constantly share data, analyze, and make decisions while connected to the internet. This interconnected architecture is attracting cyber criminals to expose the IoT system to failure. Therefore, it becomes imperative to develop a system that can accurately and automatically detect anomalies and attacks occurring in IoT networks. Therefore, in this paper, an Intrsuion Detection System (IDS) based on extracted novel feature set synthesizing BoT-IoT dataset is developed that can swiftly, accurately and automatically differentiate benign and malicious traffic. Instead of using available feature reduction techniques like PCA that can change the core meaning of variables, a unique feature set consisting of only seven lightweight features is developed that is also IoT specific and attack traffic independent. Also, the results shown in the study demonstrates the effectiveness of fabricated seven features in detecting four wide variety of attacks namely DDoS, DoS, Reconnaissance, and Information Theft. Furthermore, this study also proves the applicability and efficiency of supervised machine learning algorithms (KNN, LR, SVM, MLP, DT, RF) in IoT security. The performance of the proposed system is validated using performance Metrics like accuracy, precision, recall, F-Score and ROC. Though the accuracy of Decision Tree (99.9%) and Randon Forest (99.9%) Classifiers are same but other metrics like training and testing time shows Random Forest comparatively better.

Download Full-text

FedPARL: Client Activity and Resource-Oriented Lightweight Federated Learning Model for Resource-Constrained Heterogeneous IoT Environment

Frontiers in Communications and Networks ◽

10.3389/frcmn.2021.657653 ◽

2021 ◽

Vol 2 ◽

Author(s):

Ahmed Imteaj ◽

M. Hadi Amini

Keyword(s):

Machine Learning ◽

Resource Availability ◽

Resource Constraints ◽

Training Model ◽

Convergence Time ◽

Battery Life ◽

Sensitive Information ◽

Learning Approaches ◽

Resource Constrained ◽

Distributed Machine Learning

Federated Learning (FL) is a recently invented distributed machine learning technique that allows available network clients to perform model training at the edge, rather than sharing it with a centralized server. Unlike conventional distributed machine learning approaches, the hallmark feature of FL is to allow performing local computation and model generation on the client side, ultimately protecting sensitive information. Most of the existing FL approaches assume that each FL client has sufficient computational resources and can accomplish a given task without facing any resource-related issues. However, if we consider FL for a heterogeneous Internet of Things (IoT) environment, a major portion of the FL clients may face low resource availability (e.g., lower computational power, limited bandwidth, and battery life). Consequently, the resource-constrained FL clients may give a very slow response, or may be unable to execute expected number of local iterations. Further, any FL client can inject inappropriate model during a training phase that can prolong convergence time and waste resources of all the network clients. In this paper, we propose a novel tri-layer FL scheme, Federated Proximal, Activity and Resource-Aware 31 Lightweight model (FedPARL), that reduces model size by performing sample-based pruning, avoids misbehaved clients by examining their trust score, and allows partial amount of work by considering their resource-availability. The pruning mechanism is particularly useful while dealing with resource-constrained FL-based IoT (FL-IoT) clients. In this scenario, the lightweight training model will consume less amount of resources to accomplish a target convergence. We evaluate each interested client's resource-availability before assigning a task, monitor their activities, and update their trust scores based on their previous performance. To tackle system and statistical heterogeneities, we adapt a re-parameterization and generalization of the current state-of-the-art Federated Averaging (FedAvg) algorithm. The modification of FedAvg algorithm allows clients to perform variable or partial amounts of work considering their resource-constraints. We demonstrate that simultaneously adapting the coupling of pruning, resource and activity awareness, and re-parameterization of FedAvg algorithm leads to more robust convergence of FL in IoT environment.

Download Full-text

Effective Intrusion Detection System by using LOS Classifier

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.b7396.129219 ◽

2019 ◽

Vol 9 (2) ◽

pp. 2961-2966

Keyword(s):

Machine Learning ◽

Cloud Computing ◽

Intrusion Detection ◽

Social Networking ◽

Intrusion Detection System ◽

Detection System ◽

Learning Approaches ◽

Area Unit ◽

Hybrid Development

With winning advances like catch of Things, Cloud Computing and Social Networking, mammoth proportions of framework traffic associated information area unit made Intrusion Detection System for sort out security suggests the strategy to look at partner unapproved access on framework traffic. For Intrusion Detection System we are going to call attention to with respect to Machine Learning Approaches. it's accomplice rising field of enrolling which can explicitly act with a decent arrangement of less human affiliation. System gains from the data intentionally affirmation and makes perfect objectives. all through this paper we keep an eye on zone unit going to separated styles of Machine Learning pulls in near and had done relative examination in it. inside the last we keep an eye on territory unit going to foreseen the idea of hybrid development, that might be a blend of host principally and framework based for the most part Intrusion Detection System.

Download Full-text

Towards a Multi-Layered Phishing Detection

Sensors ◽

10.3390/s20164540 ◽

2020 ◽

Vol 20 (16) ◽

pp. 4540

Author(s):

Kieran Rendall ◽

Antonia Nisioti ◽

Alexios Mylonas

Keyword(s):

Machine Learning ◽

State Of The Art ◽

Detection System ◽

Single Layer ◽

Supervised Machine Learning ◽

Data Driven ◽

Feature Sets ◽

Phishing Attacks ◽

Production Environments ◽

Phishing Detection

Phishing is one of the most common threats that users face while browsing the web. In the current threat landscape, a targeted phishing attack (i.e., spear phishing) often constitutes the first action of a threat actor during an intrusion campaign. To tackle this threat, many data-driven approaches have been proposed, which mostly rely on the use of supervised machine learning under a single-layer approach. However, such approaches are resource-demanding and, thus, their deployment in production environments is infeasible. Moreover, most previous works utilise a feature set that can be easily tampered with by adversaries. In this paper, we investigate the use of a multi-layered detection framework in which a potential phishing domain is classified multiple times by models using different feature sets. In our work, an additional classification takes place only when the initial one scores below a predefined confidence level, which is set by the system owner. We demonstrate our approach by implementing a two-layered detection system, which uses supervised machine learning to identify phishing attacks. We evaluate our system with a dataset consisting of active phishing attacks and find that its performance is comparable to the state of the art.

Download Full-text