scholarly journals PHISHING DETECTION FROM URLS BY USING NEURAL NETWORKS

Author(s):  
Ozgur Koray Sahingoz ◽  
Saide Işılay Baykal ◽  
Deniz Bulut

The objective of this undertaking is to apply neural systems to phishing email recognition and assess the adequacy of this methodology. We structure the list of capabilities, process the phishing dataset, and execute the Neural Network frameworks. we analyze its exhibition against that of other real Artificial Intelligence Techniques – DT , K-nearest , NB and SVM machine.. The equivalent dataset and list of capabilities are utilized in the correlation. From the factual examination, we infer that Neural Networks with a proper number of concealed units can accomplish acceptable precision notwithstanding when the preparation models are rare. Additionally, our element determination is compelling in catching the qualities of phishing messages, as most AI calculations can yield sensible outcomes with it.


Phishing attacks have risen by 209% in the last 10 years according to the Anti Phishing Working Group (APWG) statistics [19]. Machine learning is commonly used to detect phishing attacks. Researchers have traditionally judged phishing detection models with either accuracy or F1-scores, however in this paper we argue that a single metric alone will never correlate to a successful deployment of machine learning phishing detection model. This is because every machine learning model will have an inherent trade-off between it’s False Positive Rate (FPR) and False Negative Rate (FNR). Tuning the trade-off is important since a higher or lower FPR/FNR will impact the user acceptance rate of any deployment of a phishing detection model. When models have high FPR, they tend to block users from accessing legitimate webpages, whereas a model with a high FNR will allow the users to inadvertently access phishing webpages. Either one of these extremes may cause a user base to either complain (due to blocked pages) or fall victim to phishing attacks. Depending on the security needs of a deployment (secure vs relaxed setting) phishing detection models should be tuned accordingly. In this paper, we demonstrate two effective techniques to tune the trade-off between FPR and FNR: varying the class distribution of the training data and adjusting the probabilistic prediction threshold. We demonstrate both techniques using a data set of 50,000 phishing and 50,000 legitimate sites to perform all experiments using three common machine learning algorithms for example, Random Forest, Logistic Regression, and Neural Networks. Using our techniques we are able to regulate a model’s FPR/FNR. We observed that among the three algorithms we used, Neural Networks performed best; resulting in an higher F1-score of 0.98 with corresponding FPR/FNR values of 0.0003 and 0.0198 respectively.


Sign in / Sign up

Export Citation Format

Share Document