Phishing websites blacklisting using machine  learning algorithms

The development of the phishing sites is by all accounts amazing. Despite the fact that the web clients know about these sorts of phishing assaults, part of clients move toward becoming casualty to these assaults. Quantities of assaults are propelled with the point of making web clients trust that they are speaking with a trusted entity. Phishing is one among them. Phishing is consistently developing since it is anything but difficult to duplicate a whole site utilizing the HTML source code. By rolling out slight improvements in the source code, it is conceivable to guide the victim to the phishing site. Phishers utilize part of strategies to draw the unsuspected web client. Consequently an efficient mechanism is required to recognize the phishing sites from the real sites keeping in mind the end goal to spare credential data. To detect the phishing websites and to identify it as information leaking sites, the system proposes data mining algorithms. In this paper, machine-learning algorithms have been utilized for modeling the prediction task. The process of identity extraction and feature extraction are discussed in this paper and the various experiments carried out to discover the performance of the models are demonstrated.

Download Full-text

Benchmarking Data Mining Algorithms

Data Warehousing and Web Engineering ◽

10.4018/978-1-931777-02-5.ch003 ◽

2011 ◽

pp. 77-99

Author(s):

Balaji Rajagopalan ◽

Ravi Krovi

Keyword(s):

Machine Learning ◽

Data Mining ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Successful Implementation ◽

Basic Premise ◽

Data Mining Algorithms ◽

External Data ◽

Mining Algorithms ◽

Careful Assessment

Data mining is the process of sifting through the mass of organizational (internal and external) data to identify patterns critical for decision support. Successful implementation of the data mining effort requires a careful assessment of the various tools and algorithms available. The basic premise of this study is that machine-learning algorithms, which are assumption free, should outperform their traditional counterparts when mining business databases. The objective of this study is to test this proposition by investigating the performance of the algorithms for several scenarios. The scenarios are based on simulations designed to reflect the extent to which typical statistical assumptions are violated in the business domain. The results of the computational experiments support the proposition that machine learning algorithms generally outperform their statistical counterparts under certain conditions. These can be used as prescriptive guidelines for the applicability of data mining techniques.

Download Full-text

Dr. Phish: Phishing Website Detector

E3S Web of Conferences ◽

10.1051/e3sconf/202129701032 ◽

2021 ◽

Vol 297 ◽

pp. 01032

Author(s):

Harish Kumar ◽

Anshal Prasad ◽

Ninad Rane ◽

Nilay Tamane ◽

Anjali Yeole

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Cyber Crime ◽

Data Mining Algorithms ◽

Learning Techniques ◽

Mining Algorithms ◽

Host Properties ◽

New Strategies

Phishing is a common attack on credulous people by making them disclose their unique information. It is a type of cyber-crime where false sites allure exploited people to give delicate data. This paper deals with methods for detecting phishing websites by analyzing various features of URLs by Machine learning techniques. This experimentation discusses the methods used for detection of phishing websites based on lexical features, host properties and page importance properties. We consider various data mining algorithms for evaluation of the features in order to get a better understanding of the structure of URLs that spread phishing. To protect end users from visiting these sites, we can try to identify the phishing URLs by analyzing their lexical and host-based features.A particular challenge in this domain is that criminals are constantly making new strategies to counter our defense measures. To succeed in this contest, we need Machine Learning algorithms that continually adapt to new examples and features of phishing URLs.

Download Full-text

Feature Extraction with Machine Learning and Data Mining Algorithms

Multisensor Data Fusion and Machine Learning for Environmental Remote Sensing ◽

10.1201/9781315154602-7 ◽

2018 ◽

pp. 127-163

Author(s):

Ni-Bin Chang ◽

Kaixu Bai

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Extraction ◽

Data Mining Algorithms ◽

Mining Algorithms

Download Full-text

Benchmarking data mining approaches for traveler segmentation

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v11i1.pp409-415 ◽

2021 ◽

Vol 11 (1) ◽

pp. 409

Author(s):

Tamer Uçar ◽

Adem Karahoca

Keyword(s):

Machine Learning ◽

Data Mining ◽

Machine Learning Algorithms ◽

Travel Agency ◽

Data Set ◽

Data Mining Algorithms ◽

Travel Agencies ◽

User Data ◽

Hybrid Data ◽

Mining Algorithms

The purpose of this study is proposing a hybrid data mining solution for traveler segmentation in tourism domain which can be used for planning user-oriented trips, arranging travel campaigns or similar services. Data set used in this work have been provided by a travel agency which contains flight and hotel bookings of travelers. Initially, the data set was prepared for running data mining algorithms. Then, various machine learning algorithms were benchmarked for performing accurate traveler segmentation and prediction tasks. Fuzzy C-means and X-means algorithms were applied for clustering user data. J48 and multilayer perceptron (MLP) algorithms were applied for classifying instances based on segmented user data. According to the findings of this study, J48 has the most effective classification results when applied on the data set which is clustered with X-means algorithm. The proposed hybrid data mining solution can be used by travel agencies to plan trip campaigns for similar travelers.

Download Full-text

Performance Analysis of Machine Learning Algorithms and Feature Extraction Methods for Sentiment Analysis

10.1109/icses52305.2021.9633882 ◽

2021 ◽

Author(s):

Anshumaan Chauhan ◽

Ayushi Agarwal ◽

Razia Sulthana

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Performance Analysis ◽

Sentiment Analysis ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms

Download Full-text

Comparison of Machine Learning Algorithms to Recognize Human Activities from Images and Videos Using Pose Estimation and Feature Extraction

Proceedings of the Future Technologies Conference (FTC) 2020, Volume 1 - Advances in Intelligent Systems and Computing ◽

10.1007/978-3-030-63128-4_7 ◽

2020 ◽

pp. 78-87

Author(s):

Md Hasibul Huq ◽

Mohammed Alnakli ◽

Zakiya Jafrin ◽

Tanjima Nasreen Jenia

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Pose Estimation ◽

Human Activities ◽

Learning Algorithms ◽

Machine Learning Algorithms

Download Full-text

Predicting Student Failure in University Examination using Machine Learning Algorithms

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.e2643.039520 ◽

2020 ◽

Vol 9 (5) ◽

pp. 956-959

Keyword(s):

Machine Learning ◽

Data Mining ◽

Performance Management ◽

Student Performance ◽

Learning Algorithms ◽

Educational Data Mining ◽

Machine Learning Algorithms ◽

Machine Learning Techniques ◽

Social Characteristics ◽

Student Failure

Student Performance Management is one of the key pillars of the higher education institutions since it directly impacts the student’s career prospects and college rankings. This paper follows the path of learning analytics and educational data mining by applying machine learning techniques in student data for identifying students who are at the more likely to fail in the university examinations and thus providing needed interventions for improved student performance. The Paper uses data mining approach with 10 fold cross validation to classify students based on predictors which are demographic and social characteristics of the students. This paper compares five popular machine learning algorithms Rep Tree, Jrip, Random Forest, Random Tree, Naive Bayes algorithms based on overall classifier accuracy as well as other class specific indicators i.e. precision, recall, f-measure. Results proved that Rep tree algorithm outperformed other machine learning algorithms in classifying students who are at more likely to fail in the examinations.

Download Full-text

Lead-based virtual screening and prediction of EGFR inhibitors using PubChem’s database with data mining and machine learning algorithms

10.1021/scimeetings.0c03836 ◽

2020 ◽

Cited By ~ 1

Author(s):

Kedan He

Keyword(s):

Machine Learning ◽

Data Mining ◽

Virtual Screening ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Egfr Inhibitors

Download Full-text

Abnormal Behavior Detection: A Comparative Study of Machine Learning Algorithms Using Feature Extraction and a Fully Labeled Dataset

2019 International Conference on Information Systems and Computer Science (INCISCOS) ◽

10.1109/inciscos49368.2019.00019 ◽

2019 ◽

Author(s):

Mateo Hervas ◽

Christian Fernandez-Medina ◽

Pedro Shiguihara-Juarez ◽

Ricardo Gonzalez-Valenzuela

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Comparative Study ◽

Learning Algorithms ◽

Machine Learning Algorithms ◽

Abnormal Behavior ◽

Behavior Detection ◽

Abnormal Behavior Detection

Download Full-text

Amino Acid k-mer Feature Extraction for Quantitative Antimicrobial Resistance (AMR) Prediction by Machine Learning and Model Interpretation for Biological Insights

Biology ◽

10.3390/biology9110365 ◽

2020 ◽

Vol 9 (11) ◽

pp. 365

Author(s):

Taha ValizadehAslani ◽

Zhengqiao Zhao ◽

Bahrad A. Sokhansanj ◽

Gail L. Rosen

Keyword(s):

Machine Learning ◽

Feature Extraction ◽

Amino Acid ◽

Computational Complexity ◽

Antimicrobial Resistance ◽

Learning Algorithms ◽

Extraction Methods ◽

Machine Learning Algorithms ◽

Model Interpretation ◽

New Feature

Machine learning algorithms can learn mechanisms of antimicrobial resistance from the data of DNA sequence without any a priori information. Interpreting a trained machine learning algorithm can be exploited for validating the model and obtaining new information about resistance mechanisms. Different feature extraction methods, such as SNP calling and counting nucleotide k-mers have been proposed for presenting DNA sequences to the model. However, there are trade-offs between interpretability, computational complexity and accuracy for different feature extraction methods. In this study, we have proposed a new feature extraction method, counting amino acid k-mers or oligopeptides, which provides easier model interpretation compared to counting nucleotide k-mers and reaches the same or even better accuracy in comparison with different methods. Additionally, we have trained machine learning algorithms using different feature extraction methods and compared the results in terms of accuracy, model interpretability and computational complexity. We have built a new feature selection pipeline for extraction of important features so that new AMR determinants can be discovered by analyzing these features. This pipeline allows the construction of models that only use a small number of features and can predict resistance accurately.

Download Full-text