COMPARISON OF CLASSIFICATION ALGORITHMS TO DETECT PHISHING WEB PAGES USING FEATURE SELECTION AND EXTRACTION

The phishing is a kind of e-commerce lure which try to steal the confidential information of the web user by making identical website of legitimate one in which the contents and images almost remains similar to the legitimate website with small changes. Another way of phishing is to make minor changes in the URL or in the domain of the legitimate website. In this paper, a number of anti-phishing toolbars have been discussed and proposed a system model to tackle the phishing attack. The proposed anti-phishing system is based on the development of the Plug-in tool for the web browser. The performance of the proposed system is studied with three different data mining classification algorithms which are Random Forest, Nearest Neighbour Classification (NNC), Bayesian Classifier (BC). To evaluate the proposed anti-phishing system for the detection of phishing websites, 7690 legitimate websites and 2280 phishing websites have been collected from authorised sources like APWG database and PhishTank. After analyzing the data mining algorithms over phishing web pages, it is found that the Bayesian algorithm gives fast response and gives more accurate results than other algorithms.

Download Full-text

Phishing websites blacklisting using machine learning algorithms

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.7.10646 ◽

2018 ◽

Vol 7 (1.7) ◽

pp. 179

Author(s):

Nivedhitha G ◽

Carmel Mary Belinda M.J ◽

Rupavathy N

Keyword(s):

Machine Learning ◽

Data Mining ◽

Feature Extraction ◽

Learning Algorithms ◽

Source Code ◽

Machine Learning Algorithms ◽

Paper Machine ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

The Web

The development of the phishing sites is by all accounts amazing. Despite the fact that the web clients know about these sorts of phishing assaults, part of clients move toward becoming casualty to these assaults. Quantities of assaults are propelled with the point of making web clients trust that they are speaking with a trusted entity. Phishing is one among them. Phishing is consistently developing since it is anything but difficult to duplicate a whole site utilizing the HTML source code. By rolling out slight improvements in the source code, it is conceivable to guide the victim to the phishing site. Phishers utilize part of strategies to draw the unsuspected web client. Consequently an efficient mechanism is required to recognize the phishing sites from the real sites keeping in mind the end goal to spare credential data. To detect the phishing websites and to identify it as information leaking sites, the system proposes data mining algorithms. In this paper, machine-learning algorithms have been utilized for modeling the prediction task. The process of identity extraction and feature extraction are discussed in this paper and the various experiments carried out to discover the performance of the models are demonstrated.

Download Full-text

Multimedia Filtering Analysis of Massive Information Combined with Data Mining Algorithms

Advances in Multimedia ◽

10.1155/2021/7461874 ◽

2021 ◽

Vol 2021 ◽

pp. 1-7

Author(s):

Bo Wang

Keyword(s):

Data Mining ◽

Big Data ◽

Real Time ◽

System Model ◽

Information Presentation ◽

Packet Filtering ◽

Data Mining Algorithms ◽

Content Recognition ◽

Mining Algorithms ◽

Massive Information

With the advent of the big data era, information presentation has exploded. For example, rich methods such as audio and video have integrated more information, but with it, a lot of bad information has been brought. In view of this situation, this paper relies on data mining algorithms, builds a multimedia filtering system model for massive information, and integrates content recognition, packet filtering, and other technologies to match the two to ensure the integrity and real time of filtering. Practice results prove that the method is effective.

Download Full-text

A New Associative Classification Algorithm for Predicting Groundwater Locations

Journal of Information & Knowledge Management ◽

10.1142/s0219649218500430 ◽

2018 ◽

Vol 17 (04) ◽

pp. 1850043

Author(s):

Faisal Aburub ◽

Wa’el Hadi

Keyword(s):

Data Mining ◽

Groundwater Exploration ◽

Classification Algorithms ◽

Associative Classification ◽

Detection Problem ◽

Classification Rate ◽

Data Mining Algorithms ◽

New Locations ◽

Using Data ◽

Mining Algorithms

In this paper, we study the problem of predicting new locations of groundwater in Jordan through the application of a proposed new method, Groundwater Prediction using Associative Classification (GwPAC). We identify features that differentiate locations of groundwater wells according to whether or not they contain water. In addition, we survey intelligent-based methods related to groundwater exploration and management. Three experimental analyses were conducted with the objective to evaluate the capability of data mining algorithms using real groundwater data from the Ministry of Water and Irrigation. In the first experiment, we investigated the performance of GwPAC against three well-known associative classification algorithms, namely CBA, CMAR and FACA. Furthermore, three rule-based algorithms — C4.5, Random Forest and PBC4cip — were investigated in the second experiment; further, so as to generalise the capability of using data mining for solving the groundwater detection problem, four benchmark algorithms — SVMs, NB, KNN and ANNs — were evaluated in the third experiment. From all the experiments, the results indicated that all considered data mining algorithms predict locations of groundwater with acceptable classification rate (all classification accuracies [Formula: see text]%), and can be useful methods when seeking to address the problem of exploring new groundwater locations.

Download Full-text

Data Mining Using $\mathcal{MLC}++$ a Machine Learning Library in C++

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821309700027x ◽

1997 ◽

Vol 06 (04) ◽

pp. 537-566 ◽

Cited By ~ 69

Author(s):

Ron Kohavi ◽

Dan Sommerfield ◽

James Dougherty

Keyword(s):

Machine Learning ◽

Data Mining ◽

Pattern Recognition ◽

Statistical Analysis ◽

Classification Algorithms ◽

Pattern Recognition Techniques ◽

Data Mining Algorithms ◽

Multiple Classification ◽

Mining Algorithms ◽

New Algorithms

Data mining algorithms including maching learning, statistical analysis, and pattern recognition techniques can greatly improve our understanding of data warehouses that are now becoming more widespread. In this paper, we focus on classification algorithms and review the need for multiple classification algorithms. We describe a system called [Formula: see text], which was designed to help choose the appropriate classification algorithm for a given dataset by making it easy to compare the utility of different algorithms on a specific dataset of interest. [Formula: see text] not only provides a workbench for such comparisons, but also provides a library of C++ classes to aid in the development of new algorithms, especially hybrid algorithms and multi-strategy algorithms. Such algorithms are generally hard to code from scratch. We discuss design issues, interfaces to other programs, and visualization of the resulting classifiers.

Download Full-text

Big Data Analysis of Web Data Extraction

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i4.37.24095 ◽

2018 ◽

Vol 7 (4.37) ◽

pp. 168

Author(s):

Nadia Ibrahim ◽

Alaa Hassan ◽

Marwah Nihad

Keyword(s):

Data Mining ◽

Data Analysis ◽

High Performance ◽

Data Extraction ◽

Large Data ◽

Heterogeneous Data ◽

Web Pages ◽

Target Domain ◽

Data Mining Algorithms ◽

The Web

In this study, the large data extraction techniques; include detection of patterns and secret relationships between factors numbering and bring in the required information. Rapid analysis of massive data can lead to innovation and concepts of the theoretical value. Compared with results from mining between traditional data sets and the vast amount of large heterogeneous data interdependent it has the ability expand the knowledge and ideas about the target domain. We studied in this research data mining on the Internet. The various networks that are used to extract data onto different locations complex may appear sometimes and has been used to extract information on the web technology to extract and data analysis (Marwah et al., 2016). In this research, we extracted the information on large quantities of the web pages and examined the pages of the site using Java code, and we added the extracted information on a special database for the web page. We used the data network function to get accurate results of evaluating and categorizing the data pages found, which identifies the trusted web or risky web pages, and imported the data onto a CSV extension. Consequently, examine and categorize these data using WEKA to obtain accurate results. We concluded from the results that the applied data mining algorithms are better than other techniques in classification and extraction of data and high performance.

Download Full-text

Novel Adverse Events of Iloperidone: A Disproportionality Analysis in US Food and Drug Administration Adverse Event Reporting System (FAERS) Database

Current Drug Safety ◽

10.2174/1574886313666181026100000 ◽

2019 ◽

Vol 14 (1) ◽

pp. 21-26 ◽

Cited By ~ 2

Author(s):

Viswam Subeesh ◽

Eswaran Maheswari ◽

Hemendra Singh ◽

Thomas Elsa Beulah ◽

Ann Mary Swaroop

Keyword(s):

Data Mining ◽

Adverse Event ◽

Adverse Events ◽

Reporting System ◽

Adverse Event Reporting System ◽

Adverse Event Reporting ◽

Disproportionality Analysis ◽

Positive Signal ◽

Data Mining Algorithms ◽

Mining Algorithms

Background: The signal is defined as “reported information on a possible causal relationship between an adverse event and a drug, of which the relationship is unknown or incompletely documented previously”. Objective: To detect novel adverse events of iloperidone by disproportionality analysis in FDA database of Adverse Event Reporting System (FAERS) using Data Mining Algorithms (DMAs). Methodology: The US FAERS database consists of 1028 iloperidone associated Drug Event Combinations (DECs) which were reported from 2010 Q1 to 2016 Q3. We consider DECs for disproportionality analysis only if a minimum of ten reports are present in database for the given adverse event and which were not detected earlier (in clinical trials). Two data mining algorithms, namely, Reporting Odds Ratio (ROR) and Information Component (IC) were applied retrospectively in the aforementioned time period. A value of ROR-1.96SE>1 and IC- 2SD>0 were considered as the threshold for positive signal. Results: The mean age of the patients of iloperidone associated events was found to be 44years [95% CI: 36-51], nevertheless age was not mentioned in twenty-one reports. The data mining algorithms exhibited positive signal for akathisia (ROR-1.96SE=43.15, IC-2SD=2.99), dyskinesia (21.24, 3.06), peripheral oedema (6.67,1.08), priapism (425.7,9.09) and sexual dysfunction (26.6-1.5) upon analysis as those were well above the pre-set threshold. Conclusion: Iloperidone associated five potential signals were generated by data mining in the FDA AERS database. The result requires an integration of further clinical surveillance for the quantification and validation of possible risks for the adverse events reported of iloperidone.

Download Full-text