A Novel Approach for Extraction of Relevant Web Pages from WWW Using Data Mining

Weblog analysis takes raw data from access logs and performs study on this data for extracting statistical information. This info incorporates a variety of data for the website activity such as average no. of hits, total no. of user visits, failed and successful cached hits, average time of view, average path length over a website; analytical information such as page was not found errors and server errors; server information, which includes exit and entry pages, single access pages, and top visited pages; requester information like which type of search engines is used, keywords and top referring sites, and so on. In general, the website administrator uses this kind of knowledge to make the system act better, helping in the manipulation process of site, then also forgiving marketing decisions support. Most of the advanced web mining systems practice this kind of information to take out more difficult or complex interpretations using data mining procedures like association rules, clustering, and classification.

Download Full-text

Qualitative usability feature selection with ranking: a novel approach for ranking the identified usability problematic attributes for academic websites using data-mining techniques

Human-centric Computing and Information Sciences ◽

10.1186/s13673-017-0111-8 ◽

2017 ◽

Vol 7 (1) ◽

Cited By ~ 5

Author(s):

Kalpna Sagar ◽

Anju Saha

Keyword(s):

Data Mining ◽

Feature Selection ◽

Data Mining Techniques ◽

Novel Approach ◽

Academic Websites ◽

Using Data

Download Full-text

A Novel Approach to Extracting Casing Status Features Using Data Mining

Entropy ◽

10.3390/e16010389 ◽

2013 ◽

Vol 16 (1) ◽

pp. 389-404 ◽

Cited By ~ 2

Author(s):

Jikai Chen ◽

Haoyu Li ◽

Yanjun Wang ◽

Ronghua Xie ◽

Xingbin Liu

Keyword(s):

Data Mining ◽

Novel Approach ◽

Using Data

Download Full-text

A novel approach for the prediction of treadmill test in cardiology using data mining algorithms implemented as a mobile application

Indian Heart Journal ◽

10.1016/j.ihj.2018.01.011 ◽

2018 ◽

Vol 70 (4) ◽

pp. 511-518 ◽

Cited By ~ 2

Author(s):

A. Jerline Amutha ◽

R. Padmajavalli ◽

D. Prabhakar

Keyword(s):

Data Mining ◽

Mobile Application ◽

Treadmill Test ◽

Data Mining Algorithms ◽

Novel Approach ◽

Using Data ◽

Mining Algorithms

Download Full-text

Web Search using Improved Concept Based Query Refinement

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2013.1193 ◽

2013 ◽

pp. 186-189

Author(s):

Ralla Suresh ◽

Saritha Vemuri ◽

Swetha V

Keyword(s):

Query Expansion ◽

Web Search ◽

Data Extraction ◽

Web Pages ◽

Web Content ◽

Novel Approach ◽

Improve Accuracy ◽

Mining Methods ◽

Using Data ◽

Semantically Heterogeneous

The information extracted from Web pages can be used for effective query expansion. The aspect needed to improve accuracy of web search engines is the inclusion of metadata, not only to analyze Web content, but also to interpret. With the Web of today being unstructured and semantically heterogeneous, keyword-based queries are likely to miss important results. . Using data mining methods, our system derives dependency rules and applies them to concept-based queries. This paper presents a novel approach for query expansion that applies dependence rules mined from a large Web World, combining several existing techniques for data extraction and mining, to integrate the system into COMPACT, our prototype implementation of a concept-based search engine.

Download Full-text

A Novel Approach for Heart Disease Diagnosis using Data Mining and Fuzzy Logic

International Journal of Computer Applications ◽

10.5120/8658-2498 ◽

2012 ◽

Vol 54 (17) ◽

pp. 16-21 ◽

Cited By ~ 16

Author(s):

Nidhi Bhatla ◽

Kiran Jyoti

Keyword(s):

Data Mining ◽

Fuzzy Logic ◽

Heart Disease ◽

Disease Diagnosis ◽

Novel Approach ◽

Using Data ◽

Heart Disease Diagnosis

Download Full-text

Detection of Drive-by Download Attacks Using Machine Learning Approach

Cognitive Analytics ◽

10.4018/978-1-7998-2460-2.ch082 ◽

2020 ◽

pp. 1598-1611

Author(s):

Monther Aldwairi ◽

Musaab Hasan ◽

Zayed Balbahaith

Keyword(s):

Machine Learning ◽

False Positive Rate ◽

Detection Accuracy ◽

Web Pages ◽

Financial Loss ◽

Detection Model ◽

Detection Systems ◽

Novel Approach ◽

Positive Rate ◽

Using Data

Drive-by download refers to attacks that automatically download malwares to user's computer without his knowledge or consent. This type of attack is accomplished by exploiting web browsers and plugins vulnerabilities. The damage may include data leakage leading to financial loss. Traditional antivirus and intrusion detection systems are not efficient against such attacks. Researchers proposed plenty of detection approaches mostly passive blacklisting. However, a few proposed dynamic classification techniques, which suffer from clear shortcomings. In this paper, we propose a novel approach to detect drive-by download infected web pages based on extracted features from their source code. We test 23 different machine learning classifiers using data set of 5435 webpages and based on the detection accuracy we selected the top five to build our detection model. The approach is expected to serve as a base for implementing and developing anti drive-by download programs. We develop a graphical user interface program to allow the end user to examine the URL before visiting the website. The Bagged Trees classifier exhibited the highest accuracy of 90.1% and reported 96.24% true positive and 26.07% false positive rate.

Download Full-text