Efficient Web Mining for Traversal Path Patterns

A maximal forward reference of a Web user is a longest consecutive sequence of Web pages visited by the user in a session without revisiting some previously visited page in the sequence. Efficient mining of frequent traversal path patterns, that is, large reference sequences of maximal forward references, from very large Web logs is a fundamental problem in Web mining. This chapter aims at designing algorithms for this problem with the best possible efficiency. First, two optimal linear time algorithms are designed for finding maximal forward references from Web logs. Second, two algorithms for mining frequent traversal path patterns are devised with the help of a fast construction of shallow generalized suffix trees over a very large alphabet. These two algorithms have respectively provable linear and sublinear time complexity, and their performances are analyzed in comparison with the a priori-like algorithms and the Ukkonen algorithm. It is shown that these two new algorithms are substantially more efficient than the a priori-like algorithms and the Ukkonen algorithm.

Download Full-text

BUILDING A KNOWLEDGE BASE FOR IMPLEMENTING A WEB-BASED COMPUTERIZED RECOMMENDATION SYSTEM

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213007003552 ◽

2007 ◽

Vol 16 (05) ◽

pp. 793-828 ◽

Cited By ~ 10

Author(s):

JUAN D. VELÁSQUEZ ◽

VASILE PALADE

Keyword(s):

Knowledge Base ◽

Web Site ◽

Web Mining ◽

Recommendation System ◽

The Internet ◽

Web Pages ◽

Web Based ◽

Web Logs ◽

Mining Tools ◽

The Web

Understanding the web user browsing behaviour in order to adapt a web site to the needs of a particular user represents a key issue for many commercial companies that do their business over the Internet. This paper presents the implementation of a Knowledge Base (KB) for building web-based computerized recommender systems. The Knowledge Base consists of a Pattern Repository that contains patterns extracted from web logs and web pages, by applying various web mining tools, and a Rule Repository containing rules that describe the use of discovered patterns for building navigation or web site modification recommendations. The paper also focuses on testing the effectiveness of the proposed online and offline recommendations. An ample real-world experiment is carried out on a web site of a bank.

Download Full-text

Asymptotically optimal linear time algorithms for two-stage and three-stage flexible flow shops

Naval Research Logistics (NRL) ◽

10.1002/(sici)1520-6750(200004)47:3<259::aid-nav5>3.0.co;2-k ◽

2000 ◽

Vol 47 (3) ◽

pp. 259-268 ◽

Cited By ~ 16

Author(s):

Christos Koulamas ◽

George J. Kyparisis

Keyword(s):

Linear Time ◽

Two Stage ◽

Flow Shops ◽

Asymptotically Optimal ◽

Optimal Linear ◽

Linear Time Algorithms

Download Full-text

Reversed Lempel–Ziv Factorization with Suffix Trees

Algorithms ◽

10.3390/a14060161 ◽

2021 ◽

Vol 14 (6) ◽

pp. 161

Author(s):

Dominik Köppl

Keyword(s):

Suffix Tree ◽

Linear Time ◽

Suffix Trees ◽

Tree Representations ◽

Linear Time Algorithms

We present linear-time algorithms computing the reversed Lempel–Ziv factorization [Kolpakov and Kucherov, TCS’09] within the space bounds of two different suffix tree representations. We can adapt these algorithms to compute the longest previous non-overlapping reverse factor table [Crochemore et al., JDA’12] within the same space but pay a multiplicative logarithmic time penalty.

Download Full-text

Learning DOM Trees of Web Pages by Subpath Kernel and Detecting Fake e-Commerce Sites

Machine Learning and Knowledge Extraction ◽

10.3390/make3010006 ◽

2021 ◽

Vol 3 (1) ◽

pp. 95-122

Author(s):

Kilho Shin ◽

Taichi Ishikawa ◽

Yu-Lu Liu ◽

David Lawrence Shepard

Keyword(s):

Web Sites ◽

Linear Time ◽

Learning Performance ◽

Detection Methods ◽

Web Pages ◽

Real Problem ◽

Accuracy Score ◽

Positive Definite Kernels ◽

Performance Challenges ◽

Linear Time Algorithms

The subpath kernel is a class of positive definite kernels defined over trees, which has the following advantages for the purposes of classification, regression and clustering: it can be incorporated into a variety of powerful kernel machines including SVM; It is invariant whether input trees are ordered or unordered; It can be computed by significantly fast linear-time algorithms; And, finally, its excellent learning performance has been proven through intensive experiments in the literature. In this paper, we leverage recent advances in tree kernels to solve real problems. As an example, we apply our method to the problem of detecting fake e-commerce sites. Although the problem is similar to phishing site detection, the fact that mimicking existing authentic sites is harmful for fake e-commerce sites marks a clear difference between these two problems. We focus on fake e-commerce site detection for three reasons: e-commerce fraud is a real problem that companies and law enforcement have been cooperating to solve; Inefficiency hampers existing approaches because datasets tend to be large, while subpath kernel learning overcomes these performance challenges; And we offer increased resiliency against attempts to subvert existing detection methods through incorporating robust features that adversaries cannot change: the DOM-trees of web-sites. Our real-world results are remarkable: our method has exhibited accuracy as high as 0.998 when training SVM with 1000 instances and evaluating accuracy for almost 7000 independent instances. Its generalization efficiency is also excellent: with only 100 training instances, the accuracy score reached 0.996.

Download Full-text

Almost Linear Time Algorithms for Minsum k-Sink Problems on Dynamic Flow Path Networks

Theoretical Computer Science ◽

10.1016/j.tcs.2021.05.003 ◽

2021 ◽

Author(s):

Yuya Higashikawa ◽

Naoki Katoh ◽

Junichi Teruyama ◽

Koji Watase

Keyword(s):

Linear Time ◽

Flow Path ◽

Dynamic Flow ◽

Linear Time Algorithms

Download Full-text

HPM: A Hybrid Model for User’s Behavior Prediction Based on N-Gram Parsing and Access Logs

Scientific Programming ◽

10.1155/2020/8897244 ◽

2020 ◽

Vol 2020 ◽

pp. 1-18

Author(s):

Sonia Setia ◽

Verma Jyoti ◽

Neelam Duhan

Keyword(s):

Web Mining ◽

Contextual Information ◽

Hybrid Approach ◽

Web Pages ◽

Continuous Growth ◽

Novel Approach ◽

Content Mining ◽

Long Access ◽

N Gram ◽

The Individual

The continuous growth of the World Wide Web has led to the problem of long access delays. To reduce this delay, prefetching techniques have been used to predict the users’ browsing behavior to fetch the web pages before the user explicitly demands that web page. To make near accurate predictions for users’ search behavior is a complex task faced by researchers for many years. For this, various web mining techniques have been used. However, it is observed that either of the methods has its own set of drawbacks. In this paper, a novel approach has been proposed to make a hybrid prediction model that integrates usage mining and content mining techniques to tackle the individual challenges of both these approaches. The proposed method uses N-gram parsing along with the click count of the queries to capture more contextual information as an effort to improve the prediction of web pages. Evaluation of the proposed hybrid approach has been done by using AOL search logs, which shows a 26% increase in precision of prediction and a 10% increase in hit ratio on average as compared to other mining techniques.

Download Full-text

Linear time algorithms for linear programming

Computers & Mathematics with Applications ◽

10.1016/s0898-1221(99)00069-3 ◽

1999 ◽

Vol 37 (4-5) ◽

pp. 199-208

Author(s):

E.A. Galperin

Keyword(s):

Linear Programming ◽

Linear Time ◽

Linear Time Algorithms

Download Full-text

Almost-linear-time algorithms for Markov chains and new spectral primitives for directed graphs

Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing - STOC 2017 ◽

10.1145/3055399.3055463 ◽

2017 ◽

Cited By ~ 14

Author(s):

Michael B. Cohen ◽

Jonathan Kelner ◽

John Peebles ◽

Richard Peng ◽

Anup B. Rao ◽

...

Keyword(s):

Markov Chains ◽

Linear Time ◽

Directed Graphs ◽

Linear Time Algorithms

Download Full-text

Linear-Time Algorithms for Tree Root Problems

Algorithmica ◽

10.1007/s00453-013-9815-y ◽

2013 ◽

Vol 71 (2) ◽

pp. 471-495 ◽

Cited By ~ 1

Author(s):

Maw-Shang Chang ◽

Ming-Tat Ko ◽

Hsueh-I Lu

Keyword(s):

Linear Time ◽

Linear Time Algorithms

Download Full-text

FAULT TOLERANT ROUTING IN HYPERCUBES AND STAR GRAPHS

Parallel Processing Letters ◽

10.1142/s0129626496000133 ◽

1996 ◽

Vol 06 (01) ◽

pp. 127-136 ◽

Cited By ~ 5

Author(s):

QIAN-PING GU ◽

SHIETUNG PENG

Keyword(s):

Free Path ◽

Fault Tolerant ◽

Linear Time ◽

Time Efficiency ◽

Routing Problem ◽

Star Graphs ◽

Linear Time Algorithms ◽

Better Than

In this paper, we give two linear time algorithms for node-to-node fault tolerant routing problem in n-dimensional hypercubes Hn and star graphs Gn. The first algorithm, given at most n−1 arbitrary fault nodes and two non-fault nodes s and t in Hn, finds a fault-free path s→t of length at most [Formula: see text] in O(n) time, where d(s, t) is the distance between s and t. Our second algorithm, given at most n−2 fault nodes and two non-fault nodes s and t in Gn, finds a fault-free path s→t of length at most d(Gn)+3 in O(n) time, where [Formula: see text] is the diameter of Gn. When the time efficiency of finding the routing path is more important than the length of the path, the algorithms in this paper are better than the previous ones.

Download Full-text