A scalable pattern mining approach to web graph compression with communities

Abstract A sitemap represents an explicit specification of the design concept and knowledge organization of a website and is therefore considered as the website’s basic ontology. It not only presents the main usage flows for users, but also hierarchically organizes concepts of the website. Typically, sitemaps are defined by webmasters in the very early stages of the website design. However, during their life websites significantly change their structure, their content and their possible navigation paths. Even if this is not the case, webmasters can fail to either define sitemaps that reflect the actual website content or, vice versa, to define the actual organization of pages and links which do not reflect the intended organization of the content coded in the sitemaps. In this paper we propose an approach which automatically generates sitemaps. Contrary to other approaches proposed in the literature, which mainly generate sitemaps from the textual content of the pages, in this work sitemaps are generated by analyzing the Web graph of a website. This allows us to: i) automatically generate a sitemap on the basis of possible navigation paths, ii) compare the generated sitemaps with either the sitemap provided by the Web designer or with the intended sitemap of the website and, consequently, iii) plan possible website re-organization. The solution we propose is based on closed frequent sequence extraction and only concentrates on hyperlinks organized in “Web lists”, which are logical lists embedded in the pages. These “Web lists” are typically used for supporting users in Web site navigation and they include menus, navbars and content tables. Experiments performed on three real datasets show that the extracted sitemaps are much more similar to those defined by website curators than those obtained by competitor algorithms.

Download Full-text

Merging Adjacency Lists for Efficient Web Graph Compression

Advances in Intelligent and Soft Computing - Man-Machine Interactions 2 ◽

10.1007/978-3-642-23169-8_42 ◽

2011 ◽

pp. 385-392 ◽

Cited By ~ 7

Author(s):

Szymon Grabowski ◽

Wojciech Bieniecki

Keyword(s):

Graph Compression ◽

Web Graph

Download Full-text

Tight and simple Web graph compression for forward and reverse neighbor queries

Discrete Applied Mathematics ◽

10.1016/j.dam.2013.05.028 ◽

2014 ◽

Vol 163 ◽

pp. 298-306 ◽

Cited By ~ 4

Author(s):

Szymon Grabowski ◽

Wojciech Bieniecki

Keyword(s):

Graph Compression ◽

Web Graph

Download Full-text

Web Graph Compression by Edge Elimination

Data Compression Conference (DCC'06) ◽

10.1109/dcc.2006.84 ◽

2006 ◽

Cited By ~ 1

Author(s):

A. Mahdian ◽

H. Khalili ◽

E. Nourbakhsh ◽

M. Ghodsi

Keyword(s):

Graph Compression ◽

Web Graph

Download Full-text

An Adaptive Data Distribution Through Tree Rules in Frequent Pattern Mining

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit183894 ◽

2018 ◽

pp. 300-305

Keyword(s):

Information Sharing ◽

Pattern Mining ◽

Data Distribution ◽

Frequent Pattern Mining ◽

Frequent Pattern ◽

General Development ◽

Secure Information ◽

Evaluation Parameters ◽

Secure Information Sharing

Information sharing among the associations is a general development in a couple of zones like business headway and exhibiting. As bit of the touchy principles that ought to be kept private may be uncovered and such disclosure of delicate examples may impacts the advantages of the association that have the data. Subsequently the standards which are delicate must be secured before sharing the data. In this paper to give secure information sharing delicate guidelines are bothered first which was found by incessant example tree. Here touchy arrangement of principles are bothered by substitution. This kind of substitution diminishes the hazard and increment the utility of the dataset when contrasted with different techniques. Examination is done on certifiable dataset. Results shows that proposed work is better as appear differently in relation to various past strategies on the introduce of evaluation parameters.

Download Full-text

924-P: Pattern Mining of Trajectories of Glucose Values of Continuous Glucose Monitoring System by Artificial Intelligence in Type 2 Diabetes Patients

Diabetes ◽

10.2337/db19-924-p ◽

2019 ◽

Vol 68 (Supplement 1) ◽

pp. 924-P

Author(s):

MASAKI MAKINO ◽

RYO YOSHIMOTO ◽

MIZUHO KONDO-ANDO ◽

YASUMASA YOSHINO ◽

IZUMI HIRATSUKA ◽

...

Keyword(s):

Artificial Intelligence ◽

Type 2 Diabetes ◽

Monitoring System ◽

Continuous Glucose Monitoring ◽

Pattern Mining ◽

Glucose Monitoring ◽

Continuous Glucose Monitoring System ◽

Diabetes Patients

Download Full-text

Knowledge Point Recommendation Algorithm based on Enhanced Correction Factor and Weighted Sequential Pattern Mining

International Journal of Performability Engineering ◽

10.23940/ijpe.20.04.p6.549559 ◽

2020 ◽

Vol 16 (4) ◽

pp. 549

Author(s):

Zhaoyu Shou ◽

Yanguo Wang ◽

Yiru Wen ◽

Huibing Zhang

Keyword(s):

Correction Factor ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Recommendation Algorithm

Download Full-text

Review of Improvement of Web Search Based on Web Log File

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v3i2b.2880 ◽

2012 ◽

Vol 3 (2) ◽

pp. 298-300 ◽

Cited By ~ 1

Author(s):

Soniya P. Chaudhari ◽

Prof. Hitesh Gupta ◽

S. J. Patil

Keyword(s):

Neural Network ◽

Fuzzy Logic ◽

Web Search ◽

Pattern Mining ◽

Efficiency Improvement ◽

Web Searching ◽

Important Method ◽

Web Log ◽

Journal Paper ◽

Log File

In this paper we review various research of journal paper as Web Searching efficiency improvement. Some important method based on sequential pattern Mining. Some are based on supervised learning or unsupervised learning. And also used for other method such as Fuzzy logic and neural network

Download Full-text