Web Access Log Mining, Information Extraction, and Deep Web Mining

With the widely application of the WWW and the emergence of Web technology, make the research of data mining has entered a new stage. Web log mining is based on the idea of data mining to analyze the server log processing. Paper aimed at the early stage of the data mining is put forward based on log data preprocessing methods, the purpose is to divide server logs into multiple unique user access sequence at a time, and to give a good algorithm.

Download Full-text

A Constraint Programming Approach for Web Log Mining

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2016100102 ◽

2016 ◽

Vol 11 (4) ◽

pp. 24-42 ◽

Cited By ~ 2

Author(s):

Amina Kemmar ◽

Yahia Lebbah ◽

Samir Loudni

Keyword(s):

Constraint Programming ◽

Pattern Mining ◽

Programming Approach ◽

Web Log Mining ◽

Web Log ◽

Web Access ◽

Log Mining ◽

Log File ◽

Access Patterns ◽

The Web

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.

Download Full-text

Learning Information Extraction Rules for Web Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch129 ◽

2011 ◽

pp. 678-683

Author(s):

Chia-Hui Chang ◽

Chun-Nan Hsu

Keyword(s):

Information Extraction ◽

Web Mining ◽

World Wide ◽

Information Sources ◽

Structured Data ◽

Comparison Shopping ◽

Data Formats ◽

The World ◽

Document Collection ◽

Keyword Searching

The explosive growth and popularity of the World Wide Web has resulted in a huge number of information sources on the Internet. However, due to the heterogeneity and the lack of structure of Web information sources, access to this huge collection of information has been limited to browsing and keyword searching. Sophisticated Web-mining applications, such as comparison shopping, require expensive maintenance costs to deal with different data formats. The problem in translating the contents of input documents into structured data is called information extraction (IE). Unlike information retrieval (IR), which concerns how to identify relevant documents from a document collection, IE produces structured data ready for post-processing, which is crucial to many applications of Web mining and search tools.

Download Full-text

DWDE-IR: An Efficient Deep Web Data Extraction for Information Retrieval on Web Mining

Journal of Emerging Technologies in Web Intelligence ◽

10.4304/jetwi.6.1.133-141 ◽

2014 ◽

Vol 6 (1) ◽

Cited By ~ 1

Author(s):

Aysha Banu ◽

M. Chitra

Keyword(s):

Information Retrieval ◽

Web Mining ◽

Data Extraction ◽

Deep Web ◽

Web Data ◽

Web Data Extraction

Download Full-text

Query Intensive Interface Information Extraction Protocol for deep web

2009 International Conference on Intelligent Agent & Multi-Agent Systems ◽

10.1109/iama.2009.5228052 ◽

2009 ◽

Cited By ~ 8

Author(s):

Dilip Kumar Sharma ◽

A. K. Sharma

Keyword(s):

Information Extraction ◽

Deep Web ◽

Extraction Protocol

Download Full-text

Testbed for information extraction from deep web

Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters - WWW Alt. '04 ◽

10.1145/1013367.1013468 ◽

2004 ◽

Cited By ~ 14

Author(s):

Yasuhiro Yamada ◽

Nick Craswell ◽

Tetsuya Nakatoh ◽

Sachio Hirokawa

Keyword(s):

Information Extraction ◽

Deep Web

Download Full-text

Information extraction for deep web using repetitive subject pattern

World Wide Web ◽

10.1007/s11280-013-0248-y ◽

2013 ◽

Vol 17 (5) ◽

pp. 1109-1139 ◽

Cited By ~ 10

Author(s):

Wachirawut Thamviset ◽

Sartra Wongthanavasu

Keyword(s):

Information Extraction ◽

Deep Web

Download Full-text

Efficient Methodology for Deep Web Data Extraction

Turkish Journal of Computer and Mathematics Education (TURCOMAT) ◽

10.17762/turcomat.v12i1s.1769 ◽

2021 ◽

Vol 12 (1S) ◽

pp. 286-293

Author(s):

Shilpa Deshmukh, Et. al.

Keyword(s):

Information Extraction ◽

Data Extraction ◽

Deep Web ◽

Web Pages ◽

Web Data ◽

Web Information Extraction ◽

Web Data Extraction ◽

Web Information ◽

Assessment Measure ◽

Enormous Number

Deep Web substance are gotten to by inquiries submitted to Web information bases and the returned information records are enwrapped in progressively created Web pages (they will be called profound Web pages in this paper). Removing organized information from profound Web pages is a difficult issue because of the fundamental mind boggling structures of such pages. As of not long ago, an enormous number of strategies have been proposed to address this issue, however every one of them have characteristic impediments since they are Web-page-programming-language subordinate. As the mainstream two-dimensional media, the substance on Web pages are constantly shown routinely for clients to peruse. This inspires us to look for an alternate path for profound Web information extraction to beat the constraints of past works by using some fascinating normal visual highlights on the profound Web pages. In this paper, a novel vision-based methodology that is Visual Based Deep Web Data Extraction (VBDWDE) Algorithm is proposed. This methodology basically uses the visual highlights on the profound Web pages to execute profound Web information extraction, including information record extraction and information thing extraction. We additionally propose another assessment measure amendment to catch the measure of human exertion expected to create wonderful extraction. Our investigations on a huge arrangement of Web information bases show that the proposed vision-based methodology is exceptionally viable for profound Web information extraction.

Download Full-text

UNDERSTANDING WEB TRAFFIC ACTIVITIES USING WEB MINING TECHNIQUES

International Journal of Engineering Technologies and Management Research ◽

10.29121/ijetmr.v4.i9.2017.96 ◽

2020 ◽

Vol 4 (9) ◽

pp. 18-26

Author(s):

Ng Qi Yau ◽

Wan Zainon

Keyword(s):

Web Mining ◽

Large Data ◽

Database Systems ◽

Data Sets ◽

Web Traffic ◽

Data Repositories ◽

Data Mining Algorithms ◽

Log Mining ◽

Roc Area ◽

Mining Algorithms

Web Usage Mining is a computational process of discovering patterns in large data sets involving methods using the artificial intelligence, machine learning, statistical analysis and database systems with the goal to extract valuable information from accessing server logs of World Wide Web data repositories and transform it into an understandable structure for further understanding and use. Main focus of this paper will be centered on exploring methods that expedites the log mining process and present the result of log mining process through data visualization and compare data-mining algorithms. For the comparison between classification techniques, precision, recall and ROC area are the correct measures that are used to compare algorithms. Based on this study it shows that Naïve Bayes and Bayes Network are proven to be the best algorithms for that.

Download Full-text

Application of Density-Based Adaptive K-Means Clustering Algorithm in Web Log Mining

Advanced Materials Research ◽

10.4028/www.scientific.net/amr.433-440.5152 ◽

2012 ◽

Vol 433-440 ◽

pp. 5152-5156

Author(s):

Guang Nan Guo ◽

Yong Gang Yun ◽

Mei Chu ◽

Hong Yan Shi ◽

Ke Gong Yin

Keyword(s):

Web Mining ◽

Clustering Algorithm ◽

Web Log Mining ◽

Personalized Service ◽

Cluster Number ◽

Clustering Techniques ◽

Web Log ◽

Clustering Quality ◽

Log Mining ◽

Initial Classification

Aiming at main challenges of Web mining and personalized service currently, basic K-Means algorithm of clustering techniques was researched, including algorithm flow and limitations. To solve shortcomings of pre-determining cluster number, heavily dependent on initial center selection and particularly sensitive to noise as well as edge data in basic K-Means algorithm, improved density-based adaptive K-Means algorithm was presented. It conducts steps of initial classification and K means iterative to reduce impact of above problems and improve clustering quality. Experiments on Web log clustering also verified its effectiveness.

Download Full-text