Web Access Log Mining, Information Extraction, and Deep Web Mining

2015 ◽  
pp. 185-200
2014 ◽  
Vol 687-691 ◽  
pp. 1592-1595
Author(s):  
Yun Peng Duan ◽  
Chun Xi Zhao ◽  
Ying Shi

With the widely application of the WWW and the emergence of Web technology, make the research of data mining has entered a new stage. Web log mining is based on the idea of data mining to analyze the server log processing. Paper aimed at the early stage of the data mining is put forward based on log data preprocessing methods, the purpose is to divide server logs into multiple unique user access sequence at a time, and to give a good algorithm.


Author(s):  
Amina Kemmar ◽  
Yahia Lebbah ◽  
Samir Loudni

Mining web access patterns consists in extracting knowledge from server log files. This problem is represented as a sequential pattern mining problem (SPM) which allows to extract patterns which are sequences of accesses that occur frequently in the web log file. There are in the literature many efficient algorithms to solve SMP (e.g., GSP, SPADE, PrefixSpan, WAP-tree, LAPIN, PLWAP). Despite the effectiveness of these methods, they do not allow to express and to handle new constraints defined on patterns, new implementations are required. Recently, many approaches based on constraint programming (CP) was proposed to solve SPM in a declarative and generic way. Since no CP-based approach was applied for mining web access patterns, the authors introduce in this paper an efficient CP-based approach for solving the web log mining problem. They bring back the problem of web log mining to SPM within a CP environment which enables to handle various constraints. Experimental results on non-trivial web log mining problems show the effectiveness of the authors' CP-based mining approach.


Author(s):  
Chia-Hui Chang ◽  
Chun-Nan Hsu

The explosive growth and popularity of the World Wide Web has resulted in a huge number of information sources on the Internet. However, due to the heterogeneity and the lack of structure of Web information sources, access to this huge collection of information has been limited to browsing and keyword searching. Sophisticated Web-mining applications, such as comparison shopping, require expensive maintenance costs to deal with different data formats. The problem in translating the contents of input documents into structured data is called information extraction (IE). Unlike information retrieval (IR), which concerns how to identify relevant documents from a document collection, IE produces structured data ready for post-processing, which is crucial to many applications of Web mining and search tools.


2013 ◽  
Vol 17 (5) ◽  
pp. 1109-1139 ◽  
Author(s):  
Wachirawut Thamviset ◽  
Sartra Wongthanavasu

Author(s):  
Shilpa Deshmukh, Et. al.

Deep Web substance are gotten to by inquiries submitted to Web information bases and the returned information records are enwrapped in progressively created Web pages (they will be called profound Web pages in this paper). Removing organized information from profound Web pages is a difficult issue because of the fundamental mind boggling structures of such pages. As of not long ago, an enormous number of strategies have been proposed to address this issue, however every one of them have characteristic impediments since they are Web-page-programming-language subordinate. As the mainstream two-dimensional media, the substance on Web pages are constantly shown routinely for clients to peruse. This inspires us to look for an alternate path for profound Web information extraction to beat the constraints of past works by using some fascinating normal visual highlights on the profound Web pages. In this paper, a novel vision-based methodology that is Visual Based Deep Web Data Extraction (VBDWDE) Algorithm is proposed. This methodology basically uses the visual highlights on the profound Web pages to execute profound Web information extraction, including information record extraction and information thing extraction. We additionally propose another assessment measure amendment to catch the measure of human exertion expected to create wonderful extraction. Our investigations on a huge arrangement of Web information bases show that the proposed vision-based methodology is exceptionally viable for profound Web information extraction.


Author(s):  
Ng Qi Yau ◽  
Wan Zainon

Web Usage Mining is a computational process of discovering patterns in large data sets involving methods using the artificial intelligence, machine learning, statistical analysis and database systems with the goal to extract valuable information from accessing server logs of World Wide Web data repositories and transform it into an understandable structure for further understanding and use. Main focus of this paper will be centered on exploring methods that expedites the log mining process and present the result of log mining process through data visualization and compare data-mining algorithms. For the comparison between classification techniques, precision, recall and ROC area are the correct measures that are used to compare algorithms. Based on this study it shows that Naïve Bayes and Bayes Network are proven to be the best algorithms for that.


2012 ◽  
Vol 433-440 ◽  
pp. 5152-5156
Author(s):  
Guang Nan Guo ◽  
Yong Gang Yun ◽  
Mei Chu ◽  
Hong Yan Shi ◽  
Ke Gong Yin

Aiming at main challenges of Web mining and personalized service currently, basic K-Means algorithm of clustering techniques was researched, including algorithm flow and limitations. To solve shortcomings of pre-determining cluster number, heavily dependent on initial center selection and particularly sensitive to noise as well as edge data in basic K-Means algorithm, improved density-based adaptive K-Means algorithm was presented. It conducts steps of initial classification and K means iterative to reduce impact of above problems and improve clustering quality. Experiments on Web log clustering also verified its effectiveness.


Sign in / Sign up

Export Citation Format

Share Document