scholarly journals Implementation of Web Data Mining Technology Based on Python

2021 ◽  
Vol 2066 (1) ◽  
pp. 012033
Author(s):  
Guilian Feng

Abstract With the arrival of the era of big data, people have gradually realized the importance of data. Data is not just a resource, it is an asset. This paper mainly studies the realization of Web data mining technology based on Python. This paper analyzes the overall architecture design of distributed web crawler system, and then analyzes in detail the principles of crawler’s URL function module, crawler’s web crawl function module, crawler’s web page parsing function module, crawler’s data storage function module and so on. Each function module of the crawler system was tested on the experimental computer, and the data information was summarized for comparative analysis. The main significance of this paper lies in the design and implementation of a distributed web crawler system, which, to a certain extent, solves the problems of slow speed, low efficiency and poor scalability of traditional single computer web crawler, and improves the speed and efficiency of web crawler in grasping information and web page data.

2014 ◽  
Vol 543-547 ◽  
pp. 3490-3493
Author(s):  
Yan Zhang

With the rapid development of cloud computing technology, the traditional centralized data mining technology becomes inappropriate for the growing huge amounts of data. Cloud computings Web data mining technology comes into use because it is a reliable and efficient method. This article introduces the meaning, characteristics, and the present situation of cloud computing, analyzes the advantage of Web data mining technology on the basis of the use of cloud computing technology, makes investigations and summaries of the present situation, challenges and problems of the current cloud computing Web data mining technology research, and puts forward the corresponding methods to solve these problems.


2014 ◽  
Vol 644-650 ◽  
pp. 2124-2127
Author(s):  
Fen Liu

With the rapid development of Internet, the Internet has become the important resources of information transmission and share. The characteristics of Web data are semi-structured, heterogeneous and mass, making traditional data mining technology indirectly applied to Web data sources. Web data mining refers to extracting a potential, useful model from the Web documents or Web activities. Because of the structural and expansibility of XML, research on XML combined with Web data mining has also became popular.


2011 ◽  
Vol 403-408 ◽  
pp. 1062-1067 ◽  
Author(s):  
Payalpreet Kaur ◽  
Raghu Garg ◽  
Ravinder Singh ◽  
Mandeep Singh

Web data mining is a field that has gained popularity in the recent time with the advancement in web mining technologies. Web data mining is the extraction of data on web. The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc. The data on web is unstructured, irregular and lacks a fixed unified pattern as it is presented in HTML format that represents data in the presentation format and is unable to handle semi-structured or unstructured data . These difficulties lead to the emergence of XML based web data mining. XML was created so that richly structured documents could be used over the web.XML provides a standard for the data exchange and data storage .This paper presents a web data mining model based on XML. In this model first of all unstructured data is transformed to XML and then XML document is stored in database in the form of the string tree, then specific records are searched using a LINQ query. If record does not exist in the database then check the updates of specific website and repeat the same steps. At last data selected by LINQ Query is displayed on web browser. The feature that helped to increase the speed of data extraction and that also reduces the time of extraction is the presence of database that stores the data that have been extracted earlier by a user and can be used by other users by passing a LINQ query .In this model there is no need to create an extra separate XSL file because this model stores xml document in the database in the form of the string tree. This model is implemented using C# with XML.


Sign in / Sign up

Export Citation Format

Share Document