web data mining
Recently Published Documents


TOTAL DOCUMENTS

189
(FIVE YEARS 21)

H-INDEX

9
(FIVE YEARS 0)

2021 ◽  
Vol 2066 (1) ◽  
pp. 012033
Author(s):  
Guilian Feng

Abstract With the arrival of the era of big data, people have gradually realized the importance of data. Data is not just a resource, it is an asset. This paper mainly studies the realization of Web data mining technology based on Python. This paper analyzes the overall architecture design of distributed web crawler system, and then analyzes in detail the principles of crawler’s URL function module, crawler’s web crawl function module, crawler’s web page parsing function module, crawler’s data storage function module and so on. Each function module of the crawler system was tested on the experimental computer, and the data information was summarized for comparative analysis. The main significance of this paper lies in the design and implementation of a distributed web crawler system, which, to a certain extent, solves the problems of slow speed, low efficiency and poor scalability of traditional single computer web crawler, and improves the speed and efficiency of web crawler in grasping information and web page data.


2021 ◽  
Author(s):  
Yonghong Ma ◽  
Jiao Tan ◽  
Dongning Zhang ◽  
Ke Men ◽  
Mingjuan Shi ◽  
...  

Author(s):  
Ya Wang

A good understanding of user behavior and consumption preferences can provide support for website operators to improve their service quality. However, the existing personalized recommendation systems generally have problems such as low Web data mining efficiency, low degree of automated recommendation, and low durability. Targeting at these unsolved issues, this paper mainly carries out the following works: Firstly, the authors established a user behavior identification and personalized recommendation model based on Web data mining, it gave the user behavior analysis process based on Web data mining, improved the traditional k-means algorithm, and gave the detailed execution steps of the improved algorithm; moreover, it also elaborated on the K nearest neighbor model based on user scoring information, the score matrix decomposition method, and the personalized recommendation method for network users. At last, experimental results verified the effectiveness of the constructed model.


2021 ◽  
pp. 1-10
Author(s):  
Wenjing Wang ◽  
Shanti C. Sandaran

In order to improve the translation effect of political text metaphors, based on Web data mining technology, this paper constructs a political text metaphor translation system based on Web data mining technology. Aiming at the two shortcomings of the selection of the initial center point of the K-Means algorithm and the isolated points, this paper gives a solution to the ICKM algorithm that combines the density parameter and the coordinate rotation algorithm. The algorithm uses the object with the largest density parameter as the first center point, and uses the KCR algorithm to find the next center point, which avoids the influence of isolated points on the data sample to a certain extent. The constructed political text metaphor translation system based on Web data mining technology needs to accurately translate political texts and also needs to meet the requirements of metaphor translation. Finally, this paper designs experiments to verify the system performance. The research results show that the system constructed in this paper can meet the needs of political text metaphor translation.


Author(s):  
Poli Venkata Subba Reddy

Data mining is knowledge discovery process. It has to deal with exact information and inexact information. Statistical methods deal with inexact information but it is based on likelihood. Zadeh fuzzy logic deals with inexact information but it is based on belief and it is simple to use. Fuzzy logic is used to deal with inexact information. Data mining consist methods and classifications. These methods and classifications are discussed for both exact and inexact information. Retrieval of information is important in data mining. The time and space complexity is high in big data. These are to be reduced. The time complexity is reduced through the consecutive retrieval (C-R) property and space complexity is reduced with blackboard systems. Data mining for web data based is discussed. In web data mining, the original data have to be disclosed. Fuzzy web data mining is discussed for security of data. Fuzzy web programming is discussed. Data mining, fuzzy data mining, and web data mining are discussed through MapReduce algorithms.


Sign in / Sign up

Export Citation Format

Share Document