Research on the Web Data Preprocessing

Author(s):  
Chaodong Lu ◽  
Xin Xiong
Keyword(s):  
Web Data ◽  
Author(s):  
SUPRIYA KUMAR DE ◽  
P. RADHA KRISHNA

Clustering of data in a large dimension space is of great interest in many data mining applications. In this paper, we propose a method for clustering of web usage data in a high-dimensional space based on a concept hierarchy model. In this method, the relationship present in the web usage data are mapped into a fuzzy proximity relation of user transactions. We also described an approach to present the preference set of URLs to a new user transaction based on the match score with the clusters. The study demonstrates that our approach is general and effective for mining the web data for web personalization.


Author(s):  
Wei Shen ◽  
Jianyong Wang ◽  
Ping Luo ◽  
Min Wang

Relation extraction from the Web data has attracted a lot of attention recently. However, little work has been done when it comes to the enterprise data regardless of the urgent needs to such work in real applications (e.g., E-discovery). One distinct characteristic of the enterprise data (in comparison with the Web data) is its low redundancy. Previous work on relation extraction from the Web data largely relies on the data's high redundancy level and thus cannot be applied to the enterprise data effectively. This chapter reviews related work on relation extraction and introduces an unsupervised hybrid framework REACTOR for semantic relation extraction over enterprise data. REACTOR combines a statistical method, classification, and clustering to identify various types of relations among entities appearing in the enterprise data automatically. REACTOR was evaluated over a real-world enterprise data set from HP that contains over three million pages and the experimental results show its effectiveness.


Author(s):  
Charles Greenidge ◽  
Hadrian Peter

Data warehouses have established themselves as necessary components of an effective Information Technology (IT) strategy for large businesses. In addition to utilizing operational databases data warehouses must also integrate increasing amounts of external data to assist in decision support. An important source of such external data is the Web. In an effort to ensure the availability and quality of Web data for the data warehouse we propose an intermediate data-staging layer called the Meta-Data Engine (M-DE). A major challenge, however, is the conversion of data originating in the Web, and brought in by robust search engines, to data in the data warehouse. The authors therefore also propose a framework, the Semantic Web Application (SEMWAP) framework, which facilitates semi-automatic matching of instance data from opaque web databases using ontology terms. Their framework combines Information Retrieval (IR), Information Extraction (IE), Natural Language Processing (NLP), and ontology techniques to produce a matching and thus provide a viable building block for Semantic Web (SW) Applications.


Author(s):  
Ahmed El Azab ◽  
Mahmood A. Mahmood ◽  
Abd El-Aziz

Web usage mining techniques and applications across industries is still exploratory and, despite an increase in academic research, there are challenge of analyze web which quantitatively capture web users' common interests and characterize their underlying tasks. This chapter addresses the problem of how to support web usage mining techniques and applications across industries by combining language of web pages and algorithms that used in web data mining. Existing research in web usage mining techniques tend to focus on finding out how each techniques can apply in different industries fields. However, there is little evidence that researchers have approached the issue of web usage mining across industries. Consequently, the aim of this chapter is to provide an overview of how the web usage mining techniques and applications across industries can be supported.


2017 ◽  
Vol 7 (1.1) ◽  
pp. 286
Author(s):  
B. Sekhar Babu ◽  
P. Lakshmi Prasanna ◽  
P. Vidyullatha

 In current days, World Wide Web has grown into a familiar medium to investigate the new information, Business trends, trading strategies so on. Several organizations and companies are also contracting the web in order to present their products or services across the world. E-commerce is a kind of business or saleable transaction that comprises the transfer of statistics across the web or internet. In this situation huge amount of data is obtained and dumped into the web services. This data overhead tends to arise difficulties in determining the accurate and valuable information, hence the web data mining is used as a tool to determine and mine the knowledge from the web. Web data mining technology can be applied by the E-commerce organizations to offer personalized E-commerce solutions and better meet the desires of customers. By using data mining algorithm such as ontology based association rule mining using apriori algorithms extracts the various useful information from the large data sets .We are implementing the above data mining technique in JAVA and data sets are dynamically generated while transaction is processing and extracting various patterns.


Author(s):  
Amrapali Zaveri ◽  
Andrea Maurino ◽  
Laure-Berti Equille

The standardization and adoption of Semantic Web technologies has resulted in an unprecedented volume of data being published as Linked Data (LD). However, the “publish first, refine later” philosophy leads to various quality problems arising in the underlying data such as incompleteness, inconsistency and semantic ambiguities. In this article, we describe the current state of Data Quality in the Web of Data along with details of the three papers accepted for the International Journal on Semantic Web and Information Systems' (IJSWIS) Special Issue on Web Data Quality. Additionally, we identify new challenges that are specific to the Web of Data and provide insights into the current progress and future directions for each of those challenges.


Author(s):  
Qiankun Zhao ◽  
Sourav Saha Bhowmick

Nowadays the Web poses itself as the largest data repository ever available in the history of humankind (Reis et al., 2004). However, the availability of huge amount of Web data does not imply that users can get whatever they want more easily. On the contrary, the massive amount of data on the Web has overwhelmed their abilities to find the desired information. It has been claimed that 99% of the data reachable on the Web is useless to 99% of the users (Han & Kamber, 2000, pp. 436). That is, an individual may be interested in only a tiny fragment of the Web data. However, the huge and diverse properties of Web data do imply that Web data provides a rich and unprecedented data mining source.


Author(s):  
Lan-zhong Wang

The purpose of this study is to develop a distance personalized teaching platform. The web data mining is used for the construction of the system and by analyzing the character of web data mining (WDM) and the essence of personalization teaching and instruction, based on WDM, The system contains knowledge base, individual database, WDM and web server four modules. The web data mining is used for the construction of the system and by analyzing the character of web data mining (WDM) and the essence of personalization teaching and instruction. Simulation results show that model has important enlightenment and pushing effect for promoting the individual service and improving teaching quality of modern distance education.


Sign in / Sign up

Export Citation Format

Share Document