From Web Log to Data Warehouse

Author(s):  
John M. Artz

Data warehousing is an emerging technology that greatly extends the capabilities of relational databases specifically in the analysis of very large sets of time-oriented data. The emergence of data warehousing has been somewhat eclipsed by the simultaneous emergence of Web technologies. However, Web technologies and data warehousing have some natural synergies that are just now being recognized. First, Web technologies make data warehouse data more easily available to a much wider variety of users both internally and externally. Since the value of data is directly related to its availability for exploitation, Internets and intranets help increase the value of the data in the warehouse. Second, data warehouse technologies can be used to analyze traffic to a Web site in a wide variety of ways in order to make the Web site more effective. This chapter will focus on the latter of these synergies and show, through an evolving example, how a simple data set from the Web log can be enhanced, in a step-wise fashion, into a full-fledged market research data warehouse.

Author(s):  
John M. Artz

Data warehousing is an emerging technology that greatly extends the capabilities of relational databases specifically in the analysis of very large sets of time-oriented data. The emergence of data warehousing has been somewhat eclipsed over the past decade by the simultaneous emergence of Web technologies. However, Web technologies and data warehousing have some natural synergies that are not immediately obvious. First, Web technologies make data warehouse data more easily available to a much wider variety of users. Second, data warehouse technologies can be used to analyze traffic to a Web site in order to gain a much better understanding of the visitors to the Web site. It is this second synergy that is the focus of this article.


2008 ◽  
pp. 3411-3415
Author(s):  
John M. Artz

Data warehousing is an emerging technology that greatly extends the capabilities of relational databases specifically in the analysis of very large sets of time-oriented data. The emergence of data warehousing has been somewhat eclipsed over the past decade by the simultaneous emergence of Web technologies. However, Web technologies and data warehousing have some natural synergies that are not immediately obvious. First, Web technologies make data warehouse data more easily available to a much wider variety of users. Second, data warehouse technologies can be used to analyze traffic to a Web site in order to gain a much better understanding of the visitors to the Web site. It is this second synergy that is the focus of this article.


2008 ◽  
pp. 2364-2370
Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (e.g. Wal-Mart’s data warehouse) and astronomical data (e.g. SKICAT) in scientific research, with textual data providing a descriptive rather than a central role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for non-numeric data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (in say Wal-Mart’s data warehouse (Westerman, 2000)) and astronomical data (for example SKICAT) in scientific research, with textual data providing a descriptive rather than a central analytic role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for ‘non-numeric’ data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model, and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


Author(s):  
Wilfred Ng ◽  
Mark Levene

Data warehousing is a corporate strategy that needs to integrate information from several sources of separately developed Database Management Systems (DBMSs). A future DBMS of a data warehouse should provide adequate facilities to manage a wide range of information arising from such integration. We propose that the capabilities of database languages should be enhanced to manipulate user-defined data orderings, since business queries in an enterprise usually involve order. We extend the relational model to incorporate partial orderings into data domains and describe the ordered relational model. We have already defined and implemented a minimal extension of SQL, called OSQL, which allows querying over ordered relational databases. One of the important facilities provided by OSQL is that it allows users to capture the underlying semantics of the ordering of the data for a given application. Herein we demonstrate that OSQL aided with a package discipline can be an effective means to manage the inter-related operations and the underlying data domains of a wide range of advanced applications that are vital in data warehousing, such as temporal, incomplete and fuzzy information. We present the details of the generic operations arising from these applications in the form of three OSQL packages called: OSQL_TIME, OSQL_INCOMP and OSQL_FUZZY.


1999 ◽  
Vol 08 (02) ◽  
pp. 207-227 ◽  
Author(s):  
CHEN-CHUNG LIU ◽  
GWO-DONG CHEN ◽  
KUO-LIANG OU ◽  
BAW-JHIUNE LIU ◽  
JORNG-TZONG HORNG

The World Wide Web has been widely accepted as a viable communication infrastructure to support collaborative activities on computer networks. While cooperating objects of different roles can easily and freely communicate knowledge on the web, the web site managers/developers must write programs to manage the communication behavior in collaborative activities. However, the current hypertext model for the web concentrates on the static structure of hypertext. Few conceptual specifications are capable of effectively integrating the hypertext model with activity dynamics to clarify the dynamic interaction and constraints of desired collaborative activities on the web. Furthermore, decision-makers must observe communication behavior on the web to adapt collaborative activities. Although web servers register each web access in a web log, up to now, only a few query or report mechanisms have been available to obtain required information from the web log. This study presents a specification to capture the static and dynamic structure of intended collaborative activities, and a query mechanism to obtain required information from the web log. The specification and query mechanism make it possible to construct a web site that will provide group activity space and flexibly interpret roles, encourage individuals to commit to responsibilities, and enable activities to be observed.


Author(s):  
Panagiotis Giannikopoulos ◽  
Iraklis Varlamis ◽  
Magdalini Eirinaki

The Web is a continuously evolving environment, since its content is updated on a regular basis. As a result, the traditional usage-based approach to generate recommendations that takes as input the navigation paths recorded on the Web page level, is not as effective. Moreover, most of the content available online is either explicitly or implicitly characterized by a set of categories organized in a taxonomy, allowing the page-level navigation patterns to be generalized to a higher, aggregate level. In this direction, the authors present the Frequent Generalized Pattern (FGP) algorithm. FGP takes as input the transaction data and a hierarchy of categories and produces generalized association rules that contain transaction items and/or item categories. The results can be used to generate association rules and subsequently recommendations for the users. The algorithm can be applied to the log files of a typical Web site; however, it can be more helpful in a Web 2.0 application, such as a feed aggregator or a digital library mediator, where content is semantically annotated and the taxonomic nature is more complex, requiring us to extend FGP in a version called FGP+. The authors experimentally evaluate both algorithms using Web log data collected from a newspaper Web site.


Author(s):  
Janet Delve

Data Warehousing is now a well-established part of the business and scientific worlds. However, up until recently, data warehouses were restricted to modeling essentially numerical data – examples being sales figures in the business arena (e.g. Wal-Mart’s data warehouse) and astronomical data (e.g. SKICAT) in scientific research, with textual data providing a descriptive rather than a central role. The lack of ability of data warehouses to cope with mainly non-numeric data is particularly problematic for humanities1 research utilizing material such as memoirs and trade directories. Recent innovations have opened up possibilities for non-numeric data warehouses, making them widely accessible to humanities research for the first time. Due to its irregular and complex nature, humanities research data is often difficult to model and manipulating time shifts in a relational database is problematic as is fitting such data into a normalized data model. History and linguistics are exemplars of areas where relational databases are cumbersome and which would benefit from the greater freedom afforded by data warehouse dimensional modeling.


2018 ◽  
Vol 4 (1) ◽  
pp. 11-16
Author(s):  
Fawaid Badri

Sentiment analysis is a field of text and information based research. Text documents in this language come from the web about socialization issues. The method used in this study uses algorithmic maps to calculate from a word that will be used to find a meaning in the context of public opinion. The map algorithm reduces the retrieval of data sets and converts them into a data set, data collection of individuals separated into tuples. The stages of the map algorithm reduce reading input data in the form of text stored in HDFS (Hadoop Distributed File System) then it will be processed according to the key and the value has been changed into tuple form. The next step is to process the shuffel and reduce it which will then produce a process from the data set that is processed. Furthermore, the research data uses sentiment analysis by using a map algorithm to reduce the amount of data that is very good


Sign in / Sign up

Export Citation Format

Share Document