Data warehouse clustering on the web

2005 ◽  
Vol 160 (2) ◽  
pp. 353-364 ◽  
Author(s):  
Aristides Triantafillakis ◽  
Panagiotis Kanellis ◽  
Drakoulis Martakos
Keyword(s):  
Author(s):  
Charles Greenidge ◽  
Hadrian Peter

Data warehouses have established themselves as necessary components of an effective Information Technology (IT) strategy for large businesses. In addition to utilizing operational databases data warehouses must also integrate increasing amounts of external data to assist in decision support. An important source of such external data is the Web. In an effort to ensure the availability and quality of Web data for the data warehouse we propose an intermediate data-staging layer called the Meta-Data Engine (M-DE). A major challenge, however, is the conversion of data originating in the Web, and brought in by robust search engines, to data in the data warehouse. The authors therefore also propose a framework, the Semantic Web Application (SEMWAP) framework, which facilitates semi-automatic matching of instance data from opaque web databases using ontology terms. Their framework combines Information Retrieval (IR), Information Extraction (IE), Natural Language Processing (NLP), and ontology techniques to produce a matching and thus provide a viable building block for Semantic Web (SW) Applications.


Author(s):  
Giorgio Poletti

An analysis of the reality surrounding us clearly reveals the great amount of information, available in different forms and through different media. Volumes of information available in real time and via the Web are concepts perceived as closely related. This perception is supported by the remark that the objective of the Web was the definition and construction of a universal archive, a virtual site in which the access to documents was possible with no limits of time or space. In this digital library, documents have to be equipped with logical connections making possible for each user the definition of a reading map that expands according to the demand for knowledge gradually built up. This perspective is pointing now in the direction of the Semantic Web, a network satisfying our requests while understanding them, not by some magic telepathic communication between browser and navigator, but rather a data warehouse in which documents are matched to meta-data,1 letting specialized software to distinguish fields, importance, and correlation between documents. Semantic Web and library terms have an ever increasing close relationship, fundamental for the progress and the didactic efficiency in knowledge society.


Author(s):  
Hadrian Peter

Data warehouses have established themselves as necessary components of an effective IT strategy for large businesses. To augment the streams of data being siphoned from transactional/operational databases warehouses must also integrate increasing amounts of external data to assist in decision support. Modern warehouses can be expected to handle up to 100 Terabytes or more of data. (Berson and Smith, 1997; Devlin, 1998; Inmon 2002; Imhoff et al, 2003; Schwartz, 2003; Day 2004; Peter and Greenidge, 2005; Winter and Burns 2006; Ladley, 2007). The arrival of newer generations of tools and database vendor support has smoothed the way for current warehouses to meet the needs of the challenging global business environment ( Kimball and Ross, 2002; Imhoff et al, 2003; Ross, 2006). We cannot ignore the role of the Internet in modern business and the impact on data warehouse strategies. The web represents the richest source of external data known to man ( Zhenyu et al, 2002; Chakrabarti, 2002; Laender et al, 2002) but we must be able to couple raw text or poorly structured data on the web with descriptions, annotations and other forms of summary meta-data (Crescenzi et al, 2001). In recent years the Semantic Web initiative has focussed on the production of “smarter data”. The basic idea is that instead of making programs with near human intelligence, we rather carefully add meta-data to existing stores so that the data becomes “marked up” with all the information necessary to allow not-sointelligent software to perform analysis with minimal human intervention. (Kalfoglou et al, 2004) The Semantic Web builds on established building block technologies such as Unicode, URIs(Uniform Resource Indicators) and XML (Extensible Markup Language) (Dumbill, 2000; Daconta et al, 2003; Decker et al, 2000). The modern data warehouse must embrace these emerging web initiatives. In this paper we propose a model which provides mechanisms for sourcing external data resources for analysts in the warehouse.


Author(s):  
Anthony Scime

Data warehouses are constructed to provide valuable and current information for decision-making. Typically this information is derived from the organization’s functional databases. The data warehouse is then providing a consolidated, convenient source of data for the decision-maker. However, the available organizational information may not be sufficient to come to a decision. Information external to the organization is also often necessary for management to arrive at strategic decisions. Such external information may be available on the World Wide Web; and when added to the data warehouse extends decision-making power. The Web can be considered as a large repository of data. This data is on the whole unstructured and must be gathered and extracted to be made into something valuable for the organizational decision maker. To gather this data and place it into the organization’s data warehouse requires an understanding of the data warehouse metadata and the use of Web mining techniques (Laware, 2005). Typically when conducting a search on the Web, a user initiates the search by using a search engine to find documents that refer to the desired subject. This requires the user to define the domain of interest as a keyword or a collection of keywords that can be processed by the search engine. The searcher may not know how to break the domain down, thus limiting the search to the domain name. However, even given the ability to break down the domain and conduct a search, the search results have two significant problems. One, Web searches return information about a very large number of documents. Two, much of the returned information may be marginally relevant or completely irrelevant to the domain. The decision maker may not have time to sift through results to find the meaningful information. A data warehouse that has already found domain relevant Web pages can relieve the decision maker from having to decide on search keywords and having to determine the relevant documents from those found in a search. Such a data warehouse requires previously conducted searches to add Web information.


Author(s):  
Krzysztof Wecel ◽  
Witold Abramowicz ◽  
Pawel Jan Kalczynski

Enhanced knowledge warehouse (eKW) is an extension of the enhanced data warehouse (eDW) system (Abramowicz, 2002). eKW is a Web services-based system that allows the automatic filtering of information from the Web to the data warehouse and automatic retrieval through the data warehouse. Web services technology extends eKW beyond the organization. It makes the system open and allows utilization of external software components, thus enabling the creation of distributed applications.


Author(s):  
Witold Abramowicz ◽  
Pawel Jan Kalczynski ◽  
Krzysztof Wecel

The data warehouse is considered to be the best way to organize transactional data. However, as many researches claim data warehouse should be augmented with external textual information. The objective of this chapter is to examine the requirements for profiling in the data warehouse environment. Profiles created in the data warehouse are then utilized to filter information. The goal of the sketched system is to support users in his situated actions. We explore many issues concerning personalization, such as information overflow, user models, and situatedness. We also analyze the factors that influence the filtering process. Finally, we draw some conclusions that should be considered during extension of the evaluated system.


Data Mining ◽  
2013 ◽  
pp. 1422-1448
Author(s):  
Fadila Bentayeb ◽  
Nora Maïz ◽  
Hadj Mahboubi ◽  
Cécile Favre ◽  
Sabine Loudcher ◽  
...  

Research in data warehousing and OLAP has produced important technologies for the design, management, and use of Information Systems for decision support. With the development of Internet, the availability of various types of data has increased. Thus, users require applications to help them obtaining knowledge from the Web. One possible solution to facilitate this task is to extract information from the Web, transform and load it to a Web Warehouse, which provides uniform access methods for automatic processing of the data. In this chapter, we present three innovative researches recently introduced to extend the capabilities of decision support systems, namely (1) the use of XML as a logical and physical model for complex data warehouses, (2) associating data mining to OLAP to allow elaborated analysis tasks for complex data and (3) schema evolution in complex data warehouses for personalized analyses. Our contributions cover the main phases of the data warehouse design process: data integration and modeling, and user driven-OLAP analysis.


2019 ◽  
Author(s):  
yuda fahrozi

Database Server adalah sebuah program komputer yang menyediakan layanan pengelolaan basis data dan melayani komputer atau program aplikasi basis data yang menggunakan model klien/server. Istilah ini juga merujuk kepada sebuah komputer (umumnya merupakan server) yang didedikasikan untuk menjalankan program yang bersangkutan. Sistem manajemen basis data (SMBD) pada umumnya menyediakan fungsi-fungsi server basis data, dan beberapa SMBD (seperti halnya MySQL atau Microsoft SQL Server) sangat bergantung kepada model klien-server untukmengakses basis datanya.Legenda Terbentuknya Istilah DatabaseIstilah “database” berawal dari ilmu komputer. Meskipun kemudianartinya semakin luas, memasukkan hal-hal di luar bidang elektronika, artikel inimengenai database komputer. Catatan yang mirip dengan database sebenarnyasudah ada sebelum revolusi industri yaitu dalam bentuk buku besar, kuitansi dan kumpulan data yang berhubungan dengan bisnisJenis DatabaseTerdapat 12 tipe database, antara lainOperational database,Analytical database,Data warehouse,Distributed database,End-user database,External database,Hypermedia databases on the web,Navigational database,In-memory databases,Document-oriented databases,Real-time databases,danRelational Database.Kata Kunci : Kapasitas Server Dan Data Base


Sign in / Sign up

Export Citation Format

Share Document