Innovative Approaches for Efficiently Warehousing Complex Data from the Web

Research in data warehousing and OLAP has produced important technologies for the design, management, and use of Information Systems for decision support. With the development of Internet, the availability of various types of data has increased. Thus, users require applications to help them obtaining knowledge from the Web. One possible solution to facilitate this task is to extract information from the Web, transform and load it to a Web Warehouse, which provides uniform access methods for automatic processing of the data. In this chapter, we present three innovative researches recently introduced to extend the capabilities of decision support systems, namely (1) the use of XML as a logical and physical model for complex data warehouses, (2) associating data mining to OLAP to allow elaborated analysis tasks for complex data and (3) schema evolution in complex data warehouses for personalized analyses. Our contributions cover the main phases of the data warehouse design process: data integration and modeling, and user driven-OLAP analysis.

Download Full-text

Innovative Approaches for Efficiently Warehousing Complex Data from the Web

Data Mining ◽

10.4018/978-1-4666-2455-9.ch074 ◽

2013 ◽

pp. 1422-1448

Author(s):

Fadila Bentayeb ◽

Nora Maïz ◽

Hadj Mahboubi ◽

Cécile Favre ◽

Sabine Loudcher ◽

...

Keyword(s):

Data Mining ◽

Decision Support ◽

Data Warehouse ◽

Design Management ◽

Complex Data ◽

Data Warehouses ◽

Process Data ◽

Access Methods ◽

Olap Analysis ◽

The Web

Download Full-text

Query Performance Optimization in XML Data Warehouses

E-Strategies for Resource Management Systems ◽

10.4018/978-1-61692-016-6.ch014 ◽

2010 ◽

pp. 232-253

Author(s):

Hadj Mahboubi ◽

Jérôme Darmont

Keyword(s):

Decision Support ◽

Data Warehouse ◽

Performance Optimization ◽

Database Management ◽

Optimization Techniques ◽

Materialized Views ◽

Complex Data ◽

Data Warehouses ◽

Xml Data ◽

Xml Database

XML data warehouses form an interesting basis for decision-support applications that exploit complex data. However, native-XML database management systems (DBMSs) currently bear limited performances and it is necessary to research for ways to optimize them. In this chapter, the authors present two such techniques. First, they propose an XML join index that is specifically adapted to the multidimensional architecture of XML warehouses. It eliminates join operations while preserving the information contained in the original warehouse. Second, the authors present a strategy for selecting XML materialized views by clustering the query workload. To validate these proposals, the authors measure the response time of a set of decision-support XQueries over an XML data warehouse, with and without using their optimization techniques. The authors’ experimental results demonstrate their efficiency, even when queries are complex and data are voluminous.

Download Full-text

EVALUATION OF BITMAP INDEX USING PROTOTYPE DATA WAREHOUSE

INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY ◽

10.24297/ijct.v2i1.2614 ◽

2012 ◽

Vol 2 (2) ◽

pp. 39-42

Author(s):

Murtadha M. Hamad ◽

Muhammed Abdul Raheem

Keyword(s):

Decision Support ◽

Query Processing ◽

Decision Support Systems ◽

Data Warehouse ◽

Support Systems ◽

Data Warehouses ◽

Bitmap Index ◽

Access Methods ◽

Complex Query ◽

Bitwise Operations

Bitmap indices have become popular access methods for data warehouse applications and decision support systems with large amounts of read-mostly data. This paper could arrive a number of results such as ; Bitmap Index highly improves the performance of Query Answering in Data Warehouses, It highly increases the efficiency of Complex Query processing through using bitwise operations (AND, OR). A prototype of Data Warehouse â€œSTUDENTS DWâ€ has been built according to the conditions of W. Inomn of Data Warehouses. This prototype is built for student's information.

Download Full-text

Using OCL to Model Constraints in Data Warehouses

Technology Diffusion and Adoption ◽

10.4018/978-1-4666-2791-8.ch014 ◽

2013 ◽

pp. 212-224

Author(s):

François Pinet ◽

Myoung-Ah Kang ◽

Kamal Boulil ◽

Sandro Bimonte ◽

Gil De Sousa ◽

...

Keyword(s):

Decision Support ◽

Decision Support Systems ◽

Data Warehouse ◽

Support Systems ◽

General Information ◽

Data Warehouses ◽

Object Constraint Language ◽

Software Applications ◽

Constraint Language ◽

Transactional Databases

Recent research works propose using Object-Oriented (OO) approaches, such as UML to model data warehouses. This paper overviews these recent OO techniques, describing the facts and different analysis dimensions of the data. The authors propose a tutorial of the Object Constraint Language (OCL) and show how this language can be used to specify constraints in OO-based models of data warehouses. Previously, OCL has been only applied to describe constraints in software applications and transactional databases. As such, the authors demonstrate in this paper how to use OCL to represent the different types of data warehouse constraints. This paper helps researchers working in the fields of business intelligence and decision support systems, who wish to learn about the major possibilities that OCL offer in the context of data warehouses. The authors also provide general information about the possible types of implementation of multi-dimensional models and their constraints.

Download Full-text

Using an Ontology-Based Framework to Extract External Web Data for the Data Warehouse

Visual Analytics and Interactive Technologies ◽

10.4018/978-1-60960-102-7.ch003 ◽

2011 ◽

pp. 39-59

Author(s):

Charles Greenidge ◽

Hadrian Peter

Keyword(s):

Semantic Web ◽

Data Warehouse ◽

Language Processing ◽

Web Application ◽

Data Warehouses ◽

Web Data ◽

Intermediate Data ◽

External Data ◽

Automatic Matching ◽

The Web

Data warehouses have established themselves as necessary components of an effective Information Technology (IT) strategy for large businesses. In addition to utilizing operational databases data warehouses must also integrate increasing amounts of external data to assist in decision support. An important source of such external data is the Web. In an effort to ensure the availability and quality of Web data for the data warehouse we propose an intermediate data-staging layer called the Meta-Data Engine (M-DE). A major challenge, however, is the conversion of data originating in the Web, and brought in by robust search engines, to data in the data warehouse. The authors therefore also propose a framework, the Semantic Web Application (SEMWAP) framework, which facilitates semi-automatic matching of instance data from opaque web databases using ontology terms. Their framework combines Information Retrieval (IR), Information Extraction (IE), Natural Language Processing (NLP), and ontology techniques to produce a matching and thus provide a viable building block for Semantic Web (SW) Applications.

Download Full-text

Data Warehouse Maintenance, Evolution and Versioning

Data Warehousing Design and Advanced Engineering Applications ◽

10.4018/978-1-60566-756-0.ch010 ◽

2010 ◽

pp. 171-188 ◽

Cited By ~ 1

Author(s):

Johann Eder ◽

Karl Wiggisser

Keyword(s):

Decision Support ◽

Data Warehouse ◽

Building Blocks ◽

Data Warehouses ◽

Master Data ◽

Analytical Processing ◽

The Common ◽

Instance Analysis ◽

Transactional Data ◽

Common Understanding

Data Warehouses typically are building blocks of decision support systems in companies and public administration. The data contained in a data warehouse is analyzed by means of OnLine Analytical Processing tools, which provide sophisticated features for aggregating and comparing data. Decision support applications depend on the reliability and accuracy of the contained data. Typically, a data warehouse does not only comprise the current snapshot data but also historical data to enable, for instance, analysis over several years. And, as we live in a changing world, one criterion for the reliability and accuracy of the results of such long period queries is their comparability. Whereas data warehouse systems are well prepared for changes in the transactional data, they are, surprisingly, not able to deal with changes in the master data. Nonetheless, such changes do frequently occur. The crucial point for supporting changes is, first of all, being aware of their existence. Second, once you know that a change took place, it is important to know which change (i.e., knowing about differences between versions and relations between the elements of different versions). For data warehouses this means that changes are identified and represented, validity of data and structures are recorded and this knowledge is used for computing correct results for OLAP queries. This chapter is intended to motivate the need for powerful maintenance mechanisms for data warehouse cubes. It presents some basic terms and definitions for the common understanding and introduces the different aspects of data warehouse maintenance. Furthermore, several approaches addressing the problem are presented and classified by their capabilities.

Download Full-text

An Envisioned Approach for Modeling and Supporting User-Centric Query Activities on Data Warehouses

International Journal of Data Warehousing and Mining ◽

10.4018/jdwm.2013040105 ◽

2013 ◽

Vol 9 (2) ◽

pp. 89-109 ◽

Cited By ~ 1

Author(s):

Marie-Aude Aufaure ◽

Alfredo Cuzzocrea ◽

Cécile Favre ◽

Patrick Marcel ◽

Rokia Missaoui

Keyword(s):

Data Mining ◽

Knowledge Management ◽

Information Retrieval ◽

Query Processing ◽

Data Warehouse ◽

Multidimensional Data ◽

Data Warehouses ◽

Cooperative Query Answering ◽

User Centric ◽

Use Of Knowledge

In this vision paper, the authors discuss models and techniques for integrating, processing and querying data, information and knowledge within data warehouses in a user-centric manner. The user-centric emphasis allows us to achieve a number of clear advantages with respect to classical data warehouse architectures, whose most relevant ones are the following: (i) a unified and meaningful representation of multidimensional data and knowledge patterns throughout the data warehouse layers (i.e., loading, storage, metadata, etc); (ii) advanced query mechanisms and guidance that are capable of extracting targeted information and knowledge by means of innovative information retrieval and data mining techniques. Following this main framework, the authors first outline the importance of knowledge representation and management in data warehouses, where knowledge is expressed by existing ontology or patterns discovered from data. Then, the authors propose a user-centric architecture for OLAP query processing, which is the typical applicative interface to data warehouse systems. Finally, the authors propose insights towards cooperative query answering that make use of knowledge management principles and exploit the peculiarities of data warehouses (e.g., multidimensionality, multi-resolution, and so forth).

Download Full-text

Harvesting Information from a Library Data Warehouse

Information Technology and Libraries ◽

10.6017/ital.v19i1.10070 ◽

2017 ◽

Vol 19 (1) ◽

pp. 17-28 ◽

Cited By ~ 2

Author(s):

Siew-Phek T. Su ◽

Ashwin Needamangala

Keyword(s):

Decision Support ◽

Knowledge Discovery ◽

Data Warehouse ◽

Data Warehousing ◽

End Users ◽

Cutting Edge ◽

Data Warehouses ◽

Student Data ◽

Integrated Platform ◽

Academic Sector

Data warehousing technology has been defined by John Ladley as "a set of methods, techniques, and tools that are leveraged together and used to produce a vehicle that delivers data to end users on an integrated platform." (1) This concept h s been applied increasingly by industries worldwide to develop data warehouses for decision support and knowledge discovery. In the academic sector, several universities have developed data warehouses containing the universities' financial, payroll, personnel, budget, and student data. (2) These data warehouses across all industries and academia have met with varying degrees of success. Data warehousing technology and its related issues have been widely discussed and published. (3) Little has been done, however, on the application of this cutting edge technology in the library environment using library data.

Download Full-text

Phaedra, a Protocol-Driven System for Analysis and Validation of High-Content Imaging and Flow Cytometry

CrossRef Listing of Deleted DOIs ◽

10.1177/1087057111432885 ◽

2012 ◽

Vol 17 (4) ◽

pp. 496-506 ◽

Cited By ~ 22

Author(s):

Frans Cornelissen ◽

Miroslav Cik ◽

Emmanuel Gustin

Keyword(s):

Data Mining ◽

Flow Cytometry ◽

High Throughput Screening ◽

Complex Data ◽

Process Data ◽

User Friendliness ◽

Response Curves ◽

High Content Imaging ◽

Advanced Analysis ◽

Rich Data

High-content screening has brought new dimensions to cellular assays by generating rich data sets that characterize cell populations in great detail and detect subtle phenotypes. To derive relevant, reliable conclusions from these complex data, it is crucial to have informatics tools supporting quality control, data reduction, and data mining. These tools must reconcile the complexity of advanced analysis methods with the user-friendliness demanded by the user community. After review of existing applications, we realized the possibility of adding innovative new analysis options. Phaedra was developed to support workflows for drug screening and target discovery, interact with several laboratory information management systems, and process data generated by a range of techniques including high-content imaging, multicolor flow cytometry, and traditional high-throughput screening assays. The application is modular and flexible, with an interface that can be tuned to specific user roles. It offers user-friendly data visualization and reduction tools for HCS but also integrates Matlab for custom image analysis and the Konstanz Information Miner (KNIME) framework for data mining. Phaedra features efficient JPEG2000 compression and full drill-down functionality from dose-response curves down to individual cells, with exclusion and annotation options, cell classification, statistical quality controls, and reporting.

Download Full-text

FROM DATA MINING TO BEHAVIOR MINING

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622006002271 ◽

2006 ◽

Vol 05 (04) ◽

pp. 703-711 ◽

Cited By ~ 21

Author(s):

ZHENGXIN CHEN

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Knowledge Economy ◽

Graph Mining ◽

Complex Data ◽

Fundamental Change ◽

Process Data ◽

Research And Practice ◽

Behavior Mining ◽

Recent Developments

Knowledge economy requires data mining be more goal-oriented so that more tangible results can be produced. This requirement implies that the semantics of the data should be incorporated into the mining process. Data mining is ready to deal with this challenge because recent developments in data mining have shown an increasing interest on mining of complex data (as exemplified by graph mining, text mining, etc.). By incorporating the relationships of the data along with the data itself (rather than focusing on the data alone), complex data injects semantics into the mining process, thus enhancing the potential of making better contribution to knowledge economy. Since the relationships between the data reveal certain behavioral aspects underlying the plain data, this shift of mining from simple data to complex data signals a fundamental change to a new stage in the research and practice of knowledge discovery, which can be termed as behavior mining. Behavior mining also has the potential of unifying some other recent activities in data mining. We discuss important aspects on behavior mining, and discuss its implications for the future of data mining.

Download Full-text