Evolving Application Domains of Data Warehousing and Mining
Latest Publications


TOTAL DOCUMENTS

13
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781605668161, 9781605668178

Author(s):  
Symeon Papadopoulos ◽  
Fotis Menemenis ◽  
Athena Vakali ◽  
Ioannis Kompatsiaris

The recent advent and wide adoption of Social Bookmarking Systems (SBS) has disrupted the traditional model of online content publishing and consumption. Until recently, the majority of content consumed by people was published as a result of a centralized selection process. Nowadays, the large-scale adoption of the Web 2.0 paradigm has diffused the content selection process to the masses. Modern SBS-based applications permit their users to submit their preferred content, comment on and rate the content of other users and establish social relations with each other. As a result, the evolution of popularity of socially bookmarked content constitutes nowadays an overly complex phenomenon calling for a multiaspect analysis approach. This chapter attempts to provide a unified treatment of the phenomenon by studying four aspects of popularity of socially bookmarked content: (a) the distributional properties of content consumption, (b) its evolution in time, (c) the correlation between the semantics of online content and its popularity, and (d) the impact of online social networks on the content consumption behavior of individuals. To this end, a case study is presented where the proposed analysis framework is applied to a large dataset collected from Digg, a popular social bookmarking and rating application.


Author(s):  
Claudia Plant ◽  
Christian Böhm

Clustering or finding a natural grouping of a data set is essential for knowledge discovery in many applications. This chapter provides an overview on emerging trends within the vital research area of clustering including subspace and projected clustering, correlation clustering, semi-supervised clustering, spectral clustering and parameter-free clustering. To raise the awareness of the reader for the challenges associated with clustering, the chapter first provides a general problem specification and introduces basic clustering paradigms. The requirements from concrete example applications in life sciences and the web provide the motivation for the discussion of novel approaches to clustering. Thus, this chapter is intended to appeal to all those interested in the state-of-the art in clustering including basic researchers as well as practitioners.


Author(s):  
Sandro Bimonte ◽  
Marlène Villanova-Oliver ◽  
Jerome Gensel

Spatial OLAP refers to the integration of spatial data in multidimensional applications at physical, logical and conceptual levels. The multidimensional aggregation of geographic objects (geographic measures) exhibits theoretical and implementation problems. In this chapter, the authors present a panorama of aggregation issues in multidimensional, geostatistic, GIS and Spatial OLAP models. Then, they illustrate how overlapping geometries and dependency of spatial and alphanumeric aggregation are necessary for correctly aggregating geographic measures. Consequently, they present an extension of the logical multidimensional model GeoCube (Bimonte et al., 2006) to deal with these issues.


Author(s):  
Cândida G. Silva ◽  
Pedro Gabriel Ferreira ◽  
Paulo J. Azevedo ◽  
Rui M.M. Brito

The protein folding problem, i.e. the identification of the rules that determine the acquisition of the native, functional, three-dimensional structure of a protein from its linear sequence of amino-acids, still is a major challenge in structural molecular biology. Moreover, the identification of a series of neurodegenerative diseases as protein unfolding/misfolding disorders highlights the importance of a detailed characterisation of the molecular events driving the unfolding and misfolding processes in proteins. One way of exploring these processes is through the use of molecular dynamics simulations. The analysis and comparison of the enormous amount of data generated by multiple protein folding or unfolding simulations is not a trivial task, presenting many interesting challenges to the data mining community. Considering the central role of the hydrophobic effect in protein folding, we show here the application of two data mining methods – hierarchical clustering and association rules – for the analysis and comparison of the solvent accessible surface area (SASA) variation profiles of each one of the 127 amino-acid residues in the amyloidogenic protein Transthyretin, across multiple molecular dynamics protein unfolding simulations.


Author(s):  
Martine Collard ◽  
Leila Kefi-Khelif ◽  
Van Trang Tran ◽  
Olivier Corby

DNA micro-array is a fastest-growing technology in molecular biology and bioinformatics. Based on series of microscopic spots of DNA sequences, they allow the measurement of gene expression in specific conditions at a whole genome scale. Micro-array experiments result in wide sets of expression data that are useful to the biologist to investigate various biological questions. Experimental micro-arrays data and sources of biological knowledge are now available on public repositories. As a consequence, comparative analyses involving several experiments become conceivable and hold potentially relevant knowledge. Nevertheless, the task of manually navigating and searching for similar tendencies in such huge spaces is mainly impracticable for the investigator and leads to limited results. In this context, the authors propose a semantic data warehousing solution based on semantic web technologies that allows to monitoring both the diversity and the volume of all related data.


Author(s):  
Xuegang Huang

The wide adoption of business intelligence applications has let more and more organizations to build and maintain data warehouse systems. Concepts like “unified view of data” and “one version of the truth” have been the main drive of creating data warehouses. The dynamics of the business world poses the challenges of managing large volume, complex data in data warehouses while the real-time integration and master data needs are presented. This chapter summarizes the past and present patterns of typical data warehouse architectures and describes how the concept of service-oriented architecture influences the future evolvement of data warehouse architecture. The discussion takes many real world requirements in data warehouse solutions and lists considerations on how architecture patterns can solve these requirements.


Author(s):  
Chao Luo ◽  
Yanchang Zhao ◽  
Dan Luo ◽  
Yuming Ou ◽  
Li Liu

This chapter aims to provide a comprehensive survey of the current advanced technologies of exception mining in stock market. The stock market surveillance is to identify market anomalies so as to provide a fair and efficient trading platform. The technologies of market surveillance developed from simple statistical rules to more advanced technologies, such as data mining and artificial intelligent. This chapter provides the basic concepts of exception mining in stock market. Then the recent advances of exception mining in this domain are presented and the key issues are discussed. The advantages and disadvantages of the advanced technologies are analyzed. Furthermore, our model of OMM (Outlier Mining on Multiple time series) is introduced. Finally, this chapter points out the future research directions and related issues in reality.


Author(s):  
Nan Jiang

The recent advances in sensor technologies have made these small, tiny devices much cheaper and convenient to use in many different applications, for example, the weather and environmental monitoring applications, the hospital and factory operation sites, sensor devices on the traffic road and moving vehicles and so on. The data collected from sensors forms a sensor stream and is transferred to the server to perform data warehousing and mining tasks for the end user to perform data analysis. Several data preprocessing steps are necessary to enrich the data with domain information for the data warehousing and mining tasks in the sensor stream applications. This chapter presents a general framework for domain-driven mining of sensor stream applications. The proposed framework is able to enrich sensor streams with additional domain information that meets the application requirements. Experimental studies of the proposed framework are performed on real data for two applications: a traffic management and an environmental monitoring site.


Author(s):  
Claudia Cherubini

Most data required for cleanup risk assessment are intrinsically characterized by a high degree of variability and uncertainty. Moreover, typical features of environmental datasets are the occurrence of extreme values like a few random ‘hot spots’ of large concentrations within a background of data below the detection limit. In the field of environmental pollution risk assessment constitutes a support method for decisions inherent the necessity to carry out a procedure of remediation of an area. Therefore it would be adequate to provide the analysis with elements that allow to take into account the nature of the data themselves, particularly their uncertainty. In this context, this chapter focuses on the application of an uncertainty modeling approach based on geostatistics for the parameters which enter as input in the probabilistic procedure of risk assessment. Compared with a traditional approach, the applied method provides the possibility to quantify and integrate the uncertainty and variability of input parameters in the determination of risk. Moreover, it has proved to be successful in catching and describing in a synthetic way the relations and tendencies that are intrinsic in the data set, characteristics that are neglected by a traditional classical approach.


Author(s):  
Christoph Quix ◽  
Xiang Li ◽  
David Kensche ◽  
Sandra Geisler

Data streams are continuous, rapid, time-varying, and transient streams of data and provide new opportunities for analysis of timely information. Data processing in data streams faces similar challenges as view management in data warehousing: continuous query processing is related to view maintenance in data warehousing, multi-query optimization for continuous queries is highly related to view selection in conventional relational DBMS and data warehouses. In this chapter, we give an overview of view maintenance and view selection methods, explain the fundamental issues of data stream management, and discuss how view management techniques from data warehousing are related to data stream management. We also give directions for future research in view management, data streams, and data warehousing.


Sign in / Sign up

Export Citation Format

Share Document