Data Warehousing and Mining
Latest Publications





Published By IGI Global

9781599049519, 9781599049526

2008 ◽  
pp. 146-168 ◽  
Jose D. Montero

This chapter provides a brief introduction to data mining, the data mining process, and its applications to manufacturing. Several examples are provided to illustrate how data mining, a key area of computational intelligence, offers a great promise to manufacturing companies. It also covers a brief overview of data warehousing as a strategic resource for quality improvement and as a major enabler for data mining applications. Although data mining has been used extensively in several industries, in manufacturing its use is more limited and new. The examples published in the literature of using data mining in manufacturing promise a bright future for a broader expansion of data mining and business intelligence in general into manufacturing. The author believes that data mining will become a main stream application in manufacturing and it will enhance the analytical capabilities in the organization beyond what is offered and used today from statistical methods.

2008 ◽  
pp. 2051-2066
Yan Zho ◽  
Yaohua Chen ◽  
Yiyu Yao

While many data mining models concentrate on automation and efficiency, interactive data mining models focus on adaptive and effective communications between human users and computer systems. User requirements and preferences play the most important roles in human-machine interactions, and guide the selection of target knowledge representations, operations, and measurements. Practically, user requirements and preferences also decide strategies of abnormal situation handling, and explanations of mined patterns. In this article, we discuss these fundamental issues based on a user-centered three-layer framework of interactive data mining.

2008 ◽  
pp. 2226-2247
Alex Burns ◽  
Shital Shah ◽  
Andrew Kusiak

This paper presents a hybrid approach that integrates a genetic algorithm (GA) and data mining to produce control signatures. The control signatures define the best parameter intervals leading to a desired outcome. This hybrid method integrates multiple rule sets generated by a data mining algorithm with the fitness function of a GA. The solutions of the GA represent intersections among rules providing tight parameter bounds. The integration of intuitive rules provides an explanation for each generated control setting and it provides insights into the decision making process. The ability to analyze parameter trends and the feasible solutions generated by the GA with respect to the outcomes is another benefit of the proposed hybrid method. The presented approach for deriving control signatures is applicable to various domains, such as energy, medical protocols, manufacturing, airline operations, customer service, and so on. Control signatures were developed and tested for control of a power plant boiler. These signatures discovered insightful relationships among parameters. The results and benefits of the proposed method for the power plant boiler are discussed in the paper.

2008 ◽  
pp. 2289-2295 ◽  
Hamid R. Nemati ◽  
Christopher D. Barko

An increasing number of organizations are struggling to overcome “information paralysis” — there is so much data available that it is difficult to understand what is and is not relevant. In addition, managerial intuition and instinct are more prevalent than hard facts in driving organizational decisions. Organizational Data Mining (ODM) is defined as leveraging data mining tools and technologies to enhance the decision-making process by transforming data into valuable and actionable knowledge to gain a competitive advantage (Nemati & Barko, 2001). The fundamentals of ODM can be categorized into three fields: Artificial Intelligence (AI), Information Technology (IT), and Organizational Theory (OT), with OT being the core differentiator between ODM and data mining. We take a brief look at the current status of ODM research and how a sample of organizations is benefiting. Next we examine the evolution of ODM and conclude our chapter by contemplating its challenging yet opportunistic future.

2008 ◽  
pp. 622-641
Lionel Savary ◽  
Georges Gardarin ◽  
Karine Zeitouni

GML is a promising model for integrating geodata within data warehouses. The resulting databases are generally large and require spatial operators to be handled. Depending on the size of the target geographical data and the number and complexity of operators in a query, the processing time may quickly become prohibitive. To optimize spatial queries over GML encoded data, this chapter introduces a novel cache-based architecture. A new cache replacement policy is then proposed. It takes into account the containment properties of geographical data and predicates, and allows evicting the most irrelevant values from the cache. Experiences with the GeoCache prototype show the effectiveness of the proposed architecture with the associated replacement policy, compared to existing works.

2008 ◽  
pp. 3524-3530
Protima Banerjee ◽  
Xiaohua Hu ◽  
Illhio Yoo

Over the past few decades, data mining has emerged as a field of research critical to understanding and assimilating the large stores of data accumulated by corporations, government agencies, and laboratories. Early on, mining algorithms and techniques were limited to relational data sets coming directly from Online Transaction Processing (OLTP) systems, or from a consolidated enterprise data warehouse. However, recent work has begun to extend the limits of data mining strategies to include “semi-structured data such as HTML and XML texts, symbolic sequences, ordered trees and relations represented by advanced logics” (Washio & Motoda, 2003).

2008 ◽  
pp. 3611-3620
Janusz Swierzowicz

The development of information technology is particularly noticeable in the methods and techniques of data acquisition, high-performance computing, and bandwidth frequency. According to a newly observed phenomenon, called a storage low (Fayyad & Uthurusamy, 2002), the capacity of digital data storage is doubled every 9 months with respect to the price. Data can be stored in many forms of digital media, for example, still images taken by a digital camera, MP3 songs, or MPEG videos from desktops, cell phones, or video cameras. Such data exceeds the total cumulative handwriting and printing during all of recorded human history (Fayyad, 2001). According to current analysis carried out by IBM Almaden Research (Swierzowicz, 2002), data volumes are growing at different speeds. The fastest one is Internet-resource growth: It will achieve the digital online threshold of exabytes within a few years (Liautaud, 2001). In these fast-growing volumes of data environments, restrictions are connected with a human’s low data-complexity and dimensionality analysis. Investigations on combining different media data, multimedia, into one application have begun as early as the 1960s, when text and images were combined in a document. During the research and development process, audio, video, and animation were synchronized using a time line to specify when they should be played (Rowe & Jain, 2004). Since the middle 1990s, the problems of multimedia data capture, storage, transmission, and presentation have extensively been investigated. Over the past few years, research on multimedia standards (e.g., MPEG-4, X3D, MPEG-7) has continued to grow. These standards are adapted to represent very complex multimedia data sets; can transparently handle sound, images, videos, and 3-D (three-dimensional) objects combined with events, synchronization, and scripting languages; and can describe the content of any multimedia object. Different algorithms need to be used in multimedia distribution and multimedia database applications. An example is an image database that stores pictures of birds and a sound database that stores recordings of birds (Kossmann, 2000). The distributed query that asks for “top ten different kinds of birds that have black feathers and a high voice” is described there by Kossmann (2000, p.436).

2008 ◽  
pp. 3639-3644
Bhavani Thuraisingham

Data mining is the process of posing queries to large quantities of data and extracting information often previously unknown using mathematical, statistical, and machine-learning techniques. Data mining has many applications in a number of areas, including marketing and sales, medicine, law, manufacturing, and, more recently, homeland security. Using data mining, one can uncover hidden dependencies between terrorist groups as well as possibly predict terrorist events based on past experience. One particular data-mining technique that is being investigated a great deal for homeland security is link analysis, where links are drawn between various nodes, possibly detecting some hidden links.

2008 ◽  
pp. 119-145
Alex Freitas ◽  
Andre´ C.P.L.F. de Carvalho

In machine learning and data mining, most of the works in classification problems deal with flat classification, where each instance is classified in one of a set of possible classes and there is no hierarchical relationship between the classes. There are, however, more complex classification problems where the classes to be predicted are hierarchically related. This chapter presents a tutorial on the hierarchical classification techniques found in the literature. We also discuss how hierarchical classification techniques have been applied to the area of bioinformatics (particularly the prediction of protein function), where hierarchical classification problems are often found.

2008 ◽  
pp. 974-1003 ◽  
Alfredo Cuzzocrea ◽  
Domenico Sacca ◽  
Paolo Serafino

Efficiently supporting advanced OLAP visualization of multidimensional data cubes is a novel and challenging research topic, which results to be of interest for a large family of data warehouse applications relying on the management of spatio-temporal (e.g., mobile) data, scientific and statistical data, sensor network data, biological data, etc. On the other hand, the issue of visualizing multidimensional data domains has been quite neglected from the research community, since it does not belong to the well-founded conceptual-logical-physical design hierarchy inherited from relational database methodologies. Inspired from these considerations, in this article we propose an innovative advanced OLAP visualization technique that meaningfully combines (i) the so-called OLAP dimension flattening process, which allows us to extract two-dimensional OLAP views from multidimensional data cubes, and (ii) very efficient data compression techniques for such views, which allow us to generate “semantics-aware” compressed representations where data are grouped along OLAP hierarchies.

Sign in / Sign up

Export Citation Format

Share Document