Encyclopedia of Data Warehousing and Mining

In recent years, advances in data collection and management technologies have led to a proliferation of very large databases. These large data repositories typically are created in the hope that, through analysis such as data mining and decision support, they will yield new insights into the data and the real-world processes that created them. In practice, however, while the collection and storage of massive datasets has become relatively straightforward, effective data analysis has proven more difficult to achieve. One reason that data analysis successes have proven elusive is that most analysis queries, by their nature, require aggregation or summarization of large portions of the data being analyzed. For multi-gigabyte data repositories, this means that processing even a single analysis query involves accessing enormous amounts of data, leading to prohibitively expensive running times. This severely limits the feasibility of many types of analysis applications, especially those that depend on timeliness or interactivity.

Download Full-text

Video Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch223 ◽

2011 ◽

pp. 1185-1189 ◽

Cited By ~ 2

Author(s):

Jung Hwan Oh ◽

Jeong Kyu Lee ◽

Sae Hwang

Keyword(s):

Data Mining ◽

Research Area ◽

Multimedia Databases ◽

Video Data ◽

Multimedia Data ◽

Data Sets ◽

Data Set ◽

Useful Knowledge ◽

Active Research ◽

Diverse Data

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.

Download Full-text

Tree and Graph Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch214 ◽

2011 ◽

pp. 1140-1145

Author(s):

Dimitrios Katsaros ◽

Yannis Manolopoulos

Keyword(s):

Data Mining ◽

Graph Mining ◽

Data Mining Techniques ◽

The Past

During the past decade, we have witnessed an explosive growth in our capabilities to both generate and collect data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data. Data mining involves the discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in huge collections of data.

Download Full-text

Synthesis with Data Warehouse Applications and Utilities

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch205 ◽

2011 ◽

pp. 1092-1097

Author(s):

Hakikur Rahman

Keyword(s):

Business Strategy ◽

Current Trend ◽

Real Life ◽

Data Repositories ◽

Business Decisions ◽

Business World ◽

Data Marts ◽

High Quality Information ◽

The Right ◽

Transactional Data

Today’s fast moving business world faces continuous challenges and abrupt changes in real-life situations at the context of data and information management. In the current trend of information explosion, businesses recognize the value of the information they can gather from various sources. The information that drives business decisions can have many forms, including archived data, transactional data, e-mail, Web input, surveyed data, data repositories, and data marts. The organization’s business strategy should be to deliver high-quality information to the right people at the right time.

Download Full-text

Statistical Metadata in Data Processing and Interchange

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch197 ◽

2011 ◽

pp. 1048-1053 ◽

Cited By ~ 1

Author(s):

Maria Vardaki

Keyword(s):

Information Systems ◽

Data Processing ◽

Object Oriented ◽

Statistical Information ◽

Complex Object

The term metadata is frequently considered in many different sciences. Statistical metadata is a term generally used to denote data about data. Modern statistical information systems (SIS) use metadata templates or complex object-oriented metadata models, making an extensive and active usage of metadata.

Download Full-text

Semi-Supervised Learning

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch192 ◽

2011 ◽

pp. 1022-1027

Author(s):

Tobias Scheffer

Keyword(s):

Supervised Learning ◽

Supervised Classification ◽

Unlabeled Data ◽

Training Data ◽

Classification Algorithms ◽

Classification Problems

For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs. Semi-supervised classification algorithms aim at utilizing information contained in unlabeled data in addition to the (few) labeled data.

Download Full-text

Scientific Web Intelligence

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch187 ◽

2011 ◽

pp. 995-999

Author(s):

Mike Thelwall

Keyword(s):

Data Mining ◽

End Users ◽

Research Field ◽

Data Sources ◽

Web Pages ◽

Web Intelligence ◽

Research Fields ◽

Patterns Of Communication ◽

The Impact ◽

Citation Databases

Scientific Web Intelligence (SWI) is a research field that combines techniques from data mining, Web intelligence, and scientometrics to extract useful information from the links and text of academic-related Web pages using various clustering, visualization, and counting techniques. Its origins lie in previous scientometric research into mining off-line academic data sources such as journal citation databases. Typical scientometric objectives are either evaluative (assessing the impact of research) or relational (identifying patterns of communication within and among research fields). From scientometrics, SWI also inherits a need to validate its methods and results so that the methods can be justified to end users, and the causes of the results can be found and explained.

Download Full-text

Recovery of Data Dependencies

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch178 ◽

2011 ◽

pp. 947-949

Author(s):

Hee Beng Kuan Tan ◽

Yuan Zhao

Keyword(s):

Data Mining ◽

Database Applications ◽

Data Dependencies ◽

Source Codes ◽

Database Technology ◽

Important Means

Today, many companies have to deal with problems in maintaining legacy database applications, which were developed on old database technology. These applications are getting harder and harder to maintain. Reengineering is an important means to address the problems and to upgrade the applications to newer technology (Hainaut, Englebert, Henrard, Hick, J.-M., & Roland, 1995). However, much of the design of legacy databases including data dependencies is buried in the transactions, which update the databases. They are not explicitly stated anywhere else. The recovery of data dependencies designed from transactions is essential to both the reengineering of database applications and frequently encountered maintenance tasks. Without an automated approach, the recovery is difficult and time-consuming. This issue is important in data mining, which entails mining the relationships between data from program source codes. However, until recently, no such approach was proposed in the literature.

Download Full-text

Physical Data Warehousing Design

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch171 ◽

2011 ◽

pp. 906-911 ◽

Cited By ~ 1

Author(s):

Ladjel Bellatreche ◽

Mukesh Mohania

Keyword(s):

Decision Making ◽

Data Warehouse ◽

Historical Data ◽

Data Warehousing ◽

Management Decision ◽

Materialized Views ◽

Performance Goal ◽

Interactive Nature ◽

Management Decision Making ◽

Using Data

Recently, organizations have increasingly emphasized applications in which current and historical data are analyzed and explored comprehensively, identifying useful trends and creating summaries of the data in order to support high-level decision making. Every organization keeps accumulating data from different functional units, so that they can be analyzed (after integration), and important decisions can be made from the analytical results. Conceptually, a data warehouse is extremely simple. As popularized by Inmon (1992), it is a “subject-oriented, integrated, time-invariant, non-updatable collection of data used to support management decision-making processes and business intelligence”. A data warehouse is a repository into which are placed all data relevant to the management of an organization and from which emerge the information and knowledge needed to effectively manage the organization. This management can be done using data-mining techniques, comparisons of historical data, and trend analysis. For such analysis, it is vital that (1) data should be accurate, complete, consistent, well defined, and time-stamped for informational purposes; and (2) data should follow business rules and satisfy integrity constraints. Designing a data warehouse is a lengthy, time-consuming, and iterative process. Due to the interactive nature of a data warehouse application, having fast query response time is a critical performance goal. Therefore, the physical design of a warehouse gets the lion’s part of research done in the data warehousing area. Several techniques have been developed to meet the performance requirement of such an application, including materialized views, indexing techniques, partitioning and parallel processing, and so forth. Next, we briefly outline the architecture of a data warehousing system.

Download Full-text

Mining Microarray Data

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch154 ◽

2011 ◽

pp. 810-814

Author(s):

Nanxiang Ge ◽

Li Liu

Keyword(s):

Human Genome ◽

Sequence Data ◽

Scientific Discovery ◽

New Technology ◽

Genome Project ◽

Technology Platforms ◽

Major Application ◽

Data Explosion ◽

The Many ◽

The Human Genome Project

During the last 10 years and in particularly within the last few years, there has been a data explosion associated with the completion of the human genome project (HGP) (IHGMC and Venter et al., 2001) in 2001 and the many sophisticated genomics technologies. The human genome (and genome from other species) now provides an enormous amount of data waiting to be transformed into useful information and scientific knowledge. The availability of genome sequence data also sparks the development of many new technology platforms. Among the available different technology platforms, microarray is one of the technologies that is becoming more and more mature and has been widely used as a tool for scientific discovery. The major application of microarray is for simultaneously measuring the expression level of thousands of genes in the cell. It has been widely used in drug discovery and starts to impact the drug development process.

Download Full-text

Encyclopedia of Data Warehousing and Mining
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Sampling Methods in Approximate Query Answering Systems

Video Data Mining

Tree and Graph Mining

Synthesis with Data Warehouse Applications and Utilities

Statistical Metadata in Data Processing and Interchange

Semi-Supervised Learning

Scientific Web Intelligence

Recovery of Data Dependencies

Physical Data Warehousing Design

Mining Microarray Data

Export Citation Format

Encyclopedia of Data Warehousing and MiningLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Sampling Methods in Approximate Query Answering Systems

Video Data Mining

Tree and Graph Mining

Synthesis with Data Warehouse Applications and Utilities

Statistical Metadata in Data Processing and Interchange

Semi-Supervised Learning

Scientific Web Intelligence

Recovery of Data Dependencies

Physical Data Warehousing Design

Mining Microarray Data

Encyclopedia of Data Warehousing and Mining
Latest Publications