Encyclopedia of Data Warehousing and Mining
Latest Publications


TOTAL DOCUMENTS

234
(FIVE YEARS 0)

H-INDEX

8
(FIVE YEARS 0)

Published By IGI Global

9781591405573, 9781591405597

Author(s):  
Jung Hwan Oh ◽  
Jeong Kyu Lee ◽  
Sae Hwang

Data mining, which is defined as the process of extracting previously unknown knowledge and detecting interesting patterns from a massive set of data, has been an active research area. As a result, several commercial products and research prototypes are available nowadays. However, most of these studies have focused on corporate data — typically in an alpha-numeric database, and relatively less work has been pursued for the mining of multimedia data (Zaïane, Han, & Zhu, 2000). Digital multimedia differs from previous forms of combined media in that the bits representing texts, images, audios, and videos can be treated as data by computer programs (Simoff, Djeraba, & Zaïane, 2002). One facet of these diverse data in terms of underlying models and formats is that they are synchronized and integrated hence, can be treated as integrated data records. The collection of such integral data records constitutes a multimedia data set. The challenge of extracting meaningful patterns from such data sets has lead to research and development in the area of multimedia data mining. This is a challenging field due to the non-structured nature of multimedia data. Such ubiquitous data is required in many applications such as financial, medical, advertising and Command, Control, Communications and Intelligence (C3I) (Thuraisingham, Clifton, Maurer, & Ceruti, 2001). Multimedia databases are widespread and multimedia data sets are extremely large. There are tools for managing and searching within such collections, but the need for tools to extract hidden and useful knowledge embedded within multimedia data is becoming critical for many decision-making applications.


Author(s):  
Dimitrios Katsaros ◽  
Yannis Manolopoulos

During the past decade, we have witnessed an explosive growth in our capabilities to both generate and collect data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data. Data mining involves the discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in huge collections of data.


Author(s):  
Hakikur Rahman

Today’s fast moving business world faces continuous challenges and abrupt changes in real-life situations at the context of data and information management. In the current trend of information explosion, businesses recognize the value of the information they can gather from various sources. The information that drives business decisions can have many forms, including archived data, transactional data, e-mail, Web input, surveyed data, data repositories, and data marts. The organization’s business strategy should be to deliver high-quality information to the right people at the right time.


Author(s):  
Maria Vardaki

The term metadata is frequently considered in many different sciences. Statistical metadata is a term generally used to denote data about data. Modern statistical information systems (SIS) use metadata templates or complex object-oriented metadata models, making an extensive and active usage of metadata.


Author(s):  
Tobias Scheffer

For many classification problems, unlabeled training data are inexpensive and readily available, whereas labeling training data imposes costs. Semi-supervised classification algorithms aim at utilizing information contained in unlabeled data in addition to the (few) labeled data.


Author(s):  
Mike Thelwall

Scientific Web Intelligence (SWI) is a research field that combines techniques from data mining, Web intelligence, and scientometrics to extract useful information from the links and text of academic-related Web pages using various clustering, visualization, and counting techniques. Its origins lie in previous scientometric research into mining off-line academic data sources such as journal citation databases. Typical scientometric objectives are either evaluative (assessing the impact of research) or relational (identifying patterns of communication within and among research fields). From scientometrics, SWI also inherits a need to validate its methods and results so that the methods can be justified to end users, and the causes of the results can be found and explained.


Author(s):  
Hee Beng Kuan Tan ◽  
Yuan Zhao

Today, many companies have to deal with problems in maintaining legacy database applications, which were developed on old database technology. These applications are getting harder and harder to maintain. Reengineering is an important means to address the problems and to upgrade the applications to newer technology (Hainaut, Englebert, Henrard, Hick, J.-M., & Roland, 1995). However, much of the design of legacy databases including data dependencies is buried in the transactions, which update the databases. They are not explicitly stated anywhere else. The recovery of data dependencies designed from transactions is essential to both the reengineering of database applications and frequently encountered maintenance tasks. Without an automated approach, the recovery is difficult and time-consuming. This issue is important in data mining, which entails mining the relationships between data from program source codes. However, until recently, no such approach was proposed in the literature.


Author(s):  
Ladjel Bellatreche ◽  
Mukesh Mohania

Recently, organizations have increasingly emphasized applications in which current and historical data are analyzed and explored comprehensively, identifying useful trends and creating summaries of the data in order to support high-level decision making. Every organization keeps accumulating data from different functional units, so that they can be analyzed (after integration), and important decisions can be made from the analytical results. Conceptually, a data warehouse is extremely simple. As popularized by Inmon (1992), it is a “subject-oriented, integrated, time-invariant, non-updatable collection of data used to support management decision-making processes and business intelligence”. A data warehouse is a repository into which are placed all data relevant to the management of an organization and from which emerge the information and knowledge needed to effectively manage the organization. This management can be done using data-mining techniques, comparisons of historical data, and trend analysis. For such analysis, it is vital that (1) data should be accurate, complete, consistent, well defined, and time-stamped for informational purposes; and (2) data should follow business rules and satisfy integrity constraints. Designing a data warehouse is a lengthy, time-consuming, and iterative process. Due to the interactive nature of a data warehouse application, having fast query response time is a critical performance goal. Therefore, the physical design of a warehouse gets the lion’s part of research done in the data warehousing area. Several techniques have been developed to meet the performance requirement of such an application, including materialized views, indexing techniques, partitioning and parallel processing, and so forth. Next, we briefly outline the architecture of a data warehousing system.


Author(s):  
Nanxiang Ge ◽  
Li Liu

During the last 10 years and in particularly within the last few years, there has been a data explosion associated with the completion of the human genome project (HGP) (IHGMC and Venter et al., 2001) in 2001 and the many sophisticated genomics technologies. The human genome (and genome from other species) now provides an enormous amount of data waiting to be transformed into useful information and scientific knowledge. The availability of genome sequence data also sparks the development of many new technology platforms. Among the available different technology platforms, microarray is one of the technologies that is becoming more and more mature and has been widely used as a tool for scientific discovery. The major application of microarray is for simultaneously measuring the expression level of thousands of genes in the cell. It has been widely used in drug discovery and starts to impact the drug development process.


Author(s):  
Qiankun Zhao ◽  
Sourav Saha Bhowmick

Nowadays the Web poses itself as the largest data repository ever available in the history of humankind (Reis et al., 2004). However, the availability of huge amount of Web data does not imply that users can get whatever they want more easily. On the contrary, the massive amount of data on the Web has overwhelmed their abilities to find the desired information. It has been claimed that 99% of the data reachable on the Web is useless to 99% of the users (Han & Kamber, 2000, pp. 436). That is, an individual may be interested in only a tiny fragment of the Web data. However, the huge and diverse properties of Web data do imply that Web data provides a rich and unprecedented data mining source.


Sign in / Sign up

Export Citation Format

Share Document