Vertical Database Design for Scalable Data Mining

2008 ◽  
pp. 3694-3699
Author(s):  
William Perrizo ◽  
Qiang Ding ◽  
Masum Serazi ◽  
Taufik Abidin ◽  
Baoying Wang

For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record structures and then processed vertically (vertical scans of files of horizontal records). This makes good sense when the requested result is a set of horizontal records. In knowledge discovery and data mining, however, researchers are typically interested in collective properties or predictions that can be expressed very briefly. Therefore, the approaches for scan-based processing of horizontal records are known to be inadequate for data mining in very large data repositories (Han & Kamber, 2001; Han, Pei, & Yin, 2000; Shafer, Agrawal, & Mehta, 1996).

Author(s):  
William Perrizo ◽  
Qiang Ding ◽  
Masum Serazi ◽  
Taufik Abidin ◽  
Baoying Wang

For several decades and especially with the preeminence of relational database systems, data is almost always formed into horizontal record structures and then processed vertically (vertical scans of files of horizontal records). This makes good sense when the requested result is a set of horizontal records. In knowledge discovery and data mining, however, researchers are typically interested in collective properties or predictions that can be expressed very briefly. Therefore, the approaches for scan-based processing of horizontal records are known to be inadequate for data mining in very large data repositories (Han & Kamber, 2001; Han, Pei, & Yin, 2000; Shafer, Agrawal, & Mehta, 1996).


2008 ◽  
Vol 12 (1) ◽  
pp. 17-24
Author(s):  
Ihssan Alkadi

Recently data mining has become more popular in the information industry. It is due to the availability of huge amounts of data. Industry needs turning such data into useful information and knowledge. This information and knowledge can be used in many applications ranging from business management, production control, and market analysis, to engineering design and science exploration. Database and information technology have been evolving systematically from primitive file processing systems to sophisticated and powerful databases systems. The research and development in database systems has led to the development of relational database systems, data modeling tools, and indexing and data organization techniques. In relational database systems data are stored in relational tables. In addition, users can get convenient and flexible access to data through query languages, optimized query processing, user interfaces and transaction management and optimized methods for On-Line Transaction Processing (OLTP). The abundant data, which needs powerful data analysis tools, has been described as a data rich but information poor situation. The fast-growing, tremendous amount of data, collected and stored in large and numerous databases. Humans can not analyze these large amounts of data. So we need powerful tools to analyze this large amount of data. As a result, data collected in large databases become data tombs. These are data archives that are seldom visited. So, important decisions are often not made based on the information-rich data stored in databases rather based on a decision maker's intuition. This is because the decision maker does not have the tools to extract the valuable knowledge embedded in the vast amounts of data. Data mining tools which perform data analysis may uncover important data patterns, contributing greatly to business strategies, knowledge bases, and scientific and medical research. So data mining tools will turn data tombs into golden nuggets of knowledge.


Author(s):  
Vasudha Bhatnagar ◽  
S. K. Gupta

Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large databases” ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a). The recently developed KDD technology is based on a well-defined, multi-step “KDD process” for discovering knowledge from large data repositories. The basic problem addressed by the KDD process is one of mapping lowlevel data (operational in nature and too voluminous) to a more abstract form (descriptive approximation or model of the process that generated the data) or a useful form (for example, a predictive model) (Fayyad, Piatetsky-Shapiro & Smyth, 1996b). The KDD process evolves with pro-active intervention of the domain experts, data mining analyst and the end-users. It is a ‘continuous’ process in the sense that the results of the process may fuel new motivations for further discoveries (Chapman et al., 2000). Modeling and planning of the KDD process has been recognized as a new research field (John, 2000). In this chapter we provide an introduction to the process of knowledge discovery in databases (KDD process), and present some models (conceptual as well as practical) to carry out the KDD endeavor.


Author(s):  
Carlos Ordonez ◽  
Javier García-García ◽  
Carlos Garcia-Alvarado ◽  
Wellington Cabrera ◽  
Veerabhadran Baladandayuthapani ◽  
...  

Author(s):  
Lutz Hamel

Modern, commercially available relational database systems now routinely include a cadre of data retrieval and analysis tools. Here we shed some light on the interrelationships between the most common tools and components included in today’s database systems: query language engines, data mining components, and on-line analytical processing (OLAP) tools. We do so by pair-wise juxtaposition which will underscore their differences and highlight their complementary value.


Author(s):  
Lutz Hamel

Modern, commercially available relational database systems now routinely include a cadre of data retrieval and analysis tools. Here we shed some light on the interrelationships between the most common tools and components included in today’s database systems: query language engines, data mining components, and online analytical processing (OLAP) tools. We do so by pair-wise juxtaposition, which will underscore their differences and highlight their complementary value.


Data Mining ◽  
2011 ◽  
pp. 55-79
Author(s):  
Herna Viktor ◽  
Eric Paquet ◽  
Gys le Roux

Data mining concerns the discovery and extraction of knowledge chunks from large data repositories. In a cooperative datamining environment, more than one data mining tool collaborates during the knowledge discovery process. This chapter describes a data mining approach used to visualize the cooperative data mining process. According to this approach, visual data mining consists of both data and knowledge visualization. First, the data are visualized during both data preprocessing and data mining. In this way, the quality of the data is assessed and improved throughout the knowledge discovery process. Second, the knowledge, as discovered by the individual learners, is assessed and modified through the interactive visualization of the cooperative data mining process and its results. The knowledge obtained from the human domain expert also forms part of the process. Finally, the use of virtual reality-based visualization is proposed as a new method to model both the data and its descriptors.


Sign in / Sign up

Export Citation Format

Share Document