Opportunity explorer: Navigating large databases using knowledge discovery templates

1995 ◽  
Vol 4 (1) ◽  
pp. 27-37 ◽  
Author(s):  
Tej Anand
2008 ◽  
pp. 3235-3251
Author(s):  
Yongqiao Xiao ◽  
Jenq-Foung Yao ◽  
Guizhen Yang

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like bioinformatic data. One major challenge arises due to the fact that embedded subtrees are no longer ordinary subtrees, but preserve only part of the ancestor-descendant relationships in the original trees. To solve the embedded subtree mining problem, in this article we propose a novel algorithm, called TreeGrow, which is optimized in two important respects. First, it obtains frequency counts of root-to-leaf paths through efficient compression of trees, thereby being able to quickly grow an embedded subtree pattern path by path instead of node by node. Second, candidate subtree generation is highly localized so as to avoid unnecessary computational overhead. Experimental results on benchmark synthetic data sets have shown that our algorithm can outperform unoptimized methods by up to 20 times.


1994 ◽  
Vol 9 (1) ◽  
pp. 57-60 ◽  
Author(s):  
Gregory Piatetsky-Shapiro

As the number and size of very large databases continues to grow rapidly, so does the need to make sense of them. This need is addressed by the field called knowledge Discovery in Databases (KDD), which combines approaches from machine learning, statistics, intelligent databases, and knowledge acquisition. KDD encompasses a number of different discovery methods, such as clustering, data summarization, learning classification rules, finding dependency networks, analysing changes, and detecting anomalies (Matheus et at., 1993).


2002 ◽  
Vol 1 (1) ◽  
pp. 80-91 ◽  
Author(s):  
Donna J. Peuquet ◽  
Menno-Jan Kraak

In the modern computing context, the map is no longer just a final product. Maps are now being used in a fundamentally different way – as a self-directed tool for deriving the desired information from geographic data. This, along with developments in GIScience and computer graphics, have led to the new field of geographic visualization. A central issue is how to design visualization capabilities that, as a process, facilitate creative thinking for discovering previously new information from large databases. The authors propose the term ‘geobrowsing’ to designate this process. A number of interrelated ways that visualization can be used to spark the imagination in order to derive new insights are discussed and a brief example provided. Based upon the cognitive literature, specific properties of a visual image that promote discovery and insight are discussed. These are known as preinventive properties, and include; novelty, incongruence, abstraction, and ambiguity. All of these properties, either individually or in combination, tend to produce features that are unanticipated by the viewer, and often not explicitly created or anticipated by the person generating the visual display. While traditional (i.e. non-computer generated) images can also possess these properties, as shown in the historical examples in this discussion, it is the capability of the viewer to directly and quickly manipulate these properties that provides the real power of ‘geobrowsing’ for uncovering new insights.


1998 ◽  
Vol 07 (02) ◽  
pp. 189-220 ◽  
Author(s):  
ROBERT J. HILDERMAN ◽  
HOWARD J. HAMILTON ◽  
COLIN L. CARTER ◽  
NICK CERCONE

We propose the share-confidence framework for knowledge discovery from databases which addresses the problem of mining characterized association rules from market basket data (i.e., itemsets). Our goal is to not only discover the buying patterns of customers, but also to discover customer profiles by partitioning customers into distinct classes. We present a new algorithm for classifying itemsets based upon characteristic attributes extracted from census or lifestyle data. Our algorithm combines the A priori algorithm for discovering association rules between items in large databases, and the A O G algorithm for attribute-oriented generalization in large databases. We show how characterized itemsets can be generalized according to concept hierarchies associated with the characteristic attributes. Finally, we present experimental results that demonstrate the utility of the share-confidence framework.


Author(s):  
A. de Carvalho ◽  
A. P. Braga ◽  
S. O. Rezende ◽  
E. Martineli ◽  
T. Ludermir

In the last few years, a large number of companies are starting to realize the value of their databases. These databases, which usually cover transactions performed over several years, may lead to a better understanding of the customer’s profile, thus supporting the offer of new products or services. The treatment of these large databases surpasses the human ability to understand and efficiently deal with these data, creating the need for a new generation of tools and techniques to perform automatic and intelligent analyses of large databases. The extraction of useful knowledge from large databases is named knowledge discovery. Knowledge discovery is a very demanding task and requires the use of sophisticated techniques. The recent advances in hardware and software make possible the development of new computing tools to support such tasks. Knowledge discovery in databases comprises a sequence of stages. One of its main stages, the data mining process, provides efficient methods and tools to extract meaningful information from large databases. In this chapter, data mining methods are used to predict the behavior of credit card users. These methods are employed to extract meaningful knowledge from a credit card database using machine learning techniques. The performance of these techniques are compared by analyzing both their correct classification rates and the knowledge extracted in a linguistic representation (rule sets or decision trees). The use of a linguistic representation for expressing knowledge acquired by learning systems aims to improve the user understanding. Under this assumption, and to make sure that these systems will be accepted, several techniques have been developed by the artificial intelligence community, using both the symbolic and the connectionist approaches.


Author(s):  
Vasudha Bhatnagar ◽  
S. K. Gupta

Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large databases” ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a). The recently developed KDD technology is based on a well-defined, multi-step “KDD process” for discovering knowledge from large data repositories. The basic problem addressed by the KDD process is one of mapping lowlevel data (operational in nature and too voluminous) to a more abstract form (descriptive approximation or model of the process that generated the data) or a useful form (for example, a predictive model) (Fayyad, Piatetsky-Shapiro & Smyth, 1996b). The KDD process evolves with pro-active intervention of the domain experts, data mining analyst and the end-users. It is a ‘continuous’ process in the sense that the results of the process may fuel new motivations for further discoveries (Chapman et al., 2000). Modeling and planning of the KDD process has been recognized as a new research field (John, 2000). In this chapter we provide an introduction to the process of knowledge discovery in databases (KDD process), and present some models (conceptual as well as practical) to carry out the KDD endeavor.


Sign in / Sign up

Export Citation Format

Share Document