Opportunity explorer: Navigating large databases using knowledge discovery templates

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like bioinformatic data. One major challenge arises due to the fact that embedded subtrees are no longer ordinary subtrees, but preserve only part of the ancestor-descendant relationships in the original trees. To solve the embedded subtree mining problem, in this article we propose a novel algorithm, called TreeGrow, which is optimized in two important respects. First, it obtains frequency counts of root-to-leaf paths through efficient compression of trees, thereby being able to quickly grow an embedded subtree pattern path by path instead of node by node. Second, candidate subtree generation is highly localized so as to avoid unnecessary computational overhead. Experimental results on benchmark synthetic data sets have shown that our algorithm can outperform unoptimized methods by up to 20 times.

Download Full-text

Efficient probability density balancing for supporting distributed knowledge discovery in large databases

IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339) ◽

10.1109/ijcnn.1999.833472 ◽

2003 ◽

Author(s):

D. Obradovic ◽

Z. Obradovic

Keyword(s):

Knowledge Discovery ◽

Probability Density ◽

Distributed Knowledge ◽

Large Databases

Download Full-text

Knowledge Discovery from Very Large Databases Using Frequent Concept Lattices

Machine Learning: ECML 2000 - Lecture Notes in Computer Science ◽

10.1007/3-540-45164-1_44 ◽

2000 ◽

pp. 437-445 ◽

Cited By ~ 6

Author(s):

Kitsana Waiyamai ◽

Lotfi Lakhal

Keyword(s):

Knowledge Discovery ◽

Concept Lattices ◽

Large Databases ◽

Very Large Databases

Download Full-text

Knowledge discovery in databases: Progress report

The Knowledge Engineering Review ◽

10.1017/s0269888900006573 ◽

1994 ◽

Vol 9 (1) ◽

pp. 57-60 ◽

Cited By ~ 5

Author(s):

Gregory Piatetsky-Shapiro

Keyword(s):

Knowledge Discovery ◽

Progress Report ◽

Knowledge Discovery In Databases ◽

Classification Rules ◽

Data Summarization ◽

Dependency Networks ◽

Large Databases ◽

Very Large Databases ◽

Intelligent Databases ◽

Clustering Data

As the number and size of very large databases continues to grow rapidly, so does the need to make sense of them. This need is addressed by the field called knowledge Discovery in Databases (KDD), which combines approaches from machine learning, statistics, intelligent databases, and knowledge acquisition. KDD encompasses a number of different discovery methods, such as clustering, data summarization, learning classification rules, finding dependency networks, analysing changes, and detecting anomalies (Matheus et at., 1993).

Download Full-text

Self-organizing systems for knowledge discovery in large databases

IJCNN'99. International Joint Conference on Neural Networks. Proceedings (Cat. No.99CH36339) ◽

10.1109/ijcnn.1999.833461 ◽

2003 ◽

Cited By ~ 1

Author(s):

W.H. Hsu ◽

L.S. Anvil ◽

W.M. Pottenger ◽

D. Tcheng ◽

M. Welge

Keyword(s):

Knowledge Discovery ◽

Large Databases ◽

Self Organizing

Download Full-text

Geobrowsing: Creative Thinking and Knowledge Discovery Using Geographic Visualization

Information Visualization ◽

10.1057/palgrave.ivs.9500007 ◽

2002 ◽

Vol 1 (1) ◽

pp. 80-91 ◽

Cited By ~ 21

Author(s):

Donna J. Peuquet ◽

Menno-Jan Kraak

Keyword(s):

Computer Graphics ◽

Knowledge Discovery ◽

Creative Thinking ◽

Visual Image ◽

Visual Display ◽

Geographic Visualization ◽

Real Power ◽

New Information ◽

Large Databases ◽

Product Maps

In the modern computing context, the map is no longer just a final product. Maps are now being used in a fundamentally different way – as a self-directed tool for deriving the desired information from geographic data. This, along with developments in GIScience and computer graphics, have led to the new field of geographic visualization. A central issue is how to design visualization capabilities that, as a process, facilitate creative thinking for discovering previously new information from large databases. The authors propose the term ‘geobrowsing’ to designate this process. A number of interrelated ways that visualization can be used to spark the imagination in order to derive new insights are discussed and a brief example provided. Based upon the cognitive literature, specific properties of a visual image that promote discovery and insight are discussed. These are known as preinventive properties, and include; novelty, incongruence, abstraction, and ambiguity. All of these properties, either individually or in combination, tend to produce features that are unanticipated by the viewer, and often not explicitly created or anticipated by the person generating the visual display. While traditional (i.e. non-computer generated) images can also possess these properties, as shown in the historical examples in this discussion, it is the capability of the viewer to directly and quickly manipulate these properties that provides the real power of ‘geobrowsing’ for uncovering new insights.

Download Full-text

MINING ASSOCIATION RULES FROM MARKET BASKET DATA USING SHARE MEASURES AND CHARACTERIZED ITEMSETS

International Journal of Artificial Intelligence Tools ◽

10.1142/s0218213098000111 ◽

1998 ◽

Vol 07 (02) ◽

pp. 189-220 ◽

Cited By ~ 14

Author(s):

ROBERT J. HILDERMAN ◽

HOWARD J. HAMILTON ◽

COLIN L. CARTER ◽

NICK CERCONE

Keyword(s):

Knowledge Discovery ◽

Association Rules ◽

A Priori ◽

Experimental Results ◽

Market Basket ◽

Large Databases ◽

Mining Association Rules ◽

Customer Profiles ◽

Concept Hierarchies

We propose the share-confidence framework for knowledge discovery from databases which addresses the problem of mining characterized association rules from market basket data (i.e., itemsets). Our goal is to not only discover the buying patterns of customers, but also to discover customer profiles by partitioning customers into distinct classes. We present a new algorithm for classifying itemsets based upon characteristic attributes extracted from census or lifestyle data. Our algorithm combines the A priori algorithm for discovering association rules between items in large databases, and the A O G algorithm for attribute-oriented generalization in large databases. We show how characterized itemsets can be generalized according to concept hierarchies associated with the characteristic attributes. Finally, we present experimental results that demonstrate the utility of the share-confidence framework.

Download Full-text

Understanding Credit Card User's Behaviour

Heuristic and Optimization for Knowledge Discovery ◽

10.4018/978-1-930708-26-6.ch013 ◽

2011 ◽

pp. 241-262 ◽

Cited By ~ 1

Author(s):

A. de Carvalho ◽

A. P. Braga ◽

S. O. Rezende ◽

E. Martineli ◽

T. Ludermir

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Credit Card ◽

Machine Learning Techniques ◽

Knowledge Discovery In Databases ◽

Intelligence Community ◽

Linguistic Representation ◽

Large Databases ◽

Rule Sets ◽

Tools And Techniques

In the last few years, a large number of companies are starting to realize the value of their databases. These databases, which usually cover transactions performed over several years, may lead to a better understanding of the customer’s profile, thus supporting the offer of new products or services. The treatment of these large databases surpasses the human ability to understand and efficiently deal with these data, creating the need for a new generation of tools and techniques to perform automatic and intelligent analyses of large databases. The extraction of useful knowledge from large databases is named knowledge discovery. Knowledge discovery is a very demanding task and requires the use of sophisticated techniques. The recent advances in hardware and software make possible the development of new computing tools to support such tasks. Knowledge discovery in databases comprises a sequence of stages. One of its main stages, the data mining process, provides efficient methods and tools to extract meaningful information from large databases. In this chapter, data mining methods are used to predict the behavior of credit card users. These methods are employed to extract meaningful knowledge from a credit card database using machine learning techniques. The performance of these techniques are compared by analyzing both their correct classification rates and the knowledge extracted in a linguistic representation (rule sets or decision trees). The use of a linguistic representation for expressing knowledge acquired by learning systems aims to improve the user understanding. Under this assumption, and to make sure that these systems will be accepted, several techniques have been developed by the artificial intelligence community, using both the symbolic and the connectionist approaches.

Download Full-text

Modeling the KDD Process

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch207 ◽

2011 ◽

pp. 1337-1345 ◽

Cited By ~ 1

Author(s):

Vasudha Bhatnagar ◽

S. K. Gupta

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Continuous Process ◽

Large Data ◽

Research Field ◽

Knowledge Discovery In Databases ◽

Data Repositories ◽

Domain Experts ◽

Large Databases ◽

New Research

Knowledge Discovery in Databases (KDD) is classically defined as the “nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in large databases” ( Fayyad, Piatetsky-Shapiro & Smyth, 1996a). The recently developed KDD technology is based on a well-defined, multi-step “KDD process” for discovering knowledge from large data repositories. The basic problem addressed by the KDD process is one of mapping lowlevel data (operational in nature and too voluminous) to a more abstract form (descriptive approximation or model of the process that generated the data) or a useful form (for example, a predictive model) (Fayyad, Piatetsky-Shapiro & Smyth, 1996b). The KDD process evolves with pro-active intervention of the domain experts, data mining analyst and the end-users. It is a ‘continuous’ process in the sense that the results of the process may fuel new motivations for further discoveries (Chapman et al., 2000). Modeling and planning of the KDD process has been recognized as a new research field (John, 2000). In this chapter we provide an introduction to the process of knowledge discovery in databases (KDD process), and present some models (conceptual as well as practical) to carry out the KDD endeavor.

Download Full-text

Genetic Algorithm for Optimization of Multiple Objectives in Knowledge Discovery from Large Databases

Multi-Objective Evolutionary Algorithms for Knowledge Discovery from Databases - Studies in Computational Intelligence ◽

10.1007/978-3-540-77467-9_1 ◽

2008 ◽

pp. 1-22 ◽

Cited By ~ 3

Author(s):

Satchidananda Dehuri ◽

Susmita Ghosh ◽

Ashish Ghosh

Keyword(s):

Genetic Algorithm ◽

Knowledge Discovery ◽

Multiple Objectives ◽

Large Databases

Download Full-text