Knowledge Discovery from Very Large Databases Using Frequent Concept Lattices

As the number and size of very large databases continues to grow rapidly, so does the need to make sense of them. This need is addressed by the field called knowledge Discovery in Databases (KDD), which combines approaches from machine learning, statistics, intelligent databases, and knowledge acquisition. KDD encompasses a number of different discovery methods, such as clustering, data summarization, learning classification rules, finding dependency networks, analysing changes, and detecting anomalies (Matheus et at., 1993).

Download Full-text

Knowledge discovery in very large databases

Proceedings of the 14th international conference on Software engineering and knowledge engineering - SEKE '02 ◽

10.1145/568760.568764 ◽

2002 ◽

Author(s):

Xindong Wu

Keyword(s):

Knowledge Discovery ◽

Large Databases ◽

Very Large Databases

Download Full-text

Very Large Databases

Wiley Encyclopedia of Electrical and Electronics Engineering ◽

10.1002/047134608x.w4308 ◽

1999 ◽

Author(s):

Minos N. Garofalakis ◽

Ren��e J. Miller

Keyword(s):

Large Databases ◽

Very Large Databases

Download Full-text

Sampling Methods in Approximate Query Answering Systems

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch186 ◽

2011 ◽

pp. 990-994 ◽

Cited By ~ 2

Author(s):

Gautam Das

Keyword(s):

Data Analysis ◽

Large Data ◽

Massive Datasets ◽

Data Repositories ◽

Large Databases ◽

Approximate Query Answering ◽

Very Large Databases ◽

Approximate Query ◽

And Storage ◽

Collection And Management

In recent years, advances in data collection and management technologies have led to a proliferation of very large databases. These large data repositories typically are created in the hope that, through analysis such as data mining and decision support, they will yield new insights into the data and the real-world processes that created them. In practice, however, while the collection and storage of massive datasets has become relatively straightforward, effective data analysis has proven more difficult to achieve. One reason that data analysis successes have proven elusive is that most analysis queries, by their nature, require aggregation or summarization of large portions of the data being analyzed. For multi-gigabyte data repositories, this means that processing even a single analysis query involves accessing enormous amounts of data, leading to prohibitively expensive running times. This severely limits the feasibility of many types of analysis applications, especially those that depend on timeliness or interactivity.

Download Full-text

Scalable Blocking for Very Large Databases

ECML PKDD 2020 Workshops - Communications in Computer and Information Science ◽

10.1007/978-3-030-65965-3_20 ◽

2020 ◽

pp. 303-319

Author(s):

Andrew Borthwick ◽

Stephen Ash ◽

Bin Pang ◽

Shehzad Qureshi ◽

Timothy Jones

Keyword(s):

Large Databases ◽

Very Large Databases

Download Full-text

A User-Profile-Oriented Mediation Architecture for Very Large DataBases in a Dynamic Inter-Grid Context

2008 International Conference on Complex, Intelligent and Software Intensive Systems ◽

10.1109/cisis.2008.79 ◽

2008 ◽

Author(s):

Nadia Bennani ◽

Julien Gossa ◽

Ny Haingo Andrianarisoa

Keyword(s):

User Profile ◽

Large Databases ◽

Very Large Databases

Download Full-text

Knowledge Combination vs. Meta-Learning

Encyclopedia of Information Science and Technology, Second Edition ◽

10.4018/978-1-60566-026-4.ch368 ◽

2011 ◽

pp. 2325-2331

Author(s):

Ivan Bruha

Keyword(s):

Knowledge Integration ◽

Learning Strategy ◽

Combination Strategy ◽

General Belief ◽

Useful Knowledge ◽

Large Databases ◽

Meta Learning ◽

Very Large Databases ◽

Intelligent Information ◽

High Level

Research in intelligent information systems investigates the possibilities of enhancing their over-all performance, particularly their prediction accuracy and time complexity. One such discipline, data mining (DM), processes usually very large databases in a profound and robust way (Fayyad et al., 1996). DM points to the overall process of determining a useful knowledge from databases, that is, extracting high-level knowledge from low-level data in the context of large databases. This article discusses two newer directions in this field, namely knowledge combination and meta-learning (Vilalta & Drissi, 2002). There exist approaches to combine various paradigms into one robust (hybrid, multistrategy) system which utilizes the advantages of each subsystem and tries to eliminate their drawbacks. There is a general belief that integrating results obtained from multiple lower-level decision-making systems, each usually (but not required) based on a different paradigm, produce better performance. Such multi-level knowledgebased systems are usually referred to as knowledge integration systems. One subset of these systems is called knowledge combination (Fan et al., 1996). We focus on a common topology of the knowledge combination strategy with base learners and base classifiers (Bruha, 2004). Meta-learning investigates how learning systems may improve their performance through experience in order to become flexible. Its goal is to search dynamically for the best learning strategy. We define the fundamental characteristics of the meta-learning such as bias, and hypothesis space. Section 2 surveys the various directions in algorithms and topologies utilized in knowledge combination and meta-learning. Section 3 represents the main focus of this article: description of knowledge combination techniques, meta-learning, and a particular application including the corresponding flow charts. The last section presents the future trends in these topics.

Download Full-text

Discovering Frequent Embedded Subtree Patterns from Large Databases of Unordered Labeled Trees

Data Warehousing and Mining ◽

10.4018/978-1-59904-951-9.ch206 ◽

2008 ◽

pp. 3235-3251

Author(s):

Yongqiao Xiao ◽

Jenq-Foung Yao ◽

Guizhen Yang

Keyword(s):

Knowledge Discovery ◽

Web Sites ◽

Synthetic Data ◽

Research Interest ◽

Data Sets ◽

Labeled Trees ◽

Computational Overhead ◽

Large Databases ◽

Frequency Counts ◽

Novel Algorithm

Recent years have witnessed a surge of research interest in knowledge discovery from data domains with complex structures, such as trees and graphs. In this paper, we address the problem of mining maximal frequent embedded subtrees which is motivated by such important applications as mining “hot” spots of Web sites from Web usage logs and discovering significant “deep” structures from tree-like bioinformatic data. One major challenge arises due to the fact that embedded subtrees are no longer ordinary subtrees, but preserve only part of the ancestor-descendant relationships in the original trees. To solve the embedded subtree mining problem, in this article we propose a novel algorithm, called TreeGrow, which is optimized in two important respects. First, it obtains frequency counts of root-to-leaf paths through efficient compression of trees, thereby being able to quickly grow an embedded subtree pattern path by path instead of node by node. Second, candidate subtree generation is highly localized so as to avoid unnecessary computational overhead. Experimental results on benchmark synthetic data sets have shown that our algorithm can outperform unoptimized methods by up to 20 times.

Download Full-text

A better way for finding the optimal number of nodes in a distributed database management system

Daffodil International University Journal of Science and Technology ◽

10.3329/diujst.v4i2.4362 ◽

2010 ◽

Vol 4 (2) ◽

pp. 19-22

Author(s):

Rashed Mustafa ◽

Md Javed Hossain ◽

Thomas Chowdhury

Keyword(s):

Management System ◽

Database Management ◽

Database Systems ◽

Distributed Database ◽

Optimal Number ◽

Database Management System ◽

Data Fragmentation ◽

Large Databases ◽

Very Large Databases ◽

Distributed Database Management

Distributed Database Management System (DDBMS) is one of the prime concerns in distributed computing. The driving force of development of DDBMS is the demand of the applications that need to query very large databases (order of terabytes). Traditional Client- Server database systems are too slower to handle such applications. This paper presents a better way to find the optimal number of nodes in a distributed database management systems. Keywords: DDBMS, Data Fragmentation, Linear Search, RMI. DOI: 10.3329/diujst.v4i2.4362 Daffodil International University Journal of Science and Technology Vol.4(2) 2009 pp.19-22

Download Full-text

Association Rules in Very Large Databases

Lecture Notes in Computer Science - Association Rule Mining ◽

10.1007/3-540-46027-6_6 ◽

2002 ◽

pp. 161-198

Keyword(s):

Association Rules ◽

Large Databases ◽

Very Large Databases

Download Full-text