Developments in Data Extraction, Management, and Analysis

Automatic Item Weight Generation for Pattern Mining and its Application

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch009 ◽

2013 ◽

pp. 187-207 ◽

Cited By ~ 1

Author(s):

Yun Sing Koh ◽

Russel Pears ◽

Gillian Dobbie

Keyword(s):

Uniform Distribution ◽

Real World ◽

Association Rule ◽

Association Rule Mining ◽

Natural Variation ◽

Pattern Mining ◽

Weighting Scheme ◽

Rule Mining ◽

Real World Application

Association rule mining discovers relationships among items in a transactional database. Most approaches assume that all items within a dataset have a uniform distribution with respect to support. However, this is not always the case, and weighted association rule mining (WARM) was introduced to provide importance to individual items. Previous approaches to the weighted association rule mining problem require users to assign weights to items. In certain cases, it is difficult to provide weights to all items within a dataset. In this paper, the authors propose a method that is based on a novel Valency model that automatically infers item weights based on interactions between items. The authors experiment shows that the weighting scheme results in rules that better capture the natural variation that occurs in a dataset when compared with a miner that does not employ a weighting scheme. The authors applied the model in a real world application to mine text from a given collection of documents. The use of item weighting enabled the authors to attach more importance to terms that are distinctive. The results demonstrate that keyword discrimination via item weighting leads to informative rules.

Download Full-text

Finding Associations in Composite Data Sets

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch008 ◽

2013 ◽

pp. 162-186

Author(s):

M. Sulaiman Khan ◽

Maybin Muyeba ◽

Frans Coenen ◽

David Reid ◽

Hissam Tawfik

Keyword(s):

Synthetic Data ◽

Data Sets ◽

Formal Definition ◽

Rule Mining ◽

Fuzzy Association Rules ◽

Back Ground ◽

Fuzzy Association Rule ◽

Definition Of ◽

Fuzzy Association Rule Mining ◽

Composite Data

In this paper, a composite fuzzy association rule mining mechanism (CFARM), directed at identifying patterns in datasets comprised of composite attributes, is described. Composite attributes are defined as attributes that can take simultaneously two or more values that subscribe to a common schema. The objective is to generate fuzzy association rules using “properties” associated with these composite attributes. The exemplar application is the analysis of the nutrients contained in items found in grocery data sets. The paper commences with a review of the back ground and related work, and a formal definition of the CFARM concepts. The CFARM algorithm is then fully described and evaluated using both real and synthetic data sets.

Download Full-text

Data Field for Hierarchical Clustering

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch014 ◽

2013 ◽

pp. 303-324

Author(s):

Shuliang Wang ◽

Wenyan Gan ◽

Deyi Li ◽

Deren Li

Keyword(s):

Hierarchical Clustering ◽

Physical Space ◽

Core Data ◽

Data Object ◽

Self Organized ◽

Data Field ◽

Data Objects ◽

The Masses ◽

Space Data ◽

Object Features

In this paper, data field is proposed to group data objects via simulating their mutual interactions and opposite movements for hierarchical clustering. Enlightened by the field in physical space, data field to simulate nuclear field is presented to illuminate the interaction between objects in data space. In the data field, the self-organized process of equipotential lines on many data objects discovers their hierarchical clustering-characteristics. During the clustering process, a random sample is first generated to optimize the impact factor. The masses of data objects are then estimated to select core data object with nonzero masses. Taking the core data objects as the initial clusters, the clusters are iteratively merged hierarchy by hierarchy with good performance. The results of a case study show that the data field is capable of hierarchical clustering on objects varying size, shape or granularity without user-specified parameters, as well as considering the object features inside the clusters and removing the outliers from noisy data. The comparisons illustrate that the data field clustering performs better than K-means, BIRCH, CURE, and CHAMELEON.

Download Full-text

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch012 ◽

2013 ◽

pp. 259-279

Author(s):

Reem Al-Mulla ◽

Zaher Al Aghbari

Keyword(s):

Data Streams ◽

Data Stream ◽

Clustering Algorithms ◽

Arrival Rate ◽

Time Algorithm ◽

Incremental Algorithm ◽

Multiple Data ◽

Multiple Data Streams ◽

Mining Data Streams ◽

New Applications

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.

Download Full-text

Weak Ratio Rules

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch010 ◽

2013 ◽

pp. 208-244

Author(s):

Baoqing Jiang ◽

Xiaohua Hu ◽

Qing Wei ◽

Jingjing Song ◽

Chong Han ◽

...

Keyword(s):

Mathematical Model ◽

Association Rules ◽

Association Rule ◽

Rule Following ◽

Uncertainty Reasoning ◽

Ratio Rule ◽

The Mathematical Model

This paper examines the problem of weak ratio rules between nonnegative real-valued data in a transactional database. The weak ratio rule is a weaker form than Flip Korn’s ratio rule. After analyzing the mathematical model of weak ratio rules problem, the authors conclude that it is a generalization of Boolean association rules problem and every weak ratio rule is supported by a Boolean association rule. Following the properties of weak ratio rules, the authors propose an algorithm for mining an important subset of weak ratio rules and construct a weak ratio rule uncertainty reasoning method. An example is given to show how to apply weak ratio rules to reconstruct lost data, and forecast and detect outliers.

Download Full-text

Dynamic View Management System for Query Prediction to View Materialization

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch007 ◽

2013 ◽

pp. 132-161

Author(s):

Negin Daneshpour ◽

Ahmad Abdollahzadeh Barfourosh

Keyword(s):

Management System ◽

Probabilistic Reasoning ◽

Bayes Rule ◽

Managerial Decision ◽

View Selection ◽

Dynamic View ◽

On Line ◽

Analytical Processing ◽

Selection Algorithms ◽

View Management

On-Line Analytical Processing (OLAP) systems based on data warehouses are the main systems for managerial decision making and must have a quick response time. Several algorithms have been presented to select the proper set of data and elicit suitable structured environments to handle the queries submitted to OLAP systems, which are called view selection algorithms to materialize. As users’ requirements may change during run time, materialization must be viewed dynamically. In this work, the authors propose and operate a dynamic view management system to select and materialize views with new and improved architecture, which predicts incoming queries through association rule mining and three probabilistic reasoning approaches: Conditional probability, Bayes’ rule, and Naïve Bayes’ rule. The proposed system is compared with DynaMat system and Hybrid system through two standard measures. Experimental results show that the proposed dynamic view selection system improves these measures. This system outperforms DynaMat and Hybrid for each type of query and each sequence of incoming queries.

Download Full-text

An Empirical Evaluation of Similarity Coefficients for Binary Valued Data

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch006 ◽

2013 ◽

pp. 109-131

Author(s):

David M. Lewis ◽

Vandana P. Janeja

Keyword(s):

Sensor Networks ◽

Web Mining ◽

Empirical Evaluation ◽

High Dimensionality ◽

Similarity Coefficients ◽

Network Mining ◽

Feature Vectors ◽

Data Similarity ◽

Document Search

In this paper, the authors present an empirical evaluation of similarity coefficients for binary valued data. Similarity coefficients provide a means to measure the similarity or distance between two binary valued objects in a dataset such that the attributes qualifying each object have a 0-1 value. This is useful in several domains, such as similarity of feature vectors in sensor networks, document search, router network mining, and web mining. The authors survey 35 similarity coefficients used in various domains and present conclusions about the efficacy of the similarity computed in (1) labeled data to quantify the accuracy of the similarity coefficients, (2) varying density of the data to evaluate the effect of sparsity of the values, and (3) varying number of attributes to see the effect of high dimensionality in the data on the similarity computed.

Download Full-text

Query Recommendations for OLAP Discovery-Driven Analysis

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch004 ◽

2013 ◽

pp. 66-90

Author(s):

Arnaud Giacometti ◽

Patrick Marcel ◽

Elsa Negre ◽

Arnaud Soulet

Keyword(s):

Open Source ◽

Recommender System ◽

Database Queries ◽

Query Log ◽

Levels Of Detail ◽

On Line ◽

A Current ◽

Tedious Process

Recommending database queries is an emerging and promising field of research and is of particular interest in the domain of OLAP systems, where the user is left with the tedious process of navigating large datacubes. In this paper, the authors present a framework for a recommender system for OLAP users that leverages former users’ investigations to enhance discovery-driven analysis. This framework recommends the discoveries detected in former sessions that investigated the same unexpected data as the current session. This task is accomplished by (1) analysing the query log to discover pairs of cells at various levels of detail for which the measure values differ significantly, and (2) analysing a current query to detect if a particular pair of cells for which the measure values differ significantly can be related to what is discovered in the log. This framework is implemented in a system that uses the open source Mondrian server and recommends MDX queries. Preliminary experiments were conducted to assess the quality of the recommendations in terms of precision and recall, as well as the efficiency of their on-line computation.

Download Full-text

Distributed Privacy Preserving Clustering via Homomorphic Secret Sharing and its Application to (Vertically) Partitioned Spatio-Temporal Data

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch003 ◽

2013 ◽

pp. 45-65

Author(s):

Can Brochmann Yildizli ◽

Thomas Pedersen ◽

Yucel Saygin ◽

Erkay Savas ◽

Albert Levi

Keyword(s):

Data Mining ◽

Real World ◽

Privacy Preserving ◽

Secure Multiparty Computation ◽

Multiparty Computation ◽

Privacy Preserving Data Mining ◽

Computational Costs ◽

Data Mining Algorithms ◽

Spatio Temporal ◽

Mining Algorithms

Recent concerns about privacy issues have motivated data mining researchers to develop methods for performing data mining while preserving the privacy of individuals. One approach to develop privacy preserving data mining algorithms is secure multiparty computation, which allows for privacy preserving data mining algorithms that do not trade accuracy for privacy. However, earlier methods suffer from very high communication and computational costs, making them infeasible to use in any real world scenario. Moreover, these algorithms have strict assumptions on the involved parties, assuming involved parties will not collude with each other. In this paper, the authors propose a new secure multiparty computation based k-means clustering algorithm that is both secure and efficient enough to be used in a real world scenario. Experiments based on realistic scenarios reveal that this protocol has lower communication costs and significantly lower computational costs.

Download Full-text

Visual Mobility Analysis using T-Warehouse

Developments in Data Extraction, Management, and Analysis ◽

10.4018/978-1-4666-2148-0.ch001 ◽

2013 ◽

pp. 1-22

Author(s):

A. Raffaetà ◽

L. Leonardi ◽

G. Marketos ◽

G. Andrienko ◽

N. Andrienko ◽

...

Keyword(s):

Data Warehousing ◽

Data Cube ◽

Trajectory Data ◽

Useful Knowledge ◽

Mobility Data ◽

Research Fields ◽

Technological Advances ◽

Wireless Telecommunication ◽

Warehousing Systems ◽

Olap Analysis

Technological advances in sensing technologies and wireless telecommunication devices enable research fields related to the management of trajectory data. The challenge after storing the data is the implementation of appropriate analytics for extracting useful knowledge. However, traditional data warehousing systems and techniques were not designed for analyzing trajectory data. In this paper, the authors demonstrate a framework that transforms the traditional data cube model into a trajectory warehouse. As a proof-of-concept, the authors implement T-Warehouse, a system that incorporates all the required steps for Visual Trajectory Data Warehousing, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data.

Download Full-text

Developments in Data Extraction, Management, and Analysis
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Automatic Item Weight Generation for Pattern Mining and its Application

Finding Associations in Composite Data Sets

Data Field for Hierarchical Clustering

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Weak Ratio Rules

Dynamic View Management System for Query Prediction to View Materialization

An Empirical Evaluation of Similarity Coefficients for Binary Valued Data

Query Recommendations for OLAP Discovery-Driven Analysis

Distributed Privacy Preserving Clustering via Homomorphic Secret Sharing and its Application to (Vertically) Partitioned Spatio-Temporal Data

Visual Mobility Analysis using T-Warehouse

Export Citation Format

Developments in Data Extraction, Management, and AnalysisLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Automatic Item Weight Generation for Pattern Mining and its Application

Finding Associations in Composite Data Sets

Data Field for Hierarchical Clustering

Incremental Algorithm for Discovering Frequent Subsequences in Multiple Data Streams

Weak Ratio Rules

Dynamic View Management System for Query Prediction to View Materialization

An Empirical Evaluation of Similarity Coefficients for Binary Valued Data

Query Recommendations for OLAP Discovery-Driven Analysis

Distributed Privacy Preserving Clustering via Homomorphic Secret Sharing and its Application to (Vertically) Partitioned Spatio-Temporal Data

Visual Mobility Analysis using T-Warehouse

Developments in Data Extraction, Management, and Analysis
Latest Publications