Developments in Data Extraction, Management, and Analysis
Latest Publications


TOTAL DOCUMENTS

15
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781466621480, 9781466621497

Author(s):  
Yun Sing Koh ◽  
Russel Pears ◽  
Gillian Dobbie

Association rule mining discovers relationships among items in a transactional database. Most approaches assume that all items within a dataset have a uniform distribution with respect to support. However, this is not always the case, and weighted association rule mining (WARM) was introduced to provide importance to individual items. Previous approaches to the weighted association rule mining problem require users to assign weights to items. In certain cases, it is difficult to provide weights to all items within a dataset. In this paper, the authors propose a method that is based on a novel Valency model that automatically infers item weights based on interactions between items. The authors experiment shows that the weighting scheme results in rules that better capture the natural variation that occurs in a dataset when compared with a miner that does not employ a weighting scheme. The authors applied the model in a real world application to mine text from a given collection of documents. The use of item weighting enabled the authors to attach more importance to terms that are distinctive. The results demonstrate that keyword discrimination via item weighting leads to informative rules.


Author(s):  
M. Sulaiman Khan ◽  
Maybin Muyeba ◽  
Frans Coenen ◽  
David Reid ◽  
Hissam Tawfik

In this paper, a composite fuzzy association rule mining mechanism (CFARM), directed at identifying patterns in datasets comprised of composite attributes, is described. Composite attributes are defined as attributes that can take simultaneously two or more values that subscribe to a common schema. The objective is to generate fuzzy association rules using “properties” associated with these composite attributes. The exemplar application is the analysis of the nutrients contained in items found in grocery data sets. The paper commences with a review of the back ground and related work, and a formal definition of the CFARM concepts. The CFARM algorithm is then fully described and evaluated using both real and synthetic data sets.


Author(s):  
Shuliang Wang ◽  
Wenyan Gan ◽  
Deyi Li ◽  
Deren Li

In this paper, data field is proposed to group data objects via simulating their mutual interactions and opposite movements for hierarchical clustering. Enlightened by the field in physical space, data field to simulate nuclear field is presented to illuminate the interaction between objects in data space. In the data field, the self-organized process of equipotential lines on many data objects discovers their hierarchical clustering-characteristics. During the clustering process, a random sample is first generated to optimize the impact factor. The masses of data objects are then estimated to select core data object with nonzero masses. Taking the core data objects as the initial clusters, the clusters are iteratively merged hierarchy by hierarchy with good performance. The results of a case study show that the data field is capable of hierarchical clustering on objects varying size, shape or granularity without user-specified parameters, as well as considering the object features inside the clusters and removing the outliers from noisy data. The comparisons illustrate that the data field clustering performs better than K-means, BIRCH, CURE, and CHAMELEON.


Author(s):  
Reem Al-Mulla ◽  
Zaher Al Aghbari

In recent years, new applications emerged that produce data streams, such as stock data and sensor networks. Therefore, finding frequent subsequences, or clusters of subsequences, in data streams is an essential task in data mining. Data streams are continuous in nature, unbounded in size and have a high arrival rate. Due to these characteristics, traditional clustering algorithms fail to effectively find clusters in data streams. Thus, an efficient incremental algorithm is proposed to find frequent subsequences in multiple data streams. The described approach for finding frequent subsequences is by clustering subsequences of a data stream. The proposed algorithm uses a window model to buffer the continuous data streams. Further, it does not recompute the clustering results for the whole data stream at every window, but rather it builds on clustering results of previous windows. The proposed approach also employs a decay value for each discovered cluster to determine when to remove old clusters and retain recent ones. In addition, the proposed algorithm is efficient as it scans the data streams once and it is considered an Any-time algorithm since the frequent subsequences are ready at the end of every window.


Author(s):  
Baoqing Jiang ◽  
Xiaohua Hu ◽  
Qing Wei ◽  
Jingjing Song ◽  
Chong Han ◽  
...  

This paper examines the problem of weak ratio rules between nonnegative real-valued data in a transactional database. The weak ratio rule is a weaker form than Flip Korn’s ratio rule. After analyzing the mathematical model of weak ratio rules problem, the authors conclude that it is a generalization of Boolean association rules problem and every weak ratio rule is supported by a Boolean association rule. Following the properties of weak ratio rules, the authors propose an algorithm for mining an important subset of weak ratio rules and construct a weak ratio rule uncertainty reasoning method. An example is given to show how to apply weak ratio rules to reconstruct lost data, and forecast and detect outliers.


Author(s):  
Negin Daneshpour ◽  
Ahmad Abdollahzadeh Barfourosh

On-Line Analytical Processing (OLAP) systems based on data warehouses are the main systems for managerial decision making and must have a quick response time. Several algorithms have been presented to select the proper set of data and elicit suitable structured environments to handle the queries submitted to OLAP systems, which are called view selection algorithms to materialize. As users’ requirements may change during run time, materialization must be viewed dynamically. In this work, the authors propose and operate a dynamic view management system to select and materialize views with new and improved architecture, which predicts incoming queries through association rule mining and three probabilistic reasoning approaches: Conditional probability, Bayes’ rule, and Naïve Bayes’ rule. The proposed system is compared with DynaMat system and Hybrid system through two standard measures. Experimental results show that the proposed dynamic view selection system improves these measures. This system outperforms DynaMat and Hybrid for each type of query and each sequence of incoming queries.


Author(s):  
David M. Lewis ◽  
Vandana P. Janeja

In this paper, the authors present an empirical evaluation of similarity coefficients for binary valued data. Similarity coefficients provide a means to measure the similarity or distance between two binary valued objects in a dataset such that the attributes qualifying each object have a 0-1 value. This is useful in several domains, such as similarity of feature vectors in sensor networks, document search, router network mining, and web mining. The authors survey 35 similarity coefficients used in various domains and present conclusions about the efficacy of the similarity computed in (1) labeled data to quantify the accuracy of the similarity coefficients, (2) varying density of the data to evaluate the effect of sparsity of the values, and (3) varying number of attributes to see the effect of high dimensionality in the data on the similarity computed.


Author(s):  
Arnaud Giacometti ◽  
Patrick Marcel ◽  
Elsa Negre ◽  
Arnaud Soulet

Recommending database queries is an emerging and promising field of research and is of particular interest in the domain of OLAP systems, where the user is left with the tedious process of navigating large datacubes. In this paper, the authors present a framework for a recommender system for OLAP users that leverages former users’ investigations to enhance discovery-driven analysis. This framework recommends the discoveries detected in former sessions that investigated the same unexpected data as the current session. This task is accomplished by (1) analysing the query log to discover pairs of cells at various levels of detail for which the measure values differ significantly, and (2) analysing a current query to detect if a particular pair of cells for which the measure values differ significantly can be related to what is discovered in the log. This framework is implemented in a system that uses the open source Mondrian server and recommends MDX queries. Preliminary experiments were conducted to assess the quality of the recommendations in terms of precision and recall, as well as the efficiency of their on-line computation.


Author(s):  
Can Brochmann Yildizli ◽  
Thomas Pedersen ◽  
Yucel Saygin ◽  
Erkay Savas ◽  
Albert Levi

Recent concerns about privacy issues have motivated data mining researchers to develop methods for performing data mining while preserving the privacy of individuals. One approach to develop privacy preserving data mining algorithms is secure multiparty computation, which allows for privacy preserving data mining algorithms that do not trade accuracy for privacy. However, earlier methods suffer from very high communication and computational costs, making them infeasible to use in any real world scenario. Moreover, these algorithms have strict assumptions on the involved parties, assuming involved parties will not collude with each other. In this paper, the authors propose a new secure multiparty computation based k-means clustering algorithm that is both secure and efficient enough to be used in a real world scenario. Experiments based on realistic scenarios reveal that this protocol has lower communication costs and significantly lower computational costs.


Author(s):  
A. Raffaetà ◽  
L. Leonardi ◽  
G. Marketos ◽  
G. Andrienko ◽  
N. Andrienko ◽  
...  

Technological advances in sensing technologies and wireless telecommunication devices enable research fields related to the management of trajectory data. The challenge after storing the data is the implementation of appropriate analytics for extracting useful knowledge. However, traditional data warehousing systems and techniques were not designed for analyzing trajectory data. In this paper, the authors demonstrate a framework that transforms the traditional data cube model into a trajectory warehouse. As a proof-of-concept, the authors implement T-Warehouse, a system that incorporates all the required steps for Visual Trajectory Data Warehousing, from trajectory reconstruction and ETL processing to Visual OLAP analysis on mobility data.


Sign in / Sign up

Export Citation Format

Share Document