Distributed Association Rule Mining

Author(s):  
Mafruz Zaman Ashrafi ◽  
David Taniar ◽  
Kate A. Smith

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications.

Author(s):  
Mafruz Zaman Ashrafi

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications. Association rule mining (Agrawal, Imielinsky & Sawmi, 1993) is one of the most studied fields in the data-mining domain. The key strength of association mining is completeness. It has the ability to discover all associations within a given dataset. Two important constraints of association rule mining are support and confidence (Agrawal & Srikant, 1994). These constraints are used to measure the interestingness of a rule. The motivation of association rule mining comes from market-basket analysis that aims to discover customer purchase behavior. However, its applications are not limited only to market-basket analysis; rather, they are used in other applications, such as network intrusion detection, credit card fraud detection, and so forth. The widespread use of computers and the advances in network technologies have enabled modern organizations to distribute their computing resources among different sites. Various business applications used by such organizations normally store their day-to-day data in each respective site. Data of such organizations increases in size everyday. Discovering useful patterns from such organizations using a centralized data mining approach is not always feasible, because merging datasets from different sites into a centralized site incurs large network communication costs (Ashrafi, David & Kate, 2004). Furthermore, data from these organizations are not only distributed over various locations, but are also fragmented vertically. Therefore, it becomes more difficult, if not impossible, to combine them in a central location. Therefore, Distributed Association Rule Mining (DARM) emerges as an active subarea of data-mining research. Consider the following example. A supermarket may have several data centers spread over various regions across the country. Each of these centers may have gigabytes of data. In order to find customer purchase behavior from these datasets, one can employ an association rule mining algorithm in one of the regional data centers. However, employing a mining algorithm to a particular data center will not allow us to obtain all the potential patterns, because customer purchase patterns of one region will vary from the others. So, in order to achieve all potential patterns, we rely on some kind of distributed association rule mining algorithm, which can incorporate all data centers. Distributed systems, by nature, require communication. Since distributed association rule mining algorithms generate rules from different datasets spread over various geographical sites, they consequently require external communications in every step of the process (Ashrafi, David & Kate, 2004; Assaf & Ron, 2002; Cheung, Ng, Fu & Fu, 1996). As a result, DARM algorithms aim to reduce communication costs in such a way that the total cost of generating global association rules must be less than the cost of combining datasets of all participating sites into a centralized site.


Author(s):  
Brian D. Haig

Chapter 2 is concerned with modern data analysis. It focuses primarily on the nature, role, and importance of exploratory data analysis, although it gives some attention to computer-intensive resampling methods. Exploratory data analysis is a process in which data are examined to reveal potential patterns of interest. However, the use of traditional confirmatory methods in data analysis remains the dominant practice. Different perspectives on data analysis, as they are shaped by four different accounts of scientific method, are provided. A brief discussion of John Tukey’s philosophy of teaching data analysis is presented. The chapter does not consider the more recent exploratory data analytic developments, such as the practice of statistical modeling, the employment of data-mining techniques, and more flexible resampling methods.


Author(s):  
Dafydd Evans

Mutual information quantifies the determinism that exists in a relationship between random variables, and thus plays an important role in exploratory data analysis. We investigate a class of non-parametric estimators for mutual information, based on the nearest neighbour structure of observations in both the joint and marginal spaces. Unless both marginal spaces are one-dimensional, we demonstrate that a well-known estimator of this type can be computationally expensive under certain conditions, and propose a computationally efficient alternative that has a time complexity of order ( N  log  N ) as the number of observations N →∞.


2020 ◽  
Vol 17 (11) ◽  
pp. 5162-5166
Author(s):  
Puninder Kaur ◽  
Amandeep Kaur ◽  
Rajwinder Kaur

In the IT world, predicting the academic performance of the huge student population poses a big challenge. Educational data mining techniques significantly contribute in providing solution to this problem. There are several prediction methods available for data classification and clustering, to extract information and provide accurate results. In this paper, different prediction methodologies are highlighted for the prediction of real-time data analysis of dynamic academic behavior of the students. The main focus is to provide brief knowledge about all data mining techniques and highlight dissimilarities among various methods in order to provide the best results for the students.


Author(s):  
Feyza Gürbüz ◽  
Fatma Gökçe Önen

The previous decades have witnessed major change within the Information Systems (IS) environment with a corresponding emphasis on the importance of specifying timely and accurate information strategies. Currently, there is an increasing interest in data mining and information systems optimization. Therefore, it makes data mining for optimization of information systems a new and growing research community. This chapter surveys the application of data mining to optimization of information systems. These systems have different data sources and accordingly different objectives for knowledge discovery. After the preprocessing stage, data mining techniques can be applied on the suitable data for the objective of the information systems. These techniques are prediction, classification, association rule mining, statistics and visualization, clustering and outlier detection.


2020 ◽  
pp. 277-293
Author(s):  
Mahima Goyal ◽  
Vishal Bhatnagar ◽  
Arushi Jain

The importance of data analysis across different domains is growing day by day. This is evident in the fact that crucial information is retrieved through data analysis, using different available tools. The usage of data mining as a tool to uncover the nuggets of critical and crucial information is evident in modern day scenarios. This chapter presents a discussion on the usage of data mining tools and techniques in the area of criminal science and investigations. The application of data mining techniques in criminal science help in understanding the criminal psychology and consequently provides insight into effective measures to curb crime. This chapter provides a state-of-the-art report on the research conducted in this domain of interest by using a classification scheme and providing a road map on the usage of various data mining tools and techniques. Furthermore, the challenges and opportunities in the application of data mining techniques in criminal investigation is explored and detailed in this chapter.


Author(s):  
Luminita Dumitriu

The concept of Quantitative Structure-Activity Relationship (QSAR), introduced by Hansch and co-workers in the 1960s, attempts to discover the relationship between the structure and the activity of chemical compounds (SAR), in order to allow the prediction of the activity of new compounds based on knowledge of their chemical structure alone. These predictions can be achieved by quantifying the SAR. Initially, statistical methods have been applied to solve the QSAR problem. For example, pattern recognition techniques facilitate data dimension reduction and transformation techniques from multiple experiments to the underlying patterns of information. Partial least squares (PLS) is used for performing the same operations on the target properties. The predictive ability of this method can be tested using cross-validation on the test set of compounds. Later, data mining techniques have been considered for this prediction problem. Among data mining techniques, the most popular ones are based on neural networks (Wang, Durst, Eberhart, Boyd, & Ben-Miled, 2004) or on neuro-fuzzy approaches (Neagu, Benfenati, Gini, Mazzatorta, & Roncaglioni, 2002) or on genetic programming (Langdon, &Barrett, 2004). All these approaches predict the activity of a chemical compound, without being able to explain the predicted value. In order to increase the understanding on the prediction process, descriptive data mining techniques have started to be used related to the QSAR problem. These techniques are based on association rule mining. In this chapter, we describe the use of association rule-based approaches related to the QSAR problem.


Author(s):  
Dimitrios Katsaros ◽  
Yannis Manolopoulos

During the past decade, we have witnessed an explosive growth in our capabilities to both generate and collect data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data. Data mining involves the discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in huge collections of data.


1990 ◽  
Vol 83 (2) ◽  
pp. 90-93
Author(s):  
Richard L. Scheaffer

Recent years have witnessed a strong movement away from what might be termed classical statistics to a more empirical, data-oriented approach to statistics, sometimes termed exploratory data analysis, or EDA. This movement has been active among professional statisticians for twenty or twenty-five years but has begun permeating the area of statistical education for nonstatisticians only in the past five to ten years. At this point, there seems to be little doubt that EDA approaches to applied statistics will gain support over classical approaches in the years to come. That is not to say that classical statistics will disappear. The two approaches begin with different assumptions and have different objectives, but both are important. These differences will be outlined in this article.


Sign in / Sign up

Export Citation Format

Share Document