Distributed Association Rule Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch108 ◽

2011 ◽

pp. 695-700

Author(s):

Mafruz Zaman Ashrafi

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Data Centers ◽

Digital Data ◽

Purchase Behavior ◽

Rule Mining ◽

Communication Costs ◽

Mining Algorithm ◽

Distributed Association

Data mining is an iterative and interactive process that explores and analyzes voluminous digital data to discover valid, novel, and meaningful patterns (Mohammed, 1999). Since digital data may have terabytes of records, data mining techniques aim to find patterns using computationally efficient techniques. It is related to a subarea of statistics called exploratory data analysis. During the past decade, data mining techniques have been used in various business, government, and scientific applications. Association rule mining (Agrawal, Imielinsky & Sawmi, 1993) is one of the most studied fields in the data-mining domain. The key strength of association mining is completeness. It has the ability to discover all associations within a given dataset. Two important constraints of association rule mining are support and confidence (Agrawal & Srikant, 1994). These constraints are used to measure the interestingness of a rule. The motivation of association rule mining comes from market-basket analysis that aims to discover customer purchase behavior. However, its applications are not limited only to market-basket analysis; rather, they are used in other applications, such as network intrusion detection, credit card fraud detection, and so forth. The widespread use of computers and the advances in network technologies have enabled modern organizations to distribute their computing resources among different sites. Various business applications used by such organizations normally store their day-to-day data in each respective site. Data of such organizations increases in size everyday. Discovering useful patterns from such organizations using a centralized data mining approach is not always feasible, because merging datasets from different sites into a centralized site incurs large network communication costs (Ashrafi, David & Kate, 2004). Furthermore, data from these organizations are not only distributed over various locations, but are also fragmented vertically. Therefore, it becomes more difficult, if not impossible, to combine them in a central location. Therefore, Distributed Association Rule Mining (DARM) emerges as an active subarea of data-mining research. Consider the following example. A supermarket may have several data centers spread over various regions across the country. Each of these centers may have gigabytes of data. In order to find customer purchase behavior from these datasets, one can employ an association rule mining algorithm in one of the regional data centers. However, employing a mining algorithm to a particular data center will not allow us to obtain all the potential patterns, because customer purchase patterns of one region will vary from the others. So, in order to achieve all potential patterns, we rely on some kind of distributed association rule mining algorithm, which can incorporate all data centers. Distributed systems, by nature, require communication. Since distributed association rule mining algorithms generate rules from different datasets spread over various geographical sites, they consequently require external communications in every step of the process (Ashrafi, David & Kate, 2004; Assaf & Ron, 2002; Cheung, Ng, Fu & Fu, 1996). As a result, DARM algorithms aim to reduce communication costs in such a way that the total cost of generating global association rules must be less than the cost of combining datasets of all participating sites into a centralized site.

Download Full-text

Exploratory Data Analysis

10.1093/oso/9780190222055.003.0002 ◽

2018 ◽

Author(s):

Brian D. Haig

Keyword(s):

Data Mining ◽

Data Analysis ◽

Statistical Modeling ◽

Scientific Method ◽

Exploratory Data Analysis ◽

Resampling Methods ◽

Data Mining Techniques ◽

Exploratory Data ◽

Data Analytic

Chapter 2 is concerned with modern data analysis. It focuses primarily on the nature, role, and importance of exploratory data analysis, although it gives some attention to computer-intensive resampling methods. Exploratory data analysis is a process in which data are examined to reveal potential patterns of interest. However, the use of traditional confirmatory methods in data analysis remains the dominant practice. Different perspectives on data analysis, as they are shaped by four different accounts of scientific method, are provided. A brief discussion of John Tukey’s philosophy of teaching data analysis is presented. The chapter does not consider the more recent exploratory data analytic developments, such as the practice of statistical modeling, the employment of data-mining techniques, and more flexible resampling methods.

Download Full-text

CROP YIELD PREDICTION AND SOIL DATA ANALYSIS USING DATA MINING TECHNIQUES IN KRISHNAGIRI DISTRICT

International Journal of Computer Sciences and Engineering ◽

10.26438/ijcse/v6si8.4955 ◽

2018 ◽

Vol 06 (08) ◽

pp. 49-55

Author(s):

K. Samundeeswari ◽

K. Srinivasan

Keyword(s):

Data Mining ◽

Data Analysis ◽

Crop Yield ◽

Yield Prediction ◽

Data Mining Techniques ◽

Using Data

Download Full-text

A computationally efficient estimator for mutual information

Proceedings of The Royal Society A Mathematical Physical and Engineering Sciences ◽

10.1098/rspa.2007.0196 ◽

2008 ◽

Vol 464 (2093) ◽

pp. 1203-1215 ◽

Cited By ~ 16

Author(s):

Dafydd Evans

Keyword(s):

Data Analysis ◽

Mutual Information ◽

Time Complexity ◽

Exploratory Data Analysis ◽

Nearest Neighbour ◽

Computationally Efficient ◽

One Dimensional ◽

Exploratory Data ◽

Efficient Alternative ◽

Computationally Expensive

Mutual information quantifies the determinism that exists in a relationship between random variables, and thus plays an important role in exploratory data analysis. We investigate a class of non-parametric estimators for mutual information, based on the nearest neighbour structure of observations in both the joint and marginal spaces. Unless both marginal spaces are one-dimensional, we demonstrate that a well-known estimator of this type can be computationally expensive under certain conditions, and propose a computationally efficient alternative that has a time complexity of order ( N log N ) as the number of observations N →∞.

Download Full-text

A Systematic Review About Prediction of Academic Behavior Through Data Mining Techniques

Journal of Computational and Theoretical Nanoscience ◽

10.1166/jctn.2020.9358 ◽

2020 ◽

Vol 17 (11) ◽

pp. 5162-5166

Author(s):

Puninder Kaur ◽

Amandeep Kaur ◽

Rajwinder Kaur

Keyword(s):

Systematic Review ◽

Data Mining ◽

Data Analysis ◽

Educational Data Mining ◽

Prediction Methods ◽

Time Data ◽

Academic Behavior ◽

Data Mining Techniques ◽

Real Time Data ◽

Extract Information

In the IT world, predicting the academic performance of the huge student population poses a big challenge. Educational data mining techniques significantly contribute in providing solution to this problem. There are several prediction methods available for data classification and clustering, to extract information and provide accurate results. In this paper, different prediction methodologies are highlighted for the prediction of real-time data analysis of dynamic academic behavior of the students. The main focus is to provide brief knowledge about all data mining techniques and highlight dissimilarities among various methods in order to provide the best results for the students.

Download Full-text

Informational Data Mining

Enterprise Business Modeling, Optimization Techniques, and Flexible Information Systems ◽

10.4018/978-1-4666-3946-1.ch005 ◽

2013 ◽

pp. 58-65

Author(s):

Feyza Gürbüz ◽

Fatma Gökçe Önen

Keyword(s):

Data Mining ◽

Information Systems ◽

Knowledge Discovery ◽

Major Change ◽

Research Community ◽

Data Sources ◽

Accurate Information ◽

Rule Mining ◽

Data Mining Techniques ◽

Information Strategies

The previous decades have witnessed major change within the Information Systems (IS) environment with a corresponding emphasis on the importance of specifying timely and accurate information strategies. Currently, there is an increasing interest in data mining and information systems optimization. Therefore, it makes data mining for optimization of information systems a new and growing research community. This chapter surveys the application of data mining to optimization of information systems. These systems have different data sources and accordingly different objectives for knowledge discovery. After the preprocessing stage, data mining techniques can be applied on the suitable data for the objective of the information systems. These techniques are prediction, classification, association rule mining, statistics and visualization, clustering and outlier detection.

Download Full-text

A Classification Framework for Data Mining Applications in Criminal Science and Investigations

Cyber Warfare and Terrorism ◽

10.4018/978-1-7998-2466-4.ch018 ◽

2020 ◽

pp. 277-293

Author(s):

Mahima Goyal ◽

Vishal Bhatnagar ◽

Arushi Jain

Keyword(s):

Data Mining ◽

Data Analysis ◽

Data Mining Techniques ◽

Road Map ◽

Classification Framework ◽

Challenges And Opportunities ◽

Crucial Information ◽

Tools And Techniques ◽

Day By Day ◽

Mining Tools

The importance of data analysis across different domains is growing day by day. This is evident in the fact that crucial information is retrieved through data analysis, using different available tools. The usage of data mining as a tool to uncover the nuggets of critical and crucial information is evident in modern day scenarios. This chapter presents a discussion on the usage of data mining tools and techniques in the area of criminal science and investigations. The application of data mining techniques in criminal science help in understanding the criminal psychology and consequently provides insight into effective measures to curb crime. This chapter provides a state-of-the-art report on the research conducted in this domain of interest by using a classification scheme and providing a road map on the usage of various data mining tools and techniques. Furthermore, the challenges and opportunities in the application of data mining techniques in criminal investigation is explored and detailed in this chapter.

Download Full-text

On Association Rule Mining for the QSAR Problem

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch014 ◽

2011 ◽

pp. 83-86

Author(s):

Luminita Dumitriu

Keyword(s):

Data Mining ◽

Association Rule ◽

Association Rule Mining ◽

Predictive Ability ◽

Quantitative Structure Activity Relationship ◽

Rule Mining ◽

Data Mining Techniques ◽

Neuro Fuzzy ◽

The 1960S ◽

New Compounds

The concept of Quantitative Structure-Activity Relationship (QSAR), introduced by Hansch and co-workers in the 1960s, attempts to discover the relationship between the structure and the activity of chemical compounds (SAR), in order to allow the prediction of the activity of new compounds based on knowledge of their chemical structure alone. These predictions can be achieved by quantifying the SAR. Initially, statistical methods have been applied to solve the QSAR problem. For example, pattern recognition techniques facilitate data dimension reduction and transformation techniques from multiple experiments to the underlying patterns of information. Partial least squares (PLS) is used for performing the same operations on the target properties. The predictive ability of this method can be tested using cross-validation on the test set of compounds. Later, data mining techniques have been considered for this prediction problem. Among data mining techniques, the most popular ones are based on neural networks (Wang, Durst, Eberhart, Boyd, & Ben-Miled, 2004) or on neuro-fuzzy approaches (Neagu, Benfenati, Gini, Mazzatorta, & Roncaglioni, 2002) or on genetic programming (Langdon, &Barrett, 2004). All these approaches predict the activity of a chemical compound, without being able to explain the predicted value. In order to increase the understanding on the prediction process, descriptive data mining techniques have started to be used related to the QSAR problem. These techniques are based on association rule mining. In this chapter, we describe the use of association rule-based approaches related to the QSAR problem.

Download Full-text

Tree and Graph Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch214 ◽

2011 ◽

pp. 1140-1145

Author(s):

Dimitrios Katsaros ◽

Yannis Manolopoulos

Keyword(s):

Data Mining ◽

Graph Mining ◽

Data Mining Techniques ◽

The Past

During the past decade, we have witnessed an explosive growth in our capabilities to both generate and collect data. Various data mining techniques have been proposed and widely employed to discover valid, novel and potentially useful patterns in these data. Data mining involves the discovery of patterns, associations, changes, anomalies, and statistically significant structures and events in huge collections of data.

Download Full-text

Why Data Analysis?

Mathematics Teacher ◽

10.5951/mt.83.2.0090 ◽

1990 ◽

Vol 83 (2) ◽

pp. 90-93

Author(s):

Richard L. Scheaffer

Keyword(s):

Data Analysis ◽

Empirical Data ◽

Exploratory Data Analysis ◽

Applied Statistics ◽

Statistical Education ◽

The Past ◽

Classical Statistics ◽

Exploratory Data ◽

To Come ◽

Oriented Approach

Recent years have witnessed a strong movement away from what might be termed classical statistics to a more empirical, data-oriented approach to statistics, sometimes termed exploratory data analysis, or EDA. This movement has been active among professional statisticians for twenty or twenty-five years but has begun permeating the area of statistical education for nonstatisticians only in the past five to ten years. At this point, there seems to be little doubt that EDA approaches to applied statistics will gain support over classical approaches in the years to come. That is not to say that classical statistics will disappear. The two approaches begin with different assumptions and have different objectives, but both are important. These differences will be outlined in this article.

Download Full-text