Data Warehousing for Association Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch093 ◽

2010 ◽

pp. 592-597

Author(s):

Yuefeng Li

Keyword(s):

Data Mining ◽

Association Rules ◽

Data Warehousing ◽

Association Mining ◽

Second Phase ◽

Frequent Patterns ◽

Search Spaces ◽

Decision Attributes ◽

Long Time ◽

Two Phases

With the phenomenal growth of electronic data and information, there are many demands for developments of efficient and effective systems (tools) to address the issue of performing data mining tasks on data warehouses or multidimensional databases. Association rules describe associations between itemsets (i.e., sets of data items) (or granules). Association mining (or called association rule mining) finds interesting or useful association rules in databases, which is the crucial technique for the development of data mining. Association mining can be used in many application areas, for example, the discovery of associations between customers’ locations and shopping behaviours in market basket analysis. Association mining includes two phases. The first phase is called pattern mining that is the discovery of frequent patterns. The second phase is called rule generation that is the discovery of the interesting and useful association rules in the discovered patterns. The first phase, however, often takes a long time to find all frequent patterns that also include much noise as well (Pei and Han, 2002). The second phase is also a time consuming activity (Han and Kamber, 2000) and can generate many redundant rules (Zaki, 2004) (Xu and Li, 2007). To reduce search spaces, user constraintbased techniques attempt to find knowledge that meet some sorts of constraints. There are two interesting concepts that have been used in user constraint-based techniques: meta-rules (Han and Kamber, 2000) and granule mining (Li et al., 2006). The aim of this chapter is to present the latest research results about data warehousing techniques that can be used for improving the performance of association mining. The chapter will introduce two important approaches based on user constraint-based techniques. The first approach requests users to inputs their meta-rules that describe their desires for certain data dimensions. It then creates data cubes based these meta-rules and then provides interesting association rules. The second approach firstly requests users to provide condition and decision attributes that used to describe the antecedent and consequence of rules, respectively. It then finds all possible data granules based condition attributes and decision attributes. It also creates a multi-tier structure to store the associations between granules, and association mappings to provide interesting rules.

Download Full-text

Self-Adaptive K-Means Based on a Covering Algorithm

Complexity ◽

10.1155/2018/7698274 ◽

2018 ◽

Vol 2018 ◽

pp. 1-16 ◽

Cited By ~ 1

Author(s):

Yiwen Zhang ◽

Yuanyuan Zhou ◽

Xing Guo ◽

Jintao Wu ◽

Qiang He ◽

...

Keyword(s):

Large Scale ◽

Clustering Algorithm ◽

Real Data ◽

Second Phase ◽

Data Sets ◽

Number Of Clusters ◽

Large Scale Data ◽

Long Time ◽

Two Phases ◽

Selection Of

The K-means algorithm is one of the ten classic algorithms in the area of data mining and has been studied by researchers in numerous fields for a long time. However, the value of the clustering number k in the K-means algorithm is not always easy to be determined, and the selection of the initial centers is vulnerable to outliers. This paper proposes an improved K-means clustering algorithm called the covering K-means algorithm (C-K-means). The C-K-means algorithm can not only acquire efficient and accurate clustering results but also self-adaptively provide a reasonable numbers of clusters based on the data features. It includes two phases: the initialization of the covering algorithm (CA) and the Lloyd iteration of the K-means. The first phase executes the CA. CA self-organizes and recognizes the number of clusters k based on the similarities in the data, and it requires neither the number of clusters to be prespecified nor the initial centers to be manually selected. Therefore, it has a “blind” feature, that is, k is not preselected. The second phase performs the Lloyd iteration based on the results of the first phase. The C-K-means algorithm combines the advantages of CA and K-means. Experiments are carried out on the Spark platform, and the results verify the good scalability of the C-K-means algorithm. This algorithm can effectively solve the problem of large-scale data clustering. Extensive experiments on real data sets show that the accuracy and efficiency of the C-K-means algorithm outperforms the existing algorithms under both sequential and parallel conditions.

Download Full-text

Using Association Rules for Query Reformulation

Data Mining ◽

10.4018/978-1-4666-2455-9.ch024 ◽

2013 ◽

pp. 503-514

Author(s):

Ismaïl Biskri ◽

Louis Rompré

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Association Rules ◽

Text Classification ◽

Large Volume ◽

Query Reformulation ◽

Long Time ◽

Hidden Knowledge

In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.

Download Full-text

Association Rule Mining in Collaborative Filtering

Collaborative Filtering Using Data Mining and Analysis - Advances in Data Mining and Database Management ◽

10.4018/978-1-5225-0489-4.ch009 ◽

2017 ◽

pp. 159-179 ◽

Cited By ~ 8

Author(s):

Carson K.-S. Leung ◽

Fan Jiang ◽

Edson M. Dela Cruz ◽

Vijay Sekar Elango

Keyword(s):

Data Mining ◽

Collaborative Filtering ◽

Association Rules ◽

Data Structures ◽

Association Rule ◽

Association Rule Mining ◽

Real Life ◽

Frequent Patterns ◽

Rule Mining ◽

Association Rule Miner

Collaborative filtering uses data mining and analysis to develop a system that helps users make appropriate decisions in real-life applications by removing redundant information and providing valuable to information users. Data mining aims to extract from data the implicit, previously unknown and potentially useful information such as association rules that reveals relationships between frequently co-occurring patterns in antecedent and consequent parts of association rules. This chapter presents an algorithm called CF-Miner for collaborative filtering with association rule miner. The CF-Miner algorithm first constructs bitwise data structures to capture important contents in the data. It then finds frequent patterns from the bitwise structures. Based on the mined frequent patterns, the algorithm forms association rules. Finally, the algorithm ranks the mined association rules to recommend appropriate merchandise products, goods or services to users. Evaluation results show the effectiveness of CF-Miner in using association rule mining in collaborative filtering.

Download Full-text

Physics-Based Simulations of Flow and Fire Development Downstream of a Canopy

Atmosphere ◽

10.3390/atmos11070683 ◽

2020 ◽

Vol 11 (7) ◽

pp. 683

Author(s):

Gilbert Accary ◽

Duncan Sutherland ◽

Nicolas Frangieh ◽

Khalid Moinuddin ◽

Ibrahim Shamseddine ◽

...

Keyword(s):

Turbulent Flow ◽

Forest Canopy ◽

Prescribed Burning ◽

Second Phase ◽

Flow Conditions ◽

Developed Turbulence ◽

Large Eddy ◽

Long Time ◽

Two Phases ◽

The Impact

The behavior of a grassland fire propagating downstream of a forest canopy has been simulated numerically using the fully physics-based wildfire model FIRESTAR3D. This configuration reproduces quite accurately the situation encountered when a wildfire spreads from a forest to an open grassland, as can be the case in a fuel break or a clearing, or during a prescribed burning operation. One of the objectives of this study was to evaluate the impact of the presence of a canopy upstream of a grassfire, especially the modifications of the local wind conditions before and inside a clearing or a fuel break. The knowledge of this kind of information constitutes a major element in improving the safety conditions of forest managers and firefighters in charge of firefighting or prescribed burning operations in such configurations. Another objective was to study the behavior of the fire under realistic turbulent flow conditions, i.e., flow resulting from the interaction between an atmospheric boundary layer (ABL) with a surrounding canopy. Therefore, the study was divided into two phases. The first phase consisted of generating an ABL/canopy turbulent flow above a pine forest (10 m high, 200 m long) using periodic boundary conditions along the streamwise direction. Large Eddy Simulations (LES) were carried out for a sufficiently long time to achieve a quasi-fully developed turbulence. The second phase consisted of simulating the propagation of a surface fire through a grassland, bordered upstream by a forest section (having the same characteristics used for the first step), while imposing the turbulent flow obtained from the first step as a dynamic inlet condition to the domain. The simulations were carried out for a wind speed that ranged between 1 and 12 m/s; these values have allowed the simulations to cover the two regimes of propagation of surfaces fires, namely plume-dominated and wind-driven fires.

Download Full-text

A Comparative Study of Tree-Based and Apriori-Based Approaches for Incremental Data Mining

International Journal of Engineering Research in Africa ◽

10.4028/www.scientific.net/jera.23.120 ◽

2016 ◽

Vol 23 ◽

pp. 120-130

Author(s):

Manoj Kumar ◽

Hemant Kumar Soni

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Future Research ◽

Frequent Patterns ◽

Rule Mining ◽

Business Decisions ◽

Depth Analysis ◽

Intelligent Tools

Association rule mining is an iterative and interactive process of discovering valid, novel, useful, understandable and hidden associations from the massive database. The Colossal databases require powerful and intelligent tools for analysis and discovery of frequent patterns and association rules. Several researchers have proposed the many algorithms for generating item sets and association rules for discovery of frequent patterns, and minning of the association rules. These proposals are validated on static data. A dynamic database may introduce some new association rules, which may be interesting and helpful in taking better business decisions. In association rule mining, the validation of performance and cost of the existing algorithms on incremental data are less explored. Hence, there is a strong need of comprehensive study and in-depth analysis of the existing proposals of association rule mining. In this paper, the existing tree-based algorithms for incremental data mining are presented and compared on the baisis of number of scans, structure, size and type of database. It is concluded that the Can-Tree approach dominates the other algorithms such as FP-Tree, FUFP-Tree, FELINE Alorithm with CATS-Tree etc.This study also highlights some hot issues and future research directions. This study also points out that there is a strong need for devising an efficient and new algorithm for incremental data mining.

Download Full-text

Using the interestingness measure lift to generate association rules

Journal of Advanced Computer Science & Technology ◽

10.14419/jacst.v4i1.4398 ◽

2015 ◽

Vol 4 (1) ◽

pp. 156 ◽

Cited By ~ 4

Author(s):

Nada Hussein ◽

Abdallah Alashqur ◽

Bilal Sowan

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Search Space ◽

The Other ◽

Frequent Patterns ◽

New Approach ◽

Left Hand ◽

Interestingness Measure ◽

The Right

<p>In this digital age, organizations have to deal with huge amounts of data, sometimes called Big Data. In recent years, the volume of data has increased substantially. Consequently, finding efficient and automated techniques for discovering useful patterns and relationships in the data becomes very important. In data mining, patterns and relationships can be represented in the form of association rules. Current techniques for discovering association rules rely on measures such as support for finding frequent patterns and confidence for finding association rules. A shortcoming of confidence is that it does not capture the correlation that exists between the left-hand side (LHS) and the right-hand side (RHS) of an association rule. On the other hand, the interestingness measure lift captures such as correlation in the sense that it tells us whether the LHS influences the RHS positively or negatively. Therefore, using Lift instead of confidence as a criteria for discovering association rules can be more effective. It also gives the user more choices in determining the kind of association rules to be discovered. This in turn helps to narrow down the search space and consequently, improves performance. In this paper, we describe a new approach for discovering association rules that is based on Lift and not based on confidence.</p>

Download Full-text

Using Association Rules for Query Reformulation

Next Generation Search Engines ◽

10.4018/978-1-4666-0330-1.ch013 ◽

2012 ◽

pp. 291-303

Author(s):

Ismaïl Biskri ◽

Louis Rompré

Keyword(s):

Data Mining ◽

Information Retrieval ◽

Association Rules ◽

Text Classification ◽

Large Volume ◽

Query Reformulation ◽

Long Time ◽

Hidden Knowledge

In this paper the authors will present research on the combination of two methods of data mining: text classification and maximal association rules. Text classification has been the focus of interest of many researchers for a long time. However, the results take the form of lists of words (classes) that people often do not know what to do with. The use of maximal association rules induced a number of advantages: (1) the detection of dependencies and correlations between the relevant units of information (words) of different classes, (2) the extraction of hidden knowledge, often relevant, from a large volume of data. The authors will show how this combination can improve the process of information retrieval.

Download Full-text

Reasoning about Frequent Patterns with Negation

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch177 ◽

2011 ◽

pp. 941-946 ◽

Cited By ~ 3

Author(s):

Marzena Kryszkiewicz

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

White Wine ◽

Frequent Patterns ◽

Sales Managers ◽

Important Data ◽

Large Databases ◽

Transaction Database ◽

Significant Patterns

Discovering frequent patterns in large databases is an important data mining problem. The problem was introduced in (Agrawal, Imielinski, & Swami, 1993) for a sales transaction database. Frequent patterns were defined there as sets of items that are purchased together frequently. Frequent patterns are commonly used for building association rules. For example, an association rule may state that 80% of customers who buy fish also buy white wine. This rule is derivable from the fact that fish occurs in 5% of sales transactions and set {fish, white wine} occurs in 4% of transactions. Patterns and association rules can be generalized by admitting negation. A sample association rule with negation could state that 75% of customers who buy coke also buy chips and neither beer nor milk. The knowledge of this kind is important not only for sales managers, but also in medical areas (Tsumoto, 2002). Admitting negation in patterns usually results in an abundance of mined patterns, which makes analysis of the discovered knowledge infeasible. It is thus preferable to discover and store a possibly small fraction of patterns, from which one can derive all other significant patterns when required. In this chapter, we introduce first lossless representations of frequent patterns with negation.

Download Full-text

Development of Paramphistomum sukari Dinnik, 1954 (Trematoda: Paramphistomidae) in a snail host

Parasitology ◽

10.1017/s0031182000021934 ◽

1957 ◽

Vol 47 (1-2) ◽

pp. 209-216 ◽

Cited By ~ 8

Author(s):

J. A. Dinnik ◽

N. N. Dinnik

Keyword(s):

Intermediate Host ◽

First Generation ◽

Second Generation ◽

Infected Snail ◽

Snail Host ◽

Second Phase ◽

Long Time ◽

Two Phases ◽

Successive Generations ◽

First And Second Generation

The development of Paramphistomum sukari Dinnik in a snail host is described with the emphasis laid on the succession of redial generation.The sporocyst gives birth to about twenty to thirty rediae. These rediae of the first generation commence with the production of daughter rediae then enter the second phase of their productivity during which they produce cercariae. The daughter rediae, or the rediae of the second generation, repeat these two phases during their lives, commencing with redial production and after that changing to the production to cercariae. Both the first- and second-generation rediae are able to produce a few daughter rediae at the end of their life. There is evidence that the subsequent generations of rediae are also able to give birth to daughter rediae and cercariae.As a result the successive generations of rediae maintain the infection in an intermediate host for a long time, probably as long as the infected snail can survive.

Download Full-text