Author(s):  
Rosa Meo ◽  
Giuseppe Psaila

Mining of association rules is one of the most adopted techniques for data mining in the most widespread application domains. A great deal of work has been carried out in the last years on the development of efficient algorithms for association rules extraction. Indeed, this problem is a computationally difficult task, known as NP-hard (Calders, 2004), which has been augmented by the fact that normally association rules are being extracted from very large databases. Moreover, in order to increase the relevance and interestingness of obtained results and to reduce the volume of the overall result, constraints on association rules are introduced and must be evaluated (Ng et al.,1998; Srikant et al., 1997). However, in this contribution, we do not focus on the problem of developing efficient algorithms but on the semantic problem behind the extraction of association rules (see Tsur et al. [1998] for an interesting generalization of this problem).


Author(s):  
LAWRENCE MAZLACK

Determining causality has been a tantalizing goal throughout human history. Proper sacrifices to the gods were thought to bring rewards; failure to make suitable observations were thought to lead to disaster. Today, data mining holds the promise of extracting unsuspected information from very large databases. Methods have been developed to build association rules from large data sets. Association rules indicate the strength of association of two or more data attributes. In many ways, the interest in association rules is that they offer the promise (or illusion) of causal, or at least, predictive relationships. However, association rules only calculate a joint probability; they do not express a causal relationship. If causal relationships could be discovered, it would be very useful. Our goal is to explore causality in the data mining context.


2011 ◽  
Vol 1 (2) ◽  
Author(s):  
Venkatapathy Umarani ◽  
Muthusamy Punithavalli

AbstractThe discovery of association rules is an important and challenging data mining task. Most of the existing algorithms for finding association rules require multiple passes over the entire database, and I/O overhead incurred is extremely high for very large databases. An obvious approach to reduce the complexity of association rule mining is sampling. In recent times, several sampling-based approaches have been developed for speeding up the process of association rule mining. A proficient progressive sampling-based approach is presented for mining association rules from large databases. At first, frequent itemsets are mined from an initial sample and subsequently, the negative border is computed from the mined frequent itemsets. Based on the support computed for the midpoint itemset in the sorted negative border, the sample size is either increased or association rules are mined from it. In this paper, we have presented an extensive analysis of the progressive sampling-based approach with different real life datasets and, in addition, the performance of the approach is evaluated with the well-known association rule mining algorithm, Apriori. The experimental results show that accuracy and computation time of the progressive sampling-based approach is effectively improved in mining of association rules from the real life datasets.


2017 ◽  
Vol 26 (1) ◽  
pp. 69-85
Author(s):  
Mohammed M. Fouad ◽  
Mostafa G.M. Mostafa ◽  
Abdulfattah S. Mashat ◽  
Tarek F. Gharib

AbstractAssociation rules provide important knowledge that can be extracted from transactional databases. Owing to the massive exchange of information nowadays, databases become dynamic and change rapidly and periodically: new transactions are added to the database and/or old transactions are updated or removed from the database. Incremental mining was introduced to overcome the problem of maintaining previously generated association rules in dynamic databases. In this paper, we propose an efficient algorithm (IMIDB) for incremental itemset mining in large databases. The algorithm utilizes the trie data structure for indexing dynamic database transactions. Performance comparison of the proposed algorithm to recently cited algorithms shows that a significant improvement of about two orders of magnitude is achieved by our algorithm. Also, the proposed algorithm exhibits linear scalability with respect to database size.


2011 ◽  
Vol 145 ◽  
pp. 292-296
Author(s):  
Lee Wen Huang

Data Mining means a process of nontrivial extraction of implicit, previously and potentially useful information from data in databases. Mining closed large itemsets is a further work of mining association rules, which aims to find the set of necessary subsets of large itemsets that could be representative of all large itemsets. In this paper, we design a hybrid approach, considering the character of data, to mine the closed large itemsets efficiently. Two features of market basket analysis are considered – the number of items is large; the number of associated items for each item is small. Combining the cut-point method and the hash concept, the new algorithm can find the closed large itemsets efficiently. The simulation results show that the new algorithm outperforms the FP-CLOSE algorithm in the execution time and the space of storage.


Author(s):  
Gautam Das

In recent years, advances in data collection and management technologies have led to a proliferation of very large databases. These large data repositories typically are created in the hope that, through analysis such as data mining and decision support, they will yield new insights into the data and the real-world processes that created them. In practice, however, while the collection and storage of massive datasets has become relatively straightforward, effective data analysis has proven more difficult to achieve. One reason that data analysis successes have proven elusive is that most analysis queries, by their nature, require aggregation or summarization of large portions of the data being analyzed. For multi-gigabyte data repositories, this means that processing even a single analysis query involves accessing enormous amounts of data, leading to prohibitively expensive running times. This severely limits the feasibility of many types of analysis applications, especially those that depend on timeliness or interactivity.


Author(s):  
Andrew Borthwick ◽  
Stephen Ash ◽  
Bin Pang ◽  
Shehzad Qureshi ◽  
Timothy Jones

Sign in / Sign up

Export Citation Format

Share Document