Towards Efficient Mining of Periodic High-Utility Itemsets in Large Databases

High Utility Item sets mining has attracted many researchers in recent years. But HUI mining methods involves a exponential mining space and returns a very large number of high-utility itemsets. . Temporal periodicity of itemset is considered recently as an important interesting criteria for mining high-utility itemsets in many applications. Periodic High Utility item sets mining methods has a limitation that it does not consider frequency and not suitable for large databases. To address this problem, we have proposed two efficient algorithms named FPHUI( mining periodic frequent HUIs), MFPHM(efficient mining periodic frequent HUIs) for mining periodic frequent high-utility itemsets. The first algorithm FPHUI miner generates all periodic frequent itemsets. Mining periodic frequent high-utility itemsets leads to more computational cost in very large databases. We further developed another algorithm called MFPHM to overcome this limitation. The performance of the frequent FPHUI miner is evaluated by conducting experiments on various real datasets. Experimental results show that proposed algorithms is efficient and effective.

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text

E-MsNFIS: Efficient Negative Frequent Itemsets Mining Based on Multiple Minimum Supports

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.411-414.386 ◽

2013 ◽

Vol 411-414 ◽

pp. 386-389 ◽

Cited By ~ 1

Author(s):

Tian Tian Xu ◽

Xiang Jun Dong

Keyword(s):

Association Rules ◽

Real Life ◽

Frequent Itemsets ◽

Negative Association ◽

Experimental Results ◽

New Method ◽

Minimum Support ◽

Frequent Itemsets Mining ◽

Negative Association Rules ◽

Multiple Minimum Supports

Negative frequent itemsets (NFIS) like (a1a2¬a3a4) have played important roles in real applications because we can mine valued negative association rules from them. In one of our previous work, we proposed a method, namede-NFISto mine NFIS from positive frequent itemsets (PFIS). However,e-NFISonly uses single minimum support, which implicitly assumes that all items in the database are of the same nature or of similar frequencies in the database. This is often not the case in real-life applications. So a lot of methods to mine frequent itemsets with multiple minimum supports have been proposed. These methods allow users to assign different minimum supports to different items. But these methods only mine PFIS, doesn’t consider negative ones. So in this paper, we propose a new method, namede-msNFIS, to mine NFIS from PFIS based on multiple minimum supports. E-msNFIScontains three steps: 1) using existing methods to mine PFIS with multiple minimum supports; 2) using the same method ine-NFISto generate NCIS from PFIS got in step 1; 3) calculating the support of these NCIS only using the support of PFIS and then gettingNFIS. Experimental results show that thee-msNFISis efficient.

Download Full-text

FIT: A Fast Algorithm for Discovering Frequent Itemsets in Large Databases

Computing Letters ◽

10.1163/1574040054861285 ◽

2005 ◽

Vol 1 (3) ◽

pp. 129-135

Author(s):

Jun Luo ◽

Sanguthevar Rajasekaran

Keyword(s):

Data Mining ◽

Association Rules ◽

Fast Algorithm ◽

Frequent Itemsets ◽

Experimental Results ◽

Important Data ◽

Computational Performance ◽

Large Databases ◽

Intersection Operation ◽

Better Than

Association rules mining is an important data mining problem that has been studied extensively. In this paper, a simple but Fast algorithm for Intersecting attributes lists using hash Tables (FIT) is presented. FIT is designed for efficiently computing all the frequent itemsets in large databases. It deploys an idea similar to Eclat but has a much better computational performance than Eclat due to two reasons: 1) FIT makes fewer total number of comparisons for each intersection operation between two attributes lists, and 2) FIT significantly reduces the total number of intersection operations. Our experimental results demonstrate that the performance of FIT is much better than that of Eclat and Apriori algorithms.

Download Full-text

Review of Association Mining Methods for the Extraction of Rules Based on the Frequency and Utility Factors

International Journal of Information Technology Project Management ◽

10.4018/ijitpm.2021100101 ◽

2021 ◽

Vol 12 (4) ◽

pp. 1-10

Author(s):

Subba Reddy Meruva ◽

Venkateswarlu Bondu

Keyword(s):

Association Rules ◽

Association Rule ◽

Strong Association ◽

Vital Role ◽

Frequent Pattern ◽

Association Mining ◽

Observation Study ◽

Mining Methods ◽

High Utility ◽

High Utility Itemsets

Association rule defines the relationship among the items and discovers the frequent items using a support-confidence framework. This framework establishes user-interested or strong association rules with two thresholds (i.e., minimum support and minimum confidence). Traditional association rule mining methods (i.e., apriori and frequent pattern growth [FP-growth]) are widely used for discovering of frequent itemsets, and limitation of these methods is that they are not considering the key factors of the items such as profit, quantity, or cost of items during the mining process. Applications like e-commerce, marketing, healthcare, and web recommendations, etc. consist of items with their utility or profit. Such cases, utility-based itemsets mining methods, are playing a vital role in the generation of effective association rules and are also useful in the mining of high utility itemsets. This paper presents the survey on high-utility itemsets mining methods and discusses the observation study of existing methods with their experimental study using benchmarked datasets.

Download Full-text

An Approach to the Mining of User Focused Frequent Itemsets Based on Attention

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.536-537.520 ◽

2014 ◽

Vol 536-537 ◽

pp. 520-523

Author(s):

Jia Liu ◽

Zhen Ya Zhang ◽

Hong Mei Cheng ◽

Qian Sheng Fang

Keyword(s):

Association Rules ◽

Association Rule ◽

Frequent Itemsets ◽

Selection Model ◽

Experimental Results ◽

Apriori Algorithm ◽

Mining Method ◽

Information Filter ◽

Frequent Itemsets Mining ◽

Log File

Usually, non trivial network visiting behaviors implied in network visiting log can be treated as the frequent itemsets or association rules if data in networking log file are transformed into transaction and technologies on association rule can be used to mine those frequent itemsets which are focused by user or some application. To mine non trivial behaviors of network visiting effectively, an attention based frequent itemsets mining method is proposed in this paper. In our proposed method, properties of users focusing is described as attention set and the early selection model of attention as information filter is referenced in the design of our method. Experimental results show that our proposed method is faster than apriori algorithm on the mining of frequent itemsets which is focused by our attention.

Download Full-text

Phương pháp song song khai phá tập lợi ích cao dựa trên chỉ số hình chiếu

Research and Development on Information and Communication Technology ◽

10.32913/rd-ict.vol1.no37.349 ◽

2017 ◽

pp. 31

Author(s):

Đậu Hải Phong

Keyword(s):

Data Mining ◽

Parallel Algorithms ◽

Experimental Results ◽

Sequential Algorithms ◽

Parallel Method ◽

Speed Up ◽

High Utility ◽

High Utility Itemsets ◽

Better Than

High utility itemsets (HUIs) mining is one of popular problems in data mining. Several parallel and sequential algorithms have been proposed in the literature to solve this problem. All the parallel algorithms to try reduce synchronization cost and caculation global profit of itemsets. In this paper, we present a parallel method for mining HUIs from projection-based indexing to speed up performance and reduce memory requirements. The experimental results show that the performance and number candidate of our algorithm is better than some non parallel algorithms.

Download Full-text

A False Negative Maximal Frequent Itemsets Mining Algorithm over Stream

Applied Mechanics and Materials ◽

10.4028/www.scientific.net/amm.135-136.21 ◽

2011 ◽

Vol 135-136 ◽

pp. 21-25

Author(s):

Hai Feng Li ◽

Ning Zhang

Keyword(s):

Real World ◽

False Negative ◽

Frequent Itemsets ◽

Experimental Results ◽

Mining Algorithm ◽

Chernoff Bound ◽

Frequent Itemsets Mining ◽

Condensed Representations ◽

Maximal Frequent Itemsets ◽

Landmark Model

Maximal frequent itemsets are one of several condensed representations of frequent itemsets, which store most of the information contained in frequent itemsets using less space, thus being more suitable for stream mining. This paper focuses on mining maximal frequent itemsets approximately over a stream landmark model. A false negative method is proposed based on Chernoff Bound to save the computing and memory cost. Our experimental results on a real world dataset show that our algorithm is effective and efficient.

Download Full-text

FCILINK: Mining Frequent Closed Itemsets Based on a Link Structure between Transactions

Journal of Information & Knowledge Management ◽

10.1142/s0219649205001213 ◽

2005 ◽

Vol 04 (04) ◽

pp. 257-267

Author(s):

Kyong Rok Han ◽

Jae Yearn Kim

Keyword(s):

Association Rules ◽

Efficient Algorithm ◽

Frequent Itemsets ◽

Experimental Results ◽

Link Structure ◽

The Past ◽

Large Databases ◽

Closure Mechanism ◽

Closed Itemsets ◽

Significant Patterns

The problem of discovering association rules between items in a database is an emerging area of research. Its goal is to extract significant patterns or interesting rules from large databases. Recent studies of mining association rules have proposed a closure mechanism. It is no longer necessary to mine the set of all of the frequent itemsets and their association rules. Rather, it is sufficient to mine the frequent closed itemsets and their corresponding rules. In the past, a number of algorithms for mining frequent closed itemsets have been based on items. In this paper, we use the transaction itself for mining frequent closed itemsets. An efficient algorithm called FCILINK is proposed that is based on a link structure between transactions. A given database is scanned once and then a much smaller sub-database is scanned twice. Our experimental results show that our algorithm is faster than previously proposed methods. Furthermore, our approach is significantly more efficient for dense databases.

Download Full-text

Analysis of the progressive sampling-based approach using real life datasets

Open Computer Science ◽

10.2478/s13537-011-0016-y ◽

2011 ◽

Vol 1 (2) ◽

Cited By ~ 1

Author(s):

Venkatapathy Umarani ◽

Muthusamy Punithavalli

Keyword(s):

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Real Life ◽

Computation Time ◽

Frequent Itemsets ◽

Rule Mining ◽

Large Databases ◽

Very Large Databases ◽

Progressive Sampling

AbstractThe discovery of association rules is an important and challenging data mining task. Most of the existing algorithms for finding association rules require multiple passes over the entire database, and I/O overhead incurred is extremely high for very large databases. An obvious approach to reduce the complexity of association rule mining is sampling. In recent times, several sampling-based approaches have been developed for speeding up the process of association rule mining. A proficient progressive sampling-based approach is presented for mining association rules from large databases. At first, frequent itemsets are mined from an initial sample and subsequently, the negative border is computed from the mined frequent itemsets. Based on the support computed for the midpoint itemset in the sorted negative border, the sample size is either increased or association rules are mined from it. In this paper, we have presented an extensive analysis of the progressive sampling-based approach with different real life datasets and, in addition, the performance of the approach is evaluated with the well-known association rule mining algorithm, Apriori. The experimental results show that accuracy and computation time of the progressive sampling-based approach is effectively improved in mining of association rules from the real life datasets.

Download Full-text