scholarly journals A Scalable Approach for Data Mining – AHUIM

Webology ◽  
2021 ◽  
Vol 18 (1) ◽  
pp. 92-103
Author(s):  
Vandna Dahiya ◽  
Sandeep Dalal

Utility itemset mining, which finds the item sets based on utility factors, has established itself as an essential form of data mining. The utility is defined in terms of quantity and some interest factor. Various methods have been developed so far by the researchers to mine these itemsets but most of them are not scalable. In the present times, a scalable approach is required that can fulfill the budding needs of data mining. A Spark based novel technique has been recommended in this research paper for mining the data in a distributed way, called as Absolute High Utility Itemset Mining (AHUIM). The technique is suitable for small as well as large datasets. The performance of the technique is being measured for various parameters such as speed, scalability, and accuracy etc.

2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


2020 ◽  
Vol 32 ◽  
pp. 03012
Author(s):  
Harshal Bhope ◽  
Yash Mahajan ◽  
Swapnil Deore ◽  
Vimla Jethani

Frequent itemsets(HUIs) mining is an evolving field in data mining, that centers around finding itemsets having a utility that meets a user-specified minimum utility by finding all the itemsets. A problem arises in setting up minimum utility exactly which causes difficulties for users. By setting minimum utility underneath average, too many incessant itemsets will be generated, which in turn will make the mining process quite inefficient. No frequent itemsets will be found if the minimum utility is set too huge. The research focuses on generating frequent itemsets by using the transaction weighted utility of each product. While using UP growth methodology for discovering high utility items from large datasets it takes more time and consumes more memory due to which it is less efficient. So to overcome these drawbacks of UP growth we use the Top-K algorithm which makes it more scalable and efficient. Therefore, we use the Top-K algorithm which does not require a minimum threshold.


2021 ◽  
pp. 107422
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Unil Yun ◽  
Philippe Fourier-Viger
Keyword(s):  

Author(s):  
Amit Verma ◽  
Siddharth Dawar ◽  
Raman Kumar ◽  
Shamkant Navathe ◽  
Vikram Goyal
Keyword(s):  

Author(s):  
Jimmy Ming-Tai Wu ◽  
Qian Teng ◽  
Shahab Tayeb ◽  
Jerry Chun-Wei Lin

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.


Sign in / Sign up

Export Citation Format

Share Document