scholarly journals Efficient Algorithm for Mining High Utility Item Sets

Efficient introduction of obvious things in savage datasets could be a key test for data mining. Assorted perspective for making high utility models have been held for the instigating years, and this raises different issues, for instance, the age of a more perceivable than common level of contender things for top utility things, and clearly wealth mining capacity to the degree speed and zone. The unessential tree structure that has beginning late been organized, i.e., FP-Tree and UP-Tree, holds information on get-together advancement and itemsets, mining results, and dependably abstains from checking the affirmed data. During this report to get a controlled far up-tree is seen, basically twofold checks the data to get the up-and-comer.

2019 ◽  
Vol 15 (3) ◽  
pp. 1-27
Author(s):  
Kuldeep Singh ◽  
Bhaskar Biswas

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.


Data mining is used for finding patterns from large amount of data which is in raw format. These patterns are then analyzed to gain useful information from them. There are many branches of data mining, one of the most interesting branch is frequent item-set mining (FIM). FIM deals with finding items that are frequently brought together by customers. Like for example, if a customer purchases a mobile phone, he also tends to purchase mobile cover, ear phones etc along with it. But such kinds of patterns are not always useful to all stake-holders. Such patterns do not emphasize on the profit obtained of sale i.e. the utility obtained from product. In order to overcome this problem, the concept of high utility item-set mining (HUIM) came into existence. HUIM is used to find the utility or profit obtained from the items in transaction data. There are various algorithms for HUIM, TKU (Top K Utility) and TKO (Top K in one phase) are two well known algorithms of HUIM. The detailed study and practical analysis of these two algorithms show that there are certain drawbacks assigned with them. TKO algorithm gets executed in very less amount of time but it gives incorrect output. Whereas TKU algorithm gives accurate results when applied on database, but its execution time is very high. Hence in order to enhance the performance of these two HUIM algorithms a hybrid algorithm i.e. TKO with TKU algorithm is proposed in this paper. The two algorithms when combined give accurate result and also get executed in considerable less amount of time


Author(s):  
Krzysztof Jurczuk ◽  
Marcin Czajkowski ◽  
Marek Kretowski

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.


Webology ◽  
2021 ◽  
Vol 18 (1) ◽  
pp. 92-103
Author(s):  
Vandna Dahiya ◽  
Sandeep Dalal

Utility itemset mining, which finds the item sets based on utility factors, has established itself as an essential form of data mining. The utility is defined in terms of quantity and some interest factor. Various methods have been developed so far by the researchers to mine these itemsets but most of them are not scalable. In the present times, a scalable approach is required that can fulfill the budding needs of data mining. A Spark based novel technique has been recommended in this research paper for mining the data in a distributed way, called as Absolute High Utility Itemset Mining (AHUIM). The technique is suitable for small as well as large datasets. The performance of the technique is being measured for various parameters such as speed, scalability, and accuracy etc.


2018 ◽  
Vol 95 ◽  
pp. 77-92 ◽  
Author(s):  
Bac Le ◽  
Duy-Tai Dinh ◽  
Van-Nam Huynh ◽  
Quang-Minh Nguyen ◽  
Philippe Fournier-Viger

2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


2016 ◽  
Vol 111 ◽  
pp. 283-298 ◽  
Author(s):  
Jerry Chun-Wei Lin ◽  
Philippe Fournier-Viger ◽  
Wensheng Gan

2020 ◽  
Vol 24 (4) ◽  
pp. 831-845
Author(s):  
Vy Huynh Trieu ◽  
Hai Le Quoc ◽  
Chau Truong Ngoc

Sign in / Sign up

Export Citation Format

Share Document