Efficient Algorithm for Mining High Utility Item Sets

doi:10.35940/ijeat.f1192.0886s219

Efficient Algorithm for Mining High Utility Item Sets

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.f1192.0886s219 ◽

2019 ◽

Vol 8 (6S2) ◽

pp. 656-659

Keyword(s):

Data Mining ◽

Efficient Algorithm ◽

Tree Structure ◽

Utility Models ◽

High Utility ◽

Mining Capacity

Efficient introduction of obvious things in savage datasets could be a key test for data mining. Assorted perspective for making high utility models have been held for the instigating years, and this raises different issues, for instance, the age of a more perceivable than common level of contender things for top utility things, and clearly wealth mining capacity to the degree speed and zone. The unessential tree structure that has beginning late been organized, i.e., FP-Tree and UP-Tree, holds information on get-together advancement and itemsets, mining results, and dependably abstains from checking the affirmed data. During this report to get a controlled far up-tree is seen, basically twofold checks the data to get the up-and-comer.

Download Full-text

Efficient Algorithm for Mining High Utility Pattern Considering Length Constraints

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019070101 ◽

2019 ◽

Vol 15 (3) ◽

pp. 1-27

Author(s):

Kuldeep Singh ◽

Bhaskar Biswas

Keyword(s):

Data Mining ◽

Upper Bound ◽

Efficient Algorithm ◽

State Of The Art ◽

Memory Usage ◽

Important Data ◽

Benchmark Datasets ◽

Projection Techniques ◽

High Utility ◽

High Utility Itemsets

High utility itemset (HUI) mining is one of the popular and important data mining tasks. Several studies have been carried out on this topic, which often discovers a very large number of itemsets and rules, which reduces not only the efficiency but also the effectiveness of HUI mining. In order to increase the efficiency and discover more interesting HUIs, constraint-based mining plays an important role. To address this issue, the authors propose an algorithm to discover HUIs with length constraints named EHIL (Efficient High utility Itemsets with Length constraints) to decrease the number of HUIs by removing tiny itemsets. EHIL adopts two new upper bound named sub-tree and local utility for pruning and modify them by incorporating length constraints. To reduce the dataset scans, the proposed algorithm uses transaction merging and dataset projection techniques. The execution time improvements ranged from a modest five percent to two orders of magnitude across benchmark datasets. The memory usage is up to twenty-eight times less than state-of-the-art algorithm FHM+.

Download Full-text

Proposed Design for Data Retrieval using Efficient Algorithm

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.a2043.109119 ◽

2019 ◽

Vol 9 (1) ◽

pp. 6331-6335

Keyword(s):

Data Mining ◽

Efficient Algorithm ◽

Hybrid Algorithm ◽

Accurate Result ◽

Data Retrieval ◽

Transaction Data ◽

Customer Purchases ◽

Practical Analysis ◽

High Utility ◽

Very High

Data mining is used for finding patterns from large amount of data which is in raw format. These patterns are then analyzed to gain useful information from them. There are many branches of data mining, one of the most interesting branch is frequent item-set mining (FIM). FIM deals with finding items that are frequently brought together by customers. Like for example, if a customer purchases a mobile phone, he also tends to purchase mobile cover, ear phones etc along with it. But such kinds of patterns are not always useful to all stake-holders. Such patterns do not emphasize on the profit obtained of sale i.e. the utility obtained from product. In order to overcome this problem, the concept of high utility item-set mining (HUIM) came into existence. HUIM is used to find the utility or profit obtained from the items in transaction data. There are various algorithms for HUIM, TKU (Top K Utility) and TKO (Top K in one phase) are two well known algorithms of HUIM. The detailed study and practical analysis of these two algorithms show that there are certain drawbacks assigned with them. TKO algorithm gets executed in very less amount of time but it gives incorrect output. Whereas TKU algorithm gives accurate results when applied on database, but its execution time is very high. Hence in order to enhance the performance of these two HUIM algorithms a hybrid algorithm i.e. TKO with TKU algorithm is proposed in this paper. The two algorithms when combined give accurate result and also get executed in considerable less amount of time

Download Full-text

Multi-GPU approach to global induction of classification trees for large-scale data mining

Applied Intelligence ◽

10.1007/s10489-020-01952-5 ◽

2021 ◽

Author(s):

Krzysztof Jurczuk ◽

Marcin Czajkowski ◽

Marek Kretowski

Keyword(s):

Data Mining ◽

Large Scale ◽

Real Life ◽

Population Based ◽

Tree Structure ◽

Global Approach ◽

Data Parallel ◽

Large Scale Data ◽

The Impact ◽

Scale Data

AbstractThis paper concerns the evolutionary induction of decision trees (DT) for large-scale data. Such a global approach is one of the alternatives to the top-down inducers. It searches for the tree structure and tests simultaneously and thus gives improvements in the prediction and size of resulting classifiers in many situations. However, it is the population-based and iterative approach that can be too computationally demanding to apply for big data mining directly. The paper demonstrates that this barrier can be overcome by smart distributed/parallel processing. Moreover, we ask the question whether the global approach can truly compete with the greedy systems for large-scale data. For this purpose, we propose a novel multi-GPU approach. It incorporates the knowledge of global DT induction and evolutionary algorithm parallelization together with efficient utilization of memory and computing GPU’s resources. The searches for the tree structure and tests are performed simultaneously on a CPU, while the fitness calculations are delegated to GPUs. Data-parallel decomposition strategy and CUDA framework are applied. Experimental validation is performed on both artificial and real-life datasets. In both cases, the obtained acceleration is very satisfactory. The solution is able to process even billions of instances in a few hours on a single workstation equipped with 4 GPUs. The impact of data characteristics (size and dimension) on convergence and speedup of the evolutionary search is also shown. When the number of GPUs grows, nearly linear scalability is observed what suggests that data size boundaries for evolutionary DT mining are fading.

Download Full-text

A Scalable Approach for Data Mining – AHUIM

Webology ◽

10.14704/web/v18i1/web18029 ◽

2021 ◽

Vol 18 (1) ◽

pp. 92-103

Author(s):

Vandna Dahiya ◽

Sandeep Dalal

Keyword(s):

Data Mining ◽

Research Paper ◽

Large Datasets ◽

Novel Technique ◽

Itemset Mining ◽

Essential Form ◽

High Utility

Utility itemset mining, which finds the item sets based on utility factors, has established itself as an essential form of data mining. The utility is defined in terms of quantity and some interest factor. Various methods have been developed so far by the researchers to mine these itemsets but most of them are not scalable. In the present times, a scalable approach is required that can fulfill the budding needs of data mining. A Spark based novel technique has been recommended in this research paper for mining the data in a distributed way, called as Absolute High Utility Itemset Mining (AHUIM). The technique is suitable for small as well as large datasets. The performance of the technique is being measured for various parameters such as speed, scalability, and accuracy etc.

Download Full-text

An efficient algorithm for Hiding High Utility Sequential Patterns

International Journal of Approximate Reasoning ◽

10.1016/j.ijar.2018.01.005 ◽

2018 ◽

Vol 95 ◽

pp. 77-92 ◽

Cited By ~ 10

Author(s):

Bac Le ◽

Duy-Tai Dinh ◽

Van-Nam Huynh ◽

Quang-Minh Nguyen ◽

Philippe Fournier-Viger

Keyword(s):

Efficient Algorithm ◽

Sequential Patterns ◽

High Utility

Download Full-text

An Efficient Algorithm for High Utility Sequential Pattern Mining

Lecture Notes in Electrical Engineering - Frontier and Innovation in Future Computing and Communications ◽

10.1007/978-94-017-8798-7_7 ◽

2014 ◽

pp. 49-56 ◽

Cited By ~ 4

Author(s):

Jun-Zhe Wang ◽

Zong-Hua Yang ◽

Jiun-Long Huang

Keyword(s):

Efficient Algorithm ◽

Pattern Mining ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

High Utility

Download Full-text

Mining of top-k high utility itemsets with negative utility

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-201357 ◽

2020 ◽

pp. 1-16

Author(s):

Rui Sun ◽

Meng Han ◽

Chunyan Zhang ◽

Mingyao Shen ◽

Shiyu Du

Keyword(s):

Data Mining ◽

Search Space ◽

Experimental Results ◽

Effective Algorithm ◽

Memory Usage ◽

Utility Value ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.

Download Full-text