A Survey of High Utility Pattern Mining Algorithms for Big Data

Author(s):  
Morteza Zihayat ◽  
Mehdi Kargar ◽  
Jaroslaw Szlichta
Author(s):  
Logeswaran K. ◽  
Suresh P. ◽  
Savitha S. ◽  
Prasanna Kumar K. R.

In recent years, the data analysts are facing many challenges in high utility itemset (HUI) mining from given transactional database using existing traditional techniques. The challenges in utility mining algorithms are exponentially growing search space and the minimum utility threshold appropriate to the given database. To overcome these challenges, evolutionary algorithm-based techniques can be used to mine the HUI from transactional database. However, testing each of the supporting functions in the optimization problem is very inefficient and it increases the time complexity of the algorithm. To overcome this drawback, reinforcement learning-based approach is proposed for improving the efficiency of the algorithm, and the most appropriate fitness function for evaluation can be selected automatically during execution of an algorithm. Furthermore, during the optimization process when distinct functions are skillful, dynamic selection of current optimal function is done.


Author(s):  
Alind Khare ◽  
Vikram Goyal ◽  
Srikanth Baride ◽  
Sushil K. Prasad ◽  
Michael McDermott ◽  
...  

Author(s):  
Sudhir Tirumalasetty ◽  
A. Divya ◽  
D. Rahitya Lakshmi ◽  
Ch. Durga Bhavani ◽  
D. Anusha

Frequent pattern mining is an essential data-mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern-mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. This paper reviews recent advances in parallel frequent pattern mining, analysing them through the Big Data lens. Load balancing and work partitioning are the major challenges to be conquered. These challenges always invoke innovative methods to do, as Big Data evolves with no limits. The biggest challenge than before is conquering unstructured data for finding frequent patterns. To accomplish this Semi Structured Doc-Model and ranking of patterns are used.


2021 ◽  
pp. 1-26
Author(s):  
Haodong Cheng ◽  
Meng Han ◽  
Ni Zhang ◽  
Xiaojuan Li ◽  
Le Wang

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.


2017 ◽  
Vol 7 (1.2) ◽  
pp. 211
Author(s):  
Sandeep Dalal ◽  
Vandna Dahiya

This paper has been withdrawn. 


2020 ◽  
Vol 2020 ◽  
pp. 1-11
Author(s):  
Le Wang ◽  
Shui Wang ◽  
Haiyan Li ◽  
Chunliang Zhou

High-utility pattern mining is a research hotspot in the field of pattern mining, and one of its main research topics is how to improve the efficiency of the mining algorithm. Based on the study on the state-of-the-art high-utility pattern mining algorithms, this paper proposes an improved strategy that removes noncandidate items from the global header table and local header table as early as possible, thus reducing search space and improving efficiency of the algorithm. The proposed strategy is applied to the algorithm EFIM (EFficient high-utility Itemset Mining). Experimental verification was carried out on nine typical datasets (including two large datasets); results show that our strategy can effectively improve temporal efficiency for mining high-utility patterns.


Author(s):  
Youcef Djenouri ◽  
Asma Belhadi ◽  
Djamel Djenouri ◽  
Jerry Chun-Wei Lin

Abstract This paper addresses the problem of responding to user queries by fetching the most relevant object from a clustered set of objects. It addresses the common drawbacks of cluster-based approaches and targets fast, high-quality information retrieval. For this purpose, a novel cluster-based information retrieval approach is proposed, named Cluster-based Retrieval using Pattern Mining (CRPM). This approach integrates various clustering and pattern mining algorithms. First, it generates clusters of objects that contain similar objects. Three clustering algorithms based on k-means, DBSCAN (Density-based spatial clustering of applications with noise), and Spectral are suggested to minimize the number of shared terms among the clusters of objects. Second, frequent and high-utility pattern mining algorithms are performed on each cluster to extract the pattern bases. Third, the clusters of objects are ranked for every query. In this context, two ranking strategies are proposed: i) Score Pattern Computing (SPC), which calculates a score representing the similarity between a user query and a cluster; and ii) Weighted Terms in Clusters (WTC), which calculates a weight for every term and uses the relevant terms to compute the score between a user query and each cluster. Irrelevant information derived from the pattern bases is also used to deal with unexpected user queries. To evaluate the proposed approach, extensive experiments were carried out on two use cases: the documents and tweets corpus. The results showed that the designed approach outperformed traditional and cluster-based information retrieval approaches in terms of the quality of the returned objects while being very competitive in terms of runtime.


Healthcare ◽  
2020 ◽  
Vol 8 (3) ◽  
pp. 234 ◽  
Author(s):  
Hyun Yoo ◽  
Soyoung Han ◽  
Kyungyong Chung

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.


IEEE Access ◽  
2021 ◽  
pp. 1-1
Author(s):  
Rashad S. Almoqbily ◽  
Azhar Rauf ◽  
Fahmi H. Quradaa
Keyword(s):  

Sign in / Sign up

Export Citation Format

Share Document