A Survey of High Utility Pattern Mining Algorithms for Big Data

In recent years, the data analysts are facing many challenges in high utility itemset (HUI) mining from given transactional database using existing traditional techniques. The challenges in utility mining algorithms are exponentially growing search space and the minimum utility threshold appropriate to the given database. To overcome these challenges, evolutionary algorithm-based techniques can be used to mine the HUI from transactional database. However, testing each of the supporting functions in the optimization problem is very inefficient and it increases the time complexity of the algorithm. To overcome this drawback, reinforcement learning-based approach is proposed for improving the efficiency of the algorithm, and the most appropriate fitness function for evaluation can be selected automatically during execution of an algorithm. Furthermore, during the optimization process when distinct functions are skillful, dynamic selection of current optimal function is done.

Download Full-text

Distributed Algorithm for High-Utility Subgraph Pattern Mining Over Big Data Platforms

2017 IEEE 24th International Conference on High Performance Computing (HiPC) ◽

10.1109/hipc.2017.00038 ◽

2017 ◽

Author(s):

Alind Khare ◽

Vikram Goyal ◽

Srikanth Baride ◽

Sushil K. Prasad ◽

Michael McDermott ◽

...

Keyword(s):

Big Data ◽

Distributed Algorithm ◽

Pattern Mining ◽

Subgraph Pattern ◽

High Utility

Download Full-text

Frequent Pattern Mining over Unstructured Data using Semi-Structured Doc-Model and Pattern Ranking

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit206216 ◽

2020 ◽

pp. 36-42

Author(s):

Sudhir Tirumalasetty ◽

A. Divya ◽

D. Rahitya Lakshmi ◽

Ch. Durga Bhavani ◽

D. Anusha

Keyword(s):

Data Mining ◽

Big Data ◽

Pattern Mining ◽

Frequent Pattern Mining ◽

Unstructured Data ◽

Frequent Pattern ◽

Frequent Patterns ◽

Innovative Methods ◽

Mining Algorithms ◽

Doc Model

Frequent pattern mining is an essential data-mining task, with a goal of discovering knowledge in the form of repeated patterns. Many efficient pattern-mining algorithms have been discovered in the last two decades, yet most do not scale to the type of data we are presented with today, the so-called “Big Data”. Scalable parallel algorithms hold the key to solving the problem in this context. This paper reviews recent advances in parallel frequent pattern mining, analysing them through the Big Data lens. Load balancing and work partitioning are the major challenges to be conquered. These challenges always invoke innovative methods to do, as Big Data evolves with no limits. The biggest challenge than before is conquering unstructured data for finding frequent patterns. To accomplish this Semi Structured Doc-Model and ranking of patterns are used.

Download Full-text

A Survey of incremental high-utility pattern mining based on storage structure

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-202745 ◽

2021 ◽

pp. 1-26

Author(s):

Haodong Cheng ◽

Meng Han ◽

Ni Zhang ◽

Xiaojuan Li ◽

Le Wang

Keyword(s):

Pattern Mining ◽

Business Decisions ◽

Practical Applications ◽

Itemset Mining ◽

High Utility ◽

High Utility Itemsets ◽

High Utility Patterns ◽

Mining Algorithms ◽

Purchase Quantity ◽

Storage Structures

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.

Download Full-text

Review on high utility itemset mining algorithms for big data

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i1.2.9230 ◽

2017 ◽

Vol 7 (1.2) ◽

pp. 211

Author(s):

Sandeep Dalal ◽

Vandna Dahiya

Keyword(s):

Big Data ◽

Itemset Mining ◽

High Utility ◽

Mining Algorithms

This paper has been withdrawn.

Download Full-text

Improved Strategy for High-Utility Pattern Mining Algorithm

Mathematical Problems in Engineering ◽

10.1155/2020/1971805 ◽

2020 ◽

Vol 2020 ◽

pp. 1-11

Author(s):

Le Wang ◽

Shui Wang ◽

Haiyan Li ◽

Chunliang Zhou

Keyword(s):

Pattern Mining ◽

State Of The Art ◽

Search Space ◽

Research Topics ◽

Main Research ◽

Mining Algorithm ◽

Temporal Efficiency ◽

High Utility ◽

High Utility Patterns ◽

Mining Algorithms

High-utility pattern mining is a research hotspot in the field of pattern mining, and one of its main research topics is how to improve the efficiency of the mining algorithm. Based on the study on the state-of-the-art high-utility pattern mining algorithms, this paper proposes an improved strategy that removes noncandidate items from the global header table and local header table as early as possible, thus reducing search space and improving efficiency of the algorithm. The proposed strategy is applied to the algorithm EFIM (EFficient high-utility Itemset Mining). Experimental verification was carried out on nine typical datasets (including two large datasets); results show that our strategy can effectively improve temporal efficiency for mining high-utility patterns.

Download Full-text

Cluster-based information retrieval using pattern mining

Applied Intelligence ◽

10.1007/s10489-020-01922-x ◽

2020 ◽

Author(s):

Youcef Djenouri ◽

Asma Belhadi ◽

Djamel Djenouri ◽

Jerry Chun-Wei Lin

Keyword(s):

Information Retrieval ◽

Pattern Mining ◽

Spatial Clustering ◽

Clustering Algorithms ◽

User Query ◽

High Quality Information ◽

High Utility ◽

Score Pattern ◽

Mining Algorithms ◽

User Queries

Abstract This paper addresses the problem of responding to user queries by fetching the most relevant object from a clustered set of objects. It addresses the common drawbacks of cluster-based approaches and targets fast, high-quality information retrieval. For this purpose, a novel cluster-based information retrieval approach is proposed, named Cluster-based Retrieval using Pattern Mining (CRPM). This approach integrates various clustering and pattern mining algorithms. First, it generates clusters of objects that contain similar objects. Three clustering algorithms based on k-means, DBSCAN (Density-based spatial clustering of applications with noise), and Spectral are suggested to minimize the number of shared terms among the clusters of objects. Second, frequent and high-utility pattern mining algorithms are performed on each cluster to extract the pattern bases. Third, the clusters of objects are ranked for every query. In this context, two ranking strategies are proposed: i) Score Pattern Computing (SPC), which calculates a score representing the similarity between a user query and a cluster; and ii) Weighted Terms in Clusters (WTC), which calculates a weight for every term and uses the relevant terms to compute the score between a user query and each cluster. Irrelevant information derived from the pattern bases is also used to deal with unexpected user queries. To evaluate the proposed approach, extensive experiments were carried out on two use cases: the documents and tweets corpus. The results showed that the designed approach outperformed traditional and cluster-based information retrieval approaches in terms of the quality of the returned objects while being very competitive in terms of runtime.

Download Full-text

A Frequency Pattern Mining Model Based on Deep Neural Network for Real-Time Classification of Heart Conditions

Healthcare ◽

10.3390/healthcare8030234 ◽

2020 ◽

Vol 8 (3) ◽

pp. 234 ◽

Cited By ~ 3

Author(s):

Hyun Yoo ◽

Soyoung Han ◽

Kyungyong Chung

Keyword(s):

Neural Network ◽

Big Data ◽

Fourier Transform ◽

Fast Fourier Transform ◽

Real Time ◽

Normal Control ◽

Input Data ◽

Deep Neural Network ◽

Pattern Mining ◽

F Measure

Recently, a massive amount of big data of bioinformation is collected by sensor-based IoT devices. The collected data are also classified into different types of health big data in various techniques. A personalized analysis technique is a basis for judging the risk factors of personal cardiovascular disorders in real-time. The objective of this paper is to provide the model for the personalized heart condition classification in combination with the fast and effective preprocessing technique and deep neural network in order to process the real-time accumulated biosensor input data. The model can be useful to learn input data and develop an approximation function, and it can help users recognize risk situations. For the analysis of the pulse frequency, a fast Fourier transform is applied in preprocessing work. With the use of the frequency-by-frequency ratio data of the extracted power spectrum, data reduction is performed. To analyze the meanings of preprocessed data, a neural network algorithm is applied. In particular, a deep neural network is used to analyze and evaluate linear data. A deep neural network can make multiple layers and can establish an operation model of nodes with the use of gradient descent. The completed model was trained by classifying the ECG signals collected in advance into normal, control, and noise groups. Thereafter, the ECG signal input in real time through the trained deep neural network system was classified into normal, control, and noise. To evaluate the performance of the proposed model, this study utilized a ratio of data operation cost reduction and F-measure. As a result, with the use of fast Fourier transform and cumulative frequency percentage, the size of ECG reduced to 1:32. According to the analysis on the F-measure of the deep neural network, the model had 83.83% accuracy. Given the results, the modified deep neural network technique can reduce the size of big data in terms of computing work, and it is an effective system to reduce operation time.

Download Full-text

A Survey of Correlated High Utility Pattern Mining

IEEE Access ◽

10.1109/access.2021.3065393 ◽

2021 ◽

pp. 1-1

Author(s):

Rashad S. Almoqbily ◽

Azhar Rauf ◽

Fahmi H. Quradaa

Keyword(s):

Pattern Mining ◽

High Utility

Download Full-text