The Efficient Mining of Skyline Patterns from a Volunteer Computing Network

2021 ◽  
Vol 21 (4) ◽  
pp. 1-20
Author(s):  
Jimmy Ming-Tai Wu ◽  
Qian Teng ◽  
Gautam Srivastava ◽  
Matin Pirouz ◽  
Jerry Chun-Wei Lin

In the ever-growing world, the concepts of High-utility Itemset Mining (HUIM) as well as Frequent Itemset Mining (FIM) are fundamental works in knowledge discovery. Several algorithms have been designed successfully. However, these algorithms only used one factor to estimate an itemset. In the past, skyline pattern mining by considering both aspects of frequency and utility has been extensively discussed. In most cases, however, people tend to focus on purchase quantities of itemsets rather than frequencies. In this article, we propose a new knowledge called skyline quantity-utility pattern (SQUP) to provide better estimations in the decision-making process by considering quantity and utility together. Two algorithms, respectively, called SQU-Miner and SKYQUP are presented to efficiently mine the set of SQUPs. Moreover, the usage of volunteer computing is proposed to show the potential in real supermarket applications. Two new efficient utility-max structures are also mentioned for the reduction of the candidate itemsets, respectively, utilized in SQU-Miner and SKYQUP. These two new utility-max structures are used to store the upper-bound of utility for itemsets under the quantity constraint instead of frequency constraint, and the second proposed utility-max structure moreover applies a recursive updated process to further obtain strict upper-bound of utility. Our in-depth experimental results prove that SKYQUP has stronger performance when a comparison is made to SQU-Miner in terms of memory usage, runtime, and the number of candidates.

2019 ◽  
Vol 8 (4) ◽  
pp. 2455-2461

Pattern mining is a technique, which discovers interesting, hidden, unpredicted and useful patterns of data from the database. Most of the research work in pattern mining has been focused on the traditional way of Frequent Itemset Mining (FIM) and Association Rule Mining (ARM) for patterndiscovery. Patterns in frequent itemset mining are based on the occurrence frequency of items. Although frequent pattern mining is useful, the assumption that ‘frequent patterns are interesting,’ doesn’t hold for numerous applications. High Utility Itemset Mining (UIM) overcomes this limitation of frequent itemset mining. The aim of HUIM is to find the patterns based on a utility function where the utility can be measured in terms of revenue, profit, weight, frequency, interestingness or time spent on some webpage, etc. Mining patterns with high utility can be seen as a generalization of FIM where the transaction database is the input and every item is having a utility factor representing its importance and might have non-binary quantities in the transactions. This paper surveys various recent advances and research opportunities in the field of high utility itemset mining


Author(s):  
Jimmy Ming-Tai Wu ◽  
Qian Teng ◽  
Shahab Tayeb ◽  
Jerry Chun-Wei Lin

AbstractThe high average-utility itemset mining (HAUIM) was established to provide a fair measure instead of genetic high-utility itemset mining (HUIM) for revealing the satisfied and interesting patterns. In practical applications, the database is dynamically changed when insertion/deletion operations are performed on databases. Several works were designed to handle the insertion process but fewer studies focused on processing the deletion process for knowledge maintenance. In this paper, we then develop a PRE-HAUI-DEL algorithm that utilizes the pre-large concept on HAUIM for handling transaction deletion in the dynamic databases. The pre-large concept is served as the buffer on HAUIM that reduces the number of database scans while the database is updated particularly in transaction deletion. Two upper-bound values are also established here to reduce the unpromising candidates early which can speed up the computational cost. From the experimental results, the designed PRE-HAUI-DEL algorithm is well performed compared to the Apriori-like model in terms of runtime, memory, and scalability in dynamic databases.


2018 ◽  
Vol 7 (3.8) ◽  
pp. 77
Author(s):  
Prof. Mangesh Ghonge ◽  
Miss Neha Rane

Essentially the most primary and crucial part of data mining is pattern mining. For acquiring important corre-lations among the information, method called itemset mining plays vital role Earlier, the notion of itemset mining was used to acquire the absolute most often occurring items in the itemset. In some situation, though having utility value less than threshold it is necessary to locate such items because they are of great use. Considering the thought of weight for each and every apparent items brings effectiveness for mining the pattern efficiently. Different mining algorithms are utilized to obtain the correlations among the information items based on frequency with the items in the dataset occurs. In frequent itemset, those things which occurs frequently whereas, in infrequent itemset the items that occur very rarely are obtained. Determining such form of data is tougher than to locate data which occurs frequently. Frequent Itemset Mining (FISM) locates large and frequent itemsets in huge data for example market baskets. Such data has two properties that are not addressed by FISM; Mixture property and projection property. Here the proposed system combines both mixture as well as projection property further providing automated support thresholds.  


2019 ◽  
Vol 18 (04) ◽  
pp. 1113-1185 ◽  
Author(s):  
Bahareh Rahmati ◽  
Mohammad Karim Sohrabi

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.


2011 ◽  
pp. 32-56
Author(s):  
Osmar R. Zaïane ◽  
Mohammed El-Hajj

Frequent Itemset Mining (FIM) is a key component of many algorithms that extract patterns from transactional databases. For example, FIM can be leveraged to produce association rules, clusters, classifiers or contrast sets. This capability provides a strategic resource for decision support, and is most commonly used for market basket analysis. One challenge for frequent itemset mining is the potentially huge number of extracted patterns, which can eclipse the original database in size. In addition to increasing the cost of mining, this makes it more difficult for users to find the valuable patterns. Introducing constraints to the mining process helps mitigate both issues. Decision makers can restrict discovered patterns according to specified rules. By applying these restrictions as early as possible, the cost of mining can be constrained. For example, users may be interested in purchases whose total price exceeds $100, or whose items cost between $50 and $100. In cases of extremely large data sets, pushing constraints sequentially is not enough and parallelization becomes a must. However, specific design is needed to achieve sizes never reported before in the literature.


2022 ◽  
Vol 16 (3) ◽  
pp. 1-26
Author(s):  
Jerry Chun-Wei Lin ◽  
Youcef Djenouri ◽  
Gautam Srivastava ◽  
Yuanfa Li ◽  
Philip S. Yu

High-utility sequential pattern mining (HUSPM) is a hot research topic in recent decades since it combines both sequential and utility properties to reveal more information and knowledge rather than the traditional frequent itemset mining or sequential pattern mining. Several works of HUSPM have been presented but most of them are based on main memory to speed up mining performance. However, this assumption is not realistic and not suitable in large-scale environments since in real industry, the size of the collected data is very huge and it is impossible to fit the data into the main memory of a single machine. In this article, we first develop a parallel and distributed three-stage MapReduce model for mining high-utility sequential patterns based on large-scale databases. Two properties are then developed to hold the correctness and completeness of the discovered patterns in the developed framework. In addition, two data structures called sidset and utility-linked list are utilized in the developed framework to accelerate the computation for mining the required patterns. From the results, we can observe that the designed model has good performance in large-scale datasets in terms of runtime, memory, efficiency of the number of distributed nodes, and scalability compared to the serial HUSP-Span approach.


2020 ◽  
Vol 50 (11) ◽  
pp. 3788-3807
Author(s):  
Jerry Chun-Wei Lin ◽  
Matin Pirouz ◽  
Youcef Djenouri ◽  
Chien-Fu Cheng ◽  
Usman Ahmed

Abstract High-utility itemset mining (HUIM) is considered as an emerging approach to detect the high-utility patterns from databases. Most existing algorithms of HUIM only consider the itemset utility regardless of the length. This limitation raises the utility as a result of a growing itemset size. High average-utility itemset mining (HAUIM) considers the size of the itemset, thus providing a more balanced scale to measure the average-utility for decision-making. Several algorithms were presented to efficiently mine the set of high average-utility itemsets (HAUIs) but most of them focus on handling static databases. In the past, a fast-updated (FUP)-based algorithm was developed to efficiently handle the incremental problem but it still has to re-scan the database when the itemset in the original database is small but there is a high average-utility upper-bound itemset (HAUUBI) in the newly inserted transactions. In this paper, an efficient framework called PRE-HAUIMI for transaction insertion in dynamic databases is developed, which relies on the average-utility-list (AUL) structures. Moreover, we apply the pre-large concept on HAUIM. A pre-large concept is used to speed up the mining performance, which can ensure that if the total utility in the newly inserted transaction is within the safety bound, the small itemsets in the original database could not be the large ones after the database is updated. This, in turn, reduces the recurring database scans and obtains the correct HAUIs. Experiments demonstrate that the PRE-HAUIMI outperforms the state-of-the-art batch mode HAUI-Miner, and the state-of-the-art incremental IHAUPM and FUP-based algorithms in terms of runtime, memory, number of assessed patterns and scalability.


Author(s):  
Jimmy Ming-Tai Wu ◽  
Zhongcui Li ◽  
Gautam Srivastava ◽  
Unil Yun ◽  
Jerry Chun-Wei Lin

AbstractRecently, revealing more valuable information except for quantity value for a database is an essential research field. High utility itemset mining (HAUIM) was suggested to reveal useful patterns by average-utility measure for pattern analytics and evaluations. HAUIM provides a more fair assessment than generic high utility itemset mining and ignores the influence of the length of itemsets. There are several high-performance HAUIM algorithms proposed to gain knowledge from a disorganized database. However, most existing works do not concern the uncertainty factor, which is one of the characteristics of data gathered from IoT equipment. In this work, an efficient algorithm for HAUIM to handle the uncertainty databases in IoTs is presented. Two upper-bound values are estimated to early diminish the search space for discovering meaningful patterns that greatly solve the limitations of pattern mining in IoTs. Experimental results showed several evaluations of the proposed approach compared to the existing algorithms, and the results are acceptable to state that the designed approach efficiently reveals high average utility itemsets from an uncertain situation.


Information ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Scott Buffett

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.


2021 ◽  
pp. 1-26
Author(s):  
Haodong Cheng ◽  
Meng Han ◽  
Ni Zhang ◽  
Xiaojuan Li ◽  
Le Wang

Traditional association rule mining has been widely studied, but this is not applicable to practical applications that must consider factors such as the unit profit of the item and the purchase quantity. High-utility itemset mining (HUIM) aims to find high-utility patterns by considering the number of items purchased and the unit profit. However, most high-utility itemset mining algorithms are designed for static databases. In real-world applications (such as market analysis and business decisions), databases are usually updated by inserting new data dynamically. Some researchers have proposed algorithms for finding high-utility itemsets in dynamically updated databases. Different from the batch processing algorithms that always process the databases from scratch, the incremental HUIM algorithms update and output high-utility itemsets in an incremental manner, thereby reducing the cost of finding high-utility itemsets. This paper provides the latest research on incremental high-utility itemset mining algorithms, including methods of storing itemsets and utilities based on tree, list, array and hash set storage structures. It also points out several important derivative algorithms and research challenges for incremental high-utility itemset mining.


Sign in / Sign up

Export Citation Format

Share Document