Mining High Utility Sequential Patterns with Negative Item Values

Author(s):  
Tiantian Xu ◽  
Xiangjun Dong ◽  
Jianliang Xu ◽  
Xue Dong

High utility sequential patterns (HUSP) refer to those sequential patterns with high utility (such as profit), which play a crucial role in many real-life applications. Relevant studies of HUSP only consider positive values of sequence utility. In some applications, however, a sequence consists of items with negative values (NIV). For example, a supermarket sells a cartridge with negative profit in a package with a printer at higher positive return. Although a few methods have been proposed to mine high utility itemsets (HUI) with NIV, they are not suitable for mining HUSP with NIV because an item may occur more than once in a sequence and its utility may have multiple values. In this paper, we propose a novel method High Utility Sequential Patterns with Negative Item Values (HUSP-NIV) to efficiently mine HUSP with NIV from sequential utility-based databases. HUSP-NIV works as follows: (1) using the lexicographic quantitative sequence tree (LQS-tree) to extract the complete set of high utility sequences and using I-Concatenation and S-Concatenation mechanisms to generate newly concatenated sequences; (2) using three pruning methods to reduce the search space in the LQS-tree; (3) traversing LQS-tree and outputting all the high utility sequential patterns. To the best of our knowledge, HUSP-NIV is the first method to mine HUSP with NIV, which is shown efficient on both synthetic and real datasets.

Author(s):  
Tiantian Xu ◽  
Jianliang Xu ◽  
Xiangjun Dong

High utility sequential patterns (HUSP) mining has recently received a lot of attention from researchers. Many algorithms have been proposed to mine HUSP and most of them only use a single minimum utility, which implicitly assumes that all items in the database are of the same importance (such as profit), or other information based on users’ concern in the database. This is often not the case in real-life applications. Although a few methods have been proposed to mine high utility itemsets (HUI) with multiple minimum utility (MMU), they are not suitable for mining HUSP with MMU because an item may occur more than one time in a sequence and may have multiple utility values. In this paper, we propose a novel method, called HUSpan-MMU, to efficiently mine HUSP with MMU from sequential utility-based databases. A lexicographic quantitative sequence tree (LQS-tree) is used to extract the complete set of HUSP. Meanwhile, two pruning methods are used to reduce the search space in the LQS-tree. Experimental results on both synthetic and real datasets show that HUSpan-MMU can efficiently mine HUSP with MMU from utility-based databases.


Author(s):  
Tiantian Xu ◽  
Xiangjun Dong ◽  
Jianliang Xu ◽  
Yongshun Gong

Mining negative sequential patterns (NSP) has been an important research area in data mining and knowledge discovery and it is much more challenging than mining positive sequential patterns (PSP) due to the computational complexity and search space. Only a few methods have been proposed to mine NSP and most of them only use single minimum support, which implicitly assumes that all items in the database are of the same nature or of similar frequencies in the database. This is often not the case in real-life applications. There are several methods to mine sequential patterns with multiple minimum supports (MMS), but these methods only consider PSP and do not handle NSP. So in this paper, we propose a new method, called e-msNSP, to mine NSP with multiple minimum supports. We also solve the problem of how to set up the minimum support to a sequence with negative item(s). E-msNSP consists of three major steps: (i) using the improved MS-GSP method to mine PSP with multiple minimum supports and storing all positive sequential candidates’ (PSC) related information simultaneously; (ii) using the same method in e-NSP to generate negative sequential candidates (NSC) based on above mined PSP; (iii) calculating the support of these NSC based only on the corresponding PSP and then getting NSP. To the best of our knowledge, e-msNSP is the first method to mine NSP with MMS and does not impose strict constraints. Experimental results show that the e-msNSP is highly effective and efficient.


2022 ◽  
Vol 13 (1) ◽  
pp. 1-22
Author(s):  
M. Saqib Nawaz ◽  
Philippe Fournier-Viger ◽  
Unil Yun ◽  
Youxi Wu ◽  
Wei Song

High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of the database and total number of items increases. Recently, evolutionary and heuristic algorithms were designed to mine HUIs, which provided considerable performance improvement. However, they can still have a long runtime and some may miss many HUIs. To address this problem, this article proposes two algorithms for HUIM based on Hill Climbing (HUIM-HC) and Simulated Annealing (HUIM-SA). Both algorithms transform the input database into a bitmap for efficient utility computation and for search space pruning. To improve population diversity, HUIs discovered by evolution are used as target values for the next population instead of keeping the current optimal values in the next population. Through experiments on real-life datasets, it was found that the proposed algorithms are faster than state-of-the-art heuristic and evolutionary HUIM algorithms, that HUIM-SA discovers similar HUIs, and that HUIM-SA evolves linearly with the number of iterations.


2021 ◽  
Vol 16 (2) ◽  
pp. 1-31
Author(s):  
Chunkai Zhang ◽  
Zilin Du ◽  
Yuting Yang ◽  
Wensheng Gan ◽  
Philip S. Yu

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.


2020 ◽  
pp. 1-16
Author(s):  
Rui Sun ◽  
Meng Han ◽  
Chunyan Zhang ◽  
Mingyao Shen ◽  
Shiyu Du

High utility itemset mining(HUIM) with negative utility is an emerging data mining task. However, the setting of the minimum utility threshold is always a challenge when mining high utility itemsets(HUIs) with negative items. Although the top-k HUIM method is very common, this method can only mine itemsets with positive items, and the problem of missing itemsets occurs when mining itemsets with negative items. To solve this problem, we first propose an effective algorithm called THN (Top-k High Utility Itemset Mining with Negative Utility). It proposes a strategy for automatically increasing the minimum utility threshold. In order to solve the problem of multiple scans of the database, it uses transaction merging and dataset projection technology. It uses a redefined sub-tree utility value and a redefined local utility value to prune the search space. Experimental results on real datasets show that THN is efficient in terms of runtime and memory usage, and has excellent scalability. Moreover, experiments show that THN performs particularly well on dense datasets.


2019 ◽  
Vol 18 (04) ◽  
pp. 1113-1185 ◽  
Author(s):  
Bahareh Rahmati ◽  
Mohammad Karim Sohrabi

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.


Author(s):  
MEHDI Haj Ali ◽  
Qun-Xiong Zhu ◽  
Yan-Lin He

<p><em>Sequential pattern mining, it  is not just important in data mining field , but  it is the basis of many applications .However, running applications cost time and memory, especially when dealing with dense of the dataset. Setting the proper minimum support threshold is one of the factors that consume more memory and time. However ,  it is difficult for users to get the appropriate patterns, it may present too many sequential patterns  and makes it difficult for users to comprehend the results. The problem becomes worse and worse when dealing with long click stream sequences or huge dataset. As a solution, we developed an efficient algorithm, called TopK (Top-K click stream sequence pattern mining), which employs the output as top-k patterns , K is the most important and relevant frequencies (with a high support) . However ,our algorithm based on pseudo-projection to avoid consuming more time and memory, and uses several efficient search space pruning methods together with BI-Directional Extension. Our extensive study and experiments on real click stream datasets show TopK significantly outperforms the previous algorithms.</em></p>


2021 ◽  
Vol 39 (2) ◽  
pp. 1-27
Author(s):  
Wei Wang ◽  
Longbing Cao

Negative sequential patterns (NSPs) capture more informative and actionable knowledge than classic positive sequential patterns (PSPs) due to the involvement of both occurring and nonoccurring behaviors and events, which can contribute to many relevant applications. However, NSP mining is nontrivial, as it involves fundamental challenges requiring distinct theoretical foundations and is not directly addressable by PSP mining. In the very limited research reported on NSP mining, a negative element constraint (NEC) is incorporated to only consider the NSPs composed of specific forms of elements (containing either positive or negative items), which results in many valuable NSPs being missed. Here, we loosen the NEC (called loose negative element constraint (LNEC)) to include partial negative elements containing both positive and negative items, which enables the discovery of more flexible patterns but incorporates significant new learning challenges, such as representing and mining complete NSPs. Accordingly, we formalize the LNEC-based NSP mining problem and propose a novel vertical NSP mining framework , VM-NSP, to efficiently mine the complete set of NSPs by a vertical representation (VR) of each sequence. An efficient bitmap-based vertical NSP mining algorithm , bM-NSP, introduces a bitmap hash table--based VR and a prefix-based negative sequential candidate generation strategy to optimize the discovery performance. VM-NSP and its implementation bM-NSP form the first VR-based approach for complete NSP mining with LNEC. Theoretical analyses and experiments confirm the performance superiority of bM-NSP on synthetic and real-life datasets w.r.t. diverse data factors, which substantially expands existing NSP mining methods toward flexible NSP discovery.


Information ◽  
2020 ◽  
Vol 11 (1) ◽  
pp. 44
Author(s):  
Scott Buffett

A ubiquitous challenge throughout all areas of data mining, particularly in the mining of frequent patterns in large databases, is centered on the necessity to reduce the time and space required to perform the search. The extent of this reduction proportionally facilitates the ability to identify patterns of interest. High utility sequential pattern mining (HUSPM) seeks to identify frequent patterns that are (1) sequential in nature and (2) hold a significant magnitude of utility in a sequence database, by considering the aspect of item value or importance. While traditional sequential pattern mining relies on the downward closure property to significantly reduce the required search space, with HUSPM, this property does not hold. To address this drawback, an approach is proposed that establishes a tight upper bound on the utility of future candidate sequential patterns by maintaining a list of items that are deemed potential candidates for concatenation. Such candidates are provably the only items that are ever needed for any extension of a given sequential pattern or its descendants in the search tree. This list is then exploited to significantly further tighten the upper bound on the utilities of descendent patterns. An extension of this work is then proposed that significantly reduces the computational cost of updating database utilities each time a candidate item is removed from the list, resulting in a massive reduction in the number of candidate sequential patterns that need to be generated in the search. Sequential pattern mining methods implementing these new techniques for bound reduction and further candidate list reduction are demonstrated via the introduction of the CRUSP and CRUSPPivot algorithms, respectively. Validation of the techniques was conducted on six public datasets. Tests show that use of the CRUSP algorithm results in a significant reduction in the overall number of candidate sequential patterns that need to be considered, and subsequently a significant reduction in run time, when compared to the current state of the art in bounding techniques. When employing the CRUSPPivot algorithm, the further reduction in the size of the search space was found to be dramatic, with the reduction in run time found to be dramatic to moderate, depending on the dataset. Demonstrating the practical significance of the work, experiments showed that time required for one particularly complex dataset was reduced from many hours to less than one minute.


Sign in / Sign up

Export Citation Format

Share Document