On-Shelf Utility Mining of Sequence Data

Utility mining has emerged as an important and interesting topic owing to its wide application and considerable popularity. However, conventional utility mining methods have a bias toward items that have longer on-shelf time as they have a greater chance to generate a high utility. To eliminate the bias, the problem of on-shelf utility mining (OSUM) is introduced. In this article, we focus on the task of OSUM of sequence data, where the sequential database is divided into several partitions according to time periods and items are associated with utilities and several on-shelf time periods. To address the problem, we propose two methods, OSUM of sequence data (OSUMS) and OSUMS + , to extract on-shelf high-utility sequential patterns. For further efficiency, we also design several strategies to reduce the search space and avoid redundant calculation with two upper bounds time prefix extension utility ( TPEU ) and time reduced sequence utility ( TRSU ). In addition, two novel data structures are developed for facilitating the calculation of upper bounds and utilities. Substantial experimental results on certain real and synthetic datasets show that the two methods outperform the state-of-the-art algorithm. In conclusion, OSUMS may consume a large amount of memory and is unsuitable for cases with limited memory, while OSUMS + has wider real-life applications owing to its high efficiency.

Download Full-text

e-HUNSR: An Efficient Algorithm for Mining High Utility Negative Sequential Rules

Symmetry ◽

10.3390/sym12081211 ◽

2020 ◽

Vol 12 (8) ◽

pp. 1211

Author(s):

Mengjiao Zhang ◽

Tiantian Xu ◽

Zhao Li ◽

Xiqing Han ◽

Xiangjun Dong

Keyword(s):

Decision Making ◽

Real Life ◽

The Other ◽

Utility Value ◽

Science Data ◽

Related Information ◽

Sequential Rule ◽

Pruning Strategy ◽

High Utility ◽

Synthetic Datasets

As an important technology in computer science, data mining aims to mine hidden, previously unknown, and potentially valuable patterns from databases.High utility negative sequential rule (HUNSR) mining can provide more comprehensive decision-making information than high utility sequential rule (HUSR) mining by taking non-occurring events into account. HUNSR mining is much more difficult than HUSR mining because of two key intrinsic complexities. One is how to define the HUNSR mining problem and the other is how to calculate the antecedent’s local utility value in a HUNSR, a key issue in calculating the utility-confidence of the HUNSR. To address the intrinsic complexities, we propose a comprehensive algorithm called e-HUNSR and the contributions are as follows. (1) We formalize the problem of HUNSR mining by proposing a series of concepts. (2) We propose a novel data structure to store the related information of HUNSR candidate (HUNSRC) and a method to efficiently calculate the local utility value and utility of HUNSRC’s antecedent. (3) We propose an efficient method to generate HUNSRC based on high utility negative sequential pattern (HUNSP) and a pruning strategy to prune meaningless HUNSRC. To the best of our knowledge, e-HUNSR is the first algorithm to efficiently mine HUNSR. The experimental results on two real-life and 12 synthetic datasets show that e-HUNSR is very efficient.

Download Full-text

Mining Dense Periodic Patterns in Time Series Databases

Temporal and Spatio-Temporal Data Mining ◽

10.4018/978-1-59904-387-6.ch003 ◽

2008 ◽

pp. 44-62

Author(s):

Wynne Hsu ◽

Mong Li Lee ◽

Junmei Wang

Keyword(s):

Time Series ◽

Pattern Mining ◽

Real Life ◽

Search Space ◽

Detection Algorithm ◽

Limited Range ◽

Periodic Pattern ◽

Periodic Patterns ◽

Pruning Strategy ◽

Synthetic Datasets

In this chapter, we describe a new periodicity detection algorithm to efficiently discover short period patterns that may exist in only a limited range of the time series. We refer to these patterns as the dense periodic patterns, where the periodicity is focused on part of the time series. We present a dense periodic pattern mining algorithm called DPMiner to find dense periodic patterns, and design a pruning strategy to limit the search space to the feasible periods. Experimental results on both real-life and synthetic datasets indicate that DPMiner is both scalable and efficient.

Download Full-text

Utility Mining Across Multi-Dimensional Sequences

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3446938 ◽

2021 ◽

Vol 15 (5) ◽

pp. 1-24

Author(s):

Wensheng Gan ◽

Jerry Chun-Wei Lin ◽

Jiexiong Zhang ◽

Hongzhi Yin ◽

Philippe Fournier-Viger ◽

...

Keyword(s):

Sequence Data ◽

Real Life ◽

Auxiliary Information ◽

Route Planning ◽

Quantitative Information ◽

Time Dependent ◽

Utility Mining ◽

Dependent Sequence ◽

Wide Range ◽

Targeted Marketing

Knowledge extraction from database is the fundamental task in database and data mining community, which has been applied to a wide range of real-world applications and situations. Different from the support-based mining models, the utility-oriented mining framework integrates the utility theory to provide more informative and useful patterns. Time-dependent sequence data are commonly seen in real life. Sequence data have been widely utilized in many applications, such as analyzing sequential user behavior on the Web, influence maximization, route planning, and targeted marketing. Unfortunately, all the existing algorithms lose sight of the fact that the processed data not only contain rich features (e.g., occur quantity, risk, and profit), but also may be associated with multi-dimensional auxiliary information, e.g., transaction sequence can be associated with purchaser profile information. In this article, we first formulate the problem of utility mining across multi-dimensional sequences, and propose a novel framework named MDUS to extract <underline>M</underline>ulti-<underline>D</underline>imensional <underline>U</underline>tility-oriented <underline>S</underline>equential useful patterns. To the best of our knowledge, this is the first study that incorporates the time-dependent sequence-order, quantitative information, utility factor, and auxiliary dimension. Two algorithms respectively named MDUS EM and MDUS SD are presented to address the formulated problem. The former algorithm is based on database transformation, and the later one performs pattern joins and a searching method to identify desired patterns across multi-dimensional sequences. Extensive experiments are carried on six real-life datasets and one synthetic dataset to show that the proposed algorithms can effectively and efficiently discover the useful knowledge from multi-dimensional sequential databases. Moreover, the MDUS framework can provide better insight, and it is more adaptable to real-life situations than the current existing models.

Download Full-text

An Efficient Algorithm for Extracting High-Utility Hierarchical Sequential Patterns

Wireless Communications and Mobile Computing ◽

10.1155/2020/8816228 ◽

2020 ◽

Vol 2020 ◽

pp. 1-12

Author(s):

Chunkai Zhang ◽

Zilin Du ◽

Yiwen Zu

Keyword(s):

Pattern Mining ◽

Search Space ◽

Sequential Pattern Mining ◽

Sequential Pattern ◽

Sequential Patterns ◽

Second Phase ◽

Two Phase ◽

High Utility ◽

Synthetic Datasets ◽

Hierarchical Relation

High-utility sequential pattern mining (HUSPM) is an emerging topic in data mining, where utility is used to measure the importance or weight of a sequence. However, the underlying informative knowledge of hierarchical relation between different items is ignored in HUSPM, which makes HUSPM unable to extract more interesting patterns. In this paper, we incorporate the hierarchical relation of items into HUSPM and propose a two-phase algorithm MHUH, the first algorithm for high-utility hierarchical sequential pattern mining (HUHSPM). In the first phase named Extension, we use the existing algorithm FHUSpan which we proposed earlier to efficiently mine the general high-utility sequences (g-sequences); in the second phase named Replacement, we mine the special high-utility sequences with the hierarchical relation (s-sequences) as high-utility hierarchical sequential patterns from g-sequences. For further improvements of efficiency, MHUH takes several strategies such as Reduction, FGS, and PBS and a novel upper bounder TSWU, which will be able to greatly reduce the search space. Substantial experiments were conducted on both real and synthetic datasets to assess the performance of the two-phase algorithm MHUH in terms of runtime, number of patterns, and scalability. Conclusion can be drawn from the experiment that MHUH extracts more interesting patterns with underlying informative knowledge efficiently in HUHSPM.

Download Full-text

Mining Top-k Regular High-Utility Itemsets in Transactional Databases

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2019010104 ◽

2019 ◽

Vol 15 (1) ◽

pp. 58-79 ◽

Cited By ~ 1

Author(s):

P. Lalitha Kumari ◽

S. G. Sanjeevi ◽

T.V. Madhusudhana Rao

Keyword(s):

High Efficiency ◽

Threshold Value ◽

Search Space ◽

List Structure ◽

High Profit ◽

Transactional Databases ◽

High Utility ◽

High Utility Itemsets ◽

Pruning Techniques ◽

Novel Algorithm

Mining high-utility itemsets is an important task in the area of data mining. It involves exponential mining space and returns a very large number of high-utility itemsets. In a real-time scenario, it is often sufficient to mine a small number of high-utility itemsets based on user-specified interestingness. Recently, the temporal regularity of an itemset is considered as an important interesting criterion for many applications. Methods for finding the regular high utility itemsets suffers from setting the threshold value. To address this problem, a novel algorithm called as TKRHU (Top k Regular High Utility Itemset) Miner is proposed to mine top-k high utility itemsets that appears regularly where k represents the desired number of regular high itemsets. A novel list structure RUL and efficient pruning techniques are developed to discover the top-k regular itemsets with high profit. Efficient pruning techniques are designed for reducing search space. Experimental results show that proposed algorithm using novel list structure achieves high efficiency in terms of runtime and space.

Download Full-text

Mining High Utility Itemsets with Hill Climbing and Simulated Annealing

ACM Transactions on Management Information Systems ◽

10.1145/3462636 ◽

2022 ◽

Vol 13 (1) ◽

pp. 1-22

Author(s):

M. Saqib Nawaz ◽

Philippe Fournier-Viger ◽

Unil Yun ◽

Youxi Wu ◽

Wei Song

Keyword(s):

Simulated Annealing ◽

Heuristic Algorithms ◽

Real Life ◽

Search Space ◽

Population Diversity ◽

Hill Climbing ◽

Target Values ◽

High Utility ◽

High Utility Itemsets ◽

Search Space Pruning

High utility itemset mining (HUIM) is the task of finding all items set, purchased together, that generate a high profit in a transaction database. In the past, several algorithms have been developed to mine high utility itemsets (HUIs). However, most of them cannot properly handle the exponential search space while finding HUIs when the size of the database and total number of items increases. Recently, evolutionary and heuristic algorithms were designed to mine HUIs, which provided considerable performance improvement. However, they can still have a long runtime and some may miss many HUIs. To address this problem, this article proposes two algorithms for HUIM based on Hill Climbing (HUIM-HC) and Simulated Annealing (HUIM-SA). Both algorithms transform the input database into a bitmap for efficient utility computation and for search space pruning. To improve population diversity, HUIs discovered by evolution are used as target values for the next population instead of keeping the current optimal values in the next population. Through experiments on real-life datasets, it was found that the proposed algorithms are faster than state-of-the-art heuristic and evolutionary HUIM algorithms, that HUIM-SA discovers similar HUIs, and that HUIM-SA evolves linearly with the number of iterations.

Download Full-text

Mining High Utility Sequential Patterns with Negative Item Values

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001417500355 ◽

2017 ◽

Vol 31 (10) ◽

pp. 1750035 ◽

Cited By ~ 8

Author(s):

Tiantian Xu ◽

Xiangjun Dong ◽

Jianliang Xu ◽

Xue Dong

Keyword(s):

Real Life ◽

Search Space ◽

Sequential Patterns ◽

Negative Item ◽

Novel Method ◽

Complete Set ◽

High Utility ◽

High Utility Itemsets ◽

Positive Return ◽

Pruning Methods

High utility sequential patterns (HUSP) refer to those sequential patterns with high utility (such as profit), which play a crucial role in many real-life applications. Relevant studies of HUSP only consider positive values of sequence utility. In some applications, however, a sequence consists of items with negative values (NIV). For example, a supermarket sells a cartridge with negative profit in a package with a printer at higher positive return. Although a few methods have been proposed to mine high utility itemsets (HUI) with NIV, they are not suitable for mining HUSP with NIV because an item may occur more than once in a sequence and its utility may have multiple values. In this paper, we propose a novel method High Utility Sequential Patterns with Negative Item Values (HUSP-NIV) to efficiently mine HUSP with NIV from sequential utility-based databases. HUSP-NIV works as follows: (1) using the lexicographic quantitative sequence tree (LQS-tree) to extract the complete set of high utility sequences and using I-Concatenation and S-Concatenation mechanisms to generate newly concatenated sequences; (2) using three pruning methods to reduce the search space in the LQS-tree; (3) traversing LQS-tree and outputting all the high utility sequential patterns. To the best of our knowledge, HUSP-NIV is the first method to mine HUSP with NIV, which is shown efficient on both synthetic and real datasets.

Download Full-text

Mining High Utility Sequential Patterns Using Multiple Minimum Utility

International Journal of Pattern Recognition and Artificial Intelligence ◽

10.1142/s0218001418590176 ◽

2018 ◽

Vol 32 (10) ◽

pp. 1859017 ◽

Cited By ~ 1

Author(s):

Tiantian Xu ◽

Jianliang Xu ◽

Xiangjun Dong

Keyword(s):

Real Life ◽

Search Space ◽

Experimental Results ◽

Sequential Patterns ◽

Other Information ◽

Novel Method ◽

Complete Set ◽

High Utility ◽

High Utility Itemsets ◽

Pruning Methods

High utility sequential patterns (HUSP) mining has recently received a lot of attention from researchers. Many algorithms have been proposed to mine HUSP and most of them only use a single minimum utility, which implicitly assumes that all items in the database are of the same importance (such as profit), or other information based on users’ concern in the database. This is often not the case in real-life applications. Although a few methods have been proposed to mine high utility itemsets (HUI) with multiple minimum utility (MMU), they are not suitable for mining HUSP with MMU because an item may occur more than one time in a sequence and may have multiple utility values. In this paper, we propose a novel method, called HUSpan-MMU, to efficiently mine HUSP with MMU from sequential utility-based databases. A lexicographic quantitative sequence tree (LQS-tree) is used to extract the complete set of HUSP. Meanwhile, two pruning methods are used to reduce the search space in the LQS-tree. Experimental results on both synthetic and real datasets show that HUSpan-MMU can efficiently mine HUSP with MMU from utility-based databases.

Download Full-text

Legal document recommendation system: A cluster based pairwise similarity computation

Journal of Intelligent & Fuzzy Systems ◽

10.3233/jifs-189871 ◽

2021 ◽

pp. 1-13

Author(s):

Jenish Dhanani ◽

Rupa Mehta ◽

Dipti Rana

Keyword(s):

Recommender Systems ◽

Recommendation System ◽

Real Life ◽

Citation Network ◽

Search Space ◽

Pairwise Similarity ◽

Large Numbers ◽

Legal Document ◽

Legal Domain ◽

Similarity Scores

Legal practitioners analyze relevant previous judgments to prepare favorable and advantageous arguments for an ongoing case. In Legal domain, recommender systems (RS) effectively identify and recommend referentially and/or semantically relevant judgments. Due to the availability of enormous amounts of judgments, RS needs to compute pairwise similarity scores for all unique judgment pairs in advance, aiming to minimize the recommendation response time. This practice introduces the scalability issue as the number of pairs to be computed increases quadratically with the number of judgments i.e., O (n2). However, there is a limited number of pairs consisting of strong relevance among the judgments. Therefore, it is insignificant to compute similarities for pairs consisting of trivial relevance between judgments. To address the scalability issue, this research proposes a graph clustering based novel Legal Document Recommendation System (LDRS) that forms clusters of referentially similar judgments and within those clusters find semantically relevant judgments. Hence, pairwise similarity scores are computed for each cluster to restrict search space within-cluster only instead of the entire corpus. Thus, the proposed LDRS severely reduces the number of similarity computations that enable large numbers of judgments to be handled. It exploits a highly scalable Louvain approach to cluster judgment citation network, and Doc2Vec to capture the semantic relevance among judgments within a cluster. The efficacy and efficiency of the proposed LDRS are evaluated and analyzed using the large real-life judgments of the Supreme Court of India. The experimental results demonstrate the encouraging performance of proposed LDRS in terms of Accuracy, F1-Scores, MCC Scores, and computational complexity, which validates the applicability for scalable recommender systems.

Download Full-text

A Family of Efficient Sloshing Liquid Dampers for Suppression of Wind-Induced Instabilities

Journal of Vibration and Control ◽

10.1177/107754603030773 ◽

2003 ◽

Vol 9 (3-4) ◽

pp. 361-386 ◽

Cited By ~ 9

Author(s):

V. J. Modi ◽

A. Akinturk ◽

W. Tse

Keyword(s):

Energy Dissipation ◽

High Efficiency ◽

Damping Ratio ◽

Real Life ◽

Wind Tunnel Test ◽

Tall Buildings ◽

Two Phases ◽

Liquid Dampers ◽

Obstacle Geometry ◽

Floating Particles

Bluff structures in the form of tall buildings, smokestacks, control towers, bridges, etc., are susceptible to vortex resonance and galloping type of instabilities. One approach to vibration control of such systems is through energy dissipation using sloshing liquid dampers. In this paper we focus on enhancing the energy dissipation efficiency of a rectangular liquid damper through the introduction of two-dimensional obstacles as well as floating particles. The investigation has two phases. To begin with, a parametric free vibration study aimed at the optimization of the obstacle geometry is undertaken to arrive at configurations promising increased damping ratio and hence higher energy dissipation. The study is complemented by an extensive wind tunnel test program, which substantiates the effectiveness of this class of damper in suppressing both vortex resonance and galloping type of instabilities. Simplicity of design, ease of implementation, minimal maintenance, reliability as well as high efficiency make such liquid dampers quite attractive for real-life applications.

Download Full-text