An Intelligent Decision in Smart Systems Using A Weighted Frequent Itemset Mining Algorithm

Intelligent decision is the key technology of smart systems. Data mining technology has been playing an increasingly important role in decision making activities. The introduction of weight makes the weighted frequent itemsets not satisfy the downward closure property any longer. As a result, the search space of frequent itemsets cannot be narrowed according to downward closure property which leads to a poor time efficiency. In this paper, the weight judgment downward closure property for weighted frequent itemsets and the existence property of weighted frequent subsets are introduced and proved first. The Fuzzy-based WARM satisfies the downward closure property and prunes the insignificant rules by assigning the weight to the itemset. This reduces the computation time and execution time. This paper presents an Enhanced Fuzzy-based Weighted AssociationRuleMining(E-FWARM) algorithm for efficient mining of the frequent itemsets. The pre-filtering method is applied to the input dataset to remove the item having low variance. Data discretization is performed and E-FWARM is applied for mining the frequent itemsets. The experimental results show that the proposed E-FWARM algorithm yields maximum frequent items, association rules, accuracy and minimum execution time than the existing algorithms.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

A Systematic Survey on High Utility Itemset Mining

International Journal of Information Technology & Decision Making ◽

10.1142/s0219622019300027 ◽

2019 ◽

Vol 18 (04) ◽

pp. 1113-1185 ◽

Cited By ~ 2

Author(s):

Bahareh Rahmati ◽

Mohammad Karim Sohrabi

Keyword(s):

Data Structures ◽

Search Space ◽

Frequent Itemset ◽

Itemset Mining ◽

Efficient Data ◽

Average Utility ◽

High Utility ◽

High Utility Itemsets ◽

Downward Closure ◽

Efficient Data Structures

High utility itemset mining considers unit profits and quantities of items in a transaction database to extract more applicable and more useful association rules. Downward closure property, which causes significant pruning in frequent itemset mining, is not established in the utility of itemsets and so the mining problem will require alternative solutions to reduce its search space and to enhance its efficiency. Using an anti-monotonic upper bound of the utility function and exploiting efficient data structures for storing and compacting the dataset to perform efficient pruning strategies are the main solutions to address high utility itemset mining problem. Different mining methods and techniques have attempted to improve performance of extracting high utility itemsets and their several variants, including high-average utility itemsets, top-k high utility itemsets, and high utility itemsets with negative values, using more efficient data structures, more appropriate anti-monotonic upper bounds, and stronger pruning strategies. This paper aims to represent a comprehensive systematic review for high utility itemset mining techniques and to classify them based on their problem-solving approaches.

Download Full-text

An Efficient Method for Frequent Itemset Mining on Temporal Data

International Journal of Scientific Research in Computer Science Engineering and Information Technology ◽

10.32628/cseit1953162 ◽

2019 ◽

pp. 558-568

Author(s):

Fathima Sherin T K ◽

Anish Kumar B.

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Edge Density ◽

Time Interval ◽

Related Data ◽

Itemset Mining ◽

A Value

Frequent itemset mining (FIM) is a data mining idea with extracting frequent itemset from a database. Finding frequent itemsets in existing methods accept that datasets are static or steady and enlisted guidelines are pertinent all through the total dataset. In any case, this isn't the situation when information is temporal which contains time-related data that changes data mining results. Patterns may occur during all or at specific interims, to limit time interims, frequent itemset mining with time cube is proposed to manage time arranges in the mining technique. This is how patterns are perceived that happen occasionally, in a period interim, or both. Thus, this paper mostly centres around developing up a productive calculation to mine frequent itemsets and their related time interval from a value-based database by expanding from the earlier calculation dependent on support and density as another edge. Density is proposed to deal with the overestimated timespan issue and to ensure the authenticity of the patterns found. As an extension from the current framework, here the density rate and minimum threshold is dynamically generated which is user determined parameter previously. Likewise, an analysis concerning time is made between dataset with partitioning and without apportioning the dataset, which shows computation time is less on account of partitioning technique.

Download Full-text

Quality-Oriented Study on Mapping Island Model Genetic Algorithm onto CUDA GPU

Symmetry ◽

10.3390/sym11030318 ◽

2019 ◽

Vol 11 (3) ◽

pp. 318 ◽

Cited By ~ 1

Author(s):

Xue Sun ◽

Ping Chou ◽

Chao-Chin Wu ◽

Liang-Rui Chen

Keyword(s):

Genetic Algorithm ◽

Execution Time ◽

Large Scale ◽

Parallel Architecture ◽

Computation Time ◽

Search Space ◽

Island Model ◽

Process Unit ◽

Np Hard ◽

Solution Quality

Genetic algorithm (GA), a global search method, has widespread applications in various fields. One very promising variant model of GA is the island model GA (IMGA) that introduces the key idea of migration to explore a wider search space. Migration will exchange chromosomes between islands, resulting in better-quality solutions. However, IMGA takes a long time to solve the large-scale NP-hard problems. In order to shorten the computation time, modern graphic process unit (GPU), as highly-parallel architecture, has been widely adopted in order to accelerate the execution of NP-hard algorithms. However, most previous studies on GPUs are focused on performance only, because the found solution qualities of the CPU and the GPU implementation of the same method are exactly the same. Therefore, it is usually previous work that did not report on quality. In this paper, we investigate how to find a better solution within a reasonable time when parallelizing IMGA on GPU, and we take the UA-FLP as a study example. Firstly, we propose an efficient approach of parallel tournament selection operator on GPU to achieve a better solution quality in a shorter amount of time. Secondly, we focus on how to tune three important parameters of IMGA to obtain a better solution efficiently, including the number of islands, the number of generations, and the number of chromosomes. In particular, different parameters have a different impact on solution quality improvement and execution time increment. We address the challenge of how to trade off between solution quality and execution time for these parameters. Finally, experiments and statistics are conducted to help researchers set parameters more efficiently to obtain better solutions when GPUs are used to accelerate IMGA. It has been observed that the order of influence on solution quality is: The number of chromosomes, the number of generations, and the number of islands, which can guide users to obtain better solutions efficiently with moderate increment of execution time. Furthermore, if we give higher priority on reducing execution time on GPU, the quality of the best solution can be improved by about 3%, with an acceleration that is 29 times faster than the CPU counterpart, after applying our suggested parameter settings. However, if we give solution quality a higher priority, i.e., the GPU execution time is close to the CPU’s, the solution quality can be improved up to 8%.

Download Full-text

HIGH CANDIDATES GENERATION: A NEW EFFICIENT METHOD FOR MINING SHARE-FREQUENT PATTERNS

Jurnal Teknologi ◽

10.11113/jt.v79.10292 ◽

2017 ◽

Vol 79 (7) ◽

Author(s):

Chayanan Nawapornanan ◽

Sarun Intakosum ◽

Veera Boonjing

Keyword(s):

Data Mining ◽

Efficient Method ◽

Execution Time ◽

Experimental Results ◽

Closure Property ◽

Frequent Patterns ◽

Research Issue ◽

Important Research ◽

Useful Knowledge ◽

Downward Closure

The share frequent patterns mining is more practical than the traditional frequent patternset mining because it can reflect useful knowledge such as total costs and profits of patterns. Mining share-frequent patterns becomes one of the most important research issue in the data mining. However, previous algorithms extract a large number of candidate and spend a lot of time to generate and test a large number of useless candidate in the mining process. This paper proposes a new efficient method for discovering share-frequent patterns. The new method reduces a number of candidates by generating candidates from only high transaction-measure-value patterns. The downward closure property of transaction-measure-value patterns assures correctness of the proposed method. Experimental results on dense and sparse datasets show that the proposed method is very efficient in terms of execution time. Also, it decreases the number of generated useless candidates in the mining process by at least 70%.

Download Full-text

Optimized Incremental Mining of Customer Buying Behavior using Temporal Association Rules

International Journal of Innovative Technology and Exploring Engineering - Special Issue ◽

10.35940/ijitee.l2482.1081219 ◽

2019 ◽

Vol 8 (12) ◽

pp. 5853-5861

Keyword(s):

Association Rules ◽

Execution Time ◽

Search Space ◽

Frequent Itemsets ◽

Customer Behavior ◽

Temporal Association ◽

Incremental Mining ◽

Buying Behavior ◽

Memory Space ◽

Variant Database

In retail business, customers’ behavior analytics is a study of customers’ buying behavior for a better understanding of customer needs to be able to provide service accordingly. The buying behavior is majorly influenced by the preferences of a customer. However, preferences of a customer change over a period of time due to various factors like change in income, taste, culture or newer products, etc. Understanding these changes in customer behavior is a very challenging task especially in a dynamic, ever-changing environment. There are various customer behavior mining models and techniques available in the data mining domain that are designed to work on static and dynamic databases. The traditional incremental mining techniques consider all the previous datasets in order to update the patterns. However, in a dynamic database, the size of the database grows with every update. To mine customers’ behavior in a time-variant database, the re-mining of the updated database is required that further increases processing cost in terms of execution time and memory space with every update. The purpose of this paper is to propose a method that can analyze the changes in customers’ behavior in time-variant databases without mining all the transactions. In this paper, an optimized incremental technique is proposed that utilizes temporal association rule mining in a time-variant database for mining customer behavioral patterns in an updated database. The proposed algorithm named ‘Autoregressive Moving Average model-based Incremental Temporal Association Rules Mining (ARMA-ITARM)’ utilizes the ARMA model to substantially reduce the database and maintains temporal frequent patterns in the updated database. Inspired by sliding window and pre-large concepts, the algorithm utilizes past frequent itemsets and probable frequent itemsets from customers’ purchased history along with frequent itemsets and probable frequent itemsets that reduce search space. Consequently, the entire database is scanned only once to count the frequency of occurrence of a few candidate itemsets. In effect, execution time memory need of the algorithm is very small. Experimental results demonstrate that our proposed technique performs better over recent techniques like ITARM, SWF, etc

Download Full-text

Deriving Frequent Itemsets from Lossless Condensed Representation

International Journal of Engineering and Advanced Technology - Regular Issue ◽

10.35940/ijeat.b4438.029320 ◽

2020 ◽

Vol 9 (3) ◽

pp. 209-214

Keyword(s):

Execution Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Result ◽

Memory Usage ◽

Major Research ◽

Condensed Representation ◽

Time Usage ◽

Multiple Scan ◽

Closed Itemsets

In data mining, major research topic is frequent itemset mining (FIM). Frequent Itemsets (FIs) usually generating a large amount of Itemsets from database it causing from high memory and long execution time usage. Frequent Closed Itemsets(FCI) and Frequent Maximal Itemsets(FMI) are a reduced lossless representation of frequent itemsets. The FCI allows to decreasing the memory usage and execution time while comparing to FMIs. The whole data of frequent Itemsets(FIs) may be derived from FCIs and FMIs with correct methods. While various study has presented several efficient approach for FCIs and FMIs mining. In sight of this, that we proposed an algorithm called DCFI-Mine for capably derive FIs from Closed FIs and RFMI algorithm derive FMIs to FIs. The advantages of DCFI-Mine algorithm has two features: First, efficiency, different existing algorithm that tends to develop an enormous quantity of Itemsets all through process, DCFI-Mine process the Itemsets straight without candidate generation. But in proposed RFMI multiple scan occurs due to search of item support so efficiency is less than proposed algorithm DCFI-Mine. Second, in terms of losslessness DCFI-Mine and RFMI can discover complete frequent itemset without lapse. Experimental result shows That DCFI-Mine is best deriving FIs in term of memory usage and executions time

Download Full-text

A Weighted Frequent Itemsets Mining Algorithm for Intelligent Decision in Smart Systems

Lecture Notes in Electrical Engineering - Advances in Smart Grid and Renewable Energy ◽

10.1007/978-981-15-7511-2_71 ◽

2021 ◽

pp. 695-702

Author(s):

P. Gopinath Reddy ◽

P. Avinash ◽

A. Velmurugan

Keyword(s):

Frequent Itemsets ◽

Smart Systems ◽

Mining Algorithm ◽

Intelligent Decision ◽

Frequent Itemsets Mining

Download Full-text

EOBAA: Enhanced Ontology Based Alignment Algorithm for Mining Frequent Patterns

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.12.15908 ◽

2018 ◽

Vol 7 (3.12) ◽

pp. 157

Author(s):

D Srinivasa Rao ◽

V Sucharitha ◽

K V.V Satyanarayana

Keyword(s):

Real Time ◽

High Performance ◽

Computation Time ◽

Search Space ◽

Search Tree ◽

Frequent Itemsets ◽

Alignment Algorithm ◽

Frequent Patterns ◽

Pruning Strategy ◽

Real Time Applications

Mining frequent patterns are most widely used in many applications such as supermarkets, diagnostics, and other real-time applications. Performance of the algorithm is calculated based on the computation of the algorithm. It is very tedious to compute the frequent patterns in mining. Many algorithms and techniques are implemented and studied to generate the high-performance algorithms such as Prepost+ which employees the N-list to represent itemsets and directly discovers frequent itemsets using a set-enumeration search tree. But due to its pruning strategy, it is known that the computation time is more for processing the search space. It enumerates all item sets from datasets by the principle of exhaustion and they don’t sort them based on utility, but only a statistical proof of most recurring itemset. In this paper, the proposed Enhanced Ontologies based Alignment Algorithm (EOBAA) to identify, extract, sort out the HUI's from FI's. To improve the similarity measure the proposed system adopted Cosine similarity. The experiments conducted on 1 real datasets and show the performance of the EOBAA based on the computation time and accuracy of the proposed EOBAA.

Download Full-text

A Parallel Apriori Algorithm and FP- Growth Based on SPARK

ITM Web of Conferences ◽

10.1051/itmconf/20214003046 ◽

2021 ◽

Vol 40 ◽

pp. 03046

Author(s):

Priyanka Gupta ◽

Vinaya Sawant

Keyword(s):

Data Mining ◽

Computation Time ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Distributed Data ◽

Apriori Algorithm ◽

Multiple Datasets ◽

Computation Technique ◽

Real World Applications ◽

Spark Framework

Frequent Itemset Mining is an important data mining task in real-world applications. Distributed parallel Apriori and FP-Growth algorithm is the most important algorithm that works on data mining for finding the frequent itemsets. Originally, Map-Reduce mining algorithm-based frequent itemsets on Hadoop were resolved. For handling the big data, Hadoop comes into the picture but the implementation of Hadoop does not reach the expectations for the parallel algorithm of distributed data mining because of its high I/O results in the transactional disk. According to research, Spark has an in-memory computation technique that gives faster results than Hadoop. It was mainly acceptable for parallel algorithms for handling the data. The algorithm working on multiple datasets for finding the frequent itemset to get accurate results for computation time. In this paper, we propose on parallel apriori and FP-growth algorithm to finding the frequent itemset on multiple datasets to get the mining itemsets using the Apache SPARK framework. Our experiment results depend on the support value to get accurate results.

Download Full-text