A Proposed Frequent Itemset Discovery Algorithm Based on Item Weights and Uncertainty

Most frequent itemset mining algorithms (FIMA) discover hidden relationships from unrelated items. They find the most frequent itemsets depending only on the frequency of the item's existence in the dataset. These algorithms give all items the same importance, and neglect the differences in importance of the items. They assume the full certainty of data, but in most cases, real word data may be uncertain. As a result, the data could be incomplete and/or imprecise. These two problems are the most common challenges that face FIMA algorithms. Some new algorithms proposed some solutions to face these two issues separately. In other words, some algorithms handle item importance only, and others handle uncertainty only. Few algorithms dealt with the two issues together. In this article, the single scan for weighted itemsets over the uncertain database (SSU-Wfim) is proposed. It depends on the single scan frequent itemsets algorithm (SS_FIM), and enhances it to deal with weighted items in an uncertain database. SSU_WFIM deals with the uncertainty of data by giving each item in a transaction an additional value to indicate occurrence likelihood. It gives the items different values to define the weight of them. It uses a table called Ptable to save the items and their probability values. This table is used to generate all possible candidates itemsets. The results indicate the high performance in aspects of runtime, memory consumption and scalability of SSU-Wfim comparing with the UApriori algorithm. The proposed algorithm saves time and memory with a percentage exceeds 70% for all tested datasets.

Download Full-text

Security and Verification of Server Data Using Frequent Itemset Mining in Ecommerce

International Journal of Synthetic Emotions ◽

10.4018/ijse.2017010103 ◽

2017 ◽

Vol 8 (1) ◽

pp. 31-43

Author(s):

Zuber Shaikh ◽

Antara Mohadikar ◽

Rachana Nayak ◽

Rohith Padamadan

Keyword(s):

Data Mining ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Graphical Password ◽

Itemset Mining ◽

Frequent Item ◽

Data Mining Algorithms ◽

Shoulder Surfing ◽

Mining Algorithms ◽

Frequent Item Sets

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.

Download Full-text

A Synopsis Based Approach for Itemset Frequency Estimation over Massive Multi-Transaction Stream

ACM Transactions on Knowledge Discovery from Data ◽

10.1145/3465238 ◽

2021 ◽

Vol 16 (2) ◽

pp. 1-30

Author(s):

Guangtao Wang ◽

Gao Cong ◽

Ying Zhang ◽

Zhen Hai ◽

Jieping Ye

Keyword(s):

Frequency Estimation ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Experimental Results ◽

Closure Property ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Minimum Value ◽

Downward Closure ◽

Bounded Size

The streams where multiple transactions are associated with the same key are prevalent in practice, e.g., a customer has multiple shopping records arriving at different time. Itemset frequency estimation on such streams is very challenging since sampling based methods, such as the popularly used reservoir sampling, cannot be used. In this article, we propose a novel k -Minimum Value (KMV) synopsis based method to estimate the frequency of itemsets over multi-transaction streams. First, we extract the KMV synopses for each item from the stream. Then, we propose a novel estimator to estimate the frequency of an itemset over the KMV synopses. Comparing to the existing estimator, our method is not only more accurate and efficient to calculate but also follows the downward-closure property. These properties enable the incorporation of our new estimator with existing frequent itemset mining (FIM) algorithm (e.g., FP-Growth) to mine frequent itemsets over multi-transaction streams. To demonstrate this, we implement a KMV synopsis based FIM algorithm by integrating our estimator into existing FIM algorithms, and we prove it is capable of guaranteeing the accuracy of FIM with a bounded size of KMV synopsis. Experimental results on massive streams show our estimator can significantly improve on the accuracy for both estimating itemset frequency and FIM compared to the existing estimators.

Download Full-text

Comparative Analysis on Frequent Itemset Mining Algorithms in Vertically Partitioned Cloud Data

10.1007/978-981-16-4625-6_38 ◽

2021 ◽

pp. 395-402

Author(s):

M. Yogasini ◽

B. N. Prathibha

Keyword(s):

Comparative Analysis ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Cloud Data ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

Frequent Itemset Mining Algorithms—A Literature Survey

10.1007/978-981-16-2422-3_13 ◽

2021 ◽

pp. 159-166

Author(s):

M. Sinthuja ◽

D. Evangeline ◽

S. Pravinth Raja ◽

G. Shanmugarathinam

Keyword(s):

Literature Survey ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

DMA

International Journal of Data Warehousing and Mining ◽

10.4018/ijdwm.2013100104 ◽

2013 ◽

Vol 9 (4) ◽

pp. 62-75 ◽

Cited By ~ 4

Author(s):

Damla Oguz ◽

Baris Yildiz ◽

Belgin Ergenc

Keyword(s):

Performance Evaluation ◽

Evaluation Study ◽

Frequent Itemsets ◽

Current Work ◽

Apriori Algorithm ◽

Itemset Mining ◽

Support Threshold ◽

Benchmark Datasets ◽

Mining Algorithms

Updates on an operational database bring forth the challenge of keeping the frequent itemsets up-to-date without re-running the itemset mining algorithms. Studies on dynamic itemset mining, which is the solution to such an update problem, have to address some challenges as handling i) updates without re-running the base algorithm, ii) changes in the support threshold, iii) new items and iv) additions/deletions in updates. The study in this paper is the extension of the Incremental Matrix Apriori Algorithm which proposes solutions to the first three challenges besides inheriting the advantages of the base algorithm which works without candidate generation. In the authors' current work, the authors have improved a former algorithm as to handle updates that are composed of additions and deletions. The authors have also carried out a detailed performance evaluation study on a real and two benchmark datasets.

Download Full-text

Mining Rare Patterns by Using Automated Threshold Support

International Journal of Engineering & Technology ◽

10.14419/ijet.v7i3.8.15225 ◽

2018 ◽

Vol 7 (3.8) ◽

pp. 77

Author(s):

Prof. Mangesh Ghonge ◽

Miss Neha Rane

Keyword(s):

Pattern Mining ◽

Vital Role ◽

Frequent Itemset ◽

Projection Property ◽

Itemset Mining ◽

Huge Data ◽

Information Method ◽

Automated Support ◽

Mining Algorithms ◽

Crucial Part

Essentially the most primary and crucial part of data mining is pattern mining. For acquiring important corre-lations among the information, method called itemset mining plays vital role Earlier, the notion of itemset mining was used to acquire the absolute most often occurring items in the itemset. In some situation, though having utility value less than threshold it is necessary to locate such items because they are of great use. Considering the thought of weight for each and every apparent items brings effectiveness for mining the pattern efficiently. Different mining algorithms are utilized to obtain the correlations among the information items based on frequency with the items in the dataset occurs. In frequent itemset, those things which occurs frequently whereas, in infrequent itemset the items that occur very rarely are obtained. Determining such form of data is tougher than to locate data which occurs frequently. Frequent Itemset Mining (FISM) locates large and frequent itemsets in huge data for example market baskets. Such data has two properties that are not addressed by FISM; Mixture property and projection property. Here the proposed system combines both mixture as well as projection property further providing automated support thresholds.

Download Full-text

Apriori-based frequent itemset mining algorithms on MapReduce

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication - ICUIMC '12 ◽

10.1145/2184751.2184842 ◽

2012 ◽

Cited By ~ 107

Author(s):

Ming-Yen Lin ◽

Pei-Yu Lee ◽

Sue-Chen Hsueh

Keyword(s):

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Mining Algorithms

Download Full-text

Data Mining Itemset of Big Data Using Pre-Processing Based on Mapreduce FrameWork with ETL Tools

APTIKOM Journal on Computer Science and Information Technologies ◽

10.11591/aptikom.j.csit.103 ◽

2017 ◽

Vol 2 (2) ◽

pp. 57-62

Author(s):

Padmanathan Anantharaman ◽

H.V. Ramakrishan

Keyword(s):

Big Data ◽

Clustering Algorithm ◽

Programming Model ◽

Hybrid Approach ◽

Processing Technique ◽

Frequent Itemsets ◽

Frequent Itemset ◽

Frequent Itemset Mining ◽

Itemset Mining ◽

Dataset Size

As data volumes continue to grow, they quickly consume the capacity of data warehouses and application databases. Is your IT organization forced into costly upgrades to expensive databases and data warehouse hardware appliances and enormous amount of data is getting explored through Internet of Things (IoT) as technologies are advancing and people uses these technologies in day to day activities, this data is termed as Big Data having its characteristics and challenges. Frequent Itemset Mining algorithms are aimed to disclose frequent itemsets from transactional database but as the dataset size increases, it cannot be handled by traditional frequent itemset mining. MapReduce programming model solves the problem of large datasets but it has large communication cost which reduces execution efficiency. This proposed new pre-processed k-means technique applied on BigFIM algorithm. ClustBigFIM uses hybrid approach, clustering using k-means algorithm to generate Clusters from huge datasets and Apriori and Eclat to mine frequent itemsets from generated clusters using MapReduce programming model. Results shown that execution efficiency of ClustBigFIM algorithm is increased by applying k-means clustering algorithm before BigFIM algorithm as one of the pre-processing technique.

Download Full-text

INDEX-MAXMINER: A NEW MAXIMAL FREQUENT ITEMSET MINING ALGORITHM

International Journal of Artificial Intelligence Tools ◽

10.1142/s021821300800390x ◽

2008 ◽

Vol 17 (02) ◽

pp. 303-320 ◽

Cited By ~ 11

Author(s):

WEI SONG ◽

BINGRU YANG ◽

ZHANGYAN XU

Keyword(s):

Search Strategy ◽

Search Space ◽

Frequent Itemset ◽

Breadth First Search ◽

Hybrid Search ◽

Itemset Mining ◽

Frequent Item ◽

Enumeration Tree ◽

Frequent Items ◽

Mining Algorithms

Because of the inherent computational complexity, mining the complete frequent item-set in dense datasets remains to be a challenging task. Mining Maximal Frequent Item-set (MFI) is an alternative to address the problem. Set-Enumeration Tree (SET) is a common data structure used in several MFI mining algorithms. For this kind of algorithm, the process of mining MFI's can also be viewed as the process of searching in set-enumeration tree. To reduce the search space, in this paper, a new algorithm, Index-MaxMiner, for mining MFI is proposed by employing a hybrid search strategy blending breadth-first and depth-first. Firstly, the index array is proposed, and based on bitmap, an algorithm for computing index array is presented. By adding subsume index to frequent items, Index-MaxMiner discovers the candidate MFI's using breadth-first search at one time, which avoids first-level nodes that would not participate in the answer set and reduces drastically the number of candidate itemsets. Then, for candidate MFI's, depth-first search strategy is used to generate all MFI's. Thus, the jumping search in SET is implemented, and the search space is reduced greatly. The experimental results show that the proposed algorithm is efficient especially for dense datasets.

Download Full-text

Modern Applications and Challenges for Rare Itemset Mining

International Journal of Machine Learning and Computing ◽

10.18178/ijmlc.2021.11.3.1037 ◽

2021 ◽

Vol 11 (3) ◽

pp. 208-218

Author(s):

Sadeq Darrab ◽

◽

David Broneske ◽

Gunter Saake

Keyword(s):

Data Mining ◽

Real Life ◽

The State ◽

Frequent Itemset ◽

Future Research ◽

Comprehensive Overview ◽

Itemset Mining ◽

Equipment Failures ◽

Mining Algorithms ◽

Rare Itemsets

Data mining is the process of extracting useful unknown knowledge from large datasets. Frequent itemset mining is the fundamental task of data mining that aims at discovering interesting itemsets that frequently appear together in a dataset. However, mining infrequent (rare) itemsets may be more interesting in many real-life applications such as predicting telecommunication equipment failures, genetics, medical diagnosis, or anomaly detection. In this paper, we survey up-to-date methods of rare itemset mining. The main goal of this survey is to provide a comprehensive overview of the state-of-the-art algorithms of rare itemset mining and its applications. The main contributions of this survey can be summarized as follows. In the first part, we define the task of rare itemset mining by explaining key concepts and terminology, motivation examples, and comparisons with underlying concepts. Then, we highlight the state-of-art methods for rare itemsets mining. Furthermore, we present variations of the task of rare itemset mining to discuss limitations of traditional rare itemset mining algorithms. After that, we highlight the fundamental applications of rare itemset mining. In the last, we point out research opportunities and challenges for rare itemset mining for future research.

Download Full-text