scholarly journals Modern Applications and Challenges for Rare Itemset Mining

2021 ◽  
Vol 11 (3) ◽  
pp. 208-218
Author(s):  
Sadeq Darrab ◽  
◽  
David Broneske ◽  
Gunter Saake

Data mining is the process of extracting useful unknown knowledge from large datasets. Frequent itemset mining is the fundamental task of data mining that aims at discovering interesting itemsets that frequently appear together in a dataset. However, mining infrequent (rare) itemsets may be more interesting in many real-life applications such as predicting telecommunication equipment failures, genetics, medical diagnosis, or anomaly detection. In this paper, we survey up-to-date methods of rare itemset mining. The main goal of this survey is to provide a comprehensive overview of the state-of-the-art algorithms of rare itemset mining and its applications. The main contributions of this survey can be summarized as follows. In the first part, we define the task of rare itemset mining by explaining key concepts and terminology, motivation examples, and comparisons with underlying concepts. Then, we highlight the state-of-art methods for rare itemsets mining. Furthermore, we present variations of the task of rare itemset mining to discuss limitations of traditional rare itemset mining algorithms. After that, we highlight the fundamental applications of rare itemset mining. In the last, we point out research opportunities and challenges for rare itemset mining for future research.

2017 ◽  
Vol 8 (1) ◽  
pp. 31-43
Author(s):  
Zuber Shaikh ◽  
Antara Mohadikar ◽  
Rachana Nayak ◽  
Rohith Padamadan

Frequent itemsets refer to a set of data values (e.g., product items) whose number of co-occurrences exceeds a given threshold. The challenge is that the design of proofs and verification objects has to be customized for different data mining algorithms. Intended method will implement a basic idea of completeness verification and authentication approach in which the client will uses a set of frequent item sets as the evidence, and checks whether the server has missed any frequent item set as evidence in its returned result. It will help client detect untrusted server and system will become much more efficiency by reducing time. In authentication process CaRP is both a captcha and a graphical password scheme. CaRP addresses a number of security problems altogether, such as online guessing attacks, relay attacks, and, if combined with dual-view technologies, shoulder-surfing attacks.


2018 ◽  
Vol 7 (2.28) ◽  
pp. 197
Author(s):  
W A.W.A. Bakar ◽  
M A. Jalil ◽  
M Man ◽  
Z Abdullah ◽  
F Mohd

Frequent itemset mining is a major field in data mining techniques. This is because it deals with usual and normal occurrences of set of items in a database transaction. Originated from market basket analysis, frequent itemset generation may lead to the formulation of association rule as to derive correlation or patterns.  Association rule mining still remains as one of the most prominent areas in data mining that aims to extract interesting correlations, frequent patterns, association or casual structures among set of items in the transaction databases. Underlying structure of association rules mining algorithms are based upon horizontal or vertical data formats. These two data formats have been widely discussed by showing few examples of algorithm of each data formats. The works on horizontal approaches suffer in many candidate generation and multiple database scans that contributes to higher memory consumptions. In response to improve on horizontal approach, the works on vertical approaches are established. Eclat algorithm is one example of algorithm in vertical approach database format. Motivated to its ‘fast intersection’, in this paper, we review and analyze the fundamental Eclat and Eclat-variants such as tidset, diffset, and sortdiffset. In response to vertical data format and as a continuity to Eclat extension, we propose a postdiffset algorithm as a new member in Eclat variants that use tidset format in the first looping and diffset in the later looping. We present the performance of postdiffset results in time execution as to indicate some improvements has been achieved in frequent itemset mining. 


2021 ◽  
Author(s):  
Martha ◽  
Ramdas Vankdothu ◽  
Hameed Mohd Abdul ◽  
Rekha Gangula

Abstract The revolution in technology for storing and processing big data leads to data intensive computing as a new paradigm. To find the valuable and precise big data knowledge, efficient and scalable data mining techniques are required. In data mining, different techniques are applied depending on the kind of knowledge to be mined. Association rules are generated from the frequent itemsets computed by frequent itemset mining (FIM) algorithms. The problem of designing scalable and efficient frequent itemset mining algorithms on the Spark RDD framework. The research done in this thesis aims to improve the performance (in terms of execution time) of the existing Spark-based frequent itemset mining algorithms and efficiently re-design other frequent itemset mining algorithms on Spark. The particular problem of interest is re-designing the Eclat algorithm in the distributed computing environment of the Spark. The paper proposes and implements a parallel Eclat algorithm using the Spark RDD architecture, dubbed RDD-Eclat. EclatV1 is the earliest version, followed by EclatV2, EclatV3, EclatV4, and EclatV5. Each version is the consequence of a different technique and heuristic being applied to the preceding variant. Following EclatV1, the filtered transaction technique is used, followed by heuristics for equivalence class partitioning in EclatV4 and EclatV5. EclatV2 and EclatV3 are slightly different algorithmically, as are EclatV4 and EclatV5. Experiments on synthetic and real-world datasets.


2017 ◽  
Vol 16 (06) ◽  
pp. 1549-1579 ◽  
Author(s):  
Jerry Chun-Wei Lin ◽  
Wensheng Gan ◽  
Philippe Fournier-Viger ◽  
Tzung-Pei Hong ◽  
Han-Chieh Chao

Frequent itemset mining (FIM) is a fundamental set of techniques used to discover useful and meaningful relationships between items in transaction databases. In recent decades, extensions of FIM such as weighted frequent itemset mining (WFIM) and frequent itemset mining in uncertain databases (UFIM) have been proposed. WFIM considers that items may have different weight/importance. It can thus discover itemsets that are more useful and meaningful by ignoring irrelevant itemsets with lower weights. UFIM takes into account that data collected in a real-life environment may often be inaccurate, imprecise, or incomplete. Recently, these two ideas have been combined in the HEWI-Uapriori algorithm. This latter considers both item weights and transaction uncertainty to mine the high expected weighted itemsets (HEWIs) using a two-phase Apriori-based approach. Although the upper-bound proposed in HEWI-Uapriori can reduce the size of the search space, it still generates a large amount of candidates and uses a level-wise search. In this paper, a more efficient algorithm named HEWI-Utree is developed to efficiently mine HEWIs without performing multiple database scans and without generating candidates. This algorithm relies on three novel structures named element (E)-table, weighted-probability (WP)-table and WP-tree to maintain the information required for identifying and pruning unpromising itemsets early. Experimental results show that the proposed algorithm is generally much more efficient than traditional methods for WFIM and UFIM, as well as the state-of-the-art HEWI-Uapriori algorithm, in terms of runtime, memory consumption, and scalability.


Author(s):  
Arun Thotapalli Sundararaman

Study of data quality for data mining application has always been a complex topic; in the recent years, this topic has gained further complexity with the advent of big data as the source for data mining and business intelligence (BI) applications. In a big data environment, data is consumed in various states and various forms serving as input for data mining, and this is the main source of added complexity. These new complexities and challenges arise from the underlying dimensions of big data (volume, variety, velocity, and value) together with the ability to consume data at various stages of transition from raw data to standardized datasets. These have created a need for expanding the traditional data quality (DQ) factors into BDQ (big data quality) factors besides the need for new BDQ assessment and measurement frameworks for data mining and BI applications. However, very limited advancement has been made in research and industry in the topic of BDQ and their relevance and criticality for data mining and BI applications. Data quality in data mining refers to the quality of the patterns or results of the models built using mining algorithms. DQ for data mining in business intelligence applications should be aligned with the objectives of the BI application. Objective measures, training/modeling approaches, and subjective measures are three major approaches that exist to measure DQ for data mining. However, there is no agreement yet on definitions or measurements or interpretations of DQ for data mining. Defining the factors of DQ for data mining and their measurement for a BI system has been one of the major challenges for researchers as well as practitioners. This chapter provides an overview of existing research in the area of BDQ definitions and measurement for data mining for BI, analyzes the gaps therein, and provides a direction for future research and practice in this area.


2021 ◽  
pp. 159-166
Author(s):  
M. Sinthuja ◽  
D. Evangeline ◽  
S. Pravinth Raja ◽  
G. Shanmugarathinam

2018 ◽  
Vol 7 (3.8) ◽  
pp. 77
Author(s):  
Prof. Mangesh Ghonge ◽  
Miss Neha Rane

Essentially the most primary and crucial part of data mining is pattern mining. For acquiring important corre-lations among the information, method called itemset mining plays vital role Earlier, the notion of itemset mining was used to acquire the absolute most often occurring items in the itemset. In some situation, though having utility value less than threshold it is necessary to locate such items because they are of great use. Considering the thought of weight for each and every apparent items brings effectiveness for mining the pattern efficiently. Different mining algorithms are utilized to obtain the correlations among the information items based on frequency with the items in the dataset occurs. In frequent itemset, those things which occurs frequently whereas, in infrequent itemset the items that occur very rarely are obtained. Determining such form of data is tougher than to locate data which occurs frequently. Frequent Itemset Mining (FISM) locates large and frequent itemsets in huge data for example market baskets. Such data has two properties that are not addressed by FISM; Mixture property and projection property. Here the proposed system combines both mixture as well as projection property further providing automated support thresholds.  


2018 ◽  
Vol 23 (3) ◽  
pp. 294-311 ◽  
Author(s):  
Christian Pieter Hoffmann ◽  
Sandra Tietz ◽  
Kerstin Hammann

PurposeThe purpose of this paper is to present a comprehensive, interdisciplinary review of international investor relations (IR) research published since 1990. It highlights the development of IR research, its disciplinary foundations and key areas of inquiry. Research is shown to reflect the rising importance of IR as a corporate communications function, its interdisciplinary character, and the recognition of its contribution to strategic management.Design/methodology/approachFindings are based on an interdisciplinary systematic literature review focusing on peer-reviewed journal articles published in English since 1990.FindingsThe authors differentiate five strands of research focusing on the organization, strategy, instruments, content and effects of IR. IR research is shown to have strong roots in the business and management, accounting and communications literature. The authors document a rising interest in the topic and a steady development beyond descriptive accounts of the function to distinctive lines of inquiry. The authors summarize the state of the field and derive a number of suggestions for future research.Research limitations/implicationsThe review is limited in scope to the applied research process, including the choice of keywords, databases as well as peer-reviewed journal publications published in English since 1990.Originality/valueThis study contributes to the necessary structuration and consolidation of the emergent field of IR research by identify salient perspectives and common subfields. It provides both a comprehensive overview of the state of research and specific suggestions for future endeavors.


Sign in / Sign up

Export Citation Format

Share Document