Combining Data-Driven and User-Driven Evaluation Measures to Identify Interesting Rules

Association rule mining is a data mining task that is applied in several real problems. However, due to the huge number of association rules that can be generated, the knowledge post-processing phase becomes very complex and challenging. There are several evaluation measures that can be used in this phase to assist users in finding interesting rules. These measures, which can be divided into data-driven (or objective measures) and user-driven (or subjective measures), are first discussed and then analyzed for their pros and cons. A new methodology that combines them, aiming to use the advantages of each kind of measure and to make user’s participation easier, is presented. In this way, data-driven measures can be used to select some potentially interesting rules for the user’s evaluation. These rules and the knowledge obtained during the evaluation can be used to calculate user-driven measures, which are used to aid the user in identifying interesting rules. In order to identify interesting rules that use our methodology, an approach is described, as well as an exploratory environment and a case study to show that the proposed methodology is feasible. Interesting results were obtained. In the end of the chapter tendencies related to the subject are discussed.

Download Full-text

Constraint-Based Association Rule Mining

Encyclopedia of Data Warehousing and Mining, Second Edition ◽

10.4018/978-1-60566-010-3.ch049 ◽

2011 ◽

pp. 307-312 ◽

Cited By ~ 10

Author(s):

Carson Kai-Sang Leung

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Computational Cost ◽

Knowledge Discovery In Databases ◽

Rule Mining ◽

The Subject ◽

User Focus ◽

High Computational Cost

The problem of association rule mining was introduced in 1993 (Agrawal et al., 1993). Since then, it has been the subject of numerous studies. Most of these studies focused on either performance issues or functionality issues. The former considered how to compute association rules efficiently, whereas the latter considered what kinds of rules to compute. Examples of the former include the Apriori-based mining framework (Agrawal & Srikant, 1994), its performance enhancements (Park et al., 1997; Leung et al., 2002), and the tree-based mining framework (Han et al., 2000); examples of the latter include extensions of the initial notion of association rules to other rules such as dependence rules (Silverstein et al., 1998) and ratio rules (Korn et al., 1998). In general, most of these studies basically considered the data mining exercise in isolation. They did not explore how data mining can interact with the human user, which is a key component in the broader picture of knowledge discovery in databases. Hence, they provided little or no support for user focus. Consequently, the user usually needs to wait for a long period of time to get numerous association rules, out of which only a small fraction may be interesting to the user. In other words, the user often incurs a high computational cost that is disproportionate to what he wants to get. This calls for constraint-based association rule mining.

Download Full-text

Parametric Variation or Defects?: Statistical Post-Processing Analysis of Wafer-Sort Data

ISTFA 2002: Conference Proceedings from the 28th International Symposium for Testing and Failure Analysis ◽

10.31399/asm.cp.istfa2002p0703 ◽

2002 ◽

Author(s):

W. Robert Daasch

Keyword(s):

Spatial Patterns ◽

Test Data ◽

Parameter Variation ◽

Data Driven ◽

Normal Variation ◽

Post Processing ◽

Parametric Data ◽

Original Measurement ◽

The Subject ◽

Measurement Variance

Abstract The subject of this paper is statistical post-processing of wafer-sort test data. Statistical post-processing (SPP) has successfully separated many of the effects of defects from normal wafer-to-wafer variation. The data-driven method is used with parametric data such as IDDQ, minVDD, and others. The neighboring die are used to form an estimate of a die’s expected value. The resulting SPP residual has smaller variance than the original measurement variance and filters most of the spatial patterns that obscure data outliers from normal variation. The method is applicable to a wide variety of process parameter variation issues of concern to both test and FA communities.

Download Full-text

Proposing a Case Study Combining Transdisciplinary and Data-Centered Methods for Understanding Complex Problems in an Educational Context

Transdisciplinary Insights ◽

10.11116/tdi2019.3.3 ◽

2019 ◽

Vol 3 (1) ◽

pp. 70-83

Author(s):

Jennifer Clara Herrmann ◽

Tim Van Wesemael ◽

Hervé Caralp ◽

Jorge Ricardo Nova Blanco

Keyword(s):

Food Insecurity ◽

Food Policy ◽

Data Driven ◽

Shared Understanding ◽

Educational Context ◽

Transport Networks ◽

Related Data ◽

Combining Data ◽

Academic Year

Abstract In this opinion piece we argue for combining data-centered hackathons with transdisciplinarity to better understand wicked problems such as food insecurity. Hackathons represent unique opportunities for answering previously identified and consequently well-defined questions in the context of high-dimensional data. However, the possibilities for providing participants with extensive and potentially quintessential background knowledge and for enabling them to develop a shared understanding of the explicit and implicit meanings of variables associated with the respective problem are limited. Thus, the inherently difficult step of deriving realistic strategic implications from provided or otherwise available data is further aggravated. In the context of this evident void, a format combining transdisciplinary and data-driven approaches could represent a promising approach. Thereby, quantitative and preferably unbiased and qualitative, concept-centered analyses could be paralleled, which would enable the synergistic and incremental understanding of both the relationships between model variables and the meaning of those variables themselves. Furthermore, transdisciplinary approaches are fundamentally stakeholder-focused. The aforementioned approach not only could thus support the development of strategic recommendations concerning the chosen problem, but also facilitates stakeholder engagement, which is central to ensuring that proposed strategies are realistic, implementable, and accepted. Food insecurity represents a prime example of a complex, multidimensional problem of extreme urgency. Besides the availability of a myriad of data relating to several aspects of food insecurity, including data on transport networks, food policy decisions, and climate change, grasping the phenomenon of food insecurity in its entirety remains challenging. Given its relevance and <target target-type="page-num" id="p-71"/>complexity and the amount of related data available, food insecurity represents an ideal challenge for exploring the feasibility of combining data-driven and transdisciplinary approaches. Therefore, during the 2018‐2019 academic year, a group of students organized a hackathon around food insecurity and drew inspiration from that hackathon to write a challenge document to be taken up during the ‘Transdisciplinary Insights’ Honours Programme of the 2019‐2020 academic year.

Download Full-text

The future of film-making: Data-driven movie-making techniques

Global Journal of Arts Education ◽

10.18844/gjae.v10i2.4735 ◽

2020 ◽

Vol 10 (2) ◽

pp. 167-174

Author(s):

Nadide Gizem Akgülgil Mutlu

Keyword(s):

Data Mining ◽

Big Data ◽

Data Driven ◽

Art World ◽

Human Beings ◽

Digital Era ◽

Production And Distribution ◽

Pros And Cons ◽

Film Making

Since the term ‘big data’ came to the scene, it has left almost no industry unaffected. Even the art world has taken advantage of the benefits of big data. One of the latest art forms, cinema, eventually started using analytics to predict their audience and their tastes through data mining. In addition to online platforms like Netflix, Amazon Prime and many more, which act on a different basis, the industry itself evolved to a new phase that uses AI in pre-production, production, post-production and distribution phases. This paper researches software, such as Cinelytic, ScriptBook and LargoAI, and their working strategies to understand the role of directors and producers in the age of the digital era in film-making. The research aims to find answers to the capabilities of data-driven movie-making techniques and, accordingly, it makes a number of predictions about the role of human beings in the production of an artwork and analyses the role of the software. The research also investigates the pros and cons of using big data in the film-making industry. Keywords: Artificial intelligence, cinema, data mining, film-making.

Download Full-text

Amphitheater of Volterra: Case Study for the Representation of Excavation Data

Studies in Digital Heritage ◽

10.14434/sdh.v1i2.23242 ◽

2017 ◽

Vol 1 (2) ◽

pp. 269-281

Author(s):

Carlo Battini ◽

Elena Sorge

Keyword(s):

Structure From Motion ◽

Mobile Application ◽

Three Dimensional ◽

Laser Scanner ◽

3D Models ◽

Post Processing ◽

The Subject ◽

Three Dimensional Models ◽

Digital Projection

The work presented wants to show how different techniques of expeditious relief can be combined together in order to better describe the subject studied. Techniques of digital projection as laser scanner, topography and Structure from Motion can be used simultaneously and interact with each other to create a rich database of colorimetric and metrics information. Methodologies that, at the same time, present the peculiarities and errors of peculiar relief of the technology employed.The case study examined in this type of research is the discovery of the amphitheater of Volterra. Discovered in July 2015 during the phases of reclamation of a stream, is located close to Porta Diana and a few hundred meters from the Roman Theater discovered in the last century. An excavation campaign undertaken Between October and November 2015 has allowed us to bring to light the crests of the supporting walls of the structure, revealing the presence of the three orders and a depth of about ten meters.The step of post processing has finally seen the use of three-dimensional models acquired both for the creation of images metrics necessary to the study of the stratigraphic units, both for studying a mobile application, 3D models and data of the excavation, easy to use for transmitting the information collected.

Download Full-text

Data Mining and Knowledge Discovery. Preliminaries for a Critical Examination of the Data Driven Society

Global Jurist ◽

10.1515/gj-2019-0016 ◽

2019 ◽

Vol 20 (1) ◽

Author(s):

Claudio Sarra

Keyword(s):

Data Mining ◽

Decision Making ◽

Data Driven ◽

Critical Examination ◽

General Data Protection Regulation ◽

New Knowledge ◽

Decision Making Processes ◽

General Data ◽

Use Of Knowledge

Abstract Data Mining (DM) is the analytical activity aimed at revealing new “knowledge” from data useful for further decision-making processes. These techniques have recently acquired enormous importance as they seem to fit perfectly the requests of the so called “Data Driven World”. In this paper, first I give an overview of DM, and of the most relevant criticisms raised so far. Then using a well-known case study and the European General Data Protection Regulation as benchmark, I show that there are some specific ambiguities in this use of “knowledge” which are relevant for the ethical and legal assessment of DM.

Download Full-text

Synoptic Review of Theory and Practice of Diversity Management

International Journal of Human Resource Studies ◽

10.5296/ijhrs.v11i1.18269 ◽

2021 ◽

Vol 11 (1) ◽

pp. 204

Author(s):

Kwesi Atta Sakyi ◽

Geoffrey K. Mweshi ◽

David Musona ◽

Esnart Mwaba Tayali

Keyword(s):

Multinational Corporations ◽

Theoretical Foundation ◽

Secondary Data ◽

Diversity Management ◽

Theory And Practice ◽

Efficient Management ◽

Pros And Cons ◽

The Subject ◽

Video Case Study

Diversity is a topic which has gained much momentum and currency in modern academic discourse partially because of globalisation and also partially because of the increased use of information technology in global transactions. The complex operations of multinational corporations across the globe require prudent and efficient management of employees from different backgrounds. Management of diversity means many things to many people. In this article, the authors delineate the importance, pros and cons of diversity management for firms, and also they deploy the analysis of some case study videos to bring to the fore the growing importance of the phenomenon of diversity. The authors used secondary data and qualitative analysis in their discourse. The authors reviewed literature from diverse sources to give a theoretical foundation to the article and at the same time they approached the topic in a multi-faceted manner to whet the appetite of both theoreticians and practitioners. The philosophical underpinning of their approach was based on Grounded Theory as it could be seen in the video case study narratives and in their own interpretative narrative of the subject.

Download Full-text

Contributions of KDD to the Knowledge Management Process

CLEI electronic journal ◽

10.19153/cleiej.7.1.2 ◽

2018 ◽

Vol 7 (1) ◽

Author(s):

Hércules Antonio Do Prado ◽

Paulo de Tarso Costa de Sousa ◽

Eduardo Amadeu Moresi ◽

Marcelo Ladeira

Keyword(s):

Data Mining ◽

Knowledge Management ◽

Knowledge Creation ◽

Federal District ◽

Knowledge Discovery In Databases ◽

Post Processing ◽

Data Set ◽

Processing Step ◽

And Storage

Knowledge Discovery in Databases (KDD), as any organizational process, is carried out beneath a Knowledge Management (KM) model adopted (even informally) by a corporation. KDD is grossly described in three steps: pre-processing, data mining, and post-processing. The latter is mainly related to the task of transforming in knowledge the patterns issued in the data mining step. On the other hand, KM comprises the following phases, in which knowledge is the subject of the actions: identification of abilities, acquisition, selection and validation, organization and storage, sharing, application, and creation. Although there are many overlaps between KDD and KM, one of them is broadly recognized: the point in which knowledge arises. This paper concerns a study aimed at clarifying relations between the overlapping areas of KDD and knowledge creation, in KM. The work is conducted by means of a case study using the data from the Electoral Court of the Federal District (ECFD), Brazil. The study was developed over a 1.717.000-citizens data set from which data mining models were built by applying algorithms from Weka. It was observed that, although the importance of Information Technology is well recognized in the KM realm, the techniques of KDD deserve a special place in the knowledge creation phase of KM. Moreover, beyond the overlap of post- processing and knowledge creation, other steps of KDD can contribute significantly to KM. An example is the fact that one important decision taken from the ECFD board was taken on the basis of a knowledge acquired from the pre-processing step of KDD.

Download Full-text

POST-MINING OF ASSOCIATION RULES USING ONTOLOGIES AND RULE SCHEMAS

International Journal of Computer and Communication Technology ◽

10.47893/ijcct.2015.1303 ◽

2015 ◽

pp. 196-200

Author(s):

R. SUBASH CHADRA BOSE ◽

R. SIVAKUMAR

Keyword(s):

Data Mining ◽

Association Rules ◽

Association Rule ◽

Association Rule Mining ◽

Post Processing ◽

Rule Mining ◽

New Approach ◽

Useful Knowledge ◽

General Terms ◽

User Knowledge

Knowledge discovery and databases (KDD) deals with the overall process of discovering useful knowledge from data. Data mining is a particular step in this process by applying specific algorithms for extracting hidden fact in the data. Association rule mining is one of the data mining techniques that generate a large number of rules. Several methods have been proposed in the literature to filter and prune the discovered rules to obtain only interesting rules in order to help the decision-maker in a business process. We propose a new approach to integrate user knowledge using ontologies and rule schemas at the stage of post-mining of association rules. General Terms- Lattice, Post-processing, pruning, itemset .

Download Full-text

Action Rules

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch001 ◽

2011 ◽

pp. 1-5 ◽

Cited By ~ 6

Author(s):

Zbigniew W. Ras ◽

Angelina Tzacheva ◽

Li-Shiang Tsay

Keyword(s):

Data Mining ◽

Data Driven ◽

Objective Measures ◽

Subjective Measures ◽

Domain Independent

There are two aspects of interestingness of rules that have been studied in data mining literature, objective and subjective measures (Liu, 1997; Adomavicius & Tuzhilin, 1997; Silberschatz & Tuzhilin, 1995, 1996). Objective measures are data-driven and domain-independent. Generally, they evaluate the rules based on their quality and similarity between them. Subjective measures, including unexpectedness, novelty and actionability, are user-driven and domain-dependent.

Download Full-text