A survey of data mining and knowledge discovery process models and methodologies

AbstractUp to now, many data mining and knowledge discovery methodologies and process models have been developed, with varying degrees of success. In this paper, we describe the most used (in industrial and academic projects) and cited (in scientific literature) data mining and knowledge discovery methodologies and process models, providing an overview of its evolution along data mining and knowledge discovery history and setting down the state of the art in this topic. For every approach, we have provided a brief description of the proposed knowledge discovery in databases (KDD) process, discussing about special features, outstanding advantages and disadvantages of every approach. Apart from that, a global comparative of all presented data mining approaches is provided, focusing on the different steps and tasks in which every approach interprets the whole KDD process. As a result of the comparison, we propose a new data mining and knowledge discovery process namedrefined data mining processfor developing any kind of data mining and knowledge discovery project. The refined data mining process is built on specific steps taken from analyzed approaches.

Download Full-text

Knowledge Discovery Process Models

Advances in Business Information Systems and Analytics - Business Intelligence and Agile Methodologies for Knowledge-Based Organizations ◽

10.4018/978-1-61350-050-7.ch004 ◽

2012 ◽

pp. 72-100 ◽

Cited By ~ 5

Author(s):

Mouhib Alnoukari ◽

Asim El Sheikh

Keyword(s):

Life Cycle ◽

Knowledge Discovery ◽

Process Model ◽

Common Factor ◽

Final Outcome ◽

Process Models ◽

Data Driven ◽

Discovery Process ◽

The Common ◽

Discovery Process Models

Knowledge Discovery (KD) process model was first discussed in 1989. Different models were suggested starting with Fayyad’s et al (1996) process model. The common factor of all data-driven discovery process is that knowledge is the final outcome of this process. In this chapter, the authors will analyze most of the KD process models suggested in the literature. The chapter will have a detailed discussion on the KD process models that have innovative life cycle steps. It will propose a categorization of the existing KD models. The chapter deeply analyzes the strengths and weaknesses of the leading KD process models, with the supported commercial systems and reported applications, and their matrix characteristics.

Download Full-text

Mining Association Rules

Handbook of Research on Emerging Rule-Based Languages and Technologies ◽

10.4018/978-1-60566-402-6.ch027 ◽

2010 ◽

pp. 647-673 ◽

Cited By ~ 2

Author(s):

Mihai Gabroveanu

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Association Rules ◽

Association Rule ◽

Knowledge Discovery In Databases ◽

Discovery Process ◽

Fuzzy Association Rules ◽

Large Databases ◽

Hidden Knowledge ◽

Basic Concepts

During the last years the amount of data stored in databases has grown very fast. Data mining, also known as knowledge discovery in databases, represents the discovery process of potentially useful hidden knowledge or relations among data from large databases. An important task in the data mining process is the discovery of the association rules. An association rule describes an interesting relationship between different attributes. There are different kinds of association rules: Boolean (crisp) association rules, quantitative association rules, fuzzy association rules, etc. In this chapter, we present the basic concepts of Boolean and the fuzzy association rules, and describe the methods used to discover the association rules by presenting the most important algorithms.

Download Full-text

Particularities of data mining in medicine: lessons learned from patient medical time series data analysis

EURASIP Journal on Wireless Communications and Networking ◽

10.1186/s13638-019-1582-2 ◽

2019 ◽

Vol 2019 (1) ◽

Cited By ~ 2

Author(s):

Shadi Aljawarneh ◽

Aurea Anguera ◽

John William Atwood ◽

Juan A. Lara ◽

David Lizcano

Keyword(s):

Data Mining ◽

Time Series ◽

Knowledge Discovery ◽

Time Series Data ◽

Medical Patient ◽

Lessons Learned ◽

Physiological Signals ◽

Knowledge Discovery In Databases ◽

Series Data ◽

Data Mining Techniques

AbstractNowadays, large amounts of data are generated in the medical domain. Various physiological signals generated from different organs can be recorded to extract interesting information about patients’ health. The analysis of physiological signals is a hard task that requires the use of specific approaches such as the Knowledge Discovery in Databases process. The application of such process in the domain of medicine has a series of implications and difficulties, especially regarding the application of data mining techniques to data, mainly time series, gathered from medical examinations of patients. The goal of this paper is to describe the lessons learned and the experience gathered by the authors applying data mining techniques to real medical patient data including time series. In this research, we carried out an exhaustive case study working on data from two medical fields: stabilometry (15 professional basketball players, 18 elite ice skaters) and electroencephalography (100 healthy patients, 100 epileptic patients). We applied a previously proposed knowledge discovery framework for classification purpose obtaining good results in terms of classification accuracy (greater than 99% in both fields). The good results obtained in our research are the groundwork for the lessons learned and recommendations made in this position paper that intends to be a guide for experts who have to face similar medical data mining projects.

Download Full-text

Scientific Discovery, Process Models, and the Social Sciences

Scientific Discovery in the Social Sciences ◽

10.1007/978-3-030-23769-1_11 ◽

2019 ◽

pp. 173-190

Author(s):

Pat Langley ◽

Adam Arvay

Keyword(s):

Social Sciences ◽

Scientific Discovery ◽

Process Models ◽

Discovery Process ◽

The Social ◽

Discovery Process Models

Download Full-text

Proposta de metodologia para a criação de etiqueta de classificação – estudo de caso: desempenho escolar

Gestão & Produção ◽

10.1590/0104-530x810-13 ◽

2016 ◽

Vol 23 (1) ◽

pp. 177-191

Author(s):

Anderson Roges Teixeira Góes ◽

Maria Teresinha Arns Steiner

Keyword(s):

Data Mining ◽

Support Vector Machines ◽

Knowledge Discovery ◽

Knowledge Discovery In Databases ◽

Support Vector ◽

Vector Machines

Resumo A qualidade na educação tem sido objeto de muita discussão, seja nas escolas e entre seus gestores, seja na mídia ou na literatura. No entanto, uma análise mais profunda na literatura parece não indicar técnicas que explorem bancos de dados com a finalidade de obter classificações para o desempenho escolar, nem tampouco há um consenso sobre o que seja “qualidade educacional”. Diante deste contexto, neste artigo, é proposta uma metodologia que se enquadra no processo KDD (Knowledge Discovery in Databases, ou seja, Descoberta de Conhecimento em Bases de Dados) para a classificação do desempenho de instituições de ensino, de forma comparativa, com base nas notas obtidas na Prova Brasil, um dos itens integrantes do Índice de Desenvolvimento da Educação Básica (IDEB) no Brasil. Para ilustrar a metodologia, esta foi aplicada às escolas públicas municipais de Araucária, PR, região metropolitana de Curitiba, PR, num total de 17, que, por ocasião da pesquisa, ofertavam Ensino Fundamental, considerando as notas obtidas pela totalidade dos alunos dos anos iniciais (1º. ao 5º. ano do ensino fundamental) e dos anos finais (6º. ao 9º. ano do ensino fundamental). Na etapa de Data Mining, principal etapa do processo KDD, foram utilizadas três técnicas de forma comparativa para o Reconhecimento de Padrões: Redes Neurais Artificiais; Support Vector Machines; e Algoritmos Genéticos. Essas técnicas apresentaram resultados satisfatórios na classificação das escolas, representados por meio de uma “Etiqueta de Classificação do Desempenho”. Por meio desta etiqueta, os gestores educacionais poderão ter melhor base para definir as medidas a serem adotadas junto a cada escola, podendo definir mais claramente as metas a serem cumpridas.

Download Full-text

Overview of Knowledge Discovery and Data Mining Process Models

Knowledge Discovery Process and Methods to Enhance Organizational Performance ◽

10.1201/b18231-6 ◽

2015 ◽

pp. 30-43

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Process Models

Download Full-text

Data Mining of Association Rules and the Process of Knowledge Discovery in Databases

Advances in Data Mining - Lecture Notes in Computer Science ◽

10.1007/3-540-46131-0_2 ◽

2002 ◽

pp. 15-36 ◽

Cited By ~ 5

Author(s):

Jochen Hipp ◽

Ulrich Güntzer ◽

Gholamreza Nakhaeizadeh

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Association Rules ◽

Knowledge Discovery In Databases

Download Full-text

Knowledge Discovery in Databases und Data Mining

Analytische Informationssysteme ◽

10.1007/978-3-662-05710-0_16 ◽

1999 ◽

pp. 345-353 ◽

Cited By ~ 3

Author(s):

Roland Düsing

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Knowledge Discovery In Databases

Download Full-text

Multi-Objective Genetic and Fuzzy Approaches in Rule Mining Problem of Knowledge Discovery in Databases

Global Trends in Intelligent Computing Research and Development - Advances in Computational Intelligence and Robotics ◽

10.4018/978-1-4666-4936-1.ch007 ◽

2014 ◽

pp. 147-179

Author(s):

Harihar Kalia ◽

Satchidananda Dehuri ◽

Ashish Ghosh

Keyword(s):

Knowledge Discovery ◽

Optimization Problems ◽

Fuzzy Rule ◽

Numerical Data ◽

Knowledge Discovery In Databases ◽

Rule Mining ◽

Boundary Problems ◽

Multi Objective ◽

Advantages And Disadvantages

Knowledge Discovery in Databases (KDD) is the process of automatically searching patterns from large volumes of data by using specific data mining techniques. Classification, association, and associative classification (integration of classification and association) rule mining are popularly used rule mining techniques in KDD for harvesting knowledge in the form of rule. The classical rule mining techniques based on crisp sets have bad experience of “sharp boundary problems” while mining rule from numerical data. Fuzzy rule mining approaches eliminate these problems and generate more human understandable rules. Several quality measures are used in order to quantify the quality of these discovered rules. However, most of these objectives/criteria are conflicting to each other. Thus, fuzzy rule mining problems are modeled as multi-objective optimization problems rather than single objective. Due to the ability of finding diverse trade-off solutions for several objectives in a single run, multi-objective genetic algorithms are popularly employed in rule mining. In this chapter, the authors discuss the multi-objective genetic-fuzzy approaches used in rule mining along with their advantages and disadvantages. In addition, some of the popular applications of these approaches are discussed.

Download Full-text

Data Mining and Knowledge Discovery in Databases

Advances in Computer and Electrical Engineering - Advanced Methodologies and Technologies in Network Architecture, Mobile Computing, and Data Analytics ◽

10.4018/978-1-5225-7598-6.ch037 ◽

2019 ◽

pp. 502-514 ◽

Cited By ~ 1

Author(s):

Ana Azevedo

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Knowledge Discovery In Databases ◽

Great Relevance ◽

The Subject ◽

Inductive Databases ◽

High Level

The term knowledge discovery in databases or KDD, for short, was coined in 1989 to refer to the broad process of finding knowledge in data, and to emphasize the “high-level” application of particular data mining (DM) methods. The DM phase concerns, mainly, the means by which the patterns are extracted and enumerated from data. Nowadays, the two terms are, usually, indistinctly used. Efforts are being developed in order to create standards and rules in the field of DM with great relevance being given to the subject of inductive databases. Within the context of inductive databases, a great relevance is given to the so-called DM languages. This chapter explores DM in KDD.

Download Full-text