Knowledge Discovery and Data Mining

Semantics Enhancing Knowledge Discovery and Ontology Engineering Using Mining Techniques

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch009 ◽

2011 ◽

pp. 163-188

Author(s):

Elena Irina Neaga

Keyword(s):

Knowledge Discovery ◽

Knowledge Engineering ◽

Reference Model ◽

Distributed Processing ◽

Ontology Engineering ◽

Software Architectures ◽

Model Driven ◽

Architectural Models ◽

Definition Of ◽

Comprehensive Framework

This chapter deals with a roadmap on the bidirectional interaction and support between knowledge discovery (Kd) processes and ontology engineering (Onto) mainly directed to provide refined models using common methodologies. This approach provides a holistic literature review required for the further definition of a comprehensive framework and an associated meta-methodology (Kd4onto4dm) based on the existing theories, paradigms, and practices regarding knowledge discovery and ontology engineering as well as closely related areas such as knowledge engineering, machine/ontology learning, standardization issues and architectural models. The suggested framework may adhere to the Iso-reference model for open distributed processing and Omg-model-driven architecture, and associated dedicated software architectures should be defined.

Download Full-text

Mining Clinical Trial Data

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch003 ◽

2011 ◽

pp. 31-47

Author(s):

Jose Ma. J. Alvir ◽

Javier Cabrera

Keyword(s):

Data Mining ◽

Clinical Trial ◽

Clinical Trials ◽

Clinical Trial Data ◽

Treated Group ◽

Trial Data ◽

Controlled Trials ◽

Clinical Trails ◽

Important Objective

Mining clinical trails is becoming an important tool for extracting information that might help design better clinical trials. One important objective is to identify characteristics of a subset of cases that responds substantially differently than the rest. For example, what are the characteristics of placebo respondents? Who have the best or worst response to a particular treatment? Are there subsets among the treated group who perform particularly well? In this chapter we give an overview of the processes of conducting clinical trials and the places where data mining might be of interest. We also introduce an algorithm for constructing data mining trees that are very useful for answering the above questions by detecting interesting features of the data. We illustrate the ARF method with an analysis of data from four placebo-controlled trials of ziprasidone in schizophrenia.

Download Full-text

Effective Intelligent Data Mining Using Dempster-Shafer Theory

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch011 ◽

2011 ◽

pp. 203-223

Author(s):

Malcolm J. Beynon

Keyword(s):

Data Mining ◽

Missing Values ◽

Uncertain Reasoning ◽

Worst Case ◽

Data Mining Techniques ◽

Imperfect Data ◽

Dempster Shafer Theory ◽

External Management ◽

Shafer Theory

The efficacy of data mining lies in its ability to identify relationships amongst data. This chapter investigates that constraining this efficacy is the quality of the data analysed, including whether the data is imprecise or in the worst case incomplete. Through the description of Dempster-Shafer theory (DST), a general methodology based on uncertain reasoning, it argues that traditional data mining techniques are not structured to handle such imperfect data, instead requiring the external management of missing values, and so forth. One DST based technique is classification and ranking belief simplex (CaRBS), which allows intelligent data mining through the acceptance of missing values in the data analysed, considering them a factor of ignorance, and not requiring their external management. Results presented here, using CaRBS and a number of simplex plots, show the effect of managing and not managing of imperfect data.

Download Full-text

Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch002 ◽

2011 ◽

pp. 17-30 ◽

Cited By ~ 21

Author(s):

Jason H. Moore

Keyword(s):

Dimensionality Reduction ◽

Dna Sequences ◽

Multifactor Dimensionality Reduction ◽

Human Genetics ◽

Interindividual Variation ◽

Human Populations ◽

Genome Wide Analysis ◽

Genome Wide ◽

Sequence Variations ◽

Using Data

Human genetics is an evolving discipline that is being driven by rapid advances in technologies that make it possible to measure enormous quantities of genetic information. An important goal of human genetics is to understand the mapping relationship between interindividual variation in DNA sequences (i.e., the genome) and variability in disease susceptibility (i.e., the phenotype). The focus of the present study is the detection and characterization of nonlinear interactions among DNA sequence variations in human populations using data mining and machine learning methods. We first review the concept difficulty and then review a multifactor dimensionality reduction (MDR) approach that was developed specifically for this domain. We then present some ideas about how to scale the MDR approach to datasets with thousands of attributes (i.e., genome-wide analysis). Finally, we end with some ideas about how nonlinear genetic models might be statistically interpreted to facilitate making biological inferences.

Download Full-text

Software Quality Modeling with Limited Apriori Defect Data

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch001 ◽

2011 ◽

pp. 1-15

Author(s):

Naeem Seliya ◽

Taghi M. Khoshgoftaar

Keyword(s):

Software Quality ◽

Training Data ◽

Software Projects ◽

Practical Applications ◽

Multiple Test ◽

Knowledge Based ◽

Quality Modeling ◽

Semisupervised Classification ◽

Expert Input ◽

Semisupervised Clustering

In machine learning the problem of limited data for supervised learning is a challenging problem with practical applications. We address a similar problem in the context of software quality modeling. Knowledge- based software engineering includes the use of quantitative software quality estimation models. Such models are trained using apriori software quality knowledge in the form of software metrics and defect data of previously developed software projects. However, various practical issues limit the availability of defect data for all modules in the training data. We present two solutions to the problem of software quality modeling when a limited number of training modules have known defect data. The proposed solutions are a semisupervised clustering with expert input scheme and a semisupervised classification approach with the expectation-maximization algorithm. Software measurement datasets obtained from multiple NASA software projects are used in our empirical investigation. The software quality knowledge learnt during the semisupervised learning processes provided good generalization performances for multiple test datasets. In addition, both solutions provided better predictions compared to a supervised learner trained on the initial labeled dataset.

Download Full-text

Outlier Detection Strategy Using the Self-Organizing Map

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch012 ◽

2011 ◽

pp. 224-243 ◽

Cited By ~ 3

Author(s):

Fedja Hadzic ◽

Tharam S. Dillon

Keyword(s):

Outlier Detection ◽

Real World ◽

The Self ◽

Continuous Data ◽

Self Organizing Map ◽

Concept Hierarchy ◽

Analysis Strategy ◽

Real World Datasets ◽

Output Space ◽

Self Organizing

Real world datasets are often accompanied with various types of anomalous or exceptional entries which are often referred to as outliers. Detecting outliers and distinguishing noise form true exceptions is important for effective data mining. This chapter presents two methods for outlier detection and analysis using the self-organizing map (SOM), where one is more suitable for categorical and the other for continuous data. They are generally based on filtering out the instances which are not captured by or are contradictory to the obtained concept hierarchy for the domain. We demonstrate how the dimension of the output space plays an important role in the kind of patterns that will be detected as outlying. Furthermore, the concept hierarchy itself provides extra criteria for distinguishing noise from true exceptions. The effectiveness of the proposed outlier detection and analysis strategy is demonstrated through the experiments on publicly available real world datasets.

Download Full-text

Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch005 ◽

2011 ◽

pp. 75-97

Author(s):

Petra Perner

Keyword(s):

Image Processing ◽

Feature Extraction ◽

Digital Image ◽

Image Understanding ◽

Data Representation ◽

Inference Rules ◽

Image Mining ◽

Semantic Inference ◽

Mining Methods ◽

Image Representations

This chapter introduces image mining as a method to discover implicit, previously unknown and potentially useful information from digital image and video repositories. It argues that image mining is a special discipline because of the special type of data and therefore, image-mining methods that consider the special data representation and the different aspects of image mining have to be developed. Furthermore, a bridge has to be established between image mining and image processing, feature extraction and image understanding since the later topics are concerned with the development of methods for the automatic extraction of higher-level image representations. We introduce our methodology, the developed methods and the system for image mining which we successfully applied to several medical image-diagnostic tasks.

Download Full-text

Knowledge Discovery in Biomedical Data Facilitated by Domain Ontologies

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch010 ◽

2011 ◽

pp. 189-201 ◽

Cited By ~ 2

Author(s):

Amandeep S. Sidhu ◽

Paul J. Kennedy ◽

Simeon Simoff

Keyword(s):

Data Mining ◽

Pattern Recognition ◽

Knowledge Discovery ◽

Real World ◽

Recognition Task ◽

Biological Knowledge ◽

Biomedical Data ◽

Data Mining Algorithms ◽

Mining Algorithms ◽

Pattern Recognition Task

In some real-world areas, it is important to enrich the data with external background knowledge so as to provide context and to facilitate pattern recognition. These areas may be described as data rich but knowledge poor. There are two challenges to incorporate this biological knowledge into the data mining cycle: (1) generating the ontologies; and (2) adapting the data mining algorithms to make use of the ontologies. This chapter presents the state-of-the-art in bringing the background ontology knowledge into the pattern recognition task for biomedical data.

Download Full-text

Beyond Classification

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch008 ◽

2011 ◽

pp. 139-161

Author(s):

Anna Olecka

Keyword(s):

Data Mining ◽

Credit Risk ◽

Credit Card ◽

Classification Scheme ◽

Credit Scoring ◽

Complex Nature ◽

Acquisition Process ◽

Business Objective ◽

Simplifying Assumptions ◽

History Of

This chapter will focus on challenges in modeling credit risk for new accounts acquisition process in the credit card industry. First section provides an overview and a brief history of credit scoring. The second section looks at some of the challenges specific to the credit industry. In many of these applications business objective is tied only indirectly to the classification scheme. Opposing objectives, such as response, profit and risk, often play a tug of war with each other. Solving a business problem of such complex nature often requires a multiple of models working jointly. Challenges to data mining lie in exploring solutions that go beyond traditional, well-documented methodology and need for simplifying assumptions; often necessitated by the reality of dataset sizes and/or implementation issues. Examples of such challenges form an illustrative example of a compromise between data mining theory and applications.

Download Full-text

Cross-Modal Correlation Mining Using Graph Algorithms

Knowledge Discovery and Data Mining ◽

10.4018/978-1-59904-252-7.ch004 ◽

2011 ◽

pp. 49-73 ◽

Cited By ~ 1

Author(s):

Jia-Yu Pan ◽

Hyung-Jeong Yang ◽

Christos Faloutsos

Keyword(s):

Graph Algorithms ◽

Large Datasets ◽

Multimedia Data ◽

Machine Learning Techniques ◽

Multimedia Content ◽

Text And Image ◽

Video Clips ◽

Learning Techniques ◽

Correlation Mining ◽

Domain Independent

Multimedia objects like video clips or captioned images contain data of various modalities such as image, audio, and transcript text. Correlations across different modalities provide information about the multimedia content, and are useful in applications ranging from summarization to semantic captioning. We propose a graph-based method, MAGIC, which represents multimedia data as a graph and can find cross-modal correlations using “random walks with restarts.” MAGIC has several desirable properties: (a) it is general and domain-independent; (b) it can detect correlations across any two modalities; (c) it is insensitive to parameter settings; (d) it scales up well for large datasets; (e) it enables novel multimedia applications (e.g., group captioning); and (f) it creates opportunity for applying graph algorithms to multimedia problems. When applied to automatic image captioning, MAGIC finds correlations between text and image and achieves a relative improvement of 58% in captioning accuracy as compared to recent machine learning techniques.

Download Full-text

Knowledge Discovery and Data Mining
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Semantics Enhancing Knowledge Discovery and Ontology Engineering Using Mining Techniques

Mining Clinical Trial Data

Effective Intelligent Data Mining Using Dempster-Shafer Theory

Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction

Software Quality Modeling with Limited Apriori Defect Data

Outlier Detection Strategy Using the Self-Organizing Map

Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems

Knowledge Discovery in Biomedical Data Facilitated by Domain Ontologies

Beyond Classification

Cross-Modal Correlation Mining Using Graph Algorithms

Export Citation Format

Knowledge Discovery and Data MiningLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Semantics Enhancing Knowledge Discovery and Ontology Engineering Using Mining Techniques

Mining Clinical Trial Data

Effective Intelligent Data Mining Using Dempster-Shafer Theory

Genome-Wide Analysis of Epistasis Using Multifactor Dimensionality Reduction

Software Quality Modeling with Limited Apriori Defect Data

Outlier Detection Strategy Using the Self-Organizing Map

Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems

Knowledge Discovery in Biomedical Data Facilitated by Domain Ontologies

Beyond Classification

Cross-Modal Correlation Mining Using Graph Algorithms

Knowledge Discovery and Data Mining
Latest Publications