Mathematical Methods for Knowledge Discovery and Data Mining
Latest Publications


TOTAL DOCUMENTS

19
(FIVE YEARS 0)

H-INDEX

2
(FIVE YEARS 0)

Published By IGI Global

9781599045283, 9781599045306

Author(s):  
Nikos Pelekis ◽  
Babis Theodoulidis ◽  
Ioannis Kopanakis ◽  
Yannis Theodoridis

QOSP Quality of Service Open Shortest Path First based on QoS routing has been recognized as a missing piece in the evolution of QoS-based services in the Internet. Data mining has emerged as a tool for data analysis, discovery of new information, and autonomous decision-making. This paper focuses on routing algorithms and their appli-cations for computing QoS routes in OSPF protocol. The proposed approach is based on a data mining approach using rough set theory, for which the attribute-value system about links of networks is created from network topology. Rough set theory offers a knowledge discovery approach to extracting routing-decisions from attribute set. The extracted rules can then be used to select significant routing-attributes and make routing-selections in routers. A case study is conducted to demonstrate that rough set theory is effective in finding the most significant attribute set. It is shown that the algorithm based on data mining and rough set offers a promising approach to the attribute-selection prob-lem in internet routing.


Author(s):  
Antonio Congiusta ◽  
Domenico Talia ◽  
Paolo Trunfio

Knowledge discovery is a compute and data intensive process that allows for finding patterns, trends, and models in large datasets. The Grid can be effectively exploited for deploying knowledge discovery applications because of the high-performance it can offer and its distributed infrastructure. For effective use of Grids in knowledge discovery, the development of middleware is critical to support data management, data transfer, data mining and knowledge representation. To such purpose, we designed the Knowledge Grid, a high-level environment providing for Grid-based knowledge discovery tools and services. Such services allow users to create and manage complex knowledge discovery applications, composed as workflows that integrate data sources and data mining tools provided as distributed Grid services. This chapter describes the Knowledge Grid architecture and describes how its components can be used to design and implement distributed knowledge discovery applications. Then, the chapter describes how the Knowledge Grid services can be made accessible using the Open Grid Services Architecture (OGSA) model.


Author(s):  
T. Warren Liao

In this chapter, we present genetic algorithm (GA) based methods developed for clustering univariate time series with equal or unequal length as an exploratory step of data mining. These methods basically implement the k-medoids algorithm. Each chromosome encodes in binary the data objects serving as the k-medoids. To compare their performance, both fixed-parameter and adaptive GAs were used. We first employed the synthetic control chart data set to investigate the performance of three fitness functions, two distance measures, and other GA parameters such as population size, crossover rate, and mutation rate. Two more sets of time series with or without known number of clusters were also experimented: one is the cylinder-bell-funnel data and the other is the novel battle simulation data. The clustering results are presented and discussed.


Author(s):  
Monica Chis

Clustering is an important technique used in discovering some inherent structure present in data. The purpose of cluster analysis is to partition a given data set into a number of groups such that data in a particular cluster are more similar to each other than objects in different clusters. Hierarchical clustering refers to the formation of a recursive clustering of the data points: a partition into many clusters, each of which is itself hierarchically clustered. Hierarchical structures solve many problems in a large area of interests. In this paper a new evolutionary algorithm for detecting the hierarchical structure of an input data set is proposed. Problem could be very useful in economy, market segmentation, management, biology taxonomy and other domains. A new linear representation of the cluster structure within the data set is proposed. An evolutionary algorithm evolves a population of clustering hierarchies. Proposed algorithm uses mutation and crossover as (search) variation operators. The final goal is to present a data clustering representation to find fast a hierarchical clustering structure.


Author(s):  
Antonino Staiano ◽  
Lara De Vinco ◽  
Giuseppe Longo ◽  
Roberto Tagliaferri

Probabilistic Principal Surfaces (PPS) is a non linear latent variable model with very powerful visualization and classification capabilities which seem to be able to overcome most of the shortcomings of other neural tools. PPS builds a probability density function of a given set of patterns lying in a high-dimensional space which can be expressed in terms of a fixed number of latent variables lying in a latent Q-dimensional space. Usually, the Q-space is either two or three dimensional and thus the density function can be used to visualize the data within it. The case in which Q = 3 allows to project the patterns on a spherical manifold which turns out to be optimal when dealing with sparse data. PPS may also be arranged in ensembles to tackle complex classification tasks. As template cases we discuss the application of PPS to two real- world data sets from astronomy and genetics.


Author(s):  
Alex Burns ◽  
Shital Shah ◽  
Andrew Kusiak

This paper presents a hybrid approach that integrates a genetic algorithm (GA) and data mining to produce control signatures. The control signatures define the best parameter intervals leading to a desired outcome. This hybrid method integrates multiple rule sets generated by a data mining algorithm with the fitness function of a GA. The solutions of the GA represent intersections among rules providing tight parameter bounds. The integration of intuitive rules provides an explanation for each generated control setting and it provides insights into the decision making process. The ability to analyze parameter trends and the feasible solutions generated by the GA with respect to the outcomes is another benefit of the proposed hybrid method. The presented approach for deriving control signatures is applicable to various domains, such as energy, medical protocols, manufacturing, airline operations, customer service, and so on. Control signatures were developed and tested for control of a power plant boiler. These signatures discovered insightful relationships among parameters. The results and benefits of the proposed method for the power plant boiler are discussed in the paper.


Author(s):  
Ali Smith ◽  
Kate A. Smith

The most critical component of kernel based learning algorithms is the choice of an appropriate kernel and its optimal parameters. In this paper we propose a rule based meta-learning approach for automatic radial basis function (rbf) kernel and its parameter selection for Support Vector Machine (SVM) classification. First, the best parameter selection is considered on the basis of prior information of the data with the help of Maximum Likelihood (ML) method and Nelder-Mead (N-M) simplex method. Then the new rule based meta-learning approach is constructed and tested on different sizes of 112 datasets with binary class as well as multi class classification problems. We observe that our rule based methodology provides significant improvement of computational time as well as accuracy in some specific cases.


Author(s):  
Jonathan Mugan ◽  
Klaus Truemper

Frequently, one wants to extend the use of a classification method that, in principle, requires records with True/False values, so that records with rational numbers can be processed. In such cases, the rational numbers must first be replaced by True/False values before the method may be applied. In other cases, a classification method in principle can process records with rational numbers directly, but replacement by True/False values improves the performance of the method. The replacement process is usually called discretization or binarization. This chapter describes a recursive discretization process called Cutpoint. The key step of Cutpoint detects points where classification patterns change abruptly. The chapter includes computational results, where Cutpoint is compared with entropy-based methods that, to date, have been found to be the best discretization schemes. The results indicate that Cutpoint is preferred by certain classification schemes, while entropy-based methods are better for other classification methods. Thus, one may view Cutpoint to be an additional discretization tool that one may want to consider.


Author(s):  
Mehmed Kantardzic ◽  
Pedram Sadeghian ◽  
Walaa M. Sheta

Advances in computing techniques, as well as the reduction in the cost of technology have made possible the viability and spread of large virtual environments. However, efficient navigation within these environments remains problematic for novice users. Novice users often report being lost, disorientated, and lacking the spatial knowledge to make appropriate decisions concerning navigation tasks. In this chapter, we propose the Frequent Wayfinding-Sequence (FWS) methodology to mine the sequences representing the routes taken by experienced users of a virtual environment in order to derive informative navigation models. The models are used to build a navigation assistance interface. We conducted several experiments using our methodology in simulated virtual environments. The results indicate that our approach is efficient in extracting and formalizing recommend routes of travel from the navigation data of previous users of large virtual environments.


Author(s):  
Brian C. Lovell ◽  
Christian J. Walder

This chapter discusses the use of support vector machines (SVM) for business applications. It provides a brief historical background on inductive learning and pattern recognition, and then an intuitive motivation for SVM methods. The method is compared to other approaches, and the tools and background theory required to successfully apply SVM to business applications are introduced. The authors hope that the chapter will help practitioners to understand when the SVM should be the method of choice, as well as how to achieve good results in minimal time.


Sign in / Sign up

Export Citation Format

Share Document