Data Mining
Latest Publications


TOTAL DOCUMENTS

20
(FIVE YEARS 0)

H-INDEX

4
(FIVE YEARS 0)

Published By IGI Global

9781591400516, 9781591400950

Data Mining ◽  
2011 ◽  
pp. 437-452 ◽  
Author(s):  
Jeffrey Hsu

Every day, enormous amounts of information are generated from all sectors, whether it be business, education, the scientific community, the World Wide Web (WWW), or one of many readily available off-line and online data sources. From all of this, which represents a sizable repository of data and information, it is possible to generate worthwhile and usable knowledge. As a result, the field of Data Mining (DM) and knowledge discovery in databases (KDD) has grown in leaps and bounds and has shown great potential for the future (Han & Kamber, 2001). The purpose of this chapter is to survey many of the critical and future trends in the field of DM, with a focus on those which are thought to have the most promise and applicability to future DM applications.


Data Mining ◽  
2011 ◽  
pp. 350-365 ◽  
Author(s):  
Fay Cobb Payton

Recent attention has turned to the healthcare industry and its use of voluntary community health information network (CHIN) models for e-health and care delivery. This chapter suggests that competition, economic dimensions, political issues, and a group of enablers are the primary determinants of implementation success. Most critical to these implementations is the issue of data management and utilization. Thus, health care organizations are finding value as well as strategic applications to mining patient data, in general, and community data, in particular. While significant gains can be obtained and have been noted at the organizational level of analysis, much attention has been given to the individual, where the focal points have centered on privacy and security of patient data. While the privacy debate is a salient issue, data mining (DM) offers broader community-based gains that enable and improve healthcare forecasting, analyses, and visualization.


Data Mining ◽  
2011 ◽  
pp. 1-26 ◽  
Author(s):  
Stefan Arnborg

This chapter reviews the fundamentals of inference, and gives a motivation for Bayesian analysis. The method is illustrated with dependency tests in data sets with categorical data variables, and the Dirichlet prior distributions. Principles and problems for deriving causality conclusions are reviewed, and illustrated with Simpson’s paradox. The selection of decomposable and directed graphical models illustrates the Bayesian approach. Bayesian and EM classification is shortly described. The material is illustrated on two cases, one in personalization of media distribution, one in schizophrenia research. These cases are illustrations of how to approach problem types that exist in many other application areas.


Data Mining ◽  
2011 ◽  
pp. 366-381 ◽  
Author(s):  
Lori K. Long ◽  
Mavin D. Troutt

This chapter focuses on the potential contributions that Data Mining (DM) could make within the Human Resource (HR) function in organizations. We first provide a basic introduction to DM techniques and processes and a survey of the literature on the steps involved in successfully mining this information. We also discuss the importance of data warehousing and datamart considerations. An examination of the contrast between DM and more routine statistical studies is given, and the value of HR information to support a firm’s competitive position and organizational decision-making is considered. Examples of potential applications are outlined in terms of data that is ordinarily captured in HR information systems.


Data Mining ◽  
2011 ◽  
pp. 199-219 ◽  
Author(s):  
Hsin-Chang Yang ◽  
Chung-Hong Lee

Recently, many approaches have been devised for mining various kinds of knowledge from texts. One important application of text mining is to identify themes and the semantic relations among these themes for text categorization. Traditionally, these themes were arranged in a hierarchical manner to achieve effective searching and indexing as well as easy comprehension for human beings. The determination of category themes and their hierarchical structures was mostly done by human experts. In this work, we developed an approach to automatically generate category themes and reveal the hierarchical structure among them. We also used the generated structure to categorize text documents. The document collection was trained by a self-organizing map to form two feature maps. We then analyzed these maps and obtained the category themes and their structure. Although the test corpus contains documents written in Chinese, the proposed approach can be applied to documents written in any language, and such documents can be transformed into a list of separated terms.


Data Mining ◽  
2011 ◽  
pp. 142-173 ◽  
Author(s):  
Jerzy W. Grzymala-Busse ◽  
Wojciech Ziarko

The chapter is focused on the data mining aspect of the applications of rough set theory. Consequently, the theoretical part is minimized to emphasize the practical application side of the rough set approach in the context of data analysis and model-building applications. Initially, the original rough set approach is presented and illustrated with detailed examples showing how data can be analyzed with this approach. The next section illustrates the Variable Precision Rough Set Model (VPRSM) to expose similarities and differences between these two approaches. Then, the data mining system LERS, based on a different generalization of the original rough set theory than VPRSM, is presented. Brief descriptions of algorithms are also cited. Finally, some applications of the LERS data mining system are listed.


Data Mining ◽  
2011 ◽  
pp. 106-141
Author(s):  
Massimo Coppola ◽  
Marco Vanneschi

We consider the application of parallel programming environments to develop portable and efficient high performance data mining (DM) tools. We first assess the need of parallel and distributed DM applications, by pointing out the problems of scalability of some mining techniques and the need to mine large, eventually geographically distributed databases. We discuss the main issues of exploiting parallel and distributed computation for DM algorithms. A high-level programming language enhances the software engineering aspects of parallel DM, and it simplifies the problems of integration with existing sequential and parallel data management systems, thus leading to programming-efficient and high-performance implementations of applications. We describe a programming environment we have implemented that is based on the parallel skeleton model, and we examine the addition of object-like interfaces toward external libraries and system software layers. This kind of abstractions will be included in the forthcoming programming environment ASSIST. In the main part of the chapter, as a proof-of-concept we describe three well-known DM algorithms, Apriori, C4.5, and DBSCAN. For each problem, we explain the sequential algorithm and a structured parallel version, which is discussed and compared to parallel solutions found in the literature. We also discuss the potential gain in performance and expressiveness from the addition of external objects on the basis of the experiments we performed so far. We evaluate the approach with respect to performance results, design, and implementation considerations.


Data Mining ◽  
2011 ◽  
pp. 395-420 ◽  
Author(s):  
Jack S. Cook ◽  
Laura L. Cook

This chapter highlights both the positive and negative aspects of Data Mining (DM). Specifically, the social, ethical, and legal implications of DM are examined through recent case law, current public opinion, and small industry-specific examples. There are many issues concerning this topic. Therefore, the purpose of this chapter is to expose the reader to some of the more interesting ones and provide insight into how information systems (IS) professionals and businesses may protect themselves from the negative ramifications associated with improper use of data. The more experience with and exposure to social, ethical, and legal concerns with respect to DM, the better prepared you will be to prevent trouble down the road.


Data Mining ◽  
2011 ◽  
pp. 323-349 ◽  
Author(s):  
Tomas Eklund ◽  
Barbro Back ◽  
Hannu Vanharanta ◽  
Ari Visa

Performing financial benchmarks in today’s information-rich society can be a daunting task. With the evolution of the Internet, access to massive amounts of financial data, typically in the form of financial statements, is widespread. Managers and stakeholders are in need of a tool that allows them to quickly and accurately analyze these data. An emerging technique that may be suited for this application is the self-organizing map. The purpose of this study was to evaluate the performance of self-organizing maps for the purpose of financial benchmarking of international pulp and paper companies. For the study, financial data in the form of seven financial ratios were collected, using the Internet as the primary source of information. A total of 77 companies and six regional averages were included in the study. The time frame of the study was the period 1995-2000. A number of benchmarks were performed, and the results were analyzed based on information contained in the annual reports. The results of the study indicate that self-organizing maps can be feasible tools for the financial benchmarking of large amounts of financial data.


Data Mining ◽  
2011 ◽  
pp. 278-300
Author(s):  
Vladimir A. Kulyukin ◽  
Robin Burke

Knowledge of the structural organization of information in documents can be of significant assistance to information systems that use documents as their knowledge bases. In particular, such knowledge is of use to information retrieval systems that retrieve documents in response to user queries. This chapter presents an approach to mining free-text documents for structure that is qualitative in nature. It complements the statistical and machine-learning approaches, insomuch as the structural organization of information in documents is discovered through mining free text for content markers left behind by document writers. The ultimate objective is to find scalable data mining (DM) solutions for free-text documents in exchange for modest knowledge-engineering requirements. The problem of mining free text for structure is addressed in the context of finding structural components of files of frequently asked questions (FAQs) associated with many USENET newsgroups. The chapter describes a system that mines FAQs for structural components. The chapter concludes with an outline of possible future trends in the structural mining of free text.


Sign in / Sign up

Export Citation Format

Share Document