Successes and New Directions in Data Mining

2008 ◽

pp. 277-301

Author(s):

Igor Nai Fovino

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Privacy Preserving ◽

Research Field ◽

Sensitive Information ◽

Discovery Process ◽

Privacy Preserving Data Mining ◽

Mining Technology ◽

Evaluation Methodologies ◽

New Research

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.

Download Full-text

Semantic Integration and Knowledge Discovery for Environmental Research

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch010 ◽

2008 ◽

pp. 213-235

Author(s):

Zhiyuan Chen ◽

Aryya Gangopadhyay ◽

George Karabatis ◽

Michael McGuire ◽

Claire Welty

Keyword(s):

Data Mining ◽

Knowledge Discovery ◽

Semantic Network ◽

Semantic Integration ◽

Environmental Data ◽

Data Sources ◽

Environmental Research ◽

Related Data ◽

Use Of Data ◽

Data Semantics

Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semantics-based techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Our system contains specifications of various environmental data sources and the relationships that are formed among them. User requests are augmented with semantically related data sources and automatically presented as a visual semantic network. In addition, we present a methodology for data navigation and pattern discovery using multi-resolution browsing and data mining. The data semantics are captured and utilized in terms of their patterns and trends at multiple levels of resolution. We present the efficacy of our methodology through experimental results.

Download Full-text

Pattern Mining and Clustering on Image Databases

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch009 ◽

2008 ◽

pp. 187-212

Author(s):

Marinette Bouet ◽

Pierre Gançarski ◽

Marie-Aude Aufaure ◽

Omar Boussaïd

Keyword(s):

Pattern Mining ◽

Frequent Pattern Mining ◽

Image Data ◽

Research Direction ◽

Relevant Information ◽

Frequent Pattern ◽

Image Clustering ◽

Image Mining ◽

Clustering Methods ◽

New Research

Analysing and mining image data to derive potentially useful information is a very challenging task. Image mining concerns the extraction of implicit knowledge, image data relationships, associations between image data and other data or patterns not explicitly stored in the images. Another crucial task is to organize the large image volumes to extract relevant information. In fact, decision support systems are evolving to store and analyse these complex data. This paper presents a survey of the relevant research related to image data processing. We present data warehouse advances that organize large volumes of data linked with images and then, we focus on two techniques largely used in image mining. We present clustering methods applied to image analysis and we introduce the new research direction concerning pattern mining from large collections of images. While considerable advances have been made in image clustering, there is little research dealing with image frequent pattern mining. We shall try to understand why.

Download Full-text

SeqPAM

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch002 ◽

2008 ◽

pp. 17-38 ◽

Cited By ~ 1

Author(s):

Pradeep Kumar Kumar ◽

Raju S. Bapi ◽

P. Radha Krishna

Keyword(s):

Similarity Measure ◽

Clustering Algorithm ◽

Clustering Algorithms ◽

Similarity Measures ◽

Sequential Data ◽

Cluster Validation ◽

Web Personalization ◽

Similarity Preserving ◽

Validation Technique ◽

The Web

With the growth in the number of web users and necessity for making information available on the web, the problem of web personalization has become very critical and popular. Developers are trying to customize a web site to the needs of specific users with the help of knowledge acquired from user navigational behavior. Since user page visits are intrinsically sequential in nature, efficient clustering algorithms for sequential data are needed. In this paper, we introduce a similarity preserving function called sequence and set similarity measure S3M that captures both the order of occurrence of page visits as well as the content of pages. We conducted pilot experiments comparing the results of PAM, a standard clustering algorithm, with two similarity measures: Cosine and S3M. The goodness of the clusters resulting from both the measures was computed using a cluster validation technique based on average levensthein distance. Results on pilot dataset established the effectiveness of S3M for sequential data. Based on these results, we proposed a new clustering algorithm, SeqPAM for clustering sequential data. We tested the new algorithm on two datasets namely, cti and msnbc datasets. We provided recommendations for web personalization based on the clusters obtained from SeqPAM for msnbc dataset.

Download Full-text

Deterministic Motif Mining in Protein Databases

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch006 ◽

2008 ◽

pp. 116-140 ◽

Cited By ~ 1

Author(s):

Pedro Gabriel Ferreira ◽

Paulo Jorge Azevedo

Keyword(s):

Protein Sequence ◽

Sequence Motif ◽

Structural Level ◽

Sequence Motifs ◽

Motif Analysis ◽

Level Information ◽

Information Patterns ◽

Definition Of ◽

Protein Sequence Motifs ◽

Related Proteins

Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino-acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A briefly description on how sequence motifs can be used to extract structural level information patterns is also provided.

Download Full-text

On the Usage of Structural Information in Constrained Semi-Supervised Clustering of XML Documents

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch004 ◽

2008 ◽

pp. 67-86

Author(s):

Eduardo Bezerra ◽

Geraldo Xexéo ◽

Marta Mattoso

Keyword(s):

Prior Knowledge ◽

Structural Information ◽

Logical Structure ◽

Structured Data ◽

Constrained Clustering ◽

Xml Documents ◽

Clustering Model ◽

Xml Document ◽

Document Collection

In this chapter, we consider the problem of constrained clustering of documents. We focus on documents that present some form of structural information, in which prior knowledge is provided. Such structured data can guide the algorithm to a better clustering model. We consider the existence of a particular form of information to be clustered: textual documents that present a logical structure represented in XML format. Based on this consideration, we present algorithms that take advantage of XML metadata (structural information), thus improving the quality of the generated clustering models. This chapter also addresses the problem of inconsistent constraints and defines algorithms that eliminate inconsistencies, also based on the existence of structural information associated to the XML document collection.

Download Full-text

Handling Local Patterns in Collaborative Structuring

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch008 ◽

2008 ◽

pp. 167-186

Author(s):

Ingo Mierswa ◽

Katharina Morik ◽

Michael Wurst

Keyword(s):

Machine Learning ◽

Learning Task ◽

Media Management ◽

Cluster Ensembles ◽

Learning Tasks ◽

New Learning ◽

Local Patterns ◽

Uniform Manner ◽

Personal Media ◽

Private Collections

Media collections in the internet have become a commercial success and the structuring of large media collections has thus become an issue. Personal media collections are locally structured in very different ways by different users. The level of detail, the chosen categories, and the extensions can differ completely from user to user. Can machine learning be of help also for structuring personal collections? Since users do not want to have their hand-made structures overwritten, one could deny the benefit of automatic structuring. We argue that what seems to exclude machine learning, actually poses a new learning task. We propose a notation which allows us to describe machine learning tasks in a uniform manner. Keeping the demands of structuring private collections in mind, we define the new learning task of localized alternative cluster ensembles. An algorithm solving the new task is presented together with its application to distributed media management.

Download Full-text

Modeling and Managing Heterogeneous Patterns

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch005 ◽

2008 ◽

pp. 87-115

Author(s):

Anna Maddalena ◽

Barbara Catania

Keyword(s):

Association Rules ◽

System Architecture ◽

Ad Hoc ◽

Common Market ◽

Market Basket Analysis ◽

Logical Model ◽

Market Basket ◽

Integrated Environment ◽

Different Types ◽

System Prototype

Patterns can be defined as concise, but rich in semantics, representations of data. Due to pattern characteristics, ad-hoc systems are required for pattern management, in order to deal with them in an efficient and effective way. Several approaches have been proposed, both by scientific and industrial communities, to cope with pattern management problems. Unfortunately, most of them deal with few types of patterns and mainly concern extraction issues. Little effort has been posed in defining an overall framework dedicated to the management of different types of patterns, possibly user-defined, in a homogeneous way. In this chapter we present PSYCHO (Pattern based SYstem arCHitecture prOtotype), a system prototype providing an integrated environment for generating, representing, and manipulating heterogeneous patterns, possibly user-defined. After presenting PSYCHO logical model and architecture, we will focus on several examples of its usage concerning common market basket analysis patterns, i.e. association rules and clusters.

Download Full-text

Mining Data - Streams

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch013 ◽

2008 ◽

pp. 302-324

Author(s):

Hanady Abdulsalam ◽

David B. Skillicorn ◽

Pat Martin

Keyword(s):

Data Mining ◽

Data Analysis ◽

Road Traffic ◽

Research Work ◽

Traffic Monitoring ◽

Data Stream Mining ◽

Standard Data ◽

Advantages And Disadvantages ◽

Mining Data Streams ◽

Future Work

Data analysis or data mining have been applied to data produced by many kinds of systems. Some systems, for example road traffic monitoring, produce data continuously, and often at high rates. Analyzing such data creates new issues, because it is neither appropriate, nor perhaps possible, to accumulate it and process it using standard data-mining techniques. The information implicit in each data record must be extracted in a limited amount of time and, usually, without the possibility of going back to consider it again. Existing algorithms must be modified to apply in this new setting. This chapter outlines and analyzes the most recent research work in the area of data-stream mining. It gives some sample research ideas or algorithms in this field and concludes with a comparison that shows the main advantages and disadvantages of the algorithms. It also includes a discussion and possible future work in the area.

Download Full-text

Using Mined Patterns for XML Query Answering

Successes and New Directions in Data Mining ◽

10.4018/978-1-59904-645-7.ch003 ◽

2008 ◽

pp. 39-66

Author(s):

Elena Baralis ◽

Paolo Garza ◽

Elisa Quintarelli ◽

Letizia Tanca

Keyword(s):

Association Rules ◽

Query Answering ◽

Semistructured Data ◽

Actual Data ◽

Storage Space ◽

Xml Data ◽

Compact Representations ◽

Approximate Answers ◽

Data Summaries

XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. Several summarized representations of XML data have been proposed, which can both provide succinct information and be directly queried. In this chapter we focus on compact representations based on the extraction of association rules from XML datasets. In particular, we show how patterns can be exploited to (possibly partially) answer queries, either when fast (and approximate) answers are required, or when the actual dataset is not available, e.g., it is currently unreachable. We focus on (a) schema patterns, representing exact or approximate dataset constraints, (b) instance patterns, which represent actual data summaries, and their use for answering queries.

Download Full-text

Successes and New Directions in Data Mining
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

Semantic Integration and Knowledge Discovery for Environmental Research

Pattern Mining and Clustering on Image Databases

SeqPAM

Deterministic Motif Mining in Protein Databases

On the Usage of Structural Information in Constrained Semi-Supervised Clustering of XML Documents

Handling Local Patterns in Collaborative Structuring

Modeling and Managing Heterogeneous Patterns

Mining Data - Streams

Using Mined Patterns for XML Query Answering

Export Citation Format

Successes and New Directions in Data MiningLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By IGI Global

Privacy Preserving Data Mining, Concepts, Techniques, and Evaluation Methodologies

Semantic Integration and Knowledge Discovery for Environmental Research

Pattern Mining and Clustering on Image Databases

SeqPAM

Deterministic Motif Mining in Protein Databases

On the Usage of Structural Information in Constrained Semi-Supervised Clustering of XML Documents

Handling Local Patterns in Collaborative Structuring

Modeling and Managing Heterogeneous Patterns

Mining Data - Streams

Using Mined Patterns for XML Query Answering

Successes and New Directions in Data Mining
Latest Publications