Successes and New Directions in Data Mining
Latest Publications


TOTAL DOCUMENTS

13
(FIVE YEARS 0)

H-INDEX

1
(FIVE YEARS 0)

Published By IGI Global

9781599046457, 9781599046471

Author(s):  
Igor Nai Fovino

Intense work in the area of data mining technology and in its applications to several domains has resulted in the development of a large variety of techniques and tools able to automatically and intelligently transform large amounts of data in knowledge relevant to users. However, as with other kinds of useful technologies, the knowledge discovery process can be misused. It can be used, for example, by malicious subjects in order to reconstruct sensitive information for which they do not have an explicit access authorization. This type of “attack” cannot easily be detected, because, usually, the data used to guess the protected information, is freely accessible. For this reason, many research efforts have been recently devoted to addressing the problem of privacy preserving in data mining. The mission of this chapter is therefore to introduce the reader in this new research field and to provide the proper instruments (in term of concepts, techniques and example) in order to allow a critical comprehension of the advantages, the limitations and the open issues of the Privacy Preserving Data Mining Techniques.


Author(s):  
Zhiyuan Chen ◽  
Aryya Gangopadhyay ◽  
George Karabatis ◽  
Michael McGuire ◽  
Claire Welty

Environmental research and knowledge discovery both require extensive use of data stored in various sources and created in different ways for diverse purposes. We describe a new metadata approach to elicit semantic information from environmental data and implement semantics-based techniques to assist users in integrating, navigating, and mining multiple environmental data sources. Our system contains specifications of various environmental data sources and the relationships that are formed among them. User requests are augmented with semantically related data sources and automatically presented as a visual semantic network. In addition, we present a methodology for data navigation and pattern discovery using multi-resolution browsing and data mining. The data semantics are captured and utilized in terms of their patterns and trends at multiple levels of resolution. We present the efficacy of our methodology through experimental results.


Author(s):  
Marinette Bouet ◽  
Pierre Gançarski ◽  
Marie-Aude Aufaure ◽  
Omar Boussaïd

Analysing and mining image data to derive potentially useful information is a very challenging task. Image mining concerns the extraction of implicit knowledge, image data relationships, associations between image data and other data or patterns not explicitly stored in the images. Another crucial task is to organize the large image volumes to extract relevant information. In fact, decision support systems are evolving to store and analyse these complex data. This paper presents a survey of the relevant research related to image data processing. We present data warehouse advances that organize large volumes of data linked with images and then, we focus on two techniques largely used in image mining. We present clustering methods applied to image analysis and we introduce the new research direction concerning pattern mining from large collections of images. While considerable advances have been made in image clustering, there is little research dealing with image frequent pattern mining. We shall try to understand why.


Author(s):  
Pradeep Kumar Kumar ◽  
Raju S. Bapi ◽  
P. Radha Krishna

With the growth in the number of web users and necessity for making information available on the web, the problem of web personalization has become very critical and popular. Developers are trying to customize a web site to the needs of specific users with the help of knowledge acquired from user navigational behavior. Since user page visits are intrinsically sequential in nature, efficient clustering algorithms for sequential data are needed. In this paper, we introduce a similarity preserving function called sequence and set similarity measure S3M that captures both the order of occurrence of page visits as well as the content of pages. We conducted pilot experiments comparing the results of PAM, a standard clustering algorithm, with two similarity measures: Cosine and S3M. The goodness of the clusters resulting from both the measures was computed using a cluster validation technique based on average levensthein distance. Results on pilot dataset established the effectiveness of S3M for sequential data. Based on these results, we proposed a new clustering algorithm, SeqPAM for clustering sequential data. We tested the new algorithm on two datasets namely, cti and msnbc datasets. We provided recommendations for web personalization based on the clusters obtained from SeqPAM for msnbc dataset.


Author(s):  
Pedro Gabriel Ferreira ◽  
Paulo Jorge Azevedo

Protein sequence motifs describe, through means of enhanced regular expression syntax, regions of amino-acids that have been conserved across several functionally related proteins. These regions may have an implication at the structural and functional level of the proteins. Sequence motif analysis can bring significant improvements towards a better understanding of the protein sequence-structure-function relation. In this chapter we review the subject of mining deterministic motifs from protein sequence databases. We start by giving a formal definition of the different types of motifs and the respective specificities. Then, we explore the methods available to evaluate the quality and interest of such patterns. Examples of applications and motif repositories are described. We discuss the algorithmic aspects and different methodologies for motif extraction. A briefly description on how sequence motifs can be used to extract structural level information patterns is also provided.


Author(s):  
Eduardo Bezerra ◽  
Geraldo Xexéo ◽  
Marta Mattoso

In this chapter, we consider the problem of constrained clustering of documents. We focus on documents that present some form of structural information, in which prior knowledge is provided. Such structured data can guide the algorithm to a better clustering model. We consider the existence of a particular form of information to be clustered: textual documents that present a logical structure represented in XML format. Based on this consideration, we present algorithms that take advantage of XML metadata (structural information), thus improving the quality of the generated clustering models. This chapter also addresses the problem of inconsistent constraints and defines algorithms that eliminate inconsistencies, also based on the existence of structural information associated to the XML document collection.


Author(s):  
Ingo Mierswa ◽  
Katharina Morik ◽  
Michael Wurst

Media collections in the internet have become a commercial success and the structuring of large media collections has thus become an issue. Personal media collections are locally structured in very different ways by different users. The level of detail, the chosen categories, and the extensions can differ completely from user to user. Can machine learning be of help also for structuring personal collections? Since users do not want to have their hand-made structures overwritten, one could deny the benefit of automatic structuring. We argue that what seems to exclude machine learning, actually poses a new learning task. We propose a notation which allows us to describe machine learning tasks in a uniform manner. Keeping the demands of structuring private collections in mind, we define the new learning task of localized alternative cluster ensembles. An algorithm solving the new task is presented together with its application to distributed media management.


Author(s):  
Anna Maddalena ◽  
Barbara Catania

Patterns can be defined as concise, but rich in semantics, representations of data. Due to pattern characteristics, ad-hoc systems are required for pattern management, in order to deal with them in an efficient and effective way. Several approaches have been proposed, both by scientific and industrial communities, to cope with pattern management problems. Unfortunately, most of them deal with few types of patterns and mainly concern extraction issues. Little effort has been posed in defining an overall framework dedicated to the management of different types of patterns, possibly user-defined, in a homogeneous way. In this chapter we present PSYCHO (Pattern based SYstem arCHitecture prOtotype), a system prototype providing an integrated environment for generating, representing, and manipulating heterogeneous patterns, possibly user-defined. After presenting PSYCHO logical model and architecture, we will focus on several examples of its usage concerning common market basket analysis patterns, i.e. association rules and clusters.


Author(s):  
Hanady Abdulsalam ◽  
David B. Skillicorn ◽  
Pat Martin

Data analysis or data mining have been applied to data produced by many kinds of systems. Some systems, for example road traffic monitoring, produce data continuously, and often at high rates. Analyzing such data creates new issues, because it is neither appropriate, nor perhaps possible, to accumulate it and process it using standard data-mining techniques. The information implicit in each data record must be extracted in a limited amount of time and, usually, without the possibility of going back to consider it again. Existing algorithms must be modified to apply in this new setting. This chapter outlines and analyzes the most recent research work in the area of data-stream mining. It gives some sample research ideas or algorithms in this field and concludes with a comparison that shows the main advantages and disadvantages of the algorithms. It also includes a discussion and possible future work in the area.


Author(s):  
Elena Baralis ◽  
Paolo Garza ◽  
Elisa Quintarelli ◽  
Letizia Tanca

XML is a rather verbose representation of semistructured data, which may require huge amounts of storage space. Several summarized representations of XML data have been proposed, which can both provide succinct information and be directly queried. In this chapter we focus on compact representations based on the extraction of association rules from XML datasets. In particular, we show how patterns can be exploited to (possibly partially) answer queries, either when fast (and approximate) answers are required, or when the actual dataset is not available, e.g., it is currently unreachable. We focus on (a) schema patterns, representing exact or approximate dataset constraints, (b) instance patterns, which represent actual data summaries, and their use for answering queries.


Sign in / Sign up

Export Citation Format

Share Document