Data Mining with Ontologies
Latest Publications


TOTAL DOCUMENTS

12
(FIVE YEARS 0)

H-INDEX

3
(FIVE YEARS 0)

Published By IGI Global

9781599046181, 9781599046204

2011 ◽  
pp. 106-122
Author(s):  
Amandeep S. Sidhu ◽  
Tharam S. Dillon ◽  
Elizabeth Chang

Traditional approaches to integrate protein data generally involved keyword searches, which immediately excludes unannotated or poorly annotated data. An alternative protein annotation approach is to rely on sequence identity, or structural similarity, or functional identification. Some proteins have high degree of sequence identity, or structural similarity, or similarity in functions that are unique to members of that family alone. Consequently, this approach can’t be generalized to integrate the protein data. Clearly, these traditional approaches have limitations in capturing and integrating data for Protein Annotation. For these reasons, we have adopted an alternative method that does not rely on keywords or similarity metrics, but instead uses ontology. In this chapter we discuss conceptual framework of Protein Ontology that has a hierarchical classification of concepts represented as classes, from general to specific; a list of attributes related to each concept, for each class; a set of relations between classes to link concepts in ontology in more complicated ways then implied by the hierarchy, to promote reuse of concepts in the ontology; and a set of algebraic operators for querying protein ontology instances.


2011 ◽  
pp. 84-105 ◽  
Author(s):  
Ana Isabel Canhoto

The use of automated systems to collect, process and analyse vast amounts of data is now integral to the operations of many corporations and government agencies, in particular it has gained recognition as a strategic tool in the war on crime. Data mining, the technology behind such analysis, has its origins in quantitative sciences. Yet, analysts face important issues of a cognitive nature both in terms of the input for the data mining effort, and in terms of the analysis of the output. Domain knowledge and bias information influence which patterns in the data are deemed as useful and, ultimately, valid. This chapter addresses the role of cognition and context in the interpretation and validation of mined knowledge. We propose the use of ontology charts and norm specifications to map how varying levels of access to information and exposure to specific social norms lead to divergent views of mined knowledge.


2011 ◽  
pp. 37-64 ◽  
Author(s):  
Brigitte Trousse ◽  
Marie-Aude Aufaure ◽  
Bénédicte Le Grand ◽  
Yves Lechevallier ◽  
Florent Masseglia

This chapter proposes an original approach for ontology management in the context of Web-based information systems. Our approach relies on the usage analysis of the chosen Web site, in complement of the existing approaches based on content analysis of Web pages. Our methodology is based on the knowledge discovery techniques mainly from HTTP Web logs and aims at confronting the discovered knowledge in terms of usage with the existing ontology in order to propose new relations between concepts. We illustrate our approach on a Web site provided by French local tourism authorities (related to Metz city) with the use of clustering and sequential patterns discovery methods. One major contribution of this chapter is thus the application of usage analysis to support ontology evolution and/or web site reorganization.


Author(s):  
Sofia Stamou ◽  
Alexandros Ntoulas ◽  
Dimitris Christodoulakis

In this paper we study how we can organize the continuously proliferating Web content into topical cate-gories, also known as Web directories. In this respect, we have implemented a system, named TODE that uses a Topical Ontology for Directories’ Editing. First, we describe the process for building our ontol-ogy of Web topics, which are treated in TODE as directories’ topics. Then, we present how TODE inter-acts with the ontology in order to categorize Web pages into the ontology’s topics and we experimentally study our system’s efficiency in grouping Web pages thematically. We evaluate TODE’s performance by comparing its resulting categorization for a number of pages to the categorization the same pages dis-play in Google Directory as well as to the categorizations delivered for the same set of pages and topics by a Bayesian classifier. Results indicate that our model has a noticeable potential in reducing the hu-man-effort overheads associated with populating Web directories. Furthermore, experimental results im-ply that the use of a rich topical ontology increases significantly classification accuracy for dynamic con-tents.


2011 ◽  
pp. 237-255
Author(s):  
Evangelos Kotsifakos ◽  
Gerasimos Marketos ◽  
Yannis Theodoridis

Pattern Base Management Systems (PBMS) have been introduced as an effective way to manage the high volume of patterns available nowadays. PBMS provide pattern management functionality in the same way where a Database Management System provides data management functionality. However, not all the extracted patterns are interesting; some are trivial and insignificant because they do not make sense according to the domain knowledge. Thus, in order to automate the pattern evaluation process, we need to incorporate the domain knowledge in it. We propose the integration of PBMS and Ontologies as a solution to the need of many scientific fields for efficient extraction of useful information from large databases and the exploitation of knowledge. In this chapter, we describe the potentiality of this integration and the issues that should be considered introducing an XML-based PBMS. We use a case study of data mining over scientific (seismological) data to illustrate the proposed PBMS and ontology integrated environment.


2011 ◽  
pp. 182-210 ◽  
Author(s):  
Peter Brezany ◽  
Ivan Janciak ◽  
A Min Tjoa

This chapter introduces an ontology-based framework for automated construction of complex interactive data mining workflows as a means of improving productivity of Grid-enabled data exploration systems. The authors first characterize existing manual and automated workflow composition approaches and then present their solution called GridMiner Assistant (GMA), which addresses the whole life cycle of the knowledge discovery process. GMA is specified in the OWL language and is being developed around a novel data mining ontology, which is based on concepts of industry standards like the Predictive Model Markup Language, Cross Industry Standard Process for Data Mining and Java Data Mining API. The ontology introduces basic data mining concepts like data mining elements, tasks, services, etc. In addition, conceptual and implementation architectures of the framework are presented and its application to an example taken from the medical domain is illustrated. The authors hope that the further research and development of this framework can lead to productivity improvements, which can have significant impact on many real-life spheres. For example, it can be a crucial factor in achievement of scientific discoveries, optimal treatment of patients, productive decision making, cutting costs, etc.


2011 ◽  
pp. 145-158
Author(s):  
Stanley Loh ◽  
Daniel Lichtnow ◽  
Thyago Borges ◽  
Gustavo Piltcher

This chapter investigates different aspects in the construction of a domain ontology to a content-based recommender system. The recommender systems suggests textual electronic documents from a Digital Library, based on documents read by the users and based on textual messages posted in electronic discussions through a web chat. The domain ontology is used to represent the user’s interest and the content of the documents. In this context, the ontology is composed by a hierarchy of concepts and keywords. Each concept has a vector of keywords with weights associated. Keywords are used to identify the content of the texts (documents and messages), through the application of text mining techniques. The chapter discusses different approaches for constructing the domain ontology, including the use of text mining software tools for supervised learning, the interference of domain experts in the engineering process and the use of a normalization step.


2011 ◽  
pp. 65-82 ◽  
Author(s):  
Minh Hai Pham ◽  
Delphine Bernhard ◽  
Gayo Diallo ◽  
Radja Messai ◽  
Michel Simonet

Clustering similar documents is a difficult task for text data mining. Difficulties stem especially from the way documents are translated into numerical vectors. In this paper, we will present a method which uses Self Organizing Map (SOM) to cluster medical documents. The originality of the method is that it does not rely on the words shared by documents but rather on concepts taken from an ontology. Our goal is to cluster various medical documents in thematically consistent groups (e.g. grouping all the documents related to cardiovascular diseases). Before applying the SOM algorithm, documents have to go through several pre-processing steps. First, textual data have to be extracted from the documents, which can be either in the PDF or HTML format. Documents are then indexed, using two kinds of indexing units: stems and concepts. After indexing, documents can be numerically represented by vectors whose dimensions correspond to indexing units. These vectors store the weight of the indexing unit within the document they represent. They are given as inputs to a SOM which arranges the corresponding documents on a two-dimensional map. We have compared the results for two indexing schemes: stem-based indexing and conceptual indexing. We will show that using an ontology for document clustering has several advantages. It is possible to cluster documents written in several languages since concepts are language-independent. This is especially helpful in the medical domain where research articles are written in different languages. Another advantage is that the use of concepts helps reduce the size of the vectors, which, in turn, reduces processing time.


2011 ◽  
pp. 18-36 ◽  
Author(s):  
Xuan Zhou ◽  
James Geller

This chapter introduces Raising as an operation which is used as a pre-processing step for Data Mining. In the Web Marketing Project, people’s demographic and interest information has been collected from the Web. Rules have been derived using this information as input for data mining. The Raising step takes advantage of an interest ontology to advance data mining and to improve rule quality. The definition and implementation of Raising are presented in this chapter. Furthermore, the effects caused by Raising are analyzed in detail, showing an improvement of the support and confidence values of useful association rules for marketing purposes.


2011 ◽  
pp. 123-144
Author(s):  
Josiane Mothe ◽  
Nathalie Hernandez

This chapter introduces a method re-using a thesaurus built for a given domain, in order to create new resources of a higher semantic level in the form of an ontology. Considering ontologies for data-mining tasks relies on the intuition that the meaning of textual information depends on the conceptual relations between the objects to which they refer rather than on the linguistic and statistical relations of their content. To put forward such advanced mechanisms, the first step is to build the ontologies. The originality of the method is that it is based both on the knowledge extracted from a thesaurus and on the knowledge semi-automatically extracted from a textual corpus. The whole process is semi-automated and experts’ tasks are limited to validating certain steps. In parallel, we have developed mechanisms based on the obtained ontology to accomplish a science monitoring task. An example will be given.


Sign in / Sign up

Export Citation Format

Share Document