Semantic Data Mining

Over the past few decades, data mining has emerged as a field of research critical to understanding and assimilating the large stores of data accumulated by corporations, government agencies, and laboratories. Early on, mining algorithms and techniques were limited to relational data sets coming directly from Online Transaction Processing (OLTP) systems, or from a consolidated enterprise data warehouse. However, recent work has begun to extend the limits of data mining strategies to include “semi-structured data such as HTML and XML texts, symbolic sequences, ordered trees and relations represented by advanced logics” (Washio & Motoda, 2003).

Download Full-text

Semantic Data Mining

Encyclopedia of Data Warehousing and Mining ◽

10.4018/978-1-59140-557-3.ch190 ◽

2011 ◽

pp. 1010-1014

Author(s):

Protima Banerjee ◽

Xiaohua Hu ◽

Illhoi Yoo

Keyword(s):

Data Mining ◽

Data Warehouse ◽

Transaction Processing ◽

Data Sets ◽

Ordered Trees ◽

Semantic Data ◽

Symbolic Sequences ◽

Semantic Data Mining ◽

Enterprise Data Warehouse ◽

Mining Algorithms

Over the past few decades, data mining has emerged as a field of research critical to understanding and assimilating the large stores of data accumulated by corporations, government agencies, and laboratories. Early on, mining algorithms and techniques were limited to relational data sets coming directly from Online Transaction Processing (OLTP) systems, or from a consolidated enterprise data warehouse. However, recent work has begun to extend the limits of data mining strategies to include “semi-structured data such as HTML and XML texts, symbolic sequences, ordered trees and relations represented by advanced logics” (Washio & Motoda, 2003).

Download Full-text

Semantic Data Mining of Financial News Articles

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-642-40897-7_20 ◽

2013 ◽

pp. 294-307 ◽

Cited By ~ 8

Author(s):

Anže Vavpetič ◽

Petra Kralj Novak ◽

Miha Grčar ◽

Igor Mozetič ◽

Nada Lavrač

Keyword(s):

Data Mining ◽

Financial News ◽

Semantic Data ◽

Semantic Data Mining

Download Full-text

Pattern Based Feature Construction in Semantic Data Mining

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2014010102 ◽

2014 ◽

Vol 10 (1) ◽

pp. 27-65 ◽

Cited By ~ 11

Author(s):

Agnieszka Ławrynowicz ◽

Jędrzej Potoniec

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Semantic Features ◽

Semantic Data ◽

Data Mining Approach ◽

Meta Learning ◽

New Type ◽

Domain Ontologies ◽

Semantic Data Mining

The authors propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies, rather than only purely empirical data. The authors have developed a tool that implements this approach. Using this the authors have conducted an experimental evaluation including comparison of our method to state-of-the-art approaches to classification of semantic data and an experimental study within emerging subfield of meta-learning called semantic meta-mining. The most important research contributions of the paper to the state-of-art are as follows. For pattern mining research or relational learning in general, the paper contributes a new algorithm for discovery of new type of patterns. For Semantic Web research, it theoretically and empirically illustrates how semantic, structured data can be used in traditional machine learning methods through a pattern-based approach for constructing semantic features.

Download Full-text

An Unsupervised Approach for Determining Link Specifications

International Journal of Information Technology and Web Engineering ◽

10.4018/ijitwe.2018100106 ◽

2018 ◽

Vol 13 (4) ◽

pp. 104-123

Author(s):

Khayra Bencherif ◽

Mimoun Malki ◽

Djamel Amar Bensaber

Keyword(s):

Linked Data ◽

Open Data ◽

Real Data ◽

Knowledge Bases ◽

Structured Data ◽

Data Sets ◽

Novel Approach ◽

Link Discovery ◽

Unsupervised Approach

This article describes how the Linked Open Data Cloud project allows data providers to publish structured data on the web according to the Linked Data principles. In this context, several link discovery frameworks have been developed for connecting entities contained in knowledge bases. In order to achieve a high effectiveness for the link discovery task, a suitable link configuration is required to specify the similarity conditions. Unfortunately, such configurations are specified manually; which makes the link discovery task tedious and more difficult for the users. In this article, the authors address this drawback by proposing a novel approach for the automatic determination of link specifications. The proposed approach is based on a neural network model to combine a set of existing metrics into a compound one. The authors evaluate the effectiveness of the proposed approach in three experiments using real data sets from the LOD Cloud. In addition, the proposed approach is compared against link specifications approaches to show that it outperforms them in most experiments.

Download Full-text

Using Ontologies in Semantic Data Mining with SEGS and g-SEGS

Discovery Science - Lecture Notes in Computer Science ◽

10.1007/978-3-642-24477-3_15 ◽

2011 ◽

pp. 165-178 ◽

Cited By ~ 15

Author(s):

Nada Lavrač ◽

Anže Vavpetič ◽

Larisa Soldatova ◽

Igor Trajkovski ◽

Petra Kralj Novak

Keyword(s):

Data Mining ◽

Semantic Data ◽

Semantic Data Mining

Download Full-text

Semantic data mining of short utterances

IEEE Transactions on Speech and Audio Processing ◽

10.1109/tsa.2005.851875 ◽

2005 ◽

Vol 13 (5) ◽

pp. 672-680 ◽

Cited By ~ 1

Author(s):

Lee Begeja ◽

H. Drucker ◽

D. Gibbon ◽

P. Haffner ◽

Zhu Liu ◽

...

Keyword(s):

Data Mining ◽

Semantic Data ◽

Semantic Data Mining ◽

Short Utterances

Download Full-text

Taxonomy-based data representation for data mining: an example of the magnitude of risk associated with H. pylori infection

BioData Mining ◽

10.1186/s13040-021-00271-w ◽

2021 ◽

Vol 14 (1) ◽

Author(s):

Inese Polaka ◽

Danute Razuka-Ebela ◽

Jin Young Park ◽

Marcis Leja

Keyword(s):

Data Mining ◽

Data Representation ◽

Study Data ◽

Significant Loss ◽

Specific Information ◽

Data Sets ◽

Potential Risk Factors ◽

H Pylori ◽

The Individual ◽

Study Participants

Abstract Background The amount of available and potentially significant data describing study subjects is ever growing with the introduction and integration of different registries and data banks. The single specific attribute of these data are not always necessary; more often, membership to a specific group (e.g. diet, social ‘bubble’, living area) is enough to build a successful machine learning or data mining model without overfitting it. Therefore, in this article we propose an approach to building taxonomies using clustering to replace detailed data from large heterogenous data sets from different sources, while improving interpretability. We used the GISTAR study data base that holds exhaustive self-assessment questionnaire data to demonstrate this approach in the task of differentiating between H. pylori positive and negative study participants, and assessing their potential risk factors. We have compared the results of taxonomy-based classification to the results of classification using raw data. Results Evaluation of our approach was carried out using 6 classification algorithms that induce rule-based or tree-based classifiers. The taxonomy-based classification results show no significant loss in information, with similar and up to 2.5% better classification accuracy. Information held by 10 and more attributes can be replaced by one attribute demonstrating membership to a cluster in a hierarchy at a specific cut. The clusters created this way can be easily interpreted by researchers (doctors, epidemiologists) and describe the co-occurring features in the group, which is significant for the specific task. Conclusions While there are always features and measurements that must be used in data analysis as they are, the use of taxonomies for the description of study subjects in parallel allows using membership to specific naturally occurring groups and their impact on an outcome. This can decrease the risk of overfitting (picking attributes and values specific to the training set without explaining the underlying conditions), improve the accuracy of the models, and improve privacy protection of study participants by decreasing the amount of specific information used to identify the individual.

Download Full-text

Pattern Based Feature Construction in Semantic Data Mining

Mobile Computing and Wireless Networks ◽

10.4018/978-1-4666-8751-6.ch036 ◽

2016 ◽

pp. 823-864

Author(s):

Agnieszka Ławrynowicz ◽

Jędrzej Potoniec

Keyword(s):

Data Mining ◽

Pattern Mining ◽

Semantic Features ◽

Semantic Data ◽

Data Mining Approach ◽

Meta Learning ◽

New Type ◽

Domain Ontologies ◽

Semantic Data Mining

The authors propose a new method for mining sets of patterns for classification, where patterns are represented as SPARQL queries over RDFS. The method contributes to so-called semantic data mining, a data mining approach where domain ontologies are used as background knowledge, and where the new challenge is to mine knowledge encoded in domain ontologies, rather than only purely empirical data. The authors have developed a tool that implements this approach. Using this the authors have conducted an experimental evaluation including comparison of our method to state-of-the-art approaches to classification of semantic data and an experimental study within emerging subfield of meta-learning called semantic meta-mining. The most important research contributions of the paper to the state-of-art are as follows. For pattern mining research or relational learning in general, the paper contributes a new algorithm for discovery of new type of patterns. For Semantic Web research, it theoretically and empirically illustrates how semantic, structured data can be used in traditional machine learning methods through a pattern-based approach for constructing semantic features.

Download Full-text

A Survey Paper on OntologyBased Approaches for Semantic Data Mining

International Journal on Recent and Innovation Trends in Computing and Communication ◽

10.17762/ijritcc2321-8169.150480 ◽

2015 ◽

Vol 3 (4) ◽

pp. 2137-2141 ◽

Cited By ~ 1

Author(s):

Priti V ◽

Keyword(s):

Data Mining ◽

Survey Paper ◽

Semantic Data ◽

Semantic Data Mining

Download Full-text