Semantic Web | ScienceGate

Handling qualitative preferences in SPARQL over virtual ontology-based data access

Semantic Web ◽

10.3233/sw-212895 ◽

2022 ◽

pp. 1-24

Author(s):

Marlene Goncalves ◽

David Chaves-Fraga ◽

Oscar Corcho

Keyword(s):

Open Data ◽

Scoring Function ◽

Data Access ◽

Heterogeneous Data ◽

Database Management System ◽

Query Complexity ◽

Distribution Data ◽

Preference Queries ◽

Preference Criteria ◽

Qualitative Preferences

With the increase of data volume in heterogeneous datasets that are being published following Open Data initiatives, new operators are necessary to help users to find the subset of data that best satisfies their preference criteria. Quantitative approaches such as top-k queries may not be the most appropriate approaches as they require the user to assign weights that may not be known beforehand to a scoring function. Unlike the quantitative approach, under the qualitative approach, which includes the well-known skyline, preference criteria are more intuitive in certain cases and can be expressed more naturally. In this paper, we address the problem of evaluating SPARQL qualitative preference queries over an Ontology-Based Data Access (OBDA) approach, which provides uniform access over multiple and heterogeneous data sources. Our main contribution is Morph-Skyline++, a framework for processing SPARQL qualitative preferences by directly querying relational databases. Our framework implements a technique that translates SPARQL qualitative preference queries directly into queries that can be evaluated by a relational database management system. We evaluate our approach over different scenarios, reporting the effects of data distribution, data size, and query complexity on the performance of our proposed technique in comparison with state-of-the-art techniques. Obtained results suggest that the execution time can be reduced by up to two orders of magnitude in comparison to current techniques scaling up to larger datasets while identifying precisely the result set.

Download Full-text

Glottocodes: Identifiers linking families, languages and dialects to comprehensive reference information

Semantic Web ◽

10.3233/sw-212843 ◽

2022 ◽

pp. 1-8

Author(s):

Robert Forkel ◽

Harald Hammarström

Keyword(s):

Linked Data ◽

Scientific Data ◽

Data Curation ◽

Identification System ◽

Target Domain ◽

Scientific Data Management ◽

Reference Information ◽

Important Challenge ◽

Entire World ◽

Comprehensive Reference

Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog (https://glottolog.org). In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/version-tracking systematics. Since our understanding of the target domain – the dialects, languages and language families of the entire world – is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.

Download Full-text

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Semantic Web ◽

10.3233/sw-212892 ◽

2022 ◽

pp. 1-24

Author(s):

Jan Portisch ◽

Nicolas Heist ◽

Heiko Paulheim

Keyword(s):

Data Mining ◽

Link Prediction ◽

Graph Embedding ◽

Knowledge Graph ◽

Graph Embeddings ◽

Similarity Functions ◽

Evaluation Methodologies ◽

Series Of Experiments ◽

Two Sides ◽

Lower Dimensional

Knowledge Graph Embeddings, i.e., projections of entities and relations to lower dimensional spaces, have been proposed for two purposes: (1) providing an encoding for data mining tasks, and (2) predicting links in a knowledge graph. Both lines of research have been pursued rather in isolation from each other so far, each with their own benchmarks and evaluation methodologies. In this paper, we argue that both tasks are actually related, and we show that the first family of approaches can also be used for the second task and vice versa. In two series of experiments, we provide a comparison of both families of approaches on both tasks, which, to the best of our knowledge, has not been done so far. Furthermore, we discuss the differences in the similarity functions evoked by the different embedding approaches.

Download Full-text

Network representation learning method embedding linear and nonlinear network structures

Semantic Web ◽

10.3233/sw-212968 ◽

2022 ◽

pp. 1-16

Author(s):

Hu Zhang ◽

Jingjing Zhou ◽

Ru Li ◽

Yue Fan

Keyword(s):

Rapid Development ◽

Linear Structure ◽

Representation Learning ◽

Nonlinear Structure ◽

Structure Information ◽

Nonlinear Network ◽

Network Representation ◽

Proposed Model ◽

Representation Method ◽

Low Dimensional

With the rapid development of neural networks, much attention has been focused on network embedding for complex network data, which aims to learn low-dimensional embedding of nodes in the network and how to effectively apply learned network representations to various graph-based analytical tasks. Two typical models exist namely the shallow random walk network representation method and deep learning models such as graph convolution networks (GCNs). The former one can be used to capture the linear structure of the network using depth-first search (DFS) and width-first search (BFS), whereas Hierarchical GCN (HGCN) is an unsupervised graph embedding that can be used to describe the global nonlinear structure of the network via aggregating node information. However, the two existing kinds of models cannot simultaneously capture the nonlinear and linear structure information of nodes. Thus, the nodal characteristics of nonlinear and linear structures are explored in this paper, and an unsupervised representation method based on HGCN that joins learning of shallow and deep models is proposed. Experiments on node classification and dimension reduction visualization are carried out on citation, language, and traffic networks. The results show that, compared with the existing shallow network representation model and deep network model, the proposed model achieves better performances in terms of micro-F1, macro-F1 and accuracy scores.

Download Full-text

Analyzing the generalizability of the network-based topic emergence identification method

Semantic Web ◽

10.3233/sw-212951 ◽

2022 ◽

pp. 1-17

Author(s):

Sukhwan Jung ◽

Aviv Segev

Keyword(s):

Structural Features ◽

Research Topics ◽

Performance Improvements ◽

Topic Evolution ◽

The Past ◽

Research Fields ◽

Fields Of Study ◽

Academic Publications ◽

Research Domains ◽

Future Topics

Topic evolution helps the understanding of current research topics and their histories by automatically modeling and detecting the set of shared research fields in academic publications as topics. This paper provides a generalized analysis of the topic evolution method for predicting the emergence of new topics, which can operate on any dataset where the topics are defined as the relationships of their neighborhoods in the past by extrapolating to the future topics. Twenty sample topic networks were built with various fields-of-study keywords as seeds, covering domains such as business, materials, diseases, and computer science from the Microsoft Academic Graph dataset. The binary classifier was trained for each topic network using 15 structural features of emerging and existing topics and consistently resulted in accuracy and F1 over 0.91 for all twenty datasets over the periods of 2000 to 2019. Feature selection showed that the models retained most of the performance with only one-third of the tested features. Incremental learning was tested within the same topic over time and between different topics, which resulted in slight performance improvements in both cases. This indicates there is an underlying pattern to the neighbors of new topics common to research domains, likely beyond the sample topics used in the experiment. The result showed that network-based new topic prediction can be applied to various research domains with different research patterns.

Download Full-text

A survey on visual transfer learning using knowledge graphs

Semantic Web ◽

10.3233/sw-212959 ◽

2022 ◽

pp. 1-34

Author(s):

Sebastian Monka ◽

Lavdim Halilaj ◽

Achim Rettinger

Keyword(s):

Transfer Learning ◽

Real World ◽

Image Data ◽

Graph Embedding ◽

Future Research ◽

Knowledge Graph ◽

Semantic Features ◽

Learning Approaches ◽

Knowledge Graphs ◽

Relational Graphs

The information perceived via visual observations of real-world phenomena is unstructured and complex. Computer vision (CV) is the field of research that attempts to make use of that information. Recent approaches of CV utilize deep learning (DL) methods as they perform quite well if training and testing domains follow the same underlying data distribution. However, it has been shown that minor variations in the images that occur when these methods are used in the real world can lead to unpredictable and catastrophic errors. Transfer learning is the area of machine learning that tries to prevent these errors. Especially, approaches that augment image data using auxiliary knowledge encoded in language embeddings or knowledge graphs (KGs) have achieved promising results in recent years. This survey focuses on visual transfer learning approaches using KGs, as we believe that KGs are well suited to store and represent any kind of auxiliary knowledge. KGs can represent auxiliary knowledge either in an underlying graph-structured schema or in a vector-based knowledge graph embedding. Intending to enable the reader to solve visual transfer learning problems with the help of specific KG-DL configurations we start with a description of relevant modeling structures of a KG of various expressions, such as directed labeled graphs, hypergraphs, and hyper-relational graphs. We explain the notion of feature extractor, while specifically referring to visual and semantic features. We provide a broad overview of knowledge graph embedding methods and describe several joint training objectives suitable to combine them with high dimensional visual embeddings. The main section introduces four different categories on how a KG can be combined with a DL pipeline: 1) Knowledge Graph as a Reviewer; 2) Knowledge Graph as a Trainee; 3) Knowledge Graph as a Trainer; and 4) Knowledge Graph as a Peer. To help researchers find meaningful evaluation benchmarks, we provide an overview of generic KGs and a set of image processing datasets and benchmarks that include various types of auxiliary knowledge. Last, we summarize related surveys and give an outlook about challenges and open issues for future research.

Download Full-text

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Semantic Web ◽

10.3233/sw-212838 ◽

2022 ◽

pp. 1-35

Author(s):

Katarina Boland ◽

Pavlos Fafalios ◽

Andon Tchechmedjiev ◽

Stefan Dietze ◽

Konstantin Todorov

Keyword(s):

Discourse Analysis ◽

Computational Linguistics ◽

Language Processing ◽

Factual Knowledge ◽

Online Discourse ◽

Knowledge Based ◽

Research Areas ◽

Argumentation Mining ◽

Knowledge Base Construction ◽

Fact Checking

Analyzing statements of facts and claims in online discourse is subject of a multitude of research areas. Methods from natural language processing and computational linguistics help investigate issues such as the spread of biased narratives and falsehoods on the Web. Related tasks include fact-checking, stance detection and argumentation mining. Knowledge-based approaches, in particular works in knowledge base construction and augmentation, are concerned with mining, verifying and representing factual knowledge. While all these fields are concerned with strongly related notions, such as claims, facts and evidence, terminology and conceptualisations used across and within communities vary heavily, making it hard to assess commonalities and relations of related works and how research in one field may contribute to address problems in another. We survey the state-of-the-art from a range of fields in this interdisciplinary area across a range of research tasks. We assess varying definitions and propose a conceptual model – Open Claims – for claims and related notions that takes into consideration their inherent complexity, distinguishing between their meaning, linguistic representation and context. We also introduce an implementation of this model by using established vocabularies and discuss applications across various tasks related to online discourse analysis.

Download Full-text

Semantic-enabled architecture for auditable privacy-preserving data analysis

Semantic Web ◽

10.3233/sw-212883 ◽

2022 ◽

pp. 1-34

Author(s):

Fajar J. Ekaputra ◽

Andreas Ekelhart ◽

Rudolf Mayer ◽

Tomasz Miksa ◽

Tanja Šarčević ◽

...

Keyword(s):

Data Analysis ◽

Data Protection ◽

Personal Data ◽

Data Representation ◽

Privacy Preserving ◽

Sufficient Information ◽

Full Potential ◽

Sensitive Data ◽

Explicit Consent ◽

Audit Trails

Small and medium-sized organisations face challenges in acquiring, storing and analysing personal data, particularly sensitive data (e.g., data of medical nature), due to data protection regulations, such as the GDPR in the EU, which stipulates high standards in data protection. Consequently, these organisations often refrain from collecting data centrally, which means losing the potential of data analytics and learning from aggregated user data. To enable organisations to leverage the full-potential of the collected personal data, two main technical challenges need to be addressed: (i) organisations must preserve the privacy of individual users and honour their consent, while (ii) being able to provide data and algorithmic governance, e.g., in the form of audit trails, to increase trust in the result and support reproducibility of the data analysis tasks performed on the collected data. Such an auditable, privacy-preserving data analysis is currently challenging to achieve, as existing methods and tools only offer partial solutions to this problem, e.g., data representation of audit trails and user consent, automatic checking of usage policies or data anonymisation. To the best of our knowledge, there exists no approach providing an integrated architecture for auditable, privacy-preserving data analysis. To address these gaps, as the main contribution of this paper, we propose the WellFort approach, a semantic-enabled architecture for auditable, privacy-preserving data analysis which provides secure storage for users’ sensitive data with explicit consent, and delivers a trusted, auditable analysis environment for executing data analytic processes in a privacy-preserving manner. Additional contributions include the adaptation of Semantic Web technologies as an integral part of the WellFort architecture, and the demonstration of the approach through a feasibility study with a prototype supporting use cases from the medical domain. Our evaluation shows that WellFort enables privacy preserving analysis of data, and collects sufficient information in an automated way to support its auditability at the same time.

Download Full-text

Diverse data! Diverse schemata?

Semantic Web ◽

10.3233/sw-210453 ◽

2021 ◽

pp. 1-3

Author(s):

Krzysztof Janowicz ◽

Cogan Shimizu ◽

Pascal Hitzler ◽

Gengchen Mai ◽

Shirly Stephen ◽

...

Keyword(s):

Social Sciences ◽

Partial Information ◽

Physical Sciences ◽

Semantic Web Technologies ◽

Web Technologies ◽

The Social ◽

Value Propositions ◽

Data Source ◽

Knowledge Graphs ◽

Diverse Data

One of the key value propositions for knowledge graphs and semantic web technologies is fostering semantic interoperability, i.e., integrating data across different themes and domains. But why do we aim at interoperability in the first place? A common answer to this question is that each individual data source only contains partial information about some phenomenon of interest. Consequently, combining multiple diverse datasets provides a more holistic perspective and enables us to answer more complex questions, e.g., those that span between the physical sciences and the social sciences. Interestingly, while these arguments are well established and go by different names, e.g., variety in the realm of big data, we seem less clear about whether the same arguments apply on the level of schemata. Put differently, we want diverse data, but do we also want diverse schemata or a single one to rule them all?

Download Full-text

Applying the LOT Methodology to a Public Bus Transport Ontology aligned with Transmodel: Challenges and Results

Semantic Web ◽

10.3233/sw-210451 ◽

2021 ◽

pp. 1-19

Author(s):

Edna Ruckhaus ◽

Adolfo Anton-Bravo ◽

Mario Scrocca ◽

Oscar Corcho

Keyword(s):

Design Patterns ◽

Reference Model ◽

Data Driven ◽

Transport Systems ◽

Ontology Design ◽

Rdf Data ◽

Mapping Language ◽

Uml Specification ◽

Validation Stage ◽

Bus Transport

We present an ontology that describes the domain of Public Transport by bus, which is common in cities around the world. This ontology is aligned to Transmodel, a reference model which is available as a UML specification and which was developed to foster interoperability of data about transport systems across Europe. The alignment with this non-ontological resource required the adaptation of the Linked Open Terms (LOT) methodology, which has been used by our team as the methodological framework for the development of many ontologies used for the publication of open city data. The ontology is structured into three main modules: (1) agencies, operators and the lines that they manage, (2) lines, routes, stops and journey patterns, and (3) planned vehicle journeys with their timetables and service calendars. Besides reusing Transmodel concepts, the ontology also reuses common ontology design patterns from GeoSPARQL and the SOSA ontology. As part of the LOT data-driven validation stage, RDF data has been generated taking as input the GTFS feeds (General Transit Feed Specification) provided by the Madrid public bus transport provider (EMT). Mapping rules from structured data sources to RDF were developed using the RDF Mapping Language (RML) to generate RDF data, and queries corresponding to competency questions were tested.

Download Full-text

Semantic Web
Latest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Ios Press

Handling qualitative preferences in SPARQL over virtual ontology-based data access

Glottocodes: Identifiers linking families, languages and dialects to comprehensive reference information

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Network representation learning method embedding linear and nonlinear network structures

Analyzing the generalizability of the network-based topic emergence identification method

A survey on visual transfer learning using knowledge graphs

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Semantic-enabled architecture for auditable privacy-preserving data analysis

Diverse data! Diverse schemata?

Applying the LOT Methodology to a Public Bus Transport Ontology aligned with Transmodel: Challenges and Results

Export Citation Format

Semantic WebLatest Publications

TOTAL DOCUMENTS

H-INDEX

Published By Ios Press

Handling qualitative preferences in SPARQL over virtual ontology-based data access

Glottocodes: Identifiers linking families, languages and dialects to comprehensive reference information

Knowledge graph embedding for data mining vs. knowledge graph embedding for link prediction – two sides of the same coin?

Network representation learning method embedding linear and nonlinear network structures

Analyzing the generalizability of the network-based topic emergence identification method

A survey on visual transfer learning using knowledge graphs

Beyond facts – a survey and conceptualisation of claims in online discourse analysis

Semantic-enabled architecture for auditable privacy-preserving data analysis

Diverse data! Diverse schemata?

Applying the LOT Methodology to a Public Bus Transport Ontology aligned with Transmodel: Challenges and Results

Semantic Web
Latest Publications