Raising Awareness of Data Sharing Consent Through Knowledge Graph Visualisation

Knowledge graphs facilitate systematic large-scale data analysis by providing both human and machine-readable structures, which can be shared across different domains and platforms. Nowadays, knowledge graphs can be used to standardise the collection and sharing of user information in many different sectors such as transport, insurance, smart cities and internet of things. Regulations such as the GDPR make sure that users are not taken advantage of when they share data. From a legal standpoint it is necessary to have the user’s consent to collect information. This consent is only valid if the user is aware about the information collected at all times. To increase this awareness, we present a knowledge graph visualisation approach, which informs users about the activities linked to their data sharing agreements, especially after they have already given their consent. To visualise the graph, we introduce a user-centred application which showcases sensor data collection and distribution to different data processors. Finally, we present the results of a user study conducted to find out whether this visualisation leads to more legal awareness and trust. We show that with our visualisation tool data sharing consent rates increase from 48% to 81.5%.

Download Full-text

Data Communities: Empowering Researcher-Driven Data Sharing in the Sciences

International Journal of Digital Curation ◽

10.2218/ijdc.v15i1.695 ◽

1970 ◽

Vol 15 (1) ◽

pp. 7

Author(s):

Rebecca Springer ◽

Danielle Cooper

Keyword(s):

Data Sharing ◽

Large Scale ◽

Data Repository ◽

Disciplinary Boundaries ◽

Success Stories ◽

Scholarly Communications ◽

Information Technologists ◽

Share Data ◽

Technological Intervention ◽

Informal Groups

There is a growing perception that science can progress more quickly, more innovatively, and more rigorously when researchers share data with each other. However many scientists are not engaging in data sharing and remain skeptical of its relevance to their work. As organizations and initiatives designed to promote STEM data sharing multiply – within, across, and outside academic institutions – there is a pressing need to decide strategically on the best ways to move forward. In this paper, we propose a new mechanism for conceptualizing and supporting STEM research data sharing.. Successful data sharing happens within data communities, formal or informal groups of scholars who share a certain type of data with each other, regardless of disciplinary boundaries. Drawing on the findings of four large-scale qualitative studies of research practices conducted by Ithaka S+R, as well as the scholarly literature, we identify what constitutes a data community and outline its most important features by studying three success stories, investigating the circumstances under which intensive data sharing is already happening. We contend that stakeholders who wish to promote data sharing – librarians, information technologists, scholarly communications professionals, and research funders, to name a few – should work to identify and empower emergent data communities. These are groups of scholars for whom a relatively straightforward technological intervention, usually the establishment of a data repository, could kickstart the growth of a more active data sharing culture. We conclude by offering recommendations for ways forward.

Download Full-text

Knowledge Graphs

Biodiversity Information Science and Standards ◽

10.3897/biss.5.73796 ◽

2021 ◽

Vol 5 ◽

Author(s):

Roderic Page

Keyword(s):

Knowledge Management ◽

Large Scale ◽

Personal Knowledge ◽

Knowledge Graph ◽

Specific Knowledge ◽

Management Tools ◽

Global Projects ◽

Knowledge Graphs ◽

Constructing Knowledge ◽

Knowledge Management Tools

Knowledge graphs embody the idea of "everything connected to everything else." As attractive as this seems, there is a substantial gap between the dream of fully interconnected knowledge and the reality of data that is still mostly siloed, or weakly connected by shared strings such as taxonomic names. How do we move forward? Do we focus on building our own domain- or project-specific knowledge graphs, or do we engage with global projects such as Wikidata? Do we construct knowledge graphs, or focus on making our data "knowledge graph ready" by adopting structured markup in the hope that knowledge graphs will spontaneously self-assemble from that data? Do we focus on large-scale, database-driven projects (e.g., triple stores in the cloud), or do we rely on more localised and distributed approaches, such as annotations (e.g., hypothes.is), "content-hash" systems where a cryptographic hash of the data is also its identifier (Elliott et al. 2020), or the growing number of personal knowledge management tools (e.g., Roam, Obsidian, LogSeq)? This talk will share experiences (the good, bad, and the ugly) as I have tried to transition from naïve advocacy to constructing knowledge graphs (Page 2019), or participating in their construction (Page 2021).

Download Full-text

End-to-End Argumentation Knowledge Graph Construction

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6231 ◽

2020 ◽

Vol 34 (05) ◽

pp. 7367-7374

Author(s):

Khalid Al-Khatib ◽

Yufang Hou ◽

Henning Wachsmuth ◽

Charles Jochim ◽

Francesca Bonin ◽

...

Keyword(s):

Large Scale ◽

Question Answering ◽

Knowledge Graph ◽

Exploratory Search ◽

Text Generation ◽

Fake News ◽

High Quality ◽

Web Based ◽

Knowledge Graphs ◽

End To End

This paper studies the end-to-end construction of an argumentation knowledge graph that is intended to support argument synthesis, argumentative question answering, or fake news detection, among others. The study is motivated by the proven effectiveness of knowledge graphs for interpretable and controllable text generation and exploratory search. Original in our work is that we propose a model of the knowledge encapsulated in arguments. Based on this model, we build a new corpus that comprises about 16k manual annotations of 4740 claims with instances of the model's elements, and we develop an end-to-end framework that automatically identifies all modeled types of instances. The results of experiments show the potential of the framework for building a web-based argumentation graph that is of high quality and large scale.

Download Full-text

A Method to Learn Embedding of a Probabilistic Medical Knowledge Graph: Algorithm Development (Preprint)

10.2196/preprints.17645 ◽

2019 ◽

Author(s):

Linfeng Li ◽

Peng Wang ◽

Yao Wang ◽

Shenghui Wang ◽

Jun Yan ◽

...

Keyword(s):

Medical Records ◽

Large Scale ◽

Semantic Representation ◽

Medical Knowledge ◽

Mapping Function ◽

Graph Algorithm ◽

Knowledge Graph ◽

Knowledge Graphs ◽

Representation Method ◽

Better Than

BACKGROUND Knowledge graph embedding is an effective semantic representation method for entities and relations in knowledge graphs. Several translation-based algorithms, including TransE, TransH, TransR, TransD, and TranSparse, have been proposed to learn effective embedding vectors from typical knowledge graphs in which the relations between head and tail entities are deterministic. However, in medical knowledge graphs, the relations between head and tail entities are inherently probabilistic. This difference introduces a challenge in embedding medical knowledge graphs. OBJECTIVE We aimed to address the challenge of how to learn the probability values of triplets into representation vectors by making enhancements to existing TransX (where X is E, H, R, D, or Sparse) algorithms, including the following: (1) constructing a mapping function between the score value and the probability, and (2) introducing probability-based loss of triplets into the original margin-based loss function. METHODS We performed the proposed PrTransX algorithm on a medical knowledge graph that we built from large-scale real-world electronic medical records data. We evaluated the embeddings using link prediction task. RESULTS Compared with the corresponding TransX algorithms, the proposed PrTransX performed better than the TransX model in all evaluation indicators, achieving a higher proportion of corrected entities ranked in the top 10 and normalized discounted cumulative gain of the top 10 predicted tail entities, and lower mean rank. CONCLUSIONS The proposed PrTransX successfully incorporated the uncertainty of the knowledge triplets into the embedding vectors.

Download Full-text

End-to-end Relation-Enhanced Learnable Graph Self-attention Network for Knowledge Graphs Embedding

10.21203/rs.3.rs-396932/v1 ◽

2021 ◽

Author(s):

Shengchen Jiang ◽

Hongbin Wang ◽

Xiang Hou

Keyword(s):

Large Scale ◽

Structural Characteristics ◽

Graph Embedding ◽

Knowledge Graph ◽

Data Sets ◽

Relevance Ranking ◽

Convolutional Network ◽

Attention Network ◽

Knowledge Graphs ◽

End To End

Abstract The existing methods ignore the adverse effect of knowledge graph incompleteness on knowledge graph embedding. In addition, the complexity and large-scale of knowledge information hinder knowledge graph embedding performance of the classic graph convolutional network. In this paper, we analyzed the structural characteristics of knowledge graph and the imbalance of knowledge information. Complex knowledge information requires that the model should have better learnability, rather than linearly weighted qualitative constraints, so the method of end-to-end relation-enhanced learnable graph self-attention network for knowledge graphs embedding is proposed. Firstly, we construct the relation-enhanced adjacency matrix to consider the incompleteness of the knowledge graph. Secondly, the graph self-attention network is employed to obtain the global encoding and relevance ranking of entity node information. Thirdly, we propose the concept of convolutional knowledge subgraph, it is constructed according to the entity relevance ranking. Finally, we improve the training effect of the convKB model by changing the construction of negative samples to obtain a better reliability score in the decoder. The experimental results based on the data sets FB15k-237 and WN18RR show that the proposed method facilitates more comprehensive representation of knowledge information than the existing methods, in terms of Hits@10 and MRR.

Download Full-text

Generative Adversarial Zero-Shot Relational Learning for Knowledge Graphs

Proceedings of the AAAI Conference on Artificial Intelligence ◽

10.1609/aaai.v34i05.6392 ◽

2020 ◽

Vol 34 (05) ◽

pp. 8673-8680

Author(s):

Pengda Qin ◽

Xin Wang ◽

Wenhu Chen ◽

Chunyun Zhang ◽

Weiran Xu ◽

...

Keyword(s):

Supervised Classification ◽

Large Scale ◽

Relational Learning ◽

Generative Adversarial Networks ◽

Knowledge Graph ◽

Semantic Features ◽

Performance Improvements ◽

Adversarial Networks ◽

Knowledge Graphs ◽

Noisy Text

Large-scale knowledge graphs (KGs) are shown to become more important in current information systems. To expand the coverage of KGs, previous studies on knowledge graph completion need to collect adequate training instances for newly-added relations. In this paper, we consider a novel formulation, zero-shot learning, to free this cumbersome curation. For newly-added relations, we attempt to learn their semantic features from their text descriptions and hence recognize the facts of unseen relations with no examples being seen. For this purpose, we leverage Generative Adversarial Networks (GANs) to establish the connection between text and knowledge graph domain: The generator learns to generate the reasonable relation embeddings merely with noisy text descriptions. Under this setting, zero-shot learning is naturally converted to a traditional supervised classification task. Empirically, our method is model-agnostic that could be potentially applied to any version of KG embeddings, and consistently yields performance improvements on NELL and Wiki dataset.

Download Full-text

On the application of Big Data in future large-scale intelligent Smart City installations

International Journal of Pervasive Computing and Communications ◽

10.1108/ijpcc-03-2014-0022 ◽

2014 ◽

Vol 10 (2) ◽

pp. 168-182 ◽

Cited By ~ 10

Author(s):

Sylva Girtelschmid ◽

Matthias Steinbauer ◽

Vikash Kumar ◽

Anna Fensel ◽

Gabriele Kotsis

Keyword(s):

Big Data ◽

Real World ◽

Smart City ◽

Large Scale ◽

Smart Cities ◽

Sensor Data ◽

Data Streaming ◽

Content Type ◽

Semantic Models ◽

Intelligent Filtering

Purpose – The purpose of this article is to propose and evaluate a novel system architecture for Smart City applications which uses ontology reasoning and a distributed stream processing framework on the cloud. In the domain of Smart City, often methodologies of semantic modeling and automated inference are applied. However, semantic models often face performance problems when applied in large scale. Design/methodology/approach – The problem domain is addressed by using methods from Big Data processing in combination with semantic models. The architecture is designed in a way that for the Smart City model still traditional semantic models and rule engines can be used. However, sensor data occurring at such Smart Cities are pre-processed by a Big Data streaming platform to lower the workload to be processed by the rule engine. Findings – By creating a real-world implementation of the proposed architecture and running simulations of Smart Cities of different sizes, on top of this implementation, the authors found that the combination of Big Data streaming platforms with semantic reasoning is a valid approach to the problem. Research limitations/implications – In this article, real-world sensor data from only two buildings were extrapolated for the simulations. Obviously, real-world scenarios will have a more complex set of sensor input values, which needs to be addressed in future work. Originality/value – The simulations show that merely using a streaming platform as a buffer for sensor input values already increases the sensor data throughput and that by applying intelligent filtering in the streaming platform, the actual number of rule executions can be limited to a minimum.

Download Full-text

FTRLIM: Distributed Instance Matching Framework for Large-Scale Knowledge Graph Fusion

Entropy ◽

10.3390/e23050602 ◽

2021 ◽

Vol 23 (5) ◽

pp. 602

Author(s):

Hongming Zhu ◽

Xiaowen Wang ◽

Yizhi Jiang ◽

Hongfei Fan ◽

Bowen Du ◽

...

Keyword(s):

Real World ◽

Large Scale ◽

Linear Time ◽

Knowledge Graph ◽

Multiple Objects ◽

Instance Matching ◽

Distributed Framework ◽

Knowledge Graphs ◽

Real World Datasets ◽

Data Collections

Instance matching is a key task in knowledge graph fusion, and it is critical to improving the efficiency of instance matching, given the increasing scale of knowledge graphs. Blocking algorithms selecting candidate instance pairs for comparison is one of the effective methods to achieve the goal. In this paper, we propose a novel blocking algorithm named MultiObJ, which constructs indexes for instances based on the Ordered Joint of Multiple Objects’ features to limit the number of candidate instance pairs. Based on MultiObJ, we further propose a distributed framework named Follow-the-Regular-Leader Instance Matching (FTRLIM), which matches instances between large-scale knowledge graphs with approximately linear time complexity. FTRLIM has participated in OAEI 2019 and achieved the best matching quality with significantly efficiency. In this research, we construct three data collections based on a real-world large-scale knowledge graph. Experiment results on the constructed data collections and two real-world datasets indicate that MultiObJ and FTRLIM outperform other state-of-the-art methods.

Download Full-text

Data sharing practices and data availability upon request differ across scientific disciplines

Scientific Data ◽

10.1038/s41597-021-00981-0 ◽

2021 ◽

Vol 8 (1) ◽

Author(s):

Leho Tedersoo ◽

Rainer Küngas ◽

Ester Oras ◽

Kajar Köster ◽

Helen Eenmaa ◽

...

Keyword(s):

Data Sharing ◽

Large Scale ◽

Modern Science ◽

Research Data ◽

Data Availability ◽

Research Articles ◽

Scientific Disciplines ◽

Funding Agencies ◽

Share Data ◽

Bonus Points

AbstractData sharing is one of the cornerstones of modern science that enables large-scale analyses and reproducibility. We evaluated data availability in research articles across nine disciplines in Nature and Science magazines and recorded corresponding authors’ concerns, requests and reasons for declining data sharing. Although data sharing has improved in the last decade and particularly in recent years, data availability and willingness to share data still differ greatly among disciplines. We observed that statements of data availability upon (reasonable) request are inefficient and should not be allowed by journals. To improve data sharing at the time of manuscript acceptance, researchers should be better motivated to release their data with real benefits such as recognition, or bonus points in grant and job applications. We recommend that data management costs should be covered by funding agencies; publicly available research data ought to be included in the evaluation of applications; and surveillance of data sharing should be enforced by both academic publishers and funders. These cross-discipline survey data are available from the plutoF repository.

Download Full-text

Resource Description Framework reification for trustworthiness in knowledge graphs

F1000Research ◽

10.12688/f1000research.72843.1 ◽

2021 ◽

Vol 10 ◽

pp. 881

Author(s):

Sini Govindapillai ◽

Lay-Ki Soon ◽

Su-Cheng Haw

Keyword(s):

Resource Description Framework ◽

Structured Data ◽

Knowledge Graph ◽

Provenance Data ◽

Context Data ◽

Knowledge Graphs ◽

Description Framework ◽

Machine Readable ◽

Resource Description

Knowledge graph (KG) publishes machine-readable representation of knowledge on the Web. Structured data in the knowledge graph is published using Resource Description Framework (RDF) where knowledge is represented as a triple (subject, predicate, object). Due to the presence of erroneous, outdated or conflicting data in the knowledge graph, the quality of facts cannot be guaranteed. Therefore, the provenance of knowledge can assist in building up the trust of these knowledge graphs. In this paper, we have provided an analysis of popular, general knowledge graphs Wikidata and YAGO4 with regard to the representation of provenance and context data. Since RDF does not support metadata for providing provenance and contextualization, an alternate method, RDF reification is employed by most of the knowledge graphs. Trustworthiness of facts in knowledge graph can be enhanced by the addition of metadata like the source of information, location and time of the fact occurrence. Wikidata employs qualifiers to include metadata to facts, while YAGO4 collects metadata from Wikidata qualifiers. RDF reification increases the magnitude of data as several statements are required to represent a single fact. However, facts in Wikidata and YAGO4 can be fetched without using reification. Another limitation for applications that uses provenance data is that not all facts in these knowledge graphs are annotated with provenance data. Structured data in the knowledge graph is noisy. Therefore, the reliability of data in knowledge graphs can be increased by provenance data. To the best of our knowledge, this is the first paper that investigates the method and the extent of the addition of metadata of two prominent KGs, Wikidata and YAGO4.

Download Full-text