scholarly journals Tabular Data Cleaning and Linked Data Generation with Grafterizer

Author(s):  
Dina Sukhobok ◽  
Nikolay Nikolov ◽  
Antoine Pultier ◽  
Xianglin Ye ◽  
Arne Berre ◽  
...  
Author(s):  
Anastasia Dimou ◽  
Gerald Haesendonck ◽  
Martin Vanbrabant ◽  
Laurens De Vocht ◽  
Ruben Verborgh ◽  
...  

2019 ◽  
Vol 46 (3) ◽  
pp. 419-433 ◽  
Author(s):  
Övünç Öztürk ◽  
Tuğba Özacar

This article is a proof-of-concept case study to evaluate the functionality of a block metaphor–based linked data generator. In this work, we chose to produce linked data repository of recipes, which provide a medium for people to share their regional and healthy recipes with the masses. However, the same approach can also be adapted easily to other domains. Therefore, the applicability of our approach extends well beyond the food domain that we are considering in this article. As a medium for information sharing and understanding between heterogeneous systems, ontologies will play an important role in the realisation of the Internet of things (IoT) vision. Therefore, an ontology-based recipe repository would also be one of the basic blocks of a smart kitchen environment. However, building ontologies is a challenging task, especially for users who are not conversant in the ontology building languages. This article proposes an approach that can be used even by non-experts and facilitates the sharing and searching of recipe data. In our case, we exploit the features of the block paradigm to publish recipes in Linked Data format. In this way, users do not have to know the OWL (Web Ontology Language) syntax and the text input is kept minimal. As far as we know, this article is the first study that produces linked data using Blockly in the literature. We also conducted a user-based evaluation of the proposed approach using the System Usability Scale (SUS) questionnaire.


2011 ◽  
Vol 2 (3) ◽  
pp. 21-31 ◽  
Author(s):  
Arup Sarkar ◽  
Ujjal Marjit ◽  
Utpal Biswas

Author(s):  
JOSEP MARIA BRUNETTI ◽  
ROSA GIL ◽  
JUAN MANUEL GIMENO ◽  
ROBERTO GARCIA

Thanks to Open Data initiatives the amount of data available on the Web is rapidly increasing. Unfortunately, most of these initiatives only publish raw tabular data, which makes its analysis and reuse very difficult. Linked Data principles allow for a more sophisticated approach by making explicit both the structure and semantics of the data. However, from the user experience viewpoint, published datasets continue to be monolithic files which are completely opaque or difficult to explore by making complex semantic queries. Our objective is to facilitate the user to grasp what kind of entities are in the dataset, how they are interrelated, which are their main properties and values, etc. Rhizomer is a data publishing tool whose interface provides a set of components borrowed from Information Architecture (IA) that facilitate getting an insight of the dataset at hand. Rhizomer automatically generates navigation menus and facets based on the kinds of things in the dataset and how they are described through metadata properties and values. This tool is currently being evaluated with end users that discover a whole new perspective of the Web of Data.


Author(s):  
Insaf Ashrapov

GANs are well known for success in the realistic image gen-eration. However, they can be applied in tabular data generation as well.We will review and examine some recent papers about tabular GANs inaction. We will generate data to make train distribution bring closer tothe test. Then compare model performance trained on the initial traindataset, with trained on the train with GAN generated data, also wetrain the model by sampling train by adversarial training. We show thatusing GAN might be an option in case of uneven data distribution be-tween train and test data


2021 ◽  
Author(s):  
Mikel Hernandez ◽  
Gorka Epelde ◽  
Ane Alberdi ◽  
Rodrigo Cilla ◽  
Debbie Rankin

Synthetic Tabular Data Generation (STDG) is a potentially valuable technology with great promise to augment real data and preserve privacy. However, prior to adoption, an empirical assessment of synthetic tabular data (STD) is required across the three dimensions of resemblance, utility, and privacy, trying to find a trade-off between them. A lack of standardised and objective metrics and methods has been found targeting this assessment in the literature and neither an organised pipeline or process for coordinating this evaluation has been identified. Therefore, in this work we propose a collection of metrics and methods to evaluate STD in the previously defined dimensions, presenting a meaningful orchestration of them and a pipeline unifying all of them. Additionally, we present a methodology to categorise STDG approaches performance for each dimension. Finally, we conducted an extensive analysis and evaluation to verify the usability of the proposed pipeline across six healthcare-related datasets, using four STDG approaches. The results of these analyses showed that the proposed pipeline can effectively be used to evaluate and benchmark the STD generated with one or more different STDG approaches, helping the scientific community to select the most suitable approaches for their data and application of interest.


Information ◽  
2021 ◽  
Vol 12 (9) ◽  
pp. 375
Author(s):  
Stavroula Bourou ◽  
Andreas El Saer ◽  
Terpsichori-Helen Velivassaki ◽  
Artemis Voulkidis ◽  
Theodore Zahariadis

Recent technological innovations along with the vast amount of available data worldwide have led to the rise of cyberattacks against network systems. Intrusion Detection Systems (IDS) play a crucial role as a defense mechanism in networks against adversarial attackers. Machine Learning methods provide various cybersecurity tools. However, these methods require plenty of data to be trained efficiently, which may be hard to collect or to use due to privacy reasons. One of the most notable Machine Learning tools is the Generative Adversarial Network (GAN), and it has great potential for tabular data synthesis. In this work, we start by briefly presenting the most popular GAN architectures, VanillaGAN, WGAN, and WGAN-GP. Focusing on tabular data generation, CTGAN, CopulaGAN, and TableGAN models are used for the creation of synthetic IDS data. Specifically, the models are trained and evaluated on an NSL-KDD dataset, considering the limitations and requirements that this procedure needs. Finally, based on certain quantitative and qualitative methods, we argue and evaluate the most prominent GANs for tabular network data synthesis.


Author(s):  
Wouter Maroy ◽  
Anastasia Dimou ◽  
Dimitris Kontokostas ◽  
Ben De Meester ◽  
Ruben Verborgh ◽  
...  
Keyword(s):  

Polysemy, when a single term has multiple meanings, and synonymy, when multiple terms have the same meaning, are common phenomena in linguistics as well as in scientific knowledge. In ontology engineering, it is vital to detect the synonyms annotations and the multiple inheritances because of polysemy. The persistence of these issues in the semantic description of a knowledge domain causes problematic interoperability and data processing. The disambiguation of the entities, properties and relationships sense in a semantic web ontology significantly improves linked data generation and information retrieval. We explore the synonymy and polysemy in the setting of a cardiology terminology generated from textbooks on the basis of field coverage, professionals’ associations’ recommendations and bibliometrics, for the building of a cardiologic ontology. From 56,134 terms collected we found that 67.7% were unique. The indexed terms included single words, compound words and multi-word expressions. The frequency of their appearances in the combined master index was calculated and used as a marker of their significance. To cope with the linguistic polysemy and synonymy of terms, we examined them in WordNet, MeSH and BioPortal, as well as by latent semantic analysis (LSA) through singular value decomposition (SVD). Through these approaches we managed to identify and decipher semantic associations and relationships between the terms. We proposed a roadmap for ontology building from scratch by utilizing intrinsic and extrinsic knowledge resources and reuse of metadata. We anticipate that this approach is applicable in ontology engineering of different knowledge domains for relationships setting and linked data contextualization


Sign in / Sign up

Export Citation Format

Share Document