A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources

In the last years a rapid emergence of lexical resources has evolved in the Semantic Web. Whereas most of the linguistic information is already machine-readable, we found that morphological information is mostly absent or only contained in semi-structured strings. An integration of morphemic data has not yet been undertaken due to the lack of existing domain-specific ontologies and explicit morphemic data. In this paper, we present the Multilingual Morpheme Ontology called MMoOn Core which can be regarded as the first comprehensive ontology for the linguistic domain of morphological language data. It will be described how crucial concepts like morphs, morphemes, word forms and meanings are represented and interrelated and how language-specific morpheme inventories can be created as a new possibility of morphological datasets. The aim of the MMoOn Core ontology is to serve as a shared semantic model for linguists and NLP researchers alike to enable the creation, conversion, exchange, reuse and enrichment of morphological language data across different data-dependent language sciences. Therefore, various use cases are illustrated to draw attention to the cross-disciplinary potential which can be realized with the MMoOn Core ontology in the context of the existing Linguistic Linked Data research landscape.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

PROPheT – Ontology Population and Semantic Enrichment from Linked Data Sources

Communications in Computer and Information Science - Data Analytics and Management in Data Intensive Domains ◽

10.1007/978-3-319-96553-6_12 ◽

2018 ◽

pp. 157-168 ◽

Cited By ~ 1

Author(s):

Marina Riga ◽

Panagiotis Mitzias ◽

Efstratios Kontopoulos ◽

Ioannis Kompatsiaris

Keyword(s):

Linked Data ◽

Data Sources ◽

Semantic Enrichment ◽

Ontology Population

Download Full-text

The Role of Geo-Demographic Big Data for Assessing the Effectiveness of Crowd-Funded Software Projects

Geo-Intelligence and Visualization through Big Data Trends - Advances in Geospatial Technologies ◽

10.4018/978-1-4666-8465-2.ch004 ◽

2015 ◽

pp. 94-120

Author(s):

Jonathan Bishop

Keyword(s):

Big Data ◽

Linked Data ◽

Data Sources ◽

Social Phenomenon ◽

Software Projects ◽

Economic Problems ◽

The Social ◽

Business Analysis ◽

Average Position

The current phenomenon of Big Data – the use of datasets that are too big for traditional business analysis tools used in industry – is driving a shift in how social and economic problems are understood and analysed. This chapter explores the role Big Data can play in analysing the effectiveness of crowd-funding projects, using the data from such a project, which aimed to fund the development of a software plug-in called ‘QPress'. Data analysed included the website metrics of impressions, clicks and average position, which were found to be significantly connected with geographical factors using an ANOVA. These were combined with other country data to perform t-tests in order to form a geo-demographic understanding of those who are displayed advertisements inviting participation in crowd-funding. The chapter concludes that there are a number of interacting variables and that for Big Data studies to be effective, their amalgamation with other data sources, including linked data, is essential to providing an overall picture of the social phenomenon being studied.

Download Full-text

Genetic-Fuzzy Programming Based Linkage Rule Miner (GFPLR-Miner) for Entity Linking in Semantic Web

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2018070107 ◽

2018 ◽

Vol 14 (3) ◽

pp. 134-166 ◽

Cited By ~ 1

Author(s):

Amit Singh ◽

Aditi Sharan

Keyword(s):

Fuzzy Logic ◽

Semantic Web ◽

Linked Data ◽

State Of The Art ◽

Fuzzy Programming ◽

Data Sources ◽

Fuzzy Approach ◽

Entity Linking ◽

Efficient Information ◽

Domain Independent

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.

Download Full-text

An E-Commerce Platform for Industrialized Construction Procurement Based on BIM and Linked Data

Sustainability ◽

10.3390/su10082613 ◽

2018 ◽

Vol 10 (8) ◽

pp. 2613

Author(s):

Dandan He ◽

Zhongfu Li ◽

Chunlin Wu ◽

Xin Ning

Keyword(s):

Data Integration ◽

Construction Industry ◽

Resource Description Framework ◽

Linked Data ◽

Query Language ◽

Rapid Development ◽

Data Sources ◽

Product Data ◽

Description Framework ◽

Resource Description

Industrialized construction has raised the requirements of procurement methods used in the construction industry. The rapid development of e-commerce offers efficient and effective solutions, however the large number of participants in the construction industry means that the data involved are complex, and problems arise related to volume, heterogeneity, and fragmentation. Thus, the sector lags behind others in the adoption of e-commerce. In particular, data integration has become a barrier preventing further development. Traditional e-commerce platform, which considered data integration for common product data, cannot meet the requirements of construction product data integration. This study aimed to build an information-integrated e-commerce platform for industrialized construction procurement (ICP) to overcome some of the shortcomings existing platforms. We proposed a platform based on Building Information Modelling (BIM) and linked data, taking an innovative approach to data integration. It uses industrialized construction technology to support product standardization, BIM to support procurement process, and linked data to connect different data sources. The platform was validated using a case study. With the development of an e-commerce ontology, industrialized construction component information was extracted from BIM models and converted to Resource Description Framework (RDF) format. Related information from different data sources was also converted to RDF format, and Simple Protocol and Resource Description Framework Query Language (SPARQL) queries were implemented. The platform provides a solution for the development of e-commerce platform in the construction industry.

Download Full-text

Coverage Evaluation on Probabilistically Linked Data

Journal of Official Statistics ◽

10.1515/jos-2015-0025 ◽

2015 ◽

Vol 31 (3) ◽

pp. 415-429 ◽

Cited By ~ 2

Author(s):

Loredana Di Consiglio ◽

Tiziana Tuoto

Keyword(s):

Administrative Data ◽

Simulation Study ◽

Record Linkage ◽

Linked Data ◽

Data Sources ◽

Simple Method ◽

Population Total ◽

Linkage Error ◽

Capture Recapture ◽

Better Than

Abstract The Capture-recapture method is a well-known solution for evaluating the unknown size of a population. Administrative data represent sources of independent counts of a population and can be jointly exploited for applying the capture-recapture method. Of course, administrative sources are affected by over- or undercoverage when considered separately. The standard Petersen approach is based on strong assumptions, including perfect record linkage between lists. In reality, record linkage results can be affected by errors. A simple method for achieving linkage error-unbiased population total estimates is proposed in Ding and Fienberg (1994). In this article, an extension of the Ding and Fienberg model by relaxing their conditions is proposed. The procedures are illustrated for estimating the total number of road casualties, on the basis of a probabilistic record linkage between two administrative data sources. Moreover, a simulation study is developed, providing evidence that the adjusted estimator always performs better than the Petersen estimator.

Download Full-text

NEPS Starting Cohort 6 survey data linked to administrative data of the IAB (NEPS-SC6-ADIAB)

International Journal for Population Data Science ◽

10.23889/ijpds.v3i2.551 ◽

2018 ◽

Vol 3 (2) ◽

Author(s):

Nadine Bachbauer

Keyword(s):

Survey Data ◽

Administrative Data ◽

Linked Data ◽

Panel Study ◽

Data Access ◽

Data Sources ◽

Employment Agency ◽

Educational Trajectories ◽

Analytical Potential ◽

Site Access

BackgroundNEPS-SC6-ADIAB is a new linked data product containing survey data of Starting Cohort 6 of the German National Educational Panel Study (NEPS) and administrative employment data from the Institute for Employment Research (IAB), the research institute of the Federal Employment Agency. NEPS is provided by the Leibniz Institute for Educational Trajectories (LIfBi). Starting Cohort 6 of this panel survey includes adults in their professional life, the survey focuses on education in adulthood and lifelong learning. The administrative data in NEPS-SC6-ADIAB consist of comprehensive information on the employment histories. ObjectivesCombining these two data sources increases for example the information about individual employment history. Overall, the data volume is increased by the linkage between the survey data and the administrative data. MethodsA record linkage process was used to link the two data sources. The data access is free for the whole scientific community. In addition to a large number of On-site access locations within Germany, there are also international On-site access locations. Including London and Colchester. In addition a Remote Data Access is offered. ConclusionsThis data linkage project is very innovative and creates an extensive database, which results in extensive analytical potential. A short application example is made to exemplify the comprehensive analytical potential of NEPS-SC6-ADIAB. This ongoing project deals with nonresponse in survey data. The linked data has a variety of variables collected in both data sources, administratively and through the NEPS survey, allowing for comparative analyses. In this case an idea to compensate nonresponse in income data with administrative data is drawn.

Download Full-text

Discovering Concept Coverings in Ontologies of Linked Data Sources

The Semantic Web – ISWC 2012 - Lecture Notes in Computer Science ◽

10.1007/978-3-642-35176-1_27 ◽

2012 ◽

pp. 427-443 ◽

Cited By ~ 27

Author(s):

Rahul Parundekar ◽

Craig A. Knoblock ◽

José Luis Ambite

Keyword(s):

Linked Data ◽

Data Sources

Download Full-text

Security Assessment Methodology Based on the Semantic Model of Metrics and Data

Voprosy kiberbezopasnosti ◽

10.21681/2311-3456-2021-1-29-40 ◽

2021 ◽

pp. 29-40

Author(s):

Elena Doynikova ◽

◽

Andrey Fedorchenko ◽

Igor Kotenko ◽

Evgenia Novikova ◽

...

Keyword(s):

Information System ◽

System Analysis ◽

Semantic Analysis ◽

Data Sources ◽

Security Level ◽

Security Assessment ◽

Semantic Model ◽

Security Metrics ◽

Security Research ◽

Proposed Model

The purpose of the article: development of semantic model of metrics and data and technique for security assessment based on of this model to get objective scores of information system security. Research method: theoretical and system analysis of open security data sources and security metrics, semantic analysis and classification of security data, development of the security assessment technique based on the semantic model and methods of logical inference, functional testing of the developed technique. The result obtained: an approach based on the semantic model of metrics and data is proposed. The model is an ontology generated considering relations among the data sources, information system objects and data about them, primary metrics of information system objects and integral metrics and goals of assessment. The technique for metrics calculation and assessment of unspecified information systems security level in real-time using the proposed model is developed. The case study demonstrating applicability of the developed technique and ontology to answer security assessment questions is provided. The area of use of the proposed approach are security assessment components of information security monitoring and management systems aimed at increasing their efficiency.

Download Full-text