Interlinking Linked Data Sources Using a Domain-Independent System

Author(s):  
Khai Nguyen ◽  
Ryutaro Ichise ◽  
Bac Le
2018 ◽  
Vol 14 (3) ◽  
pp. 134-166 ◽  
Author(s):  
Amit Singh ◽  
Aditi Sharan

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.


Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


Author(s):  
Jonathan Bishop

The current phenomenon of Big Data – the use of datasets that are too big for traditional business analysis tools used in industry – is driving a shift in how social and economic problems are understood and analysed. This chapter explores the role Big Data can play in analysing the effectiveness of crowd-funding projects, using the data from such a project, which aimed to fund the development of a software plug-in called ‘QPress'. Data analysed included the website metrics of impressions, clicks and average position, which were found to be significantly connected with geographical factors using an ANOVA. These were combined with other country data to perform t-tests in order to form a geo-demographic understanding of those who are displayed advertisements inviting participation in crowd-funding. The chapter concludes that there are a number of interacting variables and that for Big Data studies to be effective, their amalgamation with other data sources, including linked data, is essential to providing an overall picture of the social phenomenon being studied.


Author(s):  
Amit Singh ◽  
Aditi Sharan

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.


Semantic Web ◽  
2014 ◽  
Vol 5 (2) ◽  
pp. 127-142 ◽  
Author(s):  
Dimitris Zeginis ◽  
Ali Hasnain ◽  
Nikolaos Loutas ◽  
Helena Futscher Deus ◽  
Ronan Fox ◽  
...  

2018 ◽  
Vol 10 (8) ◽  
pp. 2613
Author(s):  
Dandan He ◽  
Zhongfu Li ◽  
Chunlin Wu ◽  
Xin Ning

Industrialized construction has raised the requirements of procurement methods used in the construction industry. The rapid development of e-commerce offers efficient and effective solutions, however the large number of participants in the construction industry means that the data involved are complex, and problems arise related to volume, heterogeneity, and fragmentation. Thus, the sector lags behind others in the adoption of e-commerce. In particular, data integration has become a barrier preventing further development. Traditional e-commerce platform, which considered data integration for common product data, cannot meet the requirements of construction product data integration. This study aimed to build an information-integrated e-commerce platform for industrialized construction procurement (ICP) to overcome some of the shortcomings existing platforms. We proposed a platform based on Building Information Modelling (BIM) and linked data, taking an innovative approach to data integration. It uses industrialized construction technology to support product standardization, BIM to support procurement process, and linked data to connect different data sources. The platform was validated using a case study. With the development of an e-commerce ontology, industrialized construction component information was extracted from BIM models and converted to Resource Description Framework (RDF) format. Related information from different data sources was also converted to RDF format, and Simple Protocol and Resource Description Framework Query Language (SPARQL) queries were implemented. The platform provides a solution for the development of e-commerce platform in the construction industry.


2015 ◽  
Vol 31 (3) ◽  
pp. 415-429 ◽  
Author(s):  
Loredana Di Consiglio ◽  
Tiziana Tuoto

Abstract The Capture-recapture method is a well-known solution for evaluating the unknown size of a population. Administrative data represent sources of independent counts of a population and can be jointly exploited for applying the capture-recapture method. Of course, administrative sources are affected by over- or undercoverage when considered separately. The standard Petersen approach is based on strong assumptions, including perfect record linkage between lists. In reality, record linkage results can be affected by errors. A simple method for achieving linkage error-unbiased population total estimates is proposed in Ding and Fienberg (1994). In this article, an extension of the Ding and Fienberg model by relaxing their conditions is proposed. The procedures are illustrated for estimating the total number of road casualties, on the basis of a probabilistic record linkage between two administrative data sources. Moreover, a simulation study is developed, providing evidence that the adjusted estimator always performs better than the Petersen estimator.


Sign in / Sign up

Export Citation Format

Share Document