Interlinking Linked Data Sources Using a Domain-Independent System

Genetic-Fuzzy Programming Based Linkage Rule Miner (GFPLR-Miner) for Entity Linking in Semantic Web

International Journal on Semantic Web and Information Systems ◽

10.4018/ijswis.2018070107 ◽

2018 ◽

Vol 14 (3) ◽

pp. 134-166 ◽

Cited By ~ 1

Author(s):

Amit Singh ◽

Aditi Sharan

Keyword(s):

Fuzzy Logic ◽

Semantic Web ◽

Linked Data ◽

State Of The Art ◽

Fuzzy Programming ◽

Data Sources ◽

Fuzzy Approach ◽

Entity Linking ◽

Efficient Information ◽

Domain Independent

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

PROPheT – Ontology Population and Semantic Enrichment from Linked Data Sources

Communications in Computer and Information Science - Data Analytics and Management in Data Intensive Domains ◽

10.1007/978-3-319-96553-6_12 ◽

2018 ◽

pp. 157-168 ◽

Cited By ~ 1

Author(s):

Marina Riga ◽

Panagiotis Mitzias ◽

Efstratios Kontopoulos ◽

Ioannis Kompatsiaris

Keyword(s):

Linked Data ◽

Data Sources ◽

Semantic Enrichment ◽

Ontology Population

Download Full-text

The Role of Geo-Demographic Big Data for Assessing the Effectiveness of Crowd-Funded Software Projects

Geo-Intelligence and Visualization through Big Data Trends - Advances in Geospatial Technologies ◽

10.4018/978-1-4666-8465-2.ch004 ◽

2015 ◽

pp. 94-120

Author(s):

Jonathan Bishop

Keyword(s):

Big Data ◽

Linked Data ◽

Data Sources ◽

Social Phenomenon ◽

Software Projects ◽

Economic Problems ◽

The Social ◽

Business Analysis ◽

Average Position

The current phenomenon of Big Data – the use of datasets that are too big for traditional business analysis tools used in industry – is driving a shift in how social and economic problems are understood and analysed. This chapter explores the role Big Data can play in analysing the effectiveness of crowd-funding projects, using the data from such a project, which aimed to fund the development of a software plug-in called ‘QPress'. Data analysed included the website metrics of impressions, clicks and average position, which were found to be significantly connected with geographical factors using an ANOVA. These were combined with other country data to perform t-tests in order to form a geo-demographic understanding of those who are displayed advertisements inviting participation in crowd-funding. The chapter concludes that there are a number of interacting variables and that for Big Data studies to be effective, their amalgamation with other data sources, including linked data, is essential to providing an overall picture of the social phenomenon being studied.

Download Full-text

Genetic-Fuzzy Programming Based Linkage Rule Miner (GFPLR-Miner) for Entity Linking in Semantic Web

Research Anthology on Multi-Industry Uses of Genetic Programming and Algorithms ◽

10.4018/978-1-7998-8048-6.ch023 ◽

2021 ◽

pp. 447-481

Author(s):

Amit Singh ◽

Aditi Sharan

Keyword(s):

Fuzzy Logic ◽

Information Retrieval ◽

Semantic Web ◽

Real World ◽

Fuzzy Programming ◽

Data Sources ◽

Fuzzy Approach ◽

Entity Linking ◽

Efficient Information ◽

Domain Independent

This article describes how semantic web data sources follow linked data principles to facilitate efficient information retrieval and knowledge sharing. These data sources may provide complementary, overlapping or contradicting information. In order to integrate these data sources, the authors perform entity linking. Entity linking is an important task of identifying and linking entities across data sources that refer to the same real-world entities. In this work, they have proposed a genetic fuzzy approach to learn linkage rules for entity linking. This method is domain independent, automatic and scalable. Their approach uses fuzzy logic to adapt mutation and crossover rates of genetic programming to ensure guided convergence. The authors' experimental evaluation demonstrates that our approach is competitive and make significant improvements over state of the art methods.

Download Full-text

A collaborative methodology for developing a semantic model for interlinking Cancer Chemoprevention linked-data sources

Semantic Web ◽

10.3233/sw-130112 ◽

2014 ◽

Vol 5 (2) ◽

pp. 127-142 ◽

Cited By ~ 13

Author(s):

Dimitris Zeginis ◽

Ali Hasnain ◽

Nikolaos Loutas ◽

Helena Futscher Deus ◽

Ronan Fox ◽

...

Keyword(s):

Linked Data ◽

Cancer Chemoprevention ◽

Data Sources ◽

Semantic Model ◽

Collaborative Methodology

Download Full-text

An E-Commerce Platform for Industrialized Construction Procurement Based on BIM and Linked Data

Sustainability ◽

10.3390/su10082613 ◽

2018 ◽

Vol 10 (8) ◽

pp. 2613

Author(s):

Dandan He ◽

Zhongfu Li ◽

Chunlin Wu ◽

Xin Ning

Keyword(s):

Data Integration ◽

Construction Industry ◽

Resource Description Framework ◽

Linked Data ◽

Query Language ◽

Rapid Development ◽

Data Sources ◽

Product Data ◽

Description Framework ◽

Resource Description

Industrialized construction has raised the requirements of procurement methods used in the construction industry. The rapid development of e-commerce offers efficient and effective solutions, however the large number of participants in the construction industry means that the data involved are complex, and problems arise related to volume, heterogeneity, and fragmentation. Thus, the sector lags behind others in the adoption of e-commerce. In particular, data integration has become a barrier preventing further development. Traditional e-commerce platform, which considered data integration for common product data, cannot meet the requirements of construction product data integration. This study aimed to build an information-integrated e-commerce platform for industrialized construction procurement (ICP) to overcome some of the shortcomings existing platforms. We proposed a platform based on Building Information Modelling (BIM) and linked data, taking an innovative approach to data integration. It uses industrialized construction technology to support product standardization, BIM to support procurement process, and linked data to connect different data sources. The platform was validated using a case study. With the development of an e-commerce ontology, industrialized construction component information was extracted from BIM models and converted to Resource Description Framework (RDF) format. Related information from different data sources was also converted to RDF format, and Simple Protocol and Resource Description Framework Query Language (SPARQL) queries were implemented. The platform provides a solution for the development of e-commerce platform in the construction industry.

Download Full-text

Coverage Evaluation on Probabilistically Linked Data

Journal of Official Statistics ◽

10.1515/jos-2015-0025 ◽

2015 ◽

Vol 31 (3) ◽

pp. 415-429 ◽

Cited By ~ 2

Author(s):

Loredana Di Consiglio ◽

Tiziana Tuoto

Keyword(s):

Administrative Data ◽

Simulation Study ◽

Record Linkage ◽

Linked Data ◽

Data Sources ◽

Simple Method ◽

Population Total ◽

Linkage Error ◽

Capture Recapture ◽

Better Than

Abstract The Capture-recapture method is a well-known solution for evaluating the unknown size of a population. Administrative data represent sources of independent counts of a population and can be jointly exploited for applying the capture-recapture method. Of course, administrative sources are affected by over- or undercoverage when considered separately. The standard Petersen approach is based on strong assumptions, including perfect record linkage between lists. In reality, record linkage results can be affected by errors. A simple method for achieving linkage error-unbiased population total estimates is proposed in Ding and Fienberg (1994). In this article, an extension of the Ding and Fienberg model by relaxing their conditions is proposed. The procedures are illustrated for estimating the total number of road casualties, on the basis of a probabilistic record linkage between two administrative data sources. Moreover, a simulation study is developed, providing evidence that the adjusted estimator always performs better than the Petersen estimator.

Download Full-text