Mapeamento de Problemas de Qualidade no Linked Data

Jessica Oliveira De Souza; Jose Eduardo Santarem Segundo

doi:10.26729/jadi.v1i1.1043

Mapeamento de Problemas de Qualidade no Linked Data

Journal on Advances in Theoretical and Applied Informatics ◽

10.26729/jadi.v1i1.1043 ◽

2015 ◽

Vol 1 (1) ◽

pp. 38

Author(s):

Jessica Oliveira De Souza ◽

Jose Eduardo Santarem Segundo

Keyword(s):

Semantic Web ◽

User Experience ◽

Web Application ◽

Linked Data ◽

Data Sets ◽

Quality Of Data ◽

Related Quality ◽

Primary Means ◽

Basic Semantic

Since the Semantic Web was created in order to improve the current web user experience, the Linked Data is the primary means in which semantic web application is theoretically full, respecting appropriate criteria and requirements. Therefore, the quality of data and information stored on the linked data sets is essential to meet the basic semantic web objectives. Hence, this article aims to describe and present specific dimensions and their related quality issues.

Download Full-text

Visualization of typed links in Linked Data

Mitteilungen der Vereinigung Österreichischer Bibliothekarinnen und Bibliothekare ◽

10.31263/voebm.v70i2.1748 ◽

2017 ◽

Vol 70 (2) ◽

pp. 179-199 ◽

Cited By ~ 3

Author(s):

Georg Neubauer

Keyword(s):

World Wide Web ◽

Semantic Web ◽

Information Visualization ◽

Web Application ◽

Linked Data ◽

World Wide ◽

Processing Data ◽

The World ◽

Web Of Data ◽

Practical Guidelines

The main subject of the work is the visualization of typed links in Linked Data. The academic subjects relevant to the paper in general are the Semantic Web, the Web of Data and information visualization. The Semantic Web, invented by Tim Berners-Lee in 2001, was announced as an extension to the World Wide Web (Web 2.0). The actual area of investigation concerns the connectivity of information on the World Wide Web. To be able to explore such interconnections, visualizations are critical requirements as well as a major part of processing data in themselves. In the context of the Semantic Web, representation of information interrelations can be achieved using graphs. The aim of the article is to primarily describe the arrangement of Linked Data visualization concepts by establishing their principles in a theoretical approach. Putting design restrictions into context leads to practical guidelines. By describing the creation of two alternative visualizations of a commonly used web application representing Linked Data as network visualization, their compatibility was tested. The application-oriented part treats the design phase, its results, and future requirements of the project that can be derived from this test.

Download Full-text

Improving the Quality of Linked Data Using Statistical Distributions

Information Retrieval and Management ◽

10.4018/978-1-5225-5191-1.ch074 ◽

2018 ◽

pp. 1638-1664 ◽

Cited By ~ 1

Author(s):

Heiko Paulheim ◽

Christian Bizer

Keyword(s):

Knowledge Base ◽

Linked Data ◽

Relational Databases ◽

Knowledge Bases ◽

Structured Data ◽

Data Sources ◽

Data Sets ◽

Statistical Distributions ◽

The Web

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Download Full-text

DBSCANI: Noise-Resistant Method for Missing Value Imputation

Journal of Intelligent Systems ◽

10.1515/jisys-2014-0172 ◽

2016 ◽

Vol 25 (3) ◽

pp. 431-440 ◽

Cited By ~ 1

Author(s):

Archana Purwar ◽

Sandeep Kumar Singh

Keyword(s):

Spatial Data ◽

Missing Values ◽

Clustering Algorithm ◽

Spatial Clustering ◽

Data Sets ◽

Quality Of Data ◽

Data Set ◽

Dbscan Clustering ◽

Density Based Clustering

AbstractThe quality of data is an important task in the data mining. The validity of mining algorithms is reduced if data is not of good quality. The quality of data can be assessed in terms of missing values (MV) as well as noise present in the data set. Various imputation techniques have been studied in MV study, but little attention has been given on noise in earlier work. Moreover, to the best of knowledge, no one has used density-based spatial clustering of applications with noise (DBSCAN) clustering for MV imputation. This paper proposes a novel technique density-based imputation (DBSCANI) built on density-based clustering to deal with incomplete values in the presence of noise. Density-based clustering algorithm proposed by Kriegal groups the objects according to their density in spatial data bases. The high-density regions are known as clusters, and the low-density regions refer to the noise objects in the data set. A lot of experiments have been performed on the Iris data set from life science domain and Jain’s (2D) data set from shape data sets. The performance of the proposed method is evaluated using root mean square error (RMSE) as well as it is compared with existing K-means imputation (KMI). Results show that our method is more noise resistant than KMI on data sets used under study.

Download Full-text

The Onto-CropBase – A Semantic Web Application for Querying Crops Linked-Data

Communications in Computer and Information Science - Beyond Databases, Architectures and Structures. Advanced Technologies for Data Mining and Knowledge Discovery ◽

10.1007/978-3-319-34099-9_30 ◽

2016 ◽

pp. 384-399

Author(s):

Abba Lawan ◽

Abdur Rakib ◽

Natasha Alechina ◽

Asha Karunaratne

Keyword(s):

Semantic Web ◽

Web Application ◽

Linked Data

Download Full-text

Linking Health-Related Quality-of-Life Indicators to Large National Data Sets

PharmacoEconomics ◽

10.2165/00019053-199916050-00005 ◽

1999 ◽

Vol 16 (5) ◽

pp. 473-482 ◽

Cited By ~ 5

Author(s):

John A. Rizzo ◽

Jody L. Sindelar

Keyword(s):

Quality Of Life ◽

Data Sets ◽

Health Related Quality ◽

National Data ◽

Quality Of Life Indicators ◽

Related Quality ◽

Health Related

Download Full-text

Transitioning from XML to RDF: Considerations for an effective move towards Linked Data and the Semantic Web

Information Technology and Libraries ◽

10.6017/ital.v35i1.9182 ◽

2016 ◽

Vol 35 (1) ◽

pp. 51 ◽

Cited By ~ 1

Author(s):

Juliet L. Hardesty

Keyword(s):

Semantic Web ◽

Linked Data ◽

Academic Library ◽

Data Sets ◽

Markup Language ◽

Digital Repository ◽

Extensible Markup ◽

Description Framework ◽

Meaningful Relationships ◽

Resource Description

Metadata, particularly within the academic library setting, is often expressed in eXtensible Markup Language (XML) and managed with XML tools, technologies, and workflows. Managing a library’s metadata currently takes on a greater level of complexity as libraries are increasingly adopting the Resource Description Framework (RDF). Semantic Web initiatives are surfacing in the library context with experiments in publishing metadata as Linked Data sets and also with development efforts such as BIBFRAME and the Fedora 4 Digital Repository incorporating RDF. Use cases show that transitions into RDF are occurring in both XML standards and in libraries with metadata encoded in XML. It is vital to understand that transitioning from XML to RDF requires a shift in perspective from replicating structures in XML to defining meaningful relationships in RDF. Establishing coordination and communication among these efforts will help as more libraries move to use RDF, produce Linked Data, and approach the Semantic Web.

Download Full-text

Validity as a Measure of Data Quality in Internet of Things Systems

10.21203/rs.3.rs-811543/v1 ◽

2021 ◽

Author(s):

Rishabh Deo Pandey ◽

Itu Snigdh

Keyword(s):

Data Quality ◽

Data Sets ◽

Major Focus ◽

Quality Of Data ◽

Data Set ◽

Environment Analysis ◽

Aggregated Data ◽

Pervasive Environment ◽

Validation Parameters

Abstract Data quality became significant with the emergence of data warehouse systems. While accuracy is intrinsic data quality, validity of data presents a wider perspective, which is more representational and contextual in nature. Through our article we present a different perspective in data collection and collation. We focus on faults experienced in data sets and present validity as a function of allied parameters such as completeness, usability, availability and timeliness for determining the data quality. We also analyze the applicability of these metrics and apply modifications to make it conform to IoT applications. Another major focus of this article is to verify these metrics on aggregated data set instead of separate data values. This work focuses on using the different validation parameters for determining the quality of data generated in a pervasive environment. Analysis approach presented is simple and can be employed to test the validity of collected data, isolate faults in the data set and also measure the suitability of data before applying algorithms for analysis.

Download Full-text

Building Semantic Web Portals with a Model- Driven Design Approach

Web Technologies ◽

10.4018/978-1-60566-982-3.ch032 ◽

2011 ◽

pp. 541-570

Author(s):

Marco Brambilla ◽

Federico M. Facca

Keyword(s):

Semantic Web ◽

Web Application ◽

Web Applications ◽

Conceptual Models ◽

Modeling Framework ◽

Good Design ◽

Model Driven ◽

Software Artifacts ◽

Design Requirements

This chapter presents an extension to Web application conceptual models toward Semantic Web. Conceptual models and model-driven methodologies are widely applied to the development of Web applications because of the advantages they grant in terms of productivity and quality of the outcome. Although some of these approaches are meant to address Semantic Web applications too, they do not fully exploit the whole potential deriving from interaction with ontological data sources and from semantic annotations. The authors claim that Semantic Web applications represent an emerging category of software artifacts, with peculiar characteristics and software structures, and hence need some specific methods and primitives for achieving good design results. In particular the contribution presented in this chapter is an extension of the WebML modeling framework that fulfils most of the design requirements emerging in the new area of Semantic Web. The authors generalize the development process to cover Semantic Web needs and devise a set of new primitives for ontology importing and querying. The chapter also presents a comparison of the proposed approach with the most relevant existing proposals and positioned with respect to the background and adopted technologies.

Download Full-text

Using Patterns for Engineering High-Quality Web Applications

Software Engineering for Modern Web Applications ◽

10.4018/978-1-59904-492-7.ch006 ◽

2011 ◽

pp. 100-122

Author(s):

Pankaj Kamthan

Keyword(s):

Semantic Web ◽

Web 2.0 ◽

Web Application ◽

Web Applications ◽

Architecture Design ◽

Top Down ◽

High Quality ◽

Systematic Selection

In this chapter, we view the development and maintenance of Web applications from an engineering perspective. A methodology, termed as POWEM, for deploying patterns as means for improving the quality of Web applications is presented. To that end, relevant quality attributes and corresponding stakeholder types are identified. The role of a process, the challenges in making optimal use of patterns, and feasibility issues involved in doing so, are analyzed. The activities of a systematic selection and application of patterns are explored. Following a top-down approach to design, examples illustrating the use of patterns during macro- and micro-architecture design of a Web application are given. Finally, the implications towards Semantic Web applications and Web 2.0 applications are briefly outlined.

Download Full-text

Approximation Measures for Conditional Functional Dependencies Using Stripped Conditional Partitions

International Journal of Electrical and Computer Engineering (IJECE) ◽

10.11591/ijece.v7i3.pp1385-1397 ◽

2017 ◽

Vol 7 (3) ◽

pp. 1385

Author(s):

Anh Duy Tran ◽

Somjit Arch-int ◽

Ngamnij Arch-int

Keyword(s):

Real Data ◽

Equivalence Classes ◽

Data Sets ◽

Functional Dependencies ◽

Quality Of Data ◽

Data Dependencies ◽

Incomplete Knowledge ◽

Discovery Algorithms ◽

Knowledge Granularity

Conditional functional dependencies (CFDs) have been used to improve the quality of data, including detecting and repairing data inconsistencies. Approximation measures have significant importance for data dependencies in data mining. To adapt to exceptions in real data, the measures are used to relax the strictness of CFDs for more generalized dependencies, called approximate conditional functional dependencies (ACFDs). This paper analyzes the weaknesses of dependency degree, confidence and conviction measures for general CFDs (constant and variable CFDs). A new measure for general CFDs based on incomplete knowledge granularity is proposed to measure the approximation of these dependencies as well as the distribution of data tuples into the conditional equivalence classes. Finally, the effectiveness of stripped conditional partitions and this new measure are evaluated on synthetic and real data sets. These results are important to the study of theory of approximation dependencies and improvement of discovery algorithms of CFDs and ACFDs.

Download Full-text