PatchR

Author(s):  
Magnus Knuth ◽  
Harald Sack

Incorrect or outdated data is a common problem when working with Linked Data in real world applications. Linked Data is distributed over the web and under control of various dataset publishers. It is difficult for data publishers to ensure the quality and timeliness of the data all by themselves, though they might receive individual complaints by data users, who identified incorrect or missing data. Indeed, the authors see Linked Data consumers equally responsible for the quality of the datasets they use. PatchR provides a vocabulary to report incorrect data and to propose changes to correct them. Based on the PatchR ontology a framework is suggested that allows users to efficiently report and data publishers to handle change requests for their datasets.

Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


2015 ◽  
Vol 2015 ◽  
pp. 1-14 ◽  
Author(s):  
Jaemun Sim ◽  
Jonathan Sangyun Lee ◽  
Ohbyung Kwon

In a ubiquitous environment, high-accuracy data analysis is essential because it affects real-world decision-making. However, in the real world, user-related data from information systems are often missing due to users’ concerns about privacy or lack of obligation to provide complete data. This data incompleteness can impair the accuracy of data analysis using classification algorithms, which can degrade the value of the data. Many studies have attempted to overcome these data incompleteness issues and to improve the quality of data analysis using classification algorithms. The performance of classification algorithms may be affected by the characteristics and patterns of the missing data, such as the ratio of missing data to complete data. We perform a concrete causal analysis of differences in performance of classification algorithms based on various factors. The characteristics of missing values, datasets, and imputation methods are examined. We also propose imputation and classification algorithms appropriate to different datasets and circumstances.


2020 ◽  
Vol 8 ◽  
pp. 539-555
Author(s):  
Marina Fomicheva ◽  
Shuo Sun ◽  
Lisa Yankovskaya ◽  
Frédéric Blain ◽  
Francisco Guzmán ◽  
...  

Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time. Existing approaches require large amounts of expert annotated data, computation, and time for training. As an alternative, we devise an unsupervised approach to QE where no training or access to additional resources besides the MT system itself is required. Different from most of the current work that treats the MT system as a black box, we explore useful information that can be extracted from the MT system as a by-product of translation. By utilizing methods for uncertainty quantification, we achieve very good correlation with human judgments of quality, rivaling state-of-the-art supervised QE models. To evaluate our approach we collect the first dataset that enables work on both black-box and glass-box approaches to QE.


2018 ◽  
Vol 52 (3) ◽  
pp. 405-423 ◽  
Author(s):  
Riccardo Albertoni ◽  
Monica De Martino ◽  
Paola Podestà

Purpose The purpose of this paper is to focus on the quality of the connections (linkset) among thesauri published as Linked Data on the Web. It extends the cross-walking measures with two new measures able to evaluate the enrichment brought by the information reached through the linkset (lexical enrichment, browsing space enrichment). It fosters the adoption of cross-walking linkset quality measures besides the well-known and deployed cardinality-based measures (linkset cardinality and linkset coverage). Design/methodology/approach The paper applies the linkset measures to the Linked Thesaurus fRamework for Environment (LusTRE). LusTRE is selected as testbed as it is encoded using a Simple Knowledge Organisation System (SKOS) published as Linked Data, and it explicitly exploits the cross-walking measures on its validated linksets. Findings The application on LusTRE offers an insight of the complementarities among the considered linkset measures. In particular, it shows that the cross-walking measures deepen the cardinality-based measures analysing quality facets that were not previously considered. The actual value of LusTRE’s linksets regarding the improvement of multilingualism and concept spaces is assessed. Research limitations/implications The paper considers skos:exactMatch linksets, which belong to a rather specific but a quite common kind of linkset. The cross-walking measures explicitly assume correctness and completeness of linksets. Third party approaches and tools can help to meet the above assumptions. Originality/value This paper fulfils an identified need to study the quality of linksets. Several approaches formalise and evaluate Linked Data quality focusing on data set quality but disregarding the other essential component: the connection among data.


Mathematics ◽  
2021 ◽  
Vol 9 (19) ◽  
pp. 2374
Author(s):  
Oswaldo Ulises Juarez-Sandoval ◽  
Francisco Javier Garcia-Ugalde ◽  
Manuel Cedillo-Hernandez ◽  
Jazmin Ramirez-Hernandez ◽  
Leobardo Hernandez-Gonzalez

Digital image watermarking algorithms have been designed for intellectual property, copyright protection, medical data management, and other related fields; furthermore, in real-world applications such as official documents, banknotes, etc., they are used to deliver additional information about the documents’ authenticity. In this context, the imperceptible–visible watermarking (IVW) algorithm has been designed as a digital reproduction of the real-world watermarks. This paper presents a new improved IVW algorithm for copyright protection that can deliver additional information to the image content. The proposed algorithm is divided into two stages: in the embedding stage, a human visual system-based strategy is used to embed an owner logotype or a 2D quick response (QR) code as a watermark into a color image, maintaining a high watermark imperceptibility and low image-quality degradation. In the exhibition, a new histogram binarization function approach is introduced to exhibit any watermark with enough quality to be recognized or decoded by any application, which is focused on reading QR codes. The experimental results show that the proposed algorithm can embed one or more watermark patterns, maintaining the high imperceptibility and visual quality of the embedded and the exhibited watermark. The performance evaluation shows that the method overcomes several drawbacks reported in previous algorithms, including geometric and image processing attacks such as JPEG and JPEG2000.


1997 ◽  
Vol 12 (01) ◽  
pp. 95-98 ◽  
Author(s):  
MARIE-CHRISTINE ROUSSET ◽  
SUSAN CRAW

Ensuring reliability and enhancing quality of Knowledge Based Systems (KBS) are critical factors for their successful deployment in real-world applications. This is a broad task involving both methodological and formal approaches for designing rigorous Validation, Verification and Testing (VVT) methods and tools. Some of these can be adapted from conventional software engineering, while others rely on specific aspects of KBS.


Author(s):  
Lei Tang ◽  
Huan Liu ◽  
Jiangping Zhang

The unregulated and open nature of the Internet and the explosive growth of the Web create a pressing need to provide various services for content categorization. The hierarchical classification attempts to achieve both accurate classification and increased comprehensibility. It has also been shown in literature that hierarchical models outperform flat models in training efficiency, classification efficiency, and classification accuracy (Koller & Sahami, 1997; McCallum, Rosenfeld, Mitchell & Ng, 1998; Ruiz & Srinivasan ,1999; Dumais & Chen, 2000; Yang, Zhang & Kisiel, 2003; Cai & Hofmann, 2004; Liu, Yang, Wan, Zeng, Cheng & Ma, 2005). However, the quality of the taxonomy attracted little attention in past works. Actually, different taxonomies can result in differences in classification. So the quality of the taxonomy should be considered for real-world classifications. Even a semantically sound taxonomy does not necessarily lead to the intended classification performance (Tang, Zhang & Liu 2006). Therefore, it is desirable to construct or modify a hierarchy to better suit the hierarchical content classification task.


2017 ◽  
Vol 29 (2) ◽  
pp. 226-254 ◽  
Author(s):  
Susumu Shikano ◽  
Michael F Stoffel ◽  
Markus Tepe

The relationship between legislatures and bureaucracies is typically modeled as a principal–agent game. Legislators can acquire information about the (non-)compliance of bureaucrats at some specific cost. Previous studies consider the information from oversight to be perfect, which contradicts most real-world applications. We therefore provide a model that includes random noise as part of the information. The quality of provided goods usually increases with information accuracy while simultaneously requiring less oversight. However, bureaucrats never provide high quality if information accuracy is below a specific threshold. We assess the empirical validity of our predictions in a lab experiment. Our data show that information accuracy is indeed an important determinant of both legislator and bureaucrat decision-making.


2019 ◽  
Vol 16 (1) ◽  
pp. 38-50
Author(s):  
Kridsda Nimmanunta ◽  
Thunyarat (Bam) Amornpetchkul

One of Bangkok’s most perennial problems was the misbehaviour of taxi drivers. In only 4 months, from October 2015 to January 2016, the Department of Land Transport under the Ministry of Transport (MOT) of Thailand received almost 15,000 complaints regarding the quality of services provided by Bangkok’s taxi drivers. The number one complaint was passenger refusal. Anybody taking a taxi, particularly during rush hour, was likely to get frustrated with some taxi drivers, who got flagged down but refused to go to the requested destinations. Several attempts had been made by the MOT to resolve the issue of taxi drivers refusing passengers, including imposing fines and suspending taxi drivers, allowing fare raise to improve taxi drivers’ well-being, hoping to provide higher quality services and to abide by the laws and regulations. So far, the results had been unsatisfactory. This case aims to show the beauty and usefulness of real options in real-world applications by looking at one of Bangkok’s most perennial problems of taxi drivers refusing passengers. A real option is a powerful framework for business, finance and economic decisions. Not only that, but it is also a versatile tool for resolving social issues.


Sign in / Sign up

Export Citation Format

Share Document