Increasing Quality of Austrian Open Data by Linking Them to Linked Data Sources: Lessons Learned

Author(s):  
Tomáš Knap
Author(s):  
Heiko Paulheim ◽  
Christian Bizer

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.


2016 ◽  
Vol 12 (3) ◽  
pp. 111-133 ◽  
Author(s):  
Ahmad Assaf ◽  
Aline Senart ◽  
Raphaël Troncy

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.


Author(s):  
Abdulbaki Uzun ◽  
Eric Neidhardt ◽  
Axel Küpper

Mobile network operators maintain data about their mobile network topology, which is mainly used for network provisioning and planning purposes restricting its full business potential. Utilizing this data in combination with the extensive pool of semantically modeled data in the Linking Open Data Cloud, innovative applications can be realized that would establish network operators as service providers and enablers in the highly competitive services market. In this article, the authors introduce the OpenMobileNetwork (available at http://www.openmobilenetwork.org/) as an open solution for providing approximated network topology data based on the principles of Linked Data along with a business concept for network operators to exploit their valuable asset. Since the quality of the estimated network topology is crucial when providing services on top of it, the authors further analyze and evaluate state-of-the-art approaches for estimating base station positions out of crowdsourced data and discuss the results in comparison to real base station locations.


Author(s):  
Bonnie MacKellar ◽  
Christina Schweikert ◽  
Soon Ae Chun

Patients often want to participate in relevant clinical trials for new or more effective alternative treatments. The clinical search system made available by the NIH is a step forward to support the patient's decision making, but, it is difficult to use and requires the patient to sift through lengthy text descriptions for relevant information. In addition, patients deciding whether to pursue a given trial often want more information, such as drug information. The authors' overall aim is to develop an intelligent patient-centered clinical trial decision support system. Their approach is to integrate Open Data sources related to clinical trials using the Semantic Web's Linked Data framework. The linked data representation, in terms of RDF triples, allows the development of a clinical trial knowledge base that includes entities from different open data sources and relationships among entities. The authors consider Open Data sources such as clinical trials provided by NIH as well as the drug side effects dataset SIDER. The authors use UMLS (Unified Medical Language System) to provide consistent semantics and ontological knowledge for clinical trial related entities and terms. The authors' semantic approach is a step toward a cognitive system that provides not only patient-centered integrated data search but also allows automated reasoning in search, analysis and decision making using the semantic relationships embedded in the Linked data. The authors present their integrated clinical trial knowledge base development and a prototype, patient-centered Clinical Trial Decision Support System that include capabilities of semantic search and query with reasoning ability, and semantic-link browsing where an exploration of one concept leads to other concepts easily via links which can provide visual search for the end users.


2021 ◽  
Vol 5 (2) ◽  
pp. 299-307
Author(s):  
Filippo Candela ◽  
Paolo Mulassano

Abstract The paper presents and discusses the method adopted by Compagnia di San Paolo, one of the largest European philanthropic institutions, to monitor the advancement, despite the COVID-19 situation, in providing specific input to the decision-making process for dedicated projects. An innovative approach based on the use of daily open data was adopted to monitor the metropolitan area with a multidimensional perspective. Several open data indicators related to the economy, society, culture, environment, and climate were identified and incorporated into the decision support system dashboard. Indicators are presented and discussed to highlight how open data could be integrated into the foundation's strategic approach and potentially replicated on a large scale by local institutions. Moreover, starting from the lessons learned from this experience, the paper analyzes the opportunities and critical issues surrounding the use of open data, not only to improve the quality of life during the COVID-19 epidemic but also for the effective regulation of society, the participation of citizens, and their well-being.


Author(s):  
Ahmad Assaf ◽  
Aline Senart ◽  
Raphaël Troncy

Ensuring data quality in Linked Open Data is a complex process as it consists of structured information supported by models, ontologies and vocabularies and contains queryable endpoints and links. In this paper, the authors first propose an objective assessment framework for Linked Data quality. The authors build upon previous efforts that have identified potential quality issues but focus only on objective quality indicators that can measured regardless on the underlying use case. Secondly, the authors present an extensible quality measurement tool that helps on one hand data owners to rate the quality of their datasets, and on the other hand data consumers to choose their data sources from a ranked set. The authors evaluate this tool by measuring the quality of the LOD cloud. The results demonstrate that the general state of the datasets needs attention as they mostly have low completeness, provenance, licensing and comprehensibility quality scores.


2017 ◽  
Author(s):  
Peb Ruswono Aryan ◽  
Fajar Juang Ekaputra ◽  
Kabul Kurniawan ◽  
Elmar Kiesling ◽  
A Min Tjoa

Recent advances in linked data generation through mapping such as RML (RDF mapping language) allows for providing large-scale RDF data in a more automatic way.However, considerable amount of data in open data portals remain inaccessible as linked data.This is due to the nature of data portals having large number of small-size dataset which makes writing mapping description becomes tedious and error-prone. Moreover, these data sources requires additional preprocessing before To solve this challenge, We introduce extensions to RML to support required tasks and developed RMLx, a visual web-interface to create RML mappings.Using this interface, the process of creating mapping description can become faster and less error-prone.Furthermore, the process of linked data generation can be wrapped as to enable integration with other data in a linked data exploration environment. We explore on four different use cases to identify the requirements followed by describing how these are solved.


2013 ◽  
Vol 31 (15_suppl) ◽  
pp. 6523-6523
Author(s):  
Joseph Lipscomb ◽  
Kevin C. Ward ◽  
Kathleen Adams ◽  
Peter Joski ◽  
Douglas Roblin ◽  
...  

6523 Background: The value of linking population-based cancer registry data with insurance claims files to assess quality of care has been demonstrated in numerous studies, including those using NCI’s linked SEER-Medicare database, covering patients age 65+ in fee-for-service plans, and studies linking registry data with Medicaid, private insurance, or managed care data covering the under-65 population. We describe a prototype program linking registry data with multiple data sources to assess quality of care for at-risk populations in a defined geographical area. Methods: Data exchange agreements were executed among the investigative site (Emory University), Georgia state government, and the claims data sources/vendors. We linked Georgia Cancer Registry (GCR) records for 1999-2005 incident cases of breast and colorectal cancer with enrollment and medical services records from Medicare, Medicaid, Kaiser Permanente of Georgia, and the State Health Benefit Plan (SHBP) which covers all state workers and dependents. Following data quality checks, algorithms based on National Quality Forum (NQF) endorsed breast and colorectal cancer quality measures were applied to each linked data set to assess performance. Results: The linked data sets included 60% of all breast and colorectal cancer cases in the GCR over the study period. Quality measure performance rates varied notably across payers. For example, the percent of Stage III colon cancer patients meeting the NQF standard for adjuvant chemotherapy in the linked GCR-Medicaid, GCR-Kaiser, and GCR-SHBP data were, respectively, 75%, 92%, and 92% (p<0.05). The rates for breast cancer patients meeting standards for adjuvant chemotherapy were 86%, 84%, and 87% (p=NS), respectively. Patients in the linked GCR-Medicare data (all age 65+) generally had lower performance rates for each NQF measure. Conclusions: Linking state cancer registry data with multiple public and private sources of administrative data is technically feasible, and may represent a viable strategy for building a national cancer data system for quality improvement, as recommended in 1999 by the Institute of Medicine.


Sign in / Sign up

Export Citation Format

Share Document