Measuring and Diffusing Data Quality in a Peer-to-Peer Architecture

Author(s):  
Diego Milano

Data quality is a complex concept defined by various dimensions such as accuracy, currency, completeness, and consistency (Wang & Strong, 1996). Recent research has highlighted the importance of data quality issues in various contexts. In particular, in some specific environments characterized by extensive data replication high quality of data is a strict requirement. Among such environments, this article focuses on Cooperative Information Systems. Cooperative information systems (CISs) are all distributed and heterogeneous information systems that cooperate by sharing information, constraints, and goals (Mylopoulos & Papazoglou, 1997). Quality of data is a necessary requirement for a CIS. Indeed, a system in the CIS will not easily exchange data with another system without knowledge of the quality of data provided by the other system, thus resulting in a reduced cooperation. Also, when the quality of exchanged data is poor, there is a progressive deterioration of the overall data quality in the CIS. On the other hand, the high degree of data replication that characterizes a CIS can be exploited for improving data quality, as different copies of the same data may be compared in order to detect quality problems and possibly solve them. In Scannapieco, Virgillito, Marchetti, Mecella, and Baldoni (2004) and Mecella et al. (2003), the DaQuinCIS architecture is described as an architecture managing data quality in cooperative contexts, in order to avoid the spread of low-quality data and to exploit data replication for the improvement of the overall quality of cooperative data. In this article we will describe the design of a component of our system named as, quality factory. The quality factory has the purpose of evaluating quality of XML data sources of the cooperative system. While the need for such a component had been previously identified, this article first presents the design of the quality factory and proposes an overall methodology to evaluate the quality of XML data sources. Quality values measured by the quality factory are used by the data quality broker. The data quality broker has two main functionalities: 1) quality brokering that allows users to select data in the CIS according to their quality; 2) quality improvement that diffuses best quality copies of data in the CIS.

Author(s):  
Benjamin Ngugi ◽  
Jafar Mana ◽  
Lydia Segal

As the nation confronts a growing tide of security breaches, the importance of having quality data breach information systems becomes paramount. Yet too little attention is paid to evaluating these systems. This article draws on data quality scholarship to develop a yardstick that assesses the quality of data breach notification systems in the U.S. at both the state and national levels from the perspective of key stakeholders, who include law enforcement agencies, consumers, shareholders, investors, researchers, and businesses that sell security products. Findings reveal major shortcomings that reduce the value of data breach information to these stakeholders. The study concludes with detailed recommendations for reform.


2021 ◽  
Vol 280 ◽  
pp. 08012
Author(s):  
Yordanka Anastasova ◽  
Nikolay Yanev

The purpose of this article is to present modern approaches to data storage and processing, as well as technologies to achieve the quality of data needed for specific purposes in the mining industry. The data format looks at NoSQL and NewSQL technologies, with the focus shifting from the use of common solutions (traditional RDBMS) to specific ones aimed at integrating data into industrial information systems. The information systems used in the mining industry are characterized by their specificity and diversity, which is a prerequisite for the integration of NoSQL data models in it due to their flexibility. In modern industrial information systems, data is considered high-quality if it actually reflects the described object and serves to make effective management decisions. The article also discusses the criteria for data quality from the point of view of information technology and that of its users. Technologies are also presented, providing an optimal set of necessary functions that ensure the desired quality of data in the information systems applicable in the industry. The format and quality of data in client-server based information systems is of particular importance, especially in the dynamics of data input and processing in information systems used in the mining industry.


Author(s):  
Carla Marchetti ◽  
Massimo Mecella ◽  
Monica Scannapieco ◽  
Antoninio Virgillito

A Cooperative Information System (CIS) is a large-scale information system that interconnects various systems of different and autonomous organizations, geographically distributed and sharing common objectives (De Michelis et al., 1997). Among the different resources that are shared by organizations, data are fundamental; in real world scenarios, organization A may not request data from organization B, if it does not trust B’s data (i.e., if A does not know that the quality of the data that B can provide is high). As an example, in an e-government scenario in which public administrations cooperate in order to fulfill service requests from citizens and enterprises (Batini & Mecella, 2001), administrations very often prefer asking citizens for data rather than from other administrations that have stored the same data, because the quality of such data is not known. Therefore, lack of cooperation may occur due to lack of quality certification.


2018 ◽  
Vol 10 (1) ◽  
Author(s):  
Brian E. Dixon ◽  
Chen Wen ◽  
Tony French ◽  
Jennifer Williams ◽  
Shaun J. Grannis

ObjectiveTo extend an open source analytics and visualization platform for measuring the quality of electronic health data transmitted to syndromic surveillance systems.IntroductionEffective clinical and public health practice in the twenty-first century requires access to data from an increasing array of information systems. However, the quality of data in these systems can be poor or “unfit for use.” Therefore measuring and monitoring data quality is an essential activity for clinical and public health professionals as well as researchers1. Current methods for examining data quality largely rely on manual queries and processes conducted by epidemiologists. Better, automated tools for examining data quality are desired by the surveillance community.MethodsUsing the existing, open-source platform Atlas developed by the Observational Health Data Sciences and Informatics collaborative (OHDSI; www.ohdsi.org), we added new functionality to measure and visualize the quality of data electronically reported from disparate information systems. Our extensions focused on analysis of data reported electronically to public health agencies for disease surveillance. Specifically, we created methods for examining the completeness and timeliness of data reported as well as the information entropy of the data within syndromic surveillance messages sent from emergency department information systems.ResultsTo date we transformed 111 million syndromic surveillance message segments pertaining to 16.4 million emergency department encounters representing 6 million patients into the OHDSI common data model. We further measured completeness, timeliness and entropy of the syndromic surveillance data. In Figure-1, the OHDSI tool Atlas summarizes the analysis of data completeness for key fields in over one million syndromic surveillance messages sent to Indiana’s health department in 2014. Completeness is reported by age category (e.g., 0-10, 20-30, 60+). Gender is generally complete, but both race and ethnicity fields are often complete for less than half of the patients in the cohort. These results suggest areas for improvement with respect to data quality that could be actionable by the syndromic surveillance coordinator at the state health department.ConclusionsOur project remains a work-in-progress. While functions that assess completeness, timeliness and entropy are complete, there may be other functions important to public health that need to be developed. We are currently soliciting feedback from syndromic surveillance stakeholders to gather ideas for what other functions would be useful to epidemiologists. Suggestions could be developed into functions over the next year. We are further working with the OHDSI collaborative to distribute the Atlas enhancements to other platforms, including the National Syndromic Surveillance Platform (NSSP). Our goal is to enable epidemiologists to quickly analyze data quality at scale.References1. Dixon BE, Rosenman M, Xia Y, Grannis SJ. A vision for the systematic monitoring and improvement of the quality of electronic health data. Studies in health technology and informatics. 2013;192:884-8.


Author(s):  
Nishita Shewale

Abstract: To introduce unified information systems, this will provide different establishments with an insight on how data related activities take place and there results with assured quality. Considering data accumulation, replication, missing entities, incorrect formatting, anomalies etc. can come to light in the collection of data in different information systems, which can cause an array of adverse effects on data quality, the subject of data quality should be treated with better results. This paper inspects the data quality problems in information systems and introduces the new techniques that enable organizations to improve their quality of data. Keywords: Information Systems (IS), Data Quality, Data Cleaning, Data Profiling, Standardization, Database, Organization


2017 ◽  
Vol 4 (1) ◽  
pp. 25-31 ◽  
Author(s):  
Diana Effendi

Information Product Approach (IP Approach) is an information management approach. It can be used to manage product information and data quality analysis. IP-Map can be used by organizations to facilitate the management of knowledge in collecting, storing, maintaining, and using the data in an organized. The  process of data management of academic activities in X University has not yet used the IP approach. X University has not given attention to the management of information quality of its. During this time X University just concern to system applications used to support the automation of data management in the process of academic activities. IP-Map that made in this paper can be used as a basis for analyzing the quality of data and information. By the IP-MAP, X University is expected to know which parts of the process that need improvement in the quality of data and information management.   Index term: IP Approach, IP-Map, information quality, data quality. REFERENCES[1] H. Zhu, S. Madnick, Y. Lee, and R. Wang, “Data and Information Quality Research: Its Evolution and Future,” Working Paper, MIT, USA, 2012.[2] Lee, Yang W; at al, Journey To Data Quality, MIT Press: Cambridge, 2006.[3] L. Al-Hakim, Information Quality Management: Theory and Applications. Idea Group Inc (IGI), 2007.[4] “Access : A semiotic information quality framework: development and comparative analysis : Journal ofInformation Technology.” [Online]. Available: http://www.palgravejournals.com/jit/journal/v20/n2/full/2000038a.html. [Accessed: 18-Sep-2015].[5] Effendi, Diana, Pengukuran Dan Perbaikan Kualitas Data Dan Informasi Di Perguruan Tinggi MenggunakanCALDEA Dan EVAMECAL (Studi Kasus X University), Proceeding Seminar Nasional RESASTEK, 2012, pp.TIG.1-TI-G.6.


2021 ◽  
pp. 004912412199553
Author(s):  
Jan-Lucas Schanze

An increasing age of respondents and cognitive impairment are usual suspects for increasing difficulties in survey interviews and a decreasing data quality. This is why survey researchers tend to label residents in retirement and nursing homes as hard-to-interview and exclude them from most social surveys. In this article, I examine to what extent this label is justified and whether quality of data collected among residents in institutions for the elderly really differs from data collected within private households. For this purpose, I analyze the response behavior and quality indicators in three waves of Survey of Health, Ageing and Retirement in Europe. To control for confounding variables, I use propensity score matching to identify respondents in private households who share similar characteristics with institutionalized residents. My results confirm that most indicators of response behavior and data quality are worse in institutions compared to private households. However, when controlling for sociodemographic and health-related variables, differences get very small. These results suggest the importance of health for the data quality irrespective of the housing situation.


2016 ◽  
Vol 48 (1) ◽  
pp. 17-28 ◽  
Author(s):  
Tadeusz Pastusiak

Abstract The research on the ice cover of waterways, rivers, lakes, seas and oceans by satellite remote sensing methods began at the end of the twentieth century. There was a lot of data sources in diverse file formats. It has not yet carried out a comparative assessment of their usefulness. A synthetic indicator of the quality of data sources binding maps resolution, file publication, time delay and the functionality for the user was developed in the research process. It reflects well a usefulness of maps and allows to compare them. Qualitative differences of map content have relatively little impact on the overall assessment of the data sources. Resolution of map is generally acceptable. Actuality has the greatest impact on the map content quality for the current vessel’s voyage planning in ice. The highest quality of all studied sources have the regional maps in GIF format issued by the NWS / NOAA, general maps of the Arctic Ocean in NetCDF format issued by the OSI SAF and the general maps of the Arctic Ocean in GRIB-2 format issued by the NCEP / NOAA. Among them are maps containing information on the quality of presented parameter. The leader among the map containing all three of the basic characteristics of ice cover (ice concentration, ice thickness and ice floe size) are vector maps in GML format. They are the new standard of electronic vector maps for the navigation of ships in ice. Publishing of ice cover maps in the standard electronic map format S-411 for navigation of vessels in ice adopted by the International Hydrographic Organization is advisable in case is planned to launch commercial navigation on the lagoons, rivers and canals. The wide availability of and exchange of information on the state of ice cover on rivers, lakes, estuaries and bays, which are used exclusively for water sports, ice sports and ice fishing is possible using handheld mobile phones, smartphones and tablets.


2008 ◽  
Vol 13 (5) ◽  
pp. 378-389 ◽  
Author(s):  
Xiaohua Douglas Zhang ◽  
Amy S. Espeseth ◽  
Eric N. Johnson ◽  
Jayne Chin ◽  
Adam Gates ◽  
...  

RNA interference (RNAi) not only plays an important role in drug discovery but can also be developed directly into drugs. RNAi high-throughput screening (HTS) biotechnology allows us to conduct genome-wide RNAi research. A central challenge in genome-wide RNAi research is to integrate both experimental and computational approaches to obtain high quality RNAi HTS assays. Based on our daily practice in RNAi HTS experiments, we propose the implementation of 3 experimental and analytic processes to improve the quality of data from RNAi HTS biotechnology: (1) select effective biological controls; (2) adopt appropriate plate designs to display and/or adjust for systematic errors of measurement; and (3) use effective analytic metrics to assess data quality. The applications in 5 real RNAi HTS experiments demonstrate the effectiveness of integrating these processes to improve data quality. Due to the effectiveness in improving data quality in RNAi HTS experiments, the methods and guidelines contained in the 3 experimental and analytic processes are likely to have broad utility in genome-wide RNAi research. ( Journal of Biomolecular Screening 2008:378-389)


Sign in / Sign up

Export Citation Format

Share Document